
Adam Cochran (adamscochran.eth)|May 06, 2025 12:29
Being motivated to work on OCR for Sanskrit after working on GPT-5 makes me wondering if AI companies have run out of human generated content for training.
The largest models have trained on every internet post, page and every book in history, in every deciphered major language.
We have to be hitting not just a plateau but a ceiling soon.
This might be an indicator of that.
What comes after human training data is a massive problem in scaling the current LLM training approach.
As we’ve seen that recursively training on LLM generated data that is only human reviewed and not human produced, generates worse and worse quality results over each iteration (due to subtle differences that we as humans can’t really perceive in the screening process at first)
The company that figures out how to up train after hitting the corpus ceiling, is likely to be the winner of AGI.
Share To
Timeline
HotFlash
APP
X
Telegram
CopyLink