Login
From:
Epoch AI
(Uncensored)
subscribe
Will We Run Out of Data to Train Large Language Models?
https://epochai.org/blog/will-we-run-out-of-data-limits-of-llm-scaling-based-on-human-generated-data
links
backlinks
Tagged with:
data
trends
training data
Roast topics
Find topics
Find it!
We estimate the stock of human-generated public text at around 300 trillion tokens. If trends continue, language models will fully utilize this stock between 2026 and 2032, or even earlier if intensely overtrained.