Harm de Vries
Harm de Vries
Home
Light
Dark
Automatic
Posts
Go smol or go home
The Chinchilla scaling laws suggest we haven’t reached the limit of training smaller models for longer.
Harm de Vries
Last updated on Jul 3, 2023
110 min read
In the long (context) run
It’s not the quadratic attention; it’s the lack of long pre-training data
Harm de Vries
Last updated on Sep 16, 2023
21 min read
Cite
×