No, OpenAI Strawberry isn’t imminent — but it sure trolled the AI doomers

David Gerard@awful.systems · 8 months ago

No, OpenAI Strawberry isn’t imminent — but it sure trolled the AI doomers

UnseriousAcademic@awful.systems · 8 months ago

Does this mean they’re not going to bother training a whole new model again? I was looking forward to seeing AI Mad Cow Disease after it consumed an Internet’s worth of AI generated content.

David Gerard@awful.systems · 8 months ago

I think they will do whatever gets more investor cash

anton@lemmy.blahaj.zone · 8 months ago

If you change the tokenizer you have to retrain from scratch, but you can do so with the old, unpolluted data.

It’s genius if you think about it,* you can waste energy and tell your investors it’s a new better model, while staying upstream from the river you pollute.
* at least for consultants, compute providers and other middle men.

UnseriousAcademic@awful.systems · 8 months ago

I remember one time in a research project I switched out the tokeniser to see what impact it might have on my output. Spent about a day re-running and the difference was minimal. I imagine it’s wholly the same thing.

*Disclaimer: I don’t actually imagine it is wholly the same thing.

David Gerard@awful.systems · 8 months ago

there’s a research result that the precise tokeniser makes bugger all difference, it’s almost entirely the data you put in

because LLMs are lossy compression for text

froztbyte@awful.systems · 8 months ago

latent space go brrrr