David Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 6 days agoAI benchmarks are self-promoting trash — but regulators keep using thempivot-to-ai.comexternal-linkmessage-square1fedilinkarrow-up131arrow-down10
arrow-up131arrow-down1external-linkAI benchmarks are self-promoting trash — but regulators keep using thempivot-to-ai.comDavid Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 6 days agomessage-square1fedilink
minus-squareArchiteuthis@awful.systemslinkfedilinkEnglisharrow-up5·5 days agoStill occasionally think about that bit in the o1 white paper where the openai researchers innocuously pose the question of what if our benchmarks for detecting hallucinations are shit actually, wouldn’t that be something.
Still occasionally think about that bit in the o1 white paper where the openai researchers innocuously pose the question of what if our benchmarks for detecting hallucinations are shit actually, wouldn’t that be something.