happysl.app
  • Communities
  • Create Post
  • Create Community
  • heart
    Support Lemmy
  • search
    Search
  • Login
  • Sign Up
David Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 2 months ago

AI benchmarks are self-promoting trash — but regulators keep using them

pivot-to-ai.com

external-link
message-square
1
fedilink
31
external-link

AI benchmarks are self-promoting trash — but regulators keep using them

pivot-to-ai.com

David Gerard@awful.systemsM to TechTakes@awful.systemsEnglish · 2 months ago
message-square
1
fedilink
Every new LLM and every new tweak to an old LLM has a press release bragging about how well it tests on some benchmark you’ve never heard of. Every new model is trained heavily to the previous tren…
alert-triangle
You must log in or register to comment.
  • Architeuthis@awful.systems
    link
    fedilink
    English
    arrow-up
    5
    ·
    2 months ago

    Still occasionally think about that bit in the o1 white paper where the openai researchers innocuously pose the question of what if our benchmarks for detecting hallucinations are shit actually, wouldn’t that be something.

TechTakes@awful.systems

techtakes@awful.systems

Subscribe from Remote Instance

Create a post
You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Visibility: Public
globe

This community can be federated to other instances and be posted/commented in by their users.

  • 13 users / day
  • 11 users / week
  • 132 users / month
  • 3.94K users / 6 months
  • 0 local subscribers
  • 1.69K subscribers
  • 727 Posts
  • 20.4K Comments
  • Modlog
  • mods:
  • David Gerard@awful.systems
  • BE: 0.19.9
  • Modlog
  • Instances
  • Docs
  • Code
  • join-lemmy.org