Chat GPT AI

  • Guest, it's time once again for the massively important and exciting FoH Asshat Tournament!



    Go here and give us your nominations!
    Who's been the biggest Asshat in the last year? Give us your worst ones!

Lambourne

Ahn'Qiraj Raider
2,905
6,928
So at what point do we start telling all the coders to #Learn2Journalism

Talking head days are numbered too, this new tool generates facial expressions for conversations off a single image. Created by Bytedance, owners of TikTok. If you think social media is bad now, image unleashing a few billion artificial humans onto social media.


 

Lambourne

Ahn'Qiraj Raider
2,905
6,928
O3 also got a 25% score on Frontier Math, a test specifically made to test mathematical ability at an extreme level. Only part of it is public so it can't be in any training data. Sample problems below. Remember when we were laughing at LLMs getting simple arithmetic wrong because of hallucination?

1734862936719.png
 

Leaton

Trakanon Raider
132
92
Yeah, only 175th best in coding is still very impressive, just give it a couple more versions and it should hit #1.

Once it hit #1 it probably would still lose vs a top 100 dev group collaboration. Still, give it a couple more years, maybe 2030 and then even a top 100 dev group collaboration won't be able to beat it anymore.

That is very exciting, can ditch Windows OS, have own custom apps to your own spec, own browser etc. Bascially own software and go API for all apps form that point onwards.

Have applied for safety testing o3, but do not think I will get in at all. (only have Org ID due to business tax file number, but nothing more in area of AI :() Does not hurt to at least try, yeah?

For those who have a bussiniss tax number and can get an Org ID and would like try to apply for it -> Early access for safety testing
 
Last edited:

Mist

REEEEeyore
<Gold Donor>
31,365
23,787
The version of the model that hit the 87.5 ARC-AGI score costs roughly ~$5000 in electricity per prompt and works on standardized logic problems. It also trained on 75% of the questions in the eval ahead of time, admitted by OpenAI themselves. It likely did the same with the Codeforces questions. Each question in either evaluation required careful handcrafted prompts (aka semantic computer programming) to get an answer.

The ARC-AGI eval looks like this:

1734963832583.png

1734963841774.png


This is only AGI be redefining the word AGI to mean "really good at standardized tests." To meet the criteria for actual AGI requires a highly autonomous system capable of doing any work that a human could do. This is not that. o3 is at best a semantic programming language/interpreter for solving standardized problems using a massive amount of compute.

1734964279549.png
 

Daidraco

Avatar of War Slayer
10,329
10,738
The version of the model that hit the 87.5 ARC-AGI score costs roughly ~$5000 in electricity per prompt and works on standardized logic problems. It also trained on 75% of the questions in the eval ahead of time, admitted by OpenAI themselves. It likely did the same with the Codeforces questions. Each question in either evaluation required careful handcrafted prompts (aka semantic computer programming) to get an answer.

The ARC-AGI eval looks like this:

View attachment 566085
View attachment 566086

This is only AGI be redefining the word AGI to mean "really good at standardized tests." To meet the criteria for actual AGI requires a highly autonomous system capable of doing any work that a human could do. This is not that. o3 is at best a semantic programming language/interpreter for solving standardized problems using a massive amount of compute.

View attachment 566088
Just a general question. Who is this "Mike Knoop" guy that these pictures originate from? I see that he's co-founder of stuff. But dont know what he is looked up to, for.