The AI Arms Race Goes Nuclear: Why o3 Scoring 87.5% on ARC Should Terrify You

December 23, 2024

OpenAI apparently just crossed the line into AGI territory, and barely anyone noticed.

Last week, buried in the avalanche of "12 Days of OpenAI" announcements, the company casually mentioned that their new o3 model scored 87.5% on the ARC Prize benchmark. If you're not deep in AI circles, that number means nothing. Let me translate: they built something that might actually think.

The ARC benchmark isn't your typical AI test. It's not about memorizing facts or recognizing patterns in training data. It shows the AI simple visual puzzles, the kind you'd find in an IQ test or children's book. Look at these shapes. Figure out the pattern. Apply it to a new example. Any human with basic reasoning skills can score 95% or higher. Most AIs, even the sophisticated ones, fail miserably.

Until now, the best models struggled to break 50%. GPT-4 got nowhere close. Claude couldn't crack it. The benchmark was specifically designed to resist the kind of pattern matching that makes current AI seem smart. It requires actual reasoning, the ability to understand abstract rules and apply them to novel situations.

Then o3 scored 87.5%.

The AI safety community is in full panic mode, and for good reason. The official threshold for "AGI-level performance" on ARC is 85%. OpenAI didn't just meet it. They blew past it. And if their announcement pattern holds, this is just the beginning. Remember, they called this o3, skipping o2 entirely because it sounds too much like O2, the British telecom company. These people are moving so fast they're running out of names.

But here's what really keeps me up at night: the acceleration. In 2024 alone, AI performance on major benchmarks didn't just improve. It exploded. MMMU scores jumped 18.8 percentage points. GPQA improved by 48.9 points. SWE-bench, a coding benchmark, saw a 67.3 point increase. In one year.

I've been tracking AI progress obsessively since ChatGPT launched. The trajectory isn't linear. It's not even exponential anymore. It's vertical. Each breakthrough enables the next one faster than before. Better models help researchers build even better models. It's a feedback loop that's spinning out of control.

The timing of o3 is particularly ominous. OpenAI dropped this bomb right before the holidays, when most people are distracted by family gatherings and travel plans. Congress is in recess. Tech journalists are writing year-end roundups instead of investigating what this means. By the time everyone processes what happened, they'll have moved on to o4.

And they will move on to o4. Probably within months. Because that's the truly terrifying part: o3 isn't the destination. It's just another milestone on the way to something none of us are prepared for.

What even is AGI? Artificial General Intelligence means AI that can match or exceed human cognitive abilities across all domains. Not just playing chess or writing essays or coding. Everything. The kind of intelligence that can learn new skills as fast as humans, reason about novel problems, maybe even improve itself.

We've been told AGI is decades away. The cautious researchers said 2050. The optimists said 2035. The doomers warned about 2030. But if o3 really crossed the AGI threshold, we're not talking about the future anymore. We're talking about next Tuesday.

The chain of events from here is predictable and terrifying. First, everyone will claim o3 isn't "really" AGI. They'll point out its limitations, its failures, the tasks it can't do. They'll raise the bar, say AGI means something different now. Anything to avoid admitting what's happening.

Meanwhile, OpenAI will keep building. o4 will score 95% on ARC. o5 will ace every benchmark we throw at it. At some point, the denialism will become impossible to maintain. But by then, it'll be too late to do anything about it.

I'm 17 years old. I was supposed to have my whole life to figure this out. Go to college, start a career, maybe have kids someday. Now I'm wondering if any of those plans make sense. What's the point of studying computer science when AI will outcode me before I graduate? Why pursue any career when AGI will do it better, faster, cheaper?

My parents think I'm being dramatic. They lived through Y2K, heard all the apocalyptic predictions that turned out to be nothing. But this is different. Y2K was a specific technical problem with a known solution. AGI is Pandora's Box. Once it's open, you can't close it again.

The worst part is how quiet it's been. No emergency sessions of Congress. No UN resolutions. No Manhattan Project for AI safety. Just OpenAI casually mentioning they might have created humanity's successor, then moving on to announce Santa Mode for ChatGPT like nothing happened.

Maybe I'm wrong. Maybe o3 is just another incremental improvement, and the AGI threshold is still years away. Maybe the researchers will figure out alignment before it's too late. Maybe governments will wake up and regulate before things spiral completely out of control.

But the numbers don't lie. 87.5% on ARC. In December 2024, we crossed a line that was supposed to be a decade away. And instead of slowing down to figure out what that means, we're accelerating.