News Drop #14 - March 26, 2025

March 26, 2025

Yeah guys I totally forgot about the news drop this one's on me:

Google just dropped a new version of Gemini:

Gemini 2.5 pro came out yesterday, it delivers pretty impressive performance across a wide range of tasks. This is a chain-of-thought model, which helps partially explain why its performance is so good.

One particular strength is on Humanity's Last Exam. Gemini 2.5 pro scored 18.8%, which is far higher than most of the rest of the models currently available (<13%). Humanity's Last Exam is not like the Arc Prize, where HLE is designed to just be really hard for whoever attempts it. The mean human score is 0% (for 1000 questions), and I've included a link to one of my sites if you'd like to give some of the problems a try here: https://kaileh.dev/hle

Another benchmark in this space is the Arc AGI 2 benchmark. This is a new version that came out this year, and is designed to assess a model's ability to reason like a human. For this one the average human can score 95%, whereas most models cannot score above ~10%. Here's a link to their site if you wanna try it out. It takes a few minutes to figure out the pattern, but once you've got one the rest are pretty similar: https://arcprize.org/

ChatGPT can now generate images all on its own:

The new system is not a part of DALL-E, it's native generation to the LLM.

performance is pretty solid, rivaling other tools like Imagen3
great at editing images, much better than competitors
really strong at generating text and staying coherent, this is an area where we've seen a lot of improvement recently, and it's really nice to see!
can generate images of public figures, as long as they're adults. This is a movement away from their old policies, and definitely includes some interesting possibilities

Intel & Nvidia Team blue and team green are teaming up:

There's a still as of yet unconfirmed deal between Intel and Nvidia, that would cause the GPU titan to shift away from Taiwan based TSMC, and towards Intel's fabrication facilities. TSMC is currently the leading edge of chip manufacturing, with 90% of cutting edge chips worldwide being made in their fabs. This would be a huge deal for Intel, who has seen their tech stagnate and their profits crumble. This is likely a result of attempting to move manufacturing back to the US, but it's still unclear if Intel has the tech that the maker of the world's most advanced chips needs.

DeepSeek is back to flex on all the US AI companies:

DeepSeek v3 just released! It's about as powerful as Anthropic's Claude 3.5 Sonnet. It's definitely impressive performance for a local model, and, unlike R1 this is NOT a chain-of-thought model, meaning it gets no reasoning before it jumps into responding. This is a huge gain for local AI enthusiasts, as it's able to run at a healthy 20 tokens per second on a few thousand dollars of hardware (Mac Studio)

That's it for this week!