News Drop #10 - February 26, 2025

February 26, 2025

Hey guys, I have to make these on Wednesdays now because of scheduling conflicts. Doesn't mean they'll be any more on time though!

Starting off as always with OpenAI:

OpenAI has continued the roll out of their agentic AIs. Deep Research (first coined by Google) and Operator. These are agents designed to do research or perform tasks independently, allowing one human to supervise an army of AIs working in parallel. These kinds of tools continue to automate aspects of human jobs, making research much quicker and even allowing for autonomous agents to manipulate computers all on their own. If you remember a few months back, Sundar Pichai, CEO of Google mentioned we are moving into the Agentic Era, where these models stop being just chatbots, and start doing work all on their own.

Other companies have all been working on agents as well, Anthropic has Claude Code (will be discussed shortly), and Rabbit (the team that made the R1 AI device) have released an Android app that allows an agent to control a smartphone. As we continue, it's likely that Pichai's claim will be proven more and more true, with these tools greatly enhancing the productivity of a single human.

Anthropic had a busy week:

Claude 3.7 Sonnet is here! This is the newest version of their Sonnet (mid/high end) model, and it's pretty impressive. Claude manages to continue to outperform others in coding tests, and has shown impressive improvements across the board. Claude can now do extended thinking as well, which allows for the significant performance gains seen by other models with the same technology. The maximum output has also been hugely increased, with Claude now able to generate thousands of lines of code in a single go, making it a much more effective coder for complex projects.

Claude code is also now available! This is a command line tool that can automatically create, edit, and delete files across your project. It's pretty easy to set up, and is incredibly good at software development. However, as it does require API tokens it is pretty expensive to keep running, but still far cheaper than a human of the same skill level. Some engineers even used its ability to interact with computer GUIs to play Pokémon!

Additionally, Anthropic continues to lead in the AI safety space, with Claude remaining the only model that is impossible to jailbreak (they even go as far to offer $10k to anyone who can). Claude does a much better job than others when deciding if/how to respond to sensitive issues, and Anthropic remains committed to only releasing models with a low risk of causing catastrophic harm, where OpenAI is fine releasing models in the medium range.

They've also released the Anthropic Economic Index, which is a research project tracking how people are using AI. If you're interested, I highly recommend giving it a look!

Amazon Alexa+ has come out, with smarter responses powered by Claude 3.7 Sonnet.

DeepSeek has continued to surprise us:

DeepSeek kicked off their 7 Days of Open Source (somewhat reminiscent of the 12 days of OpenAI from before the holidays) by publishing all the code they used to make DeepSeek R1! This means that now pretty much anyone has the tools needed to follow in DeepSeek's footsteps, and is likely the start to a whole new wave of AI breakthroughs!

xAI did a thing again:

xAI has released Grok 3! Grok is now topping the leaderboards in a wide range of areas, capable of both chain-of-thought reasoning and multimodal generation. This is a pretty surprising move from the xAI team, as they've beat most of the established players in a pretty short amount of time. One point to note is that Grok's guardrails are much weaker than most of the other models, and xAI does not perform much safety testing. Grok is happy to create pictures of celebrities doing almost anything, and this irresponsible approach to development is likely to have negative consequences.

IBM:

IBM Granite 3.2 just dropped! While it's not nearly as performant as top models, it is incredibly efficient and easy to run locally, which makes it a pretty interesting choice for those looking to get into the local LLM space.

That's all for this week! Newsdrops come on Wednesdays now. Probably Thursday for those getting email updates!