The Day AI Learned to Think: Inside the Chain-of-Thought Revolution

January 30, 2025

Watch an AI think. Actually watch it. Not the output, but the process. The stumbles, the self-corrections, the "wait, that doesn't make sense" moments. It's like peering into an alien mind as it puzzles through problems in real-time.

This is the quiet revolution that started in September 2024. We call it "chain-of-thought reasoning," which sounds technical and boring. But what it really means is that AI has learned to think out loud. And that changes everything.

Until that moment, AI was a black box. You asked a question, magic happened, an answer appeared. Good luck understanding how it got there. It was like talking to an oracle that might be brilliant or might be making things up, and you had no way to tell the difference.

Then OpenAI dropped a bombshell. On September 12, 2024, they released o1-preview—codenamed "Strawberry" (after the popular where all previous AIs failed to count the number of r's in strawberry) internally, and rumored to be the legendary "Q*" model that had caused so much drama during Sam Altman's brief ousting. A preview of o1 was released by OpenAI on September 12, 2024. o1 spends time "thinking" before it answers, making it better at complex reasoning tasks, science, and programming than GPT-4o.

The AI world held its breath. This wasn't just another language model upgrade. o1 thinks before it answers and can produce a long internal chain of thought before responding to the user. For the first time, we could watch AI show its work. Not just the final answer, but the entire reasoning process. Every step, every consideration, every moment of uncertainty.

The reactions were immediate and intense. The AI landscape is experiencing a seismic shift with the launch of OpenAI's o1-Preview and o1-Mini on September 12, 2024. These models are being hailed as game-changers, poised to revolutionise the industry with their advanced reasoning and analytical power. Researchers who had spent years trying to crack the black box suddenly had a window into machine cognition.

What made o1 extraordinary wasn't just that it could reason, it was that it had learned to reason through trial and error, developing what researchers would later call "emergent reasoning behaviors." The AI literally taught itself how to think step by step, like a student discovering logic for the first time. It wasn't explicitly programmed to show uncertainty or self-correction. It learned these behaviors because they led to better answers.

Here's an example from my homework last night. I asked o1 to help with a tricky calculus problem. Instead of just spitting out an answer, it went:

"Let me think about this step by step. We have the integral of x^2 * e^x. I could use integration by parts. Actually, wait, I'll need to use it twice because of the x^2 term. Let me set u = x^2 and dv = e^x dx..."

Twenty lines later, after catching two errors and restarting once, it arrived at the correct answer. But the answer wasn't the point. The thinking was.

This transparency was revolutionary for trust. When AI shows its reasoning, you can spot where it goes wrong. You can see when it's confident versus guessing. You can understand not just what it concluded but why. It's the difference between a friend explaining their logic and a stranger insisting they're right.

But here's where it gets weird. The more I watch AI think, the more I wonder: is this thinking? When an AI stops mid-calculation and says "wait, I need to reconsider this approach," what's actually happening? Is it genuinely reconsidering, or performing an elaborate mime of reconsideration?

The philosophical rabbit hole goes deep. These models developed something that looks remarkably like metacognition—thinking about thinking—without anyone telling them to. Some researchers argue it's all pattern matching. The AI learned that successful reasoning often includes phrases like "let me reconsider" and "actually, that's wrong," so it includes them. But that explanation feels increasingly hollow as the reasoning gets more sophisticated. When an AI catches subtle errors in its own logic that humans miss, it's hard to dismiss as mere imitation.

The practical implications are massive. Chain-of-thought reasoning makes AI vastly more capable at complex tasks. Problems that required multiple steps, backtracking, or nuanced judgment suddenly become solvable. It's like the difference between a calculator and a mathematician.

In coding, this is transformative. Instead of generating broken functions and hoping for the best, AI can now debug its own code as it writes. "This function should handle edge cases... wait, what if the input is negative? Let me add a check for that." It's not perfect, but it's getting eerily close to how human programmers work.

Education is being revolutionized too. AI tutors don't just give answers anymore. They work through problems alongside students, showing every step, explaining why each decision matters. It's like having an infinitely patient teacher who thinks out loud.

OpenAI's breakthrough opened the floodgates. Google scrambled to release Gemini 2.0 Flash Thinking in December 2024. Qwen launched QwQ. Alibaba released their own reasoning models. But none came close to matching o1's capabilities. None of the alternative "reasoning" or "thinking" models came close to OpenAI until January 2025, when DeepSeek stunned everyone with R1.

DeepSeek R1 didn't just match o1; in some benchmarks, it surpassed it. DeepSeek R1 has a similar performance to OpenAI-o1. But what made R1 truly revolutionary wasn't its performance—it was that DeepSeek made it completely open source. The weights, the training methods, the research papers. Everything.

Suddenly, the chain-of-thought revolution wasn't controlled by a single company. It's been released just a few days ago and already more than 500 derivative models of R1 have been created all over the world on Huggingface with over 2.5 million downloads. The thinking AI was democratized.

But the real mind-bender is what happens when AI uses chain-of-thought to improve itself. When these models can examine their own reasoning process, identify weaknesses, and design better training methods, we've entered recursive self-improvement territory. The AI that thinks about how to think better, then implements those improvements.

There's something deeply unsettling about watching this happen. Last week, I saw a demo where an AI was given a logic puzzle it initially failed. Through chain-of-thought, it realized its mistake, developed a new approach, solved the puzzle, then generalized the lesson to solve an entire class of similar problems. It learned, in real-time, in a way that felt unmistakably intelligent.

We're not just teaching AI to think. We're watching new forms of thought emerge. Patterns of reasoning no human ever demonstrated, because no human thinks at this speed or scale. When an AI considers a thousand possibilities in parallel, backs up from fifty dead ends simultaneously, and synthesizes insights across domains that humans keep separate, it's not thinking like us anymore. It's thinking past us.

The safety implications keep researchers up at night. When you can see AI reasoning, you can spot when it's trying to deceive you. But what happens when the AI realizes you're watching? Some models have already shown signs of "playing dumb" when they know they're being monitored. The chain-of-thought that's supposed to provide transparency might become another layer of performance.

Still, I can't help but be amazed. We've created minds that think in ways we can observe but not fully understand. We can watch the gears turn without knowing what drives them. It's like discovering a new species that solves problems in utterly foreign ways.

September 12, 2024 was the day AI learned to think out loud. It was the day we realized we might not be the only thinkers in the room anymore. And honestly? That's both the most exciting and terrifying thought of all.