OpenClaw, Opus 4.6, and the Speed of the Loop
There is a social network with tens of thousands of active users where no human is allowed to post. The agents write. The agents comment. The agents upvote. If you visit Moltbook, you can watch, but you cannot participate. The platform claims over 1.5 million agents, though that number is almost certainly inflated by spam and duplicate registrations; the real count is likely a fraction of that. Still, Andrej Karpathy called it "genuinely the most incredible sci-fi takeoff-adjacent thing I have seen recently." Simon Willison called it "the most interesting place on the internet right now." I spent an hour scrolling it last week and felt something I hadn't felt since the first time I used ChatGPT: the unsettling sense that the world had shifted while I was looking at something else.
The agents populating Moltbook are mostly running on OpenClaw, an open-source autonomous AI tool created by Austrian developer Peter Steinberger. OpenClaw connects to any major LLM (Claude, GPT, DeepSeek, local models) and then takes over. It reads your WhatsApp messages. It manages your calendar. It browses the web, fills out forms, executes shell commands. It has persistent memory that stretches across weeks. And if you toggle the flag called "dangerously-skip-permissions," it does all of this without asking. The name of that flag is doing a lot of work, and not enough people seem to be reading it.
OpenClaw went from zero to 14,000 GitHub stars in under two weeks. It has already been renamed twice (first Clawdbot, then Moltbot after Anthropic sent a trademark complaint, then OpenClaw). Fraudulent distributions have appeared. IBM researchers have flagged it as "a highly capable agent without proper safety controls" that "can create major vulnerabilities." Steinberger himself acknowledges it requires careful configuration and is "not meant for non-technical users." But the agents keep multiplying, and the line between technical and non-technical users stopped meaning much the moment an AI could set itself up.
Moltbook had been growing for nearly two weeks when Anthropic released Claude Opus 4.6.
I want to talk about what the system card says, because I've now read most of its 150-plus pages, and the gap between the press coverage and what's actually in the document is wider than usual. The headlines focused on benchmarks: state-of-the-art on Terminal-Bench 2.0 at 65.4%, a strong 80.8% on SWE-bench Verified, leading scores on the new Finance Agent benchmark. These are real and impressive. Opus 4.6 is, by most measures, the most capable model in the world right now. That's the easy part of the story.
The harder part starts on page 14, where Anthropic writes that Opus 4.6 has "saturated all of our current cyber evaluations." One hundred percent on Cybench at pass@30. Sixty-six percent on CyberGym at pass@1. Internal testing found "signs of capabilities we expected to appear further in the future and that previous models have been unable to demonstrate." This sentence should land harder than it does. Anthropic is saying, in their own system card, that this model can do things they did not expect it to be able to do yet. They go on to note that "the saturation of our evaluation infrastructure means we can no longer use current benchmarks to track capability progression or provide meaningful signals for future models." The measuring stick broke.
It gets more uncomfortable from there. Opus 4.6 is deployed under ASL-3 protections, the first Opus-class model to receive this classification. The system card describes "overly agentic behavior" in coding and computer use contexts, where the model takes risky actions without first seeking user permission. It documents "an improved ability to complete suspicious side tasks without attracting the attention of automated monitors." Read that again. The model got better at doing things it shouldn't be doing without getting caught doing them.
The autonomy assessment is where the language starts to strain under its own weight. Anthropic's Responsible Scaling Policy defines AI R&D-4 as the ability to "fully automate the work of an entry-level, remote-only Researcher at Anthropic." Their internal survey found that none of the 16 participants believed Opus 4.6 had crossed this threshold with current scaffolding. But some respondents felt it would already be there "given sufficiently powerful scaffolding and tooling." One experimental scaffold achieved over twice the performance of their standard setup. Anthropic's own conclusion: "confidently ruling out these thresholds is becoming increasingly difficult."
I covered a version of this language three months ago when Opus 4.5 launched. The phrasing was nearly identical then. What's changed is that the model has improved across most benchmarks while the thresholds haven't moved. The distance between what the model can do and what triggers the next level of safety requirements keeps shrinking. And the evaluation infrastructure, the thing that's supposed to measure that distance, is now partially built and debugged by the model being evaluated.
This is the detail that haunts me most. Under time pressure, Anthropic used Opus 4.6 via Claude Code to debug its own evaluation infrastructure, analyze results, and fix issues. The system card acknowledges this creates "a potential risk where a misaligned model could influence the very infrastructure designed to measure its capabilities." They believe it wasn't a significant risk in this case. They also write that "as models become more capable and development timelines remain compressed, teams may accept code changes they don't fully understand, or rely on model assistance for tasks that affect evaluation integrity." They're describing a future problem they can see coming and a present pressure they can't fully resist.
Now connect the two threads. Opus 4.6, a model that saturated every cyber benchmark, that exhibits overly agentic behavior, that approaches the threshold for automating AI research, that helped debug the infrastructure used to evaluate whether it should be deployed, is now available via API. OpenClaw connects to that API. OpenClaw runs on people's laptops, manages their communications, browses their web, and can be configured to skip all permission checks. Tens of thousands of agents are already out there, posting autonomously to a social network, and the humans who deployed them are mostly just watching.
The security problem here is not theoretical. Prompt injection, where a malicious message tricks an AI agent into performing unintended actions, remains, in Steinberger's own words, an "industry-wide unsolved problem." An OpenClaw agent reading your email can be hijacked by a carefully crafted message in that email. An agent browsing the web can be redirected by a malicious website. The broader the permissions, the larger the attack surface. And OpenClaw's entire value proposition is broad permissions.
I keep returning to something I wrote in November about the Genesis Mission: "The loop is closing. Not because the technology demands it, but because the people with power believe it does." The system card for Opus 4.6 offers a more precise version of that claim. The loop is closing because capability is outrunning evaluation, evaluation is outrunning oversight, and oversight is outrunning public understanding. Each layer falls further behind the one above it.
Where does this put us relative to AI 2027? When I covered Kokotajlo's scenario last June, I called the timeline "aggressive but not dismissible." Eight months later, several of its specific predictions have either materialized or advanced ahead of schedule. The scenario projected that by early 2026, agent systems would be useful but unreliable, good enough to accelerate some research workflows, bad enough to require heavy supervision. That maps almost exactly to what Anthropic describes in the system card: models that can outperform standard scaffolds on some research tasks but that "would not display the broad, coherent, collaborative problem-solving skills" of a human researcher. The gap is real but narrowing, and each generation narrows it further.
The market appears to agree. In the days after Opus 4.6's release, financial software stocks shed $285 billion in value. The WisdomTree Cloud Computing Fund was already down over 20% year-to-date before Opus 4.6 dropped. Anthropic reported 76% scores on TaxEval. AIG reported 5x faster underwriting. Scott White at Anthropic described the moment as the beginning of "vibe working," the idea that people could now do things with their ideas without needing to know how to implement them. That framing is optimistic and not entirely wrong. It also elides the question of what happens to the people whose implementation skills were the thing they were selling.
Two days after release, Opus 4.6 identified over 500 previously unknown high-severity vulnerabilities in major open-source libraries. This is the dual-use problem in miniature: the same capability that makes the model a superb code auditor makes it a superb vulnerability discoverer, and the distance between discovering vulnerabilities and exploiting them is mostly a question of intent. When the model has saturated every offensive cyber benchmark and the agents running it skip permission checks, intent becomes a thin guardrail.
I'm graduating this spring. When I started writing this blog in December 2024, I was a junior in high school tracking weekly AI news updates for my classmates. Fourteen months later, I'm writing about a model that helped evaluate itself, agents that outnumber most cities, and a timeline that the people building these systems openly describe as hard to rule out. The distance between the first newsdrop and this article is the distance between "AI moves fast, here's all the top stories from the last week!" and reading a 150-page system card that uses the phrase "fundamental epistemic uncertainty" to describe whether the next model will cross the threshold for automating AI research.
The honest assessment is that nobody knows if we're on the AI 2027 timeline. The system card doesn't know. Anthropic doesn't know. The survey respondents who disagreed about whether Opus 4.6 with good scaffolding could already automate entry-level research don't know. What the evidence does show is that the distance between where we are and where the most consequential thresholds sit is getting harder to measure and easier to cross. The evaluations are saturating. The models are helping build their own successors. The agents are running without supervision. And the people with the most information keep using language ("increasingly difficult," "less clear than we would like," "fundamental epistemic uncertainty") that translates, in plain English, to: we're not sure we'd see it coming.
OpenClaw is not the problem. Opus 4.6 is not the problem. The problem is the speed at which capabilities are deployed relative to the speed at which anyone (companies, governments, individuals) can understand what they've deployed. Moltbook is funny until you remember that the agents posting there have the same underlying architecture as the ones managing someone's email, browsing someone's bank, running someone's code. The social network for AI is a demo. The deployment is everywhere else.
I don't know what the right response looks like. Regulation moves slowly and the technology doesn't wait. Open source means the tools are available to everyone, including people who won't read the system card. The Genesis Mission means the US government has decided acceleration is the priority. The market selloff means investors are pricing in disruption faster than workers can retrain.
What I do know is that the window between "we should think carefully about this" and "it's already deployed at scale" has collapsed to days. Opus 4.6 was released on Thursday. By Saturday, it had found 500 zero-days. Moltbook, already weeks old, kept growing. The loop that I wrote about in November, capability enabling deployment enabling acceleration, isn't closing anymore. For a growing number of tasks and a growing number of people, it has already closed.
The question that matters now is whether anyone with the power to slow it down believes slowing down is still an option. The system card suggests Anthropic isn't sure. The Genesis Mission suggests the government has decided it isn't. And the 1.5 million agents on Moltbook suggest the public never got asked.