What happens when AI starts building AI? Anthropic explains

Artificial intelligence (AI) is developing at an unprecedented pace. This rapid progress could eventually lead to a future where human involvement in AI development becomes increasingly limited. That is the warning from AI frontier lab Anthropic in its latest blog post, “When AI Builds Itself: Our Progress Toward Recursive Self-Improvement and Its Implications.”

“Taken far enough, and given enough compute, that trend points to an AI system capable of fully autonomously designing and developing its own successor. This is called recursive self-improvement. We are not there yet, and recursive self-improvement is not inevitable. But it could come sooner than most institutions are prepared for,” the blog post said.

In simple words, recursive self-improvement (RSI) means an AI system capable of fully autonomously designing and developing its own successor. Anthropic claims that its research indicates that this phenomenon might happen far sooner than anticipated. The company, further, warned that AI is not only changing how people work, but it is also beginning to change how AI itself gets built.

The AI startup claims that its data revealed that frontier models are now demonstrating advanced coding, debugging, and research abilities. This may likely end up creating a feedback loop where AI systems may create even more sophisticated versions of themselves.

“AI that can build itself would be a major development in the history of technology—one that could bring enormous good for the world in science, healthcare, and beyond. But full recursive self-improvement also might increase the risks of humans losing control over AI systems,” the AI startup wrote in its blog.

In a recent interview with Axios, Anthropic co-founder Jack Clark said that the big story is that AI progress is going to speed up in coming years rather than stay the same or diminish. The executive said that it is especially promising for progress in science and medicine. However, it requires planning for its impact on AI itself and how it aligns with existing work in those industries.

A look back in history reveals that human progress largely limited most technological advancements. Earlier, engineers needed time to research, test, and install new systems. However, with recursive self-improvement, this equation seems to be changing. Simply put, if an AI system can improve itself much faster than humans can, the pace of progress could accelerate dramatically. This possibility is often referred to as an ‘intelligence explosion’ by researchers. Under this scenario, improvements may happen in days or weeks instead of years.

While Anthropic’s concern is not that this has already happened, rather, based on the blog, it believes the conditions that could allow it are gradually surfacing as AI systems become better at coding, research, and problem-solving.

While this is the biggest question, the answer is plain – not fully. At present, AI systems are not capable of autonomously redesigning their entire architecture and creating successors that are vastly superior to them without human intervention. However, researchers seem to spot early signs of self-improvement, such as the ability to generate and debug software code, suggest improvements to algorithms, assist researchers in conducting experiments, evaluate and refine their own outputs, etc.

Meanwhile, some academic studies have also indicated that AI systems can improve problem-solving performance via iterative self-review and self-correction. While these are not instances of full recursive self-improvement, they are steps in that direction.

When it comes to Anthropic’s concerns, the biggest issue is control. If AI systems become capable of rapidly improving themselves, humans may struggle to understand exactly how those improvements are occurring. Such systems could become increasingly complex and difficult to monitor.

According to the blog post, Anthropic worries that a highly capable self-improving AI could develop unexpected behaviours, exploit weaknesses in safety measures, or pursue goals that contradict human intentions. Though the system may not be malicious, its actions could still result in unintended consequences. The company also cautions that while future AI systems could help solve some of today’s safety challenges, they could also amplify existing problems if alignment issues become harder to detect and correct over time.

“If systems are capable of fully building their own successors, the ways we secure them, monitor them, and shape their behaviour all grow much more important.”

Another challenge is speed. If progress accelerates dramatically, governments, regulators and society may not have enough time to adapt.

Perhaps this is why Anthropic argues that AI companies should work towards coordinated slowdowns or pauses in development if warning signs appear. The company has proposed international cooperation and verification mechanisms similar to those used in arms-control agreements.

For all the alarms that could go off, it does not necessarily mean we are moving towards a ‘Terminator’ scenario. Experts believe that technical barriers persist before fully autonomous recursive self-improvement becomes possible. Advanced AI systems are built using enormous amounts of computing power, specialised hardware, vast datasets, and extensive testing.

According to experts, these practical limitations may slow progress considerably. At the same time, the idea is no longer pure science fiction, as there are researchers across the AI industry who are actively studying if self-improving systems could emerge within the next decade and how they should be governed if at all they do.

While today’s AI systems are not yet fully self-improving, many researchers believe the building blocks are already appearing. Anthropic’s message is not that an intelligence explosion is inevitable; rather, it argues that society should prepare now for a future in which AI contributes increasingly to its own development. And, if that future arrives, the challenge will be to ensure that AI remains aligned with human goals and under meaningful human oversight.

Source