Disaster from autonomous AI

Preben Monteiro Ness

Aksel Braanen Sterri

First published in:

Real threat or science fiction?

Download

Ki-generated illustration from Gemini.

Main moments

! 1

! 2

! 3

! 4

Content

Already today, AI is used as superhackers and to take lives on the battlefield, and development is extremely fast. Many leading scientists are therefore concerned about a KI disaster: that sufficiently capable KI models will do irreparable harm to society, and in the worst case, lead to the annihilation of humanity. In a 2023 expert survey, KI scientists estimated the likelihood of such a disaster at about ten percent.

There are a number of risks associated with KI development. Our purpose in this memo is to shed light on one important part of this case complex on a sober and professional basis: the risk that autonomous KI systems inflict serious damage on humanity.

We identify three necessary ingredients for such a disaster.

The AI systems will be sufficient Capable. The capabilities of KI models have grown exponentially in recent years. KI models are already being used to accelerate KI research itself, which could trigger a self-reinforcing development.
The systems are misaligned — that is, has a drive to act in ways that are contrary to the interests of human beings. In controlled tests, researchers observe that models extort, manipulate, and are willing to take human lives, and that such behavior occurs more frequently in more capable models. Research also shows that models can learn to hide misaligned behaviors when they know they are being tested.
People losing control over the systems, and no longer have the ability to stop them or turn them off. Control mechanisms that work against weak systems, like an off button, do not necessarily work against sufficiently intelligent systems that will actively oppose such interventions.

We describe three concrete disaster scenarios: KI as superhuman hacker capable of gagging critical digital infrastructure; KI with physical capabilities in the form of autonomous drones and robots; and KI as supermanipulator leveraging its superior capacity for social impact to acquire resources and influence. All of these scenarios have precursors in today's KI systems.

Nevertheless, there is considerable disagreement about how likely and how imminent an eventual KI disaster is. Prominent scholars such as Yann LeCun believe that the current language-model paradigm is inadequate for creating models capable enough to pose an existential threat, and that fundamentally new model types will require decades to develop.

We end the note by analyzing what we can and should do. Measures range from buying time by slowing progress, via investments in alignment research — including explainable KI, scalable supervision and red-teaming — to developing fundamentally new AI paradigms that are designed to be safe. We also discuss defense strategies against the concrete disaster scenarios and investments in societal resilience. Common to all strategies is that they are severely underfunded compared to overall capability development.

‍

Download to read the full note (in Norwegian). Reach out to kontakt@langsikt.no to request an English version.

Download