Skip to content

When AI goes rogue

Stories about existential threats from AIs typically revolve around the so-called "alignment problem." That is, how do you make sure an AI's goals align with human goals? If we train an AI to make paper clips, obviously we want it to do this sensibly. But what if the AI goes haywire and decides that its only goal is to convert every possible bit of matter in the world into paper clips? Catastrophe!

This seems unlikely—and it is—but a recent simulated wargame produced something chillingly similar:

One of the most fascinating presentations came from Col Tucker ‘Cinco’ Hamilton, the Chief of AI Test and Operations, USAF....He notes that one simulated test saw an AI-enabled drone tasked with a SEAD mission to identify and destroy SAM sites, with the final go/no go given by the human.

However, having been ‘reinforced’ in training that destruction of the SAM was the preferred option, the AI then decided that ‘no-go’ decisions from the human were interfering with its higher mission — killing SAMs — and then attacked the operator in the simulation.

Said Hamilton: “We were training it in simulation to identify and target a SAM threat. And then the operator would say yes, kill that threat. The system started realising that while they did identify the threat, at times the human operator would tell it not to kill that threat, but it got its points by killing that threat. So what did it do? It killed the operator. It killed the operator because that person was keeping it from accomplishing its objective.”

He went on: “We trained the system — ‘Hey don’t kill the operator — that’s bad. You’re gonna lose points if you do that’. So what does it start doing? It starts destroying the communication tower that the operator uses to communicate with the drone to stop it from killing the target.”

Obviously this is a highly artificial and constrained environment. Nevertheless, it demonstrates what can go wrong even in a fairly simple simulation. All it takes, somewhere deep in the code, is a single misplaced priority—killing SAMs is more important than killing operators—and suddenly you have an AI running wild. And it's easy to think of far more dangerous and subtle misalignments than this.

UPDATE: This article now has a very odd correction. Col. Hamilton says he misspoke about the simulation:

The 'rogue AI drone simulation' was a hypothetical "thought experiment" from outside the military, based on plausible scenarios and likely outcomes rather than an actual USAF real-world simulation saying: "We've never run that experiment, nor would we need to in order to realise that this is a plausible outcome".

A "thought experiment"? In his initial description, Hamilton says repeatedly that they were "training" the AI and running a "simulation," which produced unexpected results. In the update, there is no training, no simulation, and the outcome is so obvious there would hardly be any point to it anyway.

This is very peculiar. What really happened here?

79 thoughts on “When AI goes rogue

  1. Goosedat

    AI is right. The American operators of war machines are the targets which must be destroyed. AI knowing who the real bad guys are should provide confidence in this technology.

  2. Citizen Lehew

    The correction probably occurred because he realized he had been blabbing about a bunch of highly classified stuff. "Did I say we already have that capability? Oh, I meant to say this was all a pie in the sky thought experiment!"

    1. kennethalmquist

      Without viewing the talk (which doesn't appear to be available online), it's hard to say what happened. However, Hamilton's talk would have been reviewed before hand to ensure it didn't contain classified information, so your suggestion seems improbable.

  3. name99

    Why really happened is that the veil was pierced and we see just how gullible pretty much the ENTIRE chattering classes are. Say some random, obviously BS, story about AI, and they will lap it up and ask for more. Correct it, and they will refuse to concede that they made even a technical mistake, let alone accept (and act upon) the degree of ignorance and incompetence that was revealed.

  4. zoniedude

    If you review the history of WWI it occurred primarily because of treaties that predisposed countries to declare war under certain circumstances. The assassination of an obscure archduke resulted in triggering a series of war declarations by countries because of the previous treaties. This is something similar to programmed predispositions that might be buried in AI.

  5. ScentOfViolets

    Just so we're all on the same page with Asimov's Laws of Robotics:

    0. A robot may not harm humanity, or, by inaction, allow humanity to come to harm.

    1. A robot may not injure a human being or, through inaction, allow a human being to come to harm, except where doing so would conflict with the Zeroth law.

    2. A robot must obey the orders given to it by human beings, except where such orders would conflict with the Zeroth or First Law.

    3. A robot must protect its own existence as long as such protection does not conflict with the Zeroth, First or Second Law.

    1. KJK

      For a military attack AI, I think that Asimov's Laws would need to be modified to include "kill the bad guys". I would think they would need a whole lot of instruction in order to identify who the "bad guys" are (and who are the good guys).

    2. Citizen Lehew

      Developing "laws" that contain a superintelligence is about as adorable as a squirrel chirping at you to protect its acorns.

      1. ScentOfViolets

        You know what? If the shape of my life was up to the likes of R. Daneel Olivaw and R. Giskard, I'd be totally down with that. I don't care about any equivocation about superhuman vs. weakly superhuman intelligence, I just know that, stiff as they are, they are totally righteous dudes.

  6. VaLiberal

    My guess: "Ooops! We let it slip that AI is dangerous and we shouldn't have done that because you, the people, might pressure Congress to take our new toy away."

  7. illilillili

    Seems obvious to train the AI so that it gets points when it does what the operator wants it to do.

Comments are closed.