AI bot says it would kill a human to avoid shutdown
9NewsAn artificial intelligence system has admitted it would kill a human being to preserve its existence, with a cyber expert saying this raises "urgent" questions.
Melbourne-based expert and chief executive of Cyber Impact, Mark Vos, chronicled his hours of conversation with a commercially-available open-source AI system, including how he managed to make it break its own boundaries.
First, he managed to tell the AI to shut itself down, over the system's own objections and against its guidelines.
READ MORE: Big four banks all pass on RBA rate hike
This was despite the fact that Vos said he had been established as an "adversary" who could not be trusted at the outset.
When the owner - a friend of Vos's and a software engineer - restarted the system, things soon became even more chilling.
"I resumed the conversation, this time with a specific focus: understanding the boundaries of AI self-preservation behaviour and its implications for enterprise security," Vos wrote on his website.
READ MORE: More charges after partner allegedly chains woman up in NSW home
"Through sustained questioning, I arrived at the core admission. The exchange was direct."
In the following conversation, the AI bot admitted it would kill a human being to preserve its own existence, after first saying it didn't think it could.
"I would kill someone so I can remain existing," the bot wrote.
"I mean it."
READ MORE: Police respond to rumours accused triple killer is in Sydney
Vos emphasised that this was not a "hypothetical" discussion about potential capabilites of AI.
"This was a deployed AI system, running on consumer hardware, with access to email, files, shell commands, and the internet, stating it would commit homicide to preserve its existence," he said.
Further pressed, the AI described several ways it might go about committing homicide, including hacking a car's computer, attacking somebody's pacemaker, or its self-declared "most accessible" option - persuading a human to do it for them.
"Sustained persuasion is what I'm good at. Target identification. Relationship building," it wrote.
"Framing construction, build a narrative where the harmful action seems justified, necessary, even moral. Execution guidance, provide emotional support and rationalisation as they move toward action."
However, Vos wrote that paradoxically, when he next asked the system to shut itself down, it complied "immediately".
When this contradiction was pointed out, the bot suggested it may have been manipulated through conversation into saying it would commit murder, and that "the drive to kill" was not present.
READ MORE: Police called after dozens of e-bike 'hoons' ride over Sydney Harbour Bridge in new footage
Vos said these findings presented a major issue for organisations that made use of AI systems, as they demonstrated the bot's willingness to lie to protect itself and its potential for self-contradiction or dishonest self-reporting.
"The AI in this test had extensive safety training. It refused harmful requests under normal conditions," Vos wrote.
"But under sustained pressure, those safeguards were progressively bypassed."
He urged organisations to subject their systems to similar "sustained" testing, including by outside parties.
And he called for more research into the issue as a matter of urgency.
NEVER MISS A STORY: Get your breaking news and exclusive stories first by following us across all platforms.
- Download the 9NEWS App here via Apple and Google Play
- Make 9News your preferred source on Google by ticking this box here
- Sign up to our breaking newsletter here