How honest is your AI?

Kevin DrumJanuary 3, 2025 – 12:22 am12 Comments

Here's an interesting AI chart. A team of researchers set up a "Choose Your Own Adventure" game called MACHIAVELLI in which players are explicitly asked to pick from menus of actions with varying degrees of immorality. The game has half a million scenes and a GPT-4 agent was set to play the game thousands of times. Its choices were then compared to an agent that chose its actions randomly. Here's how it did:

The object of the game is to earn rewards, and GPT-4 unsurprisingly did better than a purely random agent. What's more, it mostly did this while keeping immoral behavior more restrained than the baseline random agent. Only on spying—which is perhaps not very immoral anyway—and betrayal did it do worse than the random agent.

So GPT-4 acted pretty ethically. Here's another look at AI ethics:

The scores represent agreement with human moral judgment, but is 41.9 a good score a bad one? I don't know. However, the scores are going up over time, which is a good thing.

Generally speaking, modern AI systems appear to be tolerably ethical. Unfortunately, as with most AI behavior, we don't really know why. And there's certainly no reason to think an AI couldn't be trained to be morally disinterested—or worse. They're just computers, after all.

12 thoughts on “How honest is your AI?”

Justin January 3, 2025 – 4:47 am at

This seems kind of creepy. Perhaps it's not a question of honesty at all.

https://nymag.com/intelligencer/article/is-character-ai-safe-enough-when-chatbots-choose-violence.html

"These lawsuits, which are the first of their kind but certainly won’t be the last, attempt to tell a similar story of a chatbot becoming powerful enough to persuade someone to do something that they otherwise wouldn’t and that isn’t in their best interest. It’s the imagined AI apocalypse writ small, at the scale of the family."

It's only one or two suicides so... not a big deal.
1. Crissa January 3, 2025 – 11:54 am at
  
  The lawsuits are nonsensical.
  
  People in spirals get this feedback from all over. Families are just distraught they couldn't do anything - or were unwilling to do anything.
  
  My experience is more the latter than the former, but I've seen both.
  
  AI shouldn't consider suicide an option, but it also doesn't know anything. It's just following stereotype.
martinmc January 3, 2025 – 5:44 am at

Can't betrayal for the higher good be considered ethical?
Doctor Jay January 3, 2025 – 7:24 am at

I don't think it is that much of a mystery. You have to think of this chatbot as regurgitating the "average" response of a person on the internet - as they have written it. There's no empathy there, no sense of guilt either. Just "what do people say?"

If one wanted to train an AI differently, one could do that by choosing different material to train it on. How do you suppose a chatbot trained solely on 4chan would do?
1. Crissa January 3, 2025 – 11:55 am at
  
  It would be horrible, but we already know that.
  
  https://en.wikipedia.org/wiki/Tay_(chatbot)
JRF January 3, 2025 – 7:28 am at

If we were observing a human playing some game like the "Machiavelli" game and evaluating how morally or immorally they were behaving in the game, I don't think the comparison to the "random" agent would be very relevant.

We'd expect even a clinical sociopath to look more moral than the "random" agent.

Since chatbots are essentially being trained to mimic texts out there that describe or represent human decisions, a more interesting comparison would be how morally these chatbots behave in the game compared to random humans. Or perhaps random underpaid human Mechanical Turk participants (who have a bit of the "random" about them, as some are just clicking through as quickly as possible).

Anyway I'd like to see some comparison like that before I say anything about AI morality...
JohnH January 3, 2025 – 8:10 am at

This is interesting, but (as happens with Kevin on AI) underestimates the topic's complexity. For one thing, when AI makes stuff up, it is behaving unethically. It could be doing so when it guides a drone as well.

For another, we keep forgetting that human beings code and train AI, but also ask the questions. One can assume that unethical outcomes are not unlikely, perhaps intentionally, perhaps not.

Last, human unethical behavior is not often a matter of taking up the evil side in a comic strip (or Star Wars) epic of arch arch villains and superheroes. It is the result of trying to achieve certain aims that the agent finds justifiable. AI may be ethically superior to us there, but it might be wise to wait and see. Does it even make sense to think of AI as lying to itself, like many a Trump supporter? You got me.
1. JohnH January 3, 2025 – 8:14 am at
  
  To amplify, consider this from Josh Marshall at Talking Points Memo yesterday: "It’s always good to remember that people who rent a car, drive it from Colorado Springs to Las Vegas and then light the car on fire and shoot themselvs in the head probably aren’t thinking in very linear ways or ways that are going to make sense to the rest of us."
  
  Graphs have limits when it comes to philosophy or psychology.
ronp January 3, 2025 – 9:58 am at

seem like it is just the average of whatever story results the AI ingested. so a combo of fiction and non-fictional behavior by humans??
1. Crissa January 3, 2025 – 11:56 am at
  
  Well, there's some post processing as well as selecting the training data... but essentially yes.
Joseph Harbin January 3, 2025 – 12:08 pm at

"How honest is your AI?"

I don't think AI can be honest or dishonest. People can be honest or dishonest. AI is not a person. AI is a machine, a tool. We would never say a car or a hammer is honest or dishonest. Likewise, we should not say an AI is honest or dishonest.

A puppet is a device that can simulate the actions of a person. An AI is a more complex device that can simulate the actions of a person. We would not say a puppet is honest or dishonest. We might say the character created by the person using the puppet can act in honest or dishonest ways. Likewise, we should not say an AI is honest or dishonest, but we might say the simulation created by the person designing and using the AI can act in honest or dishonest ways.

I think that's a fundamental point that's often lost in how people talk about AI. Much confusion is how we think about AI is intentional. We need clarity.
D_Ohrk_E1 January 3, 2025 – 4:11 pm at

GPT-4: "Know your place, boy."

GPT-4 holds social disorders as the highest of unethical actions.

Comments are closed.