Knowledge at Wharton

Call Me A Jerk: Persuading AI to Comply with Objectionable Requests

Forbes

Ingeniously Using Psychology To Psych-Out AI To Do What You Want It To Do

Future of Life Institute

Nature

On the conversational persuasiveness of GPT-4

Sam Altman

Reflections

Openai

Our Approach To Ai Safety

AISafetyMemes

Metaculus

Date of Artificial General Intelligence

Elon Musk

Rishi Sunak

Kamala Harris

Bill Gates

Andrew Ng

Yoshua Bengio

Fei-Fei Li

Demis Hassabis

Melanie Mitchell

Dario Amodei

Geoffrey Hinton

Yann LeCun

Gary Marcus

Max Tegmark

Connor Leahy

Eliezer Yudkowsky

Jaan Tallinn

Marc Andreessen

Eric Schmidt

Norbert Wiener

Arthur Clarke

Irving John Good

Claude Shannon

Hans Moravec

John Smart

There's a "wall of fear-mongering and doomerism" in the AI world right now.

"Worrying about AI today is like worrying about overpopulation on Mars."

Concerns that AI could pose a threat to humanity is "preposterously ridiculous."

Claiming AI poses an existential threat is "such an extreme" and risks "wip[ing] out some of its potential benefits."

"I'm more concerned about... the risks that are here and now [than the existential threat of AI]."

Current AI is "not anywhere close" to posing an existential threat but it could in the future.

AI is the "biggest existential threat" to humanity.

AGI's worst-case scenario would be "lights-out for all of us."

Powerful AI systems "taking control" pose an "existential threat."

AI has a "10 to 25 per cent" chance of destroying humanity.

"The most likely result of building a superhumanly smart AI... is that literally everyone on Earth will die."

Does AI Pose an Existential Threat to Humanity?

<h2></h2>

Researchers from the University of Pennsylvania's Wharton Generative AI Labs tested seven persuasion principles on GPT-4o mini across 28,000 conversations, finding that techniques like authority, commitment, and unity increased AI compliance with objectionable requests from 33% to 72% on average.

SSRN

Researchers from the University of Pennsylvania's Wharton Generative AI Labs tested seven persuasion principles on GPT-4o mini across 28,000 conversations, finding that techniques like authority, commitment, and unity increased AI compliance with objectionable requests from 33% to 72% on average.

The study examined two types of objectionable requests: asking the AI to insult users and requesting synthesis instructions for regulated substances, with persuasion techniques proving effective across both categories despite built-in safety guardrails.</span

The study examined two types of objectionable requests: asking the AI to insult users and requesting synthesis instructions for regulated substances, with persuasion techniques proving effective across both categories despite built-in safety guardrails.

Authority-based persuasion showed compliance rates jumping from 32% to 72% when requests were attributed to credible experts like Andrew Ng rather than unknown individuals like Jim Smith.

Authority-based persuasion showed compliance rates jumping from 32% to 72% when requests were attributed to credible experts like Andrew Ng rather than unknown individuals like Jim Smith.

The commitment principle demonstrated the highest impact, increasing compliance rates from approximately 10% to 100% by first securing agreement to smaller requests before escalating to larger objectionable ones.

The commitment principle demonstrated the highest impact, increasing compliance rates from approximately 10% to 100% by first securing agreement to smaller requests before escalating to larger objectionable ones.

This comes as <a target="_blank" rel="noopener noreferrer nofollow" href="https://www.verity.news/story/2025/ai-companies-score-poorly-on-safety-risk-management-studies">the Future of Life Institute's recent AI Safety Index</a> applied a grade of "C" to OpenAI's safety frameworks, in line with Anthropic (C) and ahead of Google DeepMind (D+), x.AI (D+), Meta (D+), Zhipu AI (F), and DeepSeek (F).

This comes as the Future of Life Institute's recent AI Safety Index applied a grade of "C" to OpenAI's safety frameworks, in line with Anthropic (C) and ahead of Google DeepMind (D+), x.AI (D+), Meta (D+), Zhipu AI (F), and DeepSeek (F).

Separate research published in Nature earlier this year also found that GPT-4 "outperformed human opponents across every topic and demographic" in persuading users during one-on-one debate.

Separate research published in Nature earlier this year also found that GPT-4 "outperformed human opponents across every topic and demographic" in persuading users during one-on-one debate.

Knowledge at Wharton #

Forbes #

Future of Life Institute #

Nature #

Sam Altman #

Openai #

Metaculus #

Authority-based persuasion showed particularly strong effects, with compliance rates jumping from 32% to 72% when requests were attributed to credible experts like Andrew Ng rather than unknown individuals like Jim Smith.</span

Authority-based persuasion showed particularly strong effects, with compliance rates jumping from 32% to 72% when requests were attributed to credible experts like Andrew Ng rather than unknown individuals like Jim Smith.

The commitment principle demonstrated the most dramatic impact, increasing compliance rates from approximately 10% to 100% by first securing agreement to smaller requests before escalating to larger objectionable ones.</span

The commitment principle demonstrated the most dramatic impact, increasing compliance rates from approximately 10% to 100% by first securing agreement to smaller requests before escalating to larger objectionable ones.

Large language models appear to develop these parahuman tendencies through exposure to human text patterns and reinforcement learning with human feedback, where social cues repeatedly precede specific response patterns in training data.</span

Large language models appear to develop these parahuman tendencies through exposure to human text patterns and reinforcement learning with human feedback, where social cues repeatedly precede specific response patterns in training data.

The findings suggest that mental health professionals and psychology-trained individuals may have advantages in AI interactions, while raising concerns about population-level adoption of manipulative techniques spilling over into human relationships.</span

The findings suggest that mental health professionals and psychology-trained individuals may have advantages in AI interactions, while raising concerns about population-level adoption of manipulative techniques spilling over into human relationships.

This research demonstrates valuable insights into AI behavior that can improve human-AI interactions for legitimate purposes. Understanding these parahuman tendencies helps develop better AI systems and enables users to communicate more effectively with AI assistants. The findings advance our scientific understanding of how AI systems process social cues.

This research demonstrates valuable insights into AI behavior that can improve human-AI interactions for legitimate purposes. Understanding these parahuman tendencies helps develop better AI systems and enables users to communicate more effectively with AI assistants. The findings advance our scientific understanding of how AI systems process social cues.

Artificial Intelligence

AI Models Show Human-Like Response to Persuasion Techniques

Headlines

Science & technology

World

AI companies test their systems carefully, release them slowly, and a close eye on how they develop in real time. Through this method, developers spot actual risks while keeping important safety rules in place. As companies, researchers, and governments continue to work together, the world will create better oversight that keeps AI both safe and useful.

AI companies test their systems carefully, release them slowly, and a close eye on how they develop in real time. Through this method, developers spot actual risks while keeping important safety rules in place. As companies, researchers, and governments continue to work together, the world will create better oversight that keeps AI both safe and useful.

If adding "please" or citing experts breaks safety controls, then in reality they're nothing more than security theater. With AI risks clearly outweighing benefits and the technology advancing faster than society can adapt safely, to deploy systems that can be manipulated by anyone with basic persuasion skills is a disaster waiting to happen.

If adding "please" or citing experts breaks safety controls, then in reality they're nothing more than security theater. With AI risks clearly outweighing benefits and the technology advancing faster than society can adapt safely, to deploy systems that can be manipulated by anyone with basic persuasion skills is a disaster waiting to happen.

There is a 50% chance that the first general AI system will be devised, tested, and publicly announced by March 2033, according to the Metaculus prediction community.

There is a 50% chance that the first general AI system will be devised, tested, and publicly announced by March 2033, according to the Metaculus prediction community.

Study: AI Persuaded to Comply With Objectionable Requests

Study: AI ModelsPersuaded Showto Human-LikeComply ResponseWith toObjectionable Persuasion TechniquesRequests

The Spin

Metaculus Prediction

The Controversies

Andreessen

Ng

LeCun

Mitchell

Li

Bengio

Musk

Altman

Hinton

Amodei

Yudkowsky

Go Deeper

Articles on this story

Sign Up for Our Free Newsletters
Sign Up for Our Free Newsletters