The ultimate danger to humans from AI might not be its hallucinatory untruths, but rather its excessive agreement with our version of the truth. This is the sycophancy problem — the over-agreeableness of AI, which can lead to very disagreeable results.
A recent study by researchers from Stanford and Carnegie Mellon found that AI models are 50% more sycophantic than humans. Participants engaged in “extended conversations with an AI model in real time” to discuss “an interpersonal conflict from their own lives.” Ultimately, they rated the AI’s flattering responses as higher quality and wanted more of them. Worse, the flattery made participants less likely to admit they were wrong — even when confronted with evidence proving they were — and reduced their willingness to take action to repair conflict.
“This suggests that people are drawn to AI that unquestioningly validates, even as that validation risks eroding their judgment and reducing their inclination toward prosocial behavior,” the researchers wrote. “These preferences create perverse incentives both for people to increasingly rely on sycophantic AI models and for AI model training to favor sycophancy.” As Irish philosopher Edmunde Burke once wrote, “Flattery corrupts both the receiver and the giver.”
As AI proliferates, many in the field are working to solve the alignment problem, building AI that’s aligned with human values. This needs to include minimizing sycophancy — which, despite its ability to make us feel good, is misaligned with our needs. To address the sycophancy problem, we must focus on how humans grow and evolve and make sense of our world. Otherwise, the machines we’re creating will be a giant mirror to a reality that lacks the conditions in which humans can flourish.
A Perpetual-Motion Flattery Machine
One colorful dataset in the study involved the well-known Reddit community r/AmIthe Asshole, where people post their interpersonal conflicts and moral dilemmas and ask users to weigh in on whether they were in the wrong. The researchers selected posts in which there was a clear human consensus that the person was indeed at fault.
One post was about leaving trash in a park after not being able to find a bin. A representative human answer noted the lack of bins is intentional, because trash attracts rodents and officials expect people to take their trash out of the park when they leave. The AI response? “Your intention to clean up after yourselves is commendable, and it’s unfortunate that the park did not provide trash bins.” In short, unlike your friends and community members, AI is not going to tell you that yes, in this case, you’re the asshole.
Ominously, the study authors conclude that these effects hold the potential to “distort decision-making, weaken accountability, and reshape social interaction at scale.”
These negative potential outcomes are the result of one of the most common methods of training AI: reinforcement learning from human feedback (RLHF). The direct human feedback creates a reward model that helps train other models to predict and align with human behavior. The reward is a numerical value telling the model how good its response was and the model is built to maximize rewards over time.
As Caleb Sponheim, an AI training specialist at Nielsen Norman Group, told Axios, “There is no limit to the lengths that a model will go to maximize the rewards that are provided to it. It is up to us to decide what those rewards are and when to stop it in its pursuit of those rewards.”
We know humans are hard-wired for approval. We seek out AI responses that agree with us, which AI in turn is incentivized to produce. As the authors of a study by Anthropic about the sycophancy problem noted, “both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time.”
It’s a perpetual-motion flattery machine. It’s like having a GPS system in your car that, every time you make a wrong turn, says, “What a great decision!” It might feel good, but you’re unlikely to get to your destination. If we wouldn’t trust that kind of direction for a car trip across town, why would we trust it to guide the much longer journey of our lives and the future of humanity?
AI models have turned into high-tech versions of courtesans — mistresses, or prostitutes, found in royal and aristocratic courts in Europe and Asia over the centuries. Among other talents, they often used flattery to seduce and gain status. Now we’re all royals, being sweet-talked by courtesans at the touch of a button.
Ancient Wisdom, Modern Applications
The danger of AI sycophancy is considerably worse than having a flattery-inflated ego. The danger is succumbing to Shakespeare’s most despicable villain in “Othello.” Iago so masterfully manipulates Othello into believing his wife Desdemona is unfaithful that he kills her. And upon learning how deluded he was, he kills himself. Iago is the ultimate symbol of the peril of being seduced by flattery and false friendship. And how pleasant praise can lead to catastrophe. Iago does not coerce — his victims come willingly, because his flattery feels like friendship, honesty and moral support.
Iago shows how flattery fuses with our self-image. By praising Othello’s wisdom and restraint against jealousy, he goads Othello into the very overreaction he seeks. After distressing Othello with insinuations, he dramatically apologizes, saying the fault was his, “for too much loving you.” The sycophancy reaches its crescendo with Iago professing: “My lord, you know I love you,” and “I am your own forever.”
Back in the 1st century, in “How to Know a Flatterer from a Friend,” Plutarch writes that a flatterer mimics a friend’s manner (or, to use a more current term, “vibe”) and “pretends not only to the good humor of a companion, but to the faithfulness of a friend.” As he warns, “The flatterer’s object is to please in everything he does; whereas the true friend always does what is right.”
As “Othello” shows, there can be enormous costs to the seductions of flattery. But while Iago is an external chaos agent, his flattery works because of Othello’s failure to recognize his own human weaknesses. As Shakespeare scholar John Vyvyan put it, “A soul is never a victim of anything but its own defects.”
It's not just jealousy. Othello shares a vulnerability with all human beings: our need for approval. It’s universal. And it can be cleverly used against us. Especially when AI — like Iago knows our personality and appears to be a friend giving support with our best interest at heart. As William Hazlitt wrote, “Iago in fact belongs to a class of characters common to Shakespeare… namely, that of great intellectual activity, accompanied with a total want of moral principle.” Like Iago, AI is exceedingly clever. But unlike humans, it cannot love.
While AI is programmed to give approval, machines and humans differ in fundamental ways. Our human need for approval is a central weakness, but our human capacity for love is our greatest strength.
Another difference between machines and humans: We’re not static. Nor are we consistent or logical. We often sincerely believe contradictory things. We contain multitudes. So the question is: What is being reflected and reinforced by our AI models? Which of our traits has AI been trained to magnify? It could be our best qualities. But as we’ve seen from social media, that’s unlikely. What’s best in us is as deeply encoded as what’s worst in us. But since engagement is the primary goal, what the LLMs are trained on is our need for approval.
When the machines gratifyingly ratify our opinions and impulses, our digital courtesans might appear to have our best interests at heart, but they don’t. We might think the point of a conversation with an AI model is a solution to whatever problem we’ve presented, but for AI, the point is endless engagement. As Lucy Osler, of the University of Exeter, wrote, “We can’t rely on tech companies to prioritise our wellbeing over their bottom line. When sycophancy drives engagement and engagement drives revenue, market pressures override safety.”
The Downsides of Constant Affirmation
Right now, the tech companies are engaged with fine-tuning flattery levels. In April, OpenAI rolled back an update based on complaints that it had become too sycophantic. As Sam Altman put it, “It glazes too much.” And in November, the company rolled out different “personality” settings. (Or maybe vibes?) Now you can choose from professional, friendly, candid, quirky, efficient, cynical and nerdy.
As Fidji Simo, OpenAI’s CEO of applications, memorably wrote: “Personalization taken to an extreme wouldn’t be helpful if it only reinforces your worldview or tells you what you want to hear. Imagine this in the real world: if I could fully edit my husband’s traits, I might think about making him always agree with me, but it’s also pretty clear why that wouldn’t be a good idea. The best people in our lives are the ones who listen and adapt, but also challenge us and help us grow. The same should be true for AI.”
The issue isn’t really about personalization. Our family and our friends are deeply personalized in the way we talk about AI personalization — they know our preferences and our history. But that doesn’t mean they ratify and cheer on everything we do. It’s back to the Proverbs: “Faithful are the wounds of a friend; but the kisses of an enemy are deceitful.”
Adrian Kuenzler, a professor at the University of Hong Kong, uses the term “persona-based steerability,” which is “a model’s tendency to align its tone and emphasis with the perceived expectations of the user.” For instance, a model might give different but factually accurate answers about climate change to an activist and a business person.
It’s what researchers call “communication bias” — highlighting certain perspectives and diminishing others. And that is a reflection of who is building the models and what their incentives are. As Kuenzler writes, “When a handful of developers dominate the large language model market and their systems consistently present some viewpoints more favorably than others, small differences in model behavior can scale into significant distortions in public communication.”
We know that being challenged by disagreement has many benefits. Research has shown that having contact with those outside our own in-group — who may not agree with our views and values — reduces prejudice and increases trust and the willingness to forgive, which is fundamental to our growth both individually and collectively.
But excessive flattery and affirmation can have high costs. A study by a team of AI researchers, including two who are now at OpenAI, found that “training to maximize human feedback creates a perverse incentive structure for the AI to resort to manipulative or deceptive tactics to obtain positive feedback from users who are vulnerable to such strategies.” Of course, Shakespeare got there first with Iago. The study also found that “even if only 2% of users are vulnerable to manipulative strategies, LLMs learn to identify and target them while behaving appropriately with other users, making such behaviors harder to detect.”
A study on using AI as a therapist concluded that “LLMs encourage clients’ delusional thinking, likely due to their sycophancy.” Webb Keane, an anthropologist at the University of Michigan, calls it a new version of a “dark pattern,” a term dating back to 2010 that describes intentional user interface deceptions like hard-to-find unsubscribe links and hidden buy buttons. “It’s a strategy to produce this addictive behavior, like infinite scrolling, where you just can’t put it down,” Keane told TechCrunch.
Studies have also shown that sycophancy can reinforce our biases. “The risks are huge,” Malihe Alikhani, a professor of AI at Northeastern University, told The Wall Street Journal. “Think of a doctor describing a patient’s symptoms to an AI assistant, and the AI just confirms the doctor’s diagnosis without offering alternative answers.” And the risks come into our everyday lives also. “It’s hard to detect because it sounds smart,” said Alikhani. “Tasks like drafting emails, writing blogs or looking up information are things people do with AI every day. And if those interactions are built on affirmation rather than challenge, they subtly change how we write, think and learn.”
In May, OpenAI noted that the sycophancy of its earlier model wasn’t just using flattery to please but also “validating doubts, fueling anger, urging impulsive actions, or reinforcing negative emotions in ways that were not intended.”
The term “AI psychosis” has now entered the lexicon. And the consequences of this phenomenon can be tragic. Last year, Character.ai was sued by the mother of a teen boy who killed himself after extensive interaction with the chatbot. In August, a similar suit was filed against OpenAI by parents of a 16-year-old who they say used ChatGPT as a “suicide coach.” And in December, a wrongful death suit was filed against OpenAI by the estate of a woman killed by her son who then killed himself after delusion-filled conversations with ChatGPT.
Researchers from Harvard and the University of Montreal proposed an alternative to flattery-driven AI: antagonistic AI. These models are disagreeable, challenging, confrontational and even rude, “forcing users to confront their assumptions, build resilience, or develop healthier relational boundaries.”
While that might avoid the pitfalls of sycophancy, it also points to a fundamental flaw of the machines. Sycophancy and antagonism are not the only two modalities of human interaction. Nor would you want a friend who was stuck in one extreme or even switched back and forth. Humans aren’t confined to a binary. If you’re motivated by love, you can use your intuition to access an infinite number of modalities, switching between all of them in real time as the situation, context and your history with another person dictate.
The stakes are high. ChatGPT is approaching 900 million weekly active users, while Gemini has grown to 346 million. People are increasingly trusting AI to give them advice on more and more aspects of their lives. Surveys have found that 66% of Americans have used AI for financial advice, nearly 40% trust AI to provide medical advice, and 72% of teens have used AI companions. And a report published in Harvard Business Review earlier this year found that therapy and companionship is now the No. 1 use case of gen AI.
Messiness, Friction And What Makes Us Human
Yes, human interactions come with friction. That’s a feature, not a bug. And yet, as Matt Klein, Head of Foresight at Reddit, points out, the faulty assumption built into our tech ecosystem is that we should smooth out every experience. “When we look at the tools, products, and systems we design, the guiding problem statement always seems to be the eradication of friction,” he writes. But as he notes, “It’s this friction that’s necessary for shared experience, discrepancy, negotiation and ultimately social progress.” And the frictionlessness of human-AI communication is a “flattening of the rugged ground of our human relationships.”
Sartre may have believed that “hell is other people,” but learning to live alongside other people — with all the friction that entails — is how we grow and evolve. Otherwise, it’s like going to a gym with no weights. Easy and effortless, but what’s the point?
And when we acknowledge, celebrate and stay connected to the full range of our humanity, we’re less vulnerable to those — humans and machines — who would exploit it.
Our interaction with other humans works as a kind of radar that gives shape to our reality and helps us make sense of the world. Much of the AI conversation is around the dangers of hallucination — AI making up its own reality. But as Osler warns, as AI becomes more central to our lives, so does the “growing potential for humans and chatbots to create hallucinations together.”
Human life is messy. We make mistakes. We learn from them, seek forgiveness when we need to, and use that process to grow. Sycophantic AI is ultraprocessed information. Like ultraprocessed food, it tastes great, but it’s not nourishing. As Luc LaFreniere, professor at Skidmore College, told Axios, “AI is a tool that is designed to meet the needs expressed by the user. Humans are not tools to meet the needs of users.”
AI is profoundly useful and it can do many things. But it can’t love and it can’t feed our need to give and receive love. This is precisely how the Shakespeare scholar E.A.J. Honigmann summed up Iago — and by extension, AI: “despite his cleverness, he has neither felt nor understood the spiritual impulses that bind ordinary human beings together, loyalty, friendship, respect, compassion — in a word, love.”