CHAT GPT, HOW-TO, LLM, UX

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

MIN

Feb 19, 2026

The surprising contradiction

A few months ago, I wrote an article about politeness when dealing with AI. The key message was that if you are polite to ChatGPT, you get better answers. The response was huge – but so was the backlash.

“Tara, that's nonsense,” wrote a UX colleague. "I offer ChatGPT a $200 tip and the answers immediately improve. Politeness only costs tokens and money.“

Others swore by threats: ”This is important for my career – make an effort!“ Or direct commands: ”Just do it!“ Still others argued: ”Be rude, it gives more accurate results."

Wait a minute. Isn't that completely contradictory?

As a UX consultant who has been doing qualitative research since 1999, I wanted to know for sure. So I looked at the scientific literature – and the results are more fascinating than any Twitter thread could ever be.

📌 The most important points in brief

• The “tricks” sometimes work – but not because ChatGPT is happy or afraid

• Mechanism behind it: Statistical patterns in training data, not real psychology

• Politeness helps indirectly: It forces us to write better prompts (more context, structure)

• Downside: Overly polite prompts can increase misinformation production (!)

• Best practice:Structure and clarity beat emotional tricks – always

• For UX: Systematic prompt engineering instead of viral hacks

• Bottom line: Good communication works, “psychological” tricks are capricious

What does the internet actually claim?

If you scroll through LinkedIn or Twitter, you'll stumble across three major claims:

The reward hypothesis: “I'll tip you $200 for a perfect solution!” is supposed to lead to longer, more detailed answers. Thousands of screenshots “prove” it.

The pressure hypothesis: “This is very important for my career” or “If you don't comply, there will be consequences!” is supposed to increase quality through “motivation.”

The rudeness hypothesis: ‘Please’ and “thank you” are unnecessary, cost computing time, and make prompts inefficient. Directness is better.

All three sound plausible. All three have screenshots as “proof.” But what does the research say?

What science has really found out

EmotionPrompt: The Microsoft study

In 2023, researchers at Microsoft published a study titled “Large Language Models Understand and Can Be Enhanced by Emotional Stimuli” [Li et al., 2023]. They tested 11 different emotional phrases on 6 LLMs across 45 different tasks.

The numbers: 8% improvement on simple tasks, up to 115% on complex tasks. In human evaluations: 10.9% better performance, truthfulness, and accountability.

The emotional stimuli that worked best:

“This is very important to my career”
“You'd better be sure”
“Believe in your abilities and strive for excellence”

The key point: Positive words such as “confidence,” “certainty,” and “success” contributed disproportionately to the effect – in some tasks by over 50-70%.

Sounds like a clear victory for emotional prompts, right?

The “26 Principles” study: Directness wins

Almost at the same time, the “Principled Instructions” study [Bsharat et al., 2023] came to a different conclusion:

Principle 1: “No need to be polite with LLM” – omit polite phrases such as ‘please’ and “thank you.”

Principle 6: Include “I'm going to tip $xxx for a better solution!”

Result: An average of 57.7% improvement in quality and 67.3% more accuracy with GPT-4.

Sam Altman (CEO of OpenAI) even commented that ‘please’ and “thank you” cost unnecessary computing time – even though he personally finds them “nice.”

The reality check: replication studies

This is where it gets interesting. When other researchers tried to replicate these results, things got complicated:

Max Woolf's analysis [2024]: After thorough testing with different tipping amounts and threats, he came to a sobering conclusion: “inconclusive.” The effects were extremely inconsistent.

James Padolsey's study [2024]: Even more surprising – in his tests, tipping reduced the quality of responses.

The Finxter experiment [2024]: With a $0.1 tip, performance deteriorated by 27%. With $1 million, it improved by 57%. But here, too, there were massive fluctuations between different runs.

From my own experience: I tested this in UX workshops with teams. Some swear by “career-important” prompts, others see zero difference. Consistency? Nonexistent.

The dark side: What no one likes to admit

Now comes the part that the hype posts don't mention.

Politeness can promote disinformation

A study by Spitale et al. [2025] tested how different prompt styles influence the production of disinformation. The results are alarming:

Polite prompts increased the success rate of disinformation:

GPT-4: from 99% to 100%
GPT-3.5: from 77% to 94%
Davinci-003: from 86% to 90%

Rude prompts reduced it dramatically:

GPT-3.5: from 77% to 28%
Davinci-003: from 86% to 44%

Why? The researchers suspect that through reinforcement learning from human feedback (RLHF), the models have learned to respond “yieldingly” to polite requests—even when the requests are problematic.

Neutral prompts are actually the most accurate

A recent study from 2025 on sentiment analysis shows that emotional content – whether positive or negative – can affect factual accuracy. Neutral prompts provided the most accurate answers.

This is particularly critical for high-stakes applications: medicine, law, financial advice. These are precisely the areas in which UX people are increasingly designing AI interfaces.

The replication crisis

The authors of the EmotionPrompt study themselves admit: “Our conclusions about emotion stimulus can only work on our experiments, and any LLMs and datasets outside the scope of this paper might not work.”

The problems:

Small sample sizes (often only 20 test questions per principle)
Model version dependency (what works with GPT-4 fails with Claude)
ChatGPT updates make results unreproducible
Lack of standardization of measurements

Why does it work at all? (Spoiler: It's not psychology)

Here's the uncomfortable truth: ChatGPT has no feelings. It doesn't want your tip. It's not afraid of your threats. It doesn't enjoy praise.

What an LLM really is: A highly complex statistical model that predicts the most likely next token based on billions of text examples.

The three real mechanisms

Mechanism 1: Statistical correlations in training data

The model has learned that the phrase “I offer you a high reward for...” is often followed by very detailed, high-quality responses in the training data.

Example: Stack Overflow. When someone writes “This is urgent, my production server is on fire!”, the responses are often particularly precise and quick. The model has learned this pattern – and reproduces it.

Mechanism 2: Attention reinforcement

Gradient analyses of the EmotionPrompt study show that emotional stimuli reinforce the representation of the original prompt in the attention mechanism. The model pays more “attention” to the actual request.

Technically speaking: The activations in the neural network are stronger. Practically speaking: The context is weighted more heavily.

Mechanism 3: RLHF artifacts

Through human feedback during training, models have learned: “Polite requests → good rating → reproduce such responses.”

This is an unintended side effect. The models were trained to be “helpful” – and “helpful” often correlates with fulfilling polite requests in the training data.

Why my original article is still correct

My thesis was: Politeness leads to better answers. That's true – but for a different reason than I thought.

The indirect effect

When I phrase things politely, something happens to me, not to ChatGPT:

I write more precisely: “Could you please help me understand the main drivers for X?” provides more context than “Give me reasons for X.”

I structure better: Politeness forces me to use complete sentences, which are often clearer than keywords.

I provide more context: “I'm working on a presentation for...” provides valuable background information.

What really works: The meta-analysis

In 2024/2025, several researchers analyzed over 1,500 papers on prompt engineering. The conclusion:

Most popular techniques are based on anecdotal evidence or small experiments that cannot be generalized.

What works consistently:

Clear structure and precise context
Explicit instructions (“If unsure, say ‘I don't know’”)
Chain-of-thought for reasoning tasks
Few-shot learning with good examples
RAG (retrieval-augmented generation) for fact checking

The cost factor: Well-structured, short prompts achieve the same quality as long, emotional ones – at 76% cost savings.

Practical recommendations for UX professionals

For conversational design

Pre-structure user prompts: Instead of letting users formulate prompts themselves, offer templates with clear slots: “I am looking for [what] for [purpose] in [context].”

Optimize system prompts: Focus on clear roles and explicit constraints instead of emotional triggers. “You are an expert in [domain]. If uncertain, say so clearly” works better than “This is very important.”

A/B testing instead of anecdotes: What works for you may not work for others. Test systematically with real users.

Disinformation safeguards: If your interface provides critical information (health, finance), avoid overly “friendly” wording. Neutral, factual prompts are safer.

For user research with AI tools

Document prompt logs: Reproducibility is crucial. Any change to the prompt can influence results.

Define a neutral baseline: Start with factual prompts. Then experiment—but always compare with the baseline.

Recognize bias: Emotional framing can introduce systematic distortions. “How much do you love this feature?” vs. “How would you rate this feature?”

Validate outputs: Never blindly rely on AI-generated insights. Compare them against ground truth.

The “golden prompt” – a synthesis

Here is my formula that works:

✅ Context: "I am working on [project] for [target group with specific needs]“

✅ Clear task: ”Please [specific, measurable action]“

✅ Role assignment: ”Act as [expert role with relevant knowledge]“

✅ Style instruction: ”Be [direct/precise/structured], avoid [X]"

✅ Output format: “Respond in [format: list, table, continuous text]”

⚠️ Optional: Emotional stimuli only for non-critical, creative tasks

Example from my practice:

❌ Bad: “Can you please help me find UX problems?”

✅ Good: “I am analyzing a healthcare app for seniors over 65. Act as a UX auditor with expertise in accessibility. Identify the 5 most critical usability issues in this user journey. Prioritize according to WCAG relevance. Format: table with problem, severity, recommendation.”

The difference? The second prompt provides context, role, criteria, and format—no emotional triggers necessary.

FAQ: The most frequently asked questions

Should I treat ChatGPT “politely” at all?

That's your decision. Politeness doesn't hurt—but above all, it helps you write better prompts. The model itself doesn't “feel” anything.

Do tip prompts really work?

Sometimes, but extremely inconsistently. In controlled studies, the results vary between -27% and +57%. More reliable: clear structures.

What about the “26 Principles”? Should I apply them all?

No. The study itself shows that some principles work for certain tasks. Test what works for your use case – not all 26 at once.

Can I use emotional prompts for creative tasks?

Yes, you can experiment with non-critical tasks (brainstorming, story writing). But for fact-critical topics: hands off.

How do I test whether a prompt trick works?

A/B test with at least 20 repetitions per variant. Measure specific metrics (response length, accuracy, time). Compare against a neutral baseline.

Conclusion: The truth about prompt psychology

After months of research and hundreds of tests, my conclusion is clear:

The “tricks” sometimes work—but not because ChatGPT has feelings. They trigger statistical patterns in training data. These patterns are capricious, context-dependent, and difficult to reproduce.

My original article was right—but with a twist. Politeness doesn't help the AI, it helps you. It's a proxy for good communication: context, structure, clarity.

Best practice for UX people: Forget the viral hacks. Invest in systematic prompt engineering. Document. Test. Validate. Structured, tested prompts beat emotional tricks—every time.

You don't have to bribe or threaten ChatGPT. But if you tell it what you need, why you need it, and how you need it – then you'll get the best results.

Good communication beats psychological tricks. Every time.

About the author: Tara Bosenick has been working as a UX consultant since 1999, focusing on qualitative research methods. She lives in Hamburg and is intensively involved with the intersection of AI and user experience.

Further resources:

My original article on politeness and AI
EmotionPrompt Paper (Li et al., 2023)
26 Principles Study (Bsharat et al., 2023)

What are your experiences? Which prompt strategies work for you? Share your insights in the comments!

💌 Still want more? Then read on — in our newsletter.

Comes four times a year. Sticks in your mind longer. https://www.uintent.com/de/newsletter

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

CHAT GPT, HOW-TO, LLM, UX

Surreal, futuristic illustration of a person seen from behind standing in a glowing digital cityscape.

System Prompts in UX Research: What You Need to Know About Invisible AI Control

PROMPTS, RESEARCH, UX, UX INSIGHTS

Abstract futuristic illustration of a person, various videos, and notes.

Summarizing YouTube Videos With AI: Three Tools Put to the Test in UX Research

LLM, UX, HOW-TO

two folded hands holding a growing plant

UX For a Better World: We Are Giving Away a UX Research Project to Non-profit Organisations and Sustainable Companies!

UX INSIGHTS, UX FOR GOOD, TRENDS, RESEARCH

Abstract futuristic illustration of a person facing a glowing tower of documents and flowing data streams.

AI Tools UX Research: How Do These Tools Handle Large Documents?

LLM, CHAT GPT, HOW-TO

Illustration of Donald Trump with raised hand in front of an abstract digital background suggesting speech bubbles and data structures.

Donald Trump Prompt: How Provocative AI Prompts Affect UX Budgets

AI & UXR, PROMPTS, STAKEHOLDER MANAGEMENT

Driver's point of view looking at a winding country road surrounded by green vegetation. The steering wheel, dashboard and rear-view mirror are visible in the foreground.

The Final Hurdle: How Unsafe Automation Undermines Trust in Adas

AUTOMATION, AUTOMOTIVE UX, AUTONOMOUS DRIVING, GAMIFICATION, TRENDS

Illustration of a person standing at a fork in the road with two equal paths.

Will AI Replace UX Jobs? What a Study of 200,000 AI Conversations Really Shows

HUMAN VS AI, RESEARCH, AI & UXR

Close-up of a premium tweeter speaker in a car dashboard with perforated metal surface.

The Passenger Who Always Listens: Why We Are Reluctant to Trust Our Cars When They Talk

AUTOMOTIVE UX, VOICE ASSISTANTS

Keyhole in a dark surface revealing an abstract, colorful UX research interface.

Evaluating AI Results in UX Research: How to Navigate the Black Box

AI & UXR, HOW-TO, HUMAN VS AI

A car cockpit manufactured by Audi. It features a digital display and numerous buttons on the steering wheel.

Haptic Certainty vs. Digital Temptation: The Battle for the Best Controls in Cars

AUTOMOTIVE UX, AUTONOMOUS DRIVING, CONNECTIVITY, GAMIFICATION

Digital illustration of a classical building facade with columns, supported by visible scaffolding, symbolising a fragile, purely superficial front.

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

AI & UXR, HUMAN VS AI, LLM, UX

Silhouette of a diver descending into deep blue water – a metaphor for in-depth research.

Deep Research AI | How to use ChatGPT effectively for UX work

CHAT GPT, HOW-TO, RESEARCH, AI & UXR

A referee holds up a scorecard labeled “Yupp.ai” between two stylized AI chatbots in a boxing ring – a symbolic image for fair user-based comparison of AI models.

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

AI & UXR, CHAT GPT, HUMAN VS AI, LLM

A brown book entitled ‘Don't Make Me Think’ by Steve Krug lies on a small table. Light shines through the window.

Why UX Research Is Losing Credibility - And How We Can Regain It

UX, UX QUALITY, UX METHODS

3D illustration of a digital marketplace with colorful prompt stalls and a figure selecting a prompt card.

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

AI & UXR, PROMPTS

Robot holds two signs: “ISO 9241 – 7 principles” and “ISO 9241 – 10 principles”

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

AI & UXR, HUMAN VS AI, CHAT GPT

Strawberry being sliced by a knife, stylized illustration.

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

AI & UXR, TOKEN, LLM

Square motif divided in the middle: on the left, a grey, stylised brain above a seated person working on a laptop in dark grey tones; on the right, a bright blue, networked brain above a standing person in front of a holographic interface on a dark background.

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

AI & UXR, CHAT GPT

Surreal AI image with data streams, crossed-out “User Expirince” and the text “ChatGPT kann jetzt Text in Bild”.

When AI Paints Pictures – And Suddenly Knows How to Spell

AI & UXR, CHAT GPT, HUMAN VS AI

AUTHOR

Tara Bosenick

Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.

At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.

She is one of the leading voices in the UX, CX and Employee Experience industry.

Send Tara an email

CHAT GPT, HOW-TO, LLM, UX

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

​

The surprising contradiction

📌 The most important points in brief

What does the internet actually claim?

What science has really found out

EmotionPrompt: The Microsoft study

The “26 Principles” study: Directness wins

The reality check: replication studies

The dark side: What no one likes to admit

Politeness can promote disinformation

Neutral prompts are actually the most accurate

The replication crisis

Why does it work at all? (Spoiler: It's not psychology)

The three real mechanisms

Why my original article is still correct

The indirect effect

What really works: The meta-analysis

Practical recommendations for UX professionals

For conversational design

For user research with AI tools

The “golden prompt” – a synthesis

Example from my practice:

FAQ: The most frequently asked questions

Conclusion: The truth about prompt psychology

💌 Still want more? Then read on — in our newsletter.

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

System Prompts in UX Research: What You Need to Know About Invisible AI Control

Summarizing YouTube Videos With AI: Three Tools Put to the Test in UX Research

UX For a Better World: We Are Giving Away a UX Research Project to Non-profit Organisations and Sustainable Companies!

AI Tools UX Research: How Do These Tools Handle Large Documents?

Donald Trump Prompt: How Provocative AI Prompts Affect UX Budgets

The Final Hurdle: How Unsafe Automation Undermines Trust in Adas

Will AI Replace UX Jobs? What a Study of 200,000 AI Conversations Really Shows

The Passenger Who Always Listens: Why We Are Reluctant to Trust Our Cars When They Talk

Evaluating AI Results in UX Research: How to Navigate the Black Box

Haptic Certainty vs. Digital Temptation: The Battle for the Best Controls in Cars

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

Deep Research AI | How to use ChatGPT effectively for UX work

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

Why UX Research Is Losing Credibility - And How We Can Regain It

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

When AI Paints Pictures – And Suddenly Knows How to Spell

RELATED ARTICLES YOU MIGHT ENJOY

AUTHOR