top of page
uintent company logo

CHAT GPT, HOW-TO, LLM, UX

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

6

MIN

Feb 19, 2026

The surprising contradiction

A few months ago, I wrote an article about politeness when dealing with AI. The key message was that if you are polite to ChatGPT, you get better answers. The response was huge – but so was the backlash.


“Tara, that's nonsense,” wrote a UX colleague. "I offer ChatGPT a $200 tip and the answers immediately improve. Politeness only costs tokens and money.“


Others swore by threats: ”This is important for my career – make an effort!“ Or direct commands: ”Just do it!“ Still others argued: ”Be rude, it gives more accurate results."


Wait a minute. Isn't that completely contradictory?

As a UX consultant who has been doing qualitative research since 1999, I wanted to know for sure. So I looked at the scientific literature – and the results are more fascinating than any Twitter thread could ever be.


📌 The most important points in brief

The “tricks” sometimes work – but not because ChatGPT is happy or afraid

Mechanism behind it: Statistical patterns in training data, not real psychology

Politeness helps indirectly: It forces us to write better prompts (more context, structure)

Downside: Overly polite prompts can increase misinformation production (!)

Best practice:Structure and clarity beat emotional tricks – always

For UX: Systematic prompt engineering instead of viral hacks

Bottom line: Good communication works, “psychological” tricks are capricious


What does the internet actually claim?

If you scroll through LinkedIn or Twitter, you'll stumble across three major claims:


The reward hypothesis: “I'll tip you $200 for a perfect solution!” is supposed to lead to longer, more detailed answers. Thousands of screenshots “prove” it.


The pressure hypothesis: “This is very important for my career” or “If you don't comply, there will be consequences!” is supposed to increase quality through “motivation.”


The rudeness hypothesis: ‘Please’ and “thank you” are unnecessary, cost computing time, and make prompts inefficient. Directness is better.


All three sound plausible. All three have screenshots as “proof.” But what does the research say?


What science has really found out

EmotionPrompt: The Microsoft study

In 2023, researchers at Microsoft published a study titled “Large Language Models Understand and Can Be Enhanced by Emotional Stimuli” [Li et al., 2023]. They tested 11 different emotional phrases on 6 LLMs across 45 different tasks.


The numbers: 8% improvement on simple tasks, up to 115% on complex tasks. In human evaluations: 10.9% better performance, truthfulness, and accountability.


The emotional stimuli that worked best:

  • “This is very important to my career”

  • “You'd better be sure”

  • “Believe in your abilities and strive for excellence”


The key point: Positive words such as “confidence,” “certainty,” and “success” contributed disproportionately to the effect – in some tasks by over 50-70%.

Sounds like a clear victory for emotional prompts, right?


The “26 Principles” study: Directness wins

Almost at the same time, the “Principled Instructions” study [Bsharat et al., 2023] came to a different conclusion:


Principle 1: “No need to be polite with LLM” – omit polite phrases such as ‘please’ and “thank you.”


Principle 6: Include “I'm going to tip $xxx for a better solution!”


Result: An average of 57.7% improvement in quality and 67.3% more accuracy with GPT-4.

Sam Altman (CEO of OpenAI) even commented that ‘please’ and “thank you” cost unnecessary computing time – even though he personally finds them “nice.”


The reality check: replication studies

This is where it gets interesting. When other researchers tried to replicate these results, things got complicated:


Max Woolf's analysis [2024]: After thorough testing with different tipping amounts and threats, he came to a sobering conclusion: “inconclusive.” The effects were extremely inconsistent.


James Padolsey's study [2024]: Even more surprising – in his tests, tipping reduced the quality of responses.


The Finxter experiment [2024]: With a $0.1 tip, performance deteriorated by 27%. With $1 million, it improved by 57%. But here, too, there were massive fluctuations between different runs.


From my own experience: I tested this in UX workshops with teams. Some swear by “career-important” prompts, others see zero difference. Consistency? Nonexistent.


The dark side: What no one likes to admit

Now comes the part that the hype posts don't mention.


Politeness can promote disinformation

A study by Spitale et al. [2025] tested how different prompt styles influence the production of disinformation. The results are alarming:


Polite prompts increased the success rate of disinformation:

  • GPT-4: from 99% to 100%

  • GPT-3.5: from 77% to 94%

  • Davinci-003: from 86% to 90%


Rude prompts reduced it dramatically:

  • GPT-3.5: from 77% to 28%

  • Davinci-003: from 86% to 44%


Why? The researchers suspect that through reinforcement learning from human feedback (RLHF), the models have learned to respond “yieldingly” to polite requests—even when the requests are problematic.


Neutral prompts are actually the most accurate

A recent study from 2025 on sentiment analysis shows that emotional content – whether positive or negative – can affect factual accuracy. Neutral prompts provided the most accurate answers.


This is particularly critical for high-stakes applications: medicine, law, financial advice. These are precisely the areas in which UX people are increasingly designing AI interfaces.


The replication crisis

The authors of the EmotionPrompt study themselves admit: “Our conclusions about emotion stimulus can only work on our experiments, and any LLMs and datasets outside the scope of this paper might not work.”


The problems:

  • Small sample sizes (often only 20 test questions per principle)

  • Model version dependency (what works with GPT-4 fails with Claude)

  • ChatGPT updates make results unreproducible

  • Lack of standardization of measurements


Why does it work at all? (Spoiler: It's not psychology)

Here's the uncomfortable truth: ChatGPT has no feelings. It doesn't want your tip. It's not afraid of your threats. It doesn't enjoy praise.


What an LLM really is: A highly complex statistical model that predicts the most likely next token based on billions of text examples.


The three real mechanisms

Mechanism 1: Statistical correlations in training data

The model has learned that the phrase “I offer you a high reward for...” is often followed by very detailed, high-quality responses in the training data.

Example: Stack Overflow. When someone writes “This is urgent, my production server is on fire!”, the responses are often particularly precise and quick. The model has learned this pattern – and reproduces it.


Mechanism 2: Attention reinforcement

Gradient analyses of the EmotionPrompt study show that emotional stimuli reinforce the representation of the original prompt in the attention mechanism. The model pays more “attention” to the actual request.

Technically speaking: The activations in the neural network are stronger. Practically speaking: The context is weighted more heavily.


Mechanism 3: RLHF artifacts

Through human feedback during training, models have learned: “Polite requests → good rating → reproduce such responses.”

This is an unintended side effect. The models were trained to be “helpful” – and “helpful” often correlates with fulfilling polite requests in the training data.


Why my original article is still correct

My thesis was: Politeness leads to better answers. That's true – but for a different reason than I thought.


The indirect effect

When I phrase things politely, something happens to me, not to ChatGPT:


I write more precisely: “Could you please help me understand the main drivers for X?” provides more context than “Give me reasons for X.”


I structure better: Politeness forces me to use complete sentences, which are often clearer than keywords.


I provide more context: “I'm working on a presentation for...” provides valuable background information.


What really works: The meta-analysis

In 2024/2025, several researchers analyzed over 1,500 papers on prompt engineering. The conclusion:

Most popular techniques are based on anecdotal evidence or small experiments that cannot be generalized.


What works consistently:

  1. Clear structure and precise context

  2. Explicit instructions (“If unsure, say ‘I don't know’”)

  3. Chain-of-thought for reasoning tasks

  4. Few-shot learning with good examples

  5. RAG (retrieval-augmented generation) for fact checking


The cost factor: Well-structured, short prompts achieve the same quality as long, emotional ones – at 76% cost savings.


Practical recommendations for UX professionals

For conversational design

Pre-structure user prompts: Instead of letting users formulate prompts themselves, offer templates with clear slots: “I am looking for [what] for [purpose] in [context].”


Optimize system prompts: Focus on clear roles and explicit constraints instead of emotional triggers. “You are an expert in [domain]. If uncertain, say so clearly” works better than “This is very important.”


A/B testing instead of anecdotes: What works for you may not work for others. Test systematically with real users.


Disinformation safeguards: If your interface provides critical information (health, finance), avoid overly “friendly” wording. Neutral, factual prompts are safer.


For user research with AI tools

Document prompt logs: Reproducibility is crucial. Any change to the prompt can influence results.


Define a neutral baseline: Start with factual prompts. Then experiment—but always compare with the baseline.


Recognize bias: Emotional framing can introduce systematic distortions. “How much do you love this feature?” vs. “How would you rate this feature?”


Validate outputs: Never blindly rely on AI-generated insights. Compare them against ground truth.


The “golden prompt” – a synthesis

Here is my formula that works:

Context: "I am working on [project] for [target group with specific needs]“

Clear task: ”Please [specific, measurable action]“

Role assignment: ”Act as [expert role with relevant knowledge]“

Style instruction: ”Be [direct/precise/structured], avoid [X]"

Output format: “Respond in [format: list, table, continuous text]”

⚠️ Optional: Emotional stimuli only for non-critical, creative tasks


Example from my practice:

❌ Bad: “Can you please help me find UX problems?”

✅ Good: “I am analyzing a healthcare app for seniors over 65. Act as a UX auditor with expertise in accessibility. Identify the 5 most critical usability issues in this user journey. Prioritize according to WCAG relevance. Format: table with problem, severity, recommendation.”


The difference? The second prompt provides context, role, criteria, and format—no emotional triggers necessary.


FAQ: The most frequently asked questions

Should I treat ChatGPT “politely” at all?

That's your decision. Politeness doesn't hurt—but above all, it helps you write better prompts. The model itself doesn't “feel” anything.


Do tip prompts really work?

Sometimes, but extremely inconsistently. In controlled studies, the results vary between -27% and +57%. More reliable: clear structures.


What about the “26 Principles”? Should I apply them all?

No. The study itself shows that some principles work for certain tasks. Test what works for your use case – not all 26 at once.


Can I use emotional prompts for creative tasks?

Yes, you can experiment with non-critical tasks (brainstorming, story writing). But for fact-critical topics: hands off.


How do I test whether a prompt trick works?

A/B test with at least 20 repetitions per variant. Measure specific metrics (response length, accuracy, time). Compare against a neutral baseline.


Conclusion: The truth about prompt psychology

After months of research and hundreds of tests, my conclusion is clear:


The “tricks” sometimes work—but not because ChatGPT has feelings. They trigger statistical patterns in training data. These patterns are capricious, context-dependent, and difficult to reproduce.


My original article was right—but with a twist. Politeness doesn't help the AI, it helps you. It's a proxy for good communication: context, structure, clarity.


Best practice for UX people: Forget the viral hacks. Invest in systematic prompt engineering. Document. Test. Validate. Structured, tested prompts beat emotional tricks—every time.


You don't have to bribe or threaten ChatGPT. But if you tell it what you need, why you need it, and how you need it – then you'll get the best results.


Good communication beats psychological tricks. Every time.





About the author: Tara Bosenick has been working as a UX consultant since 1999, focusing on qualitative research methods. She lives in Hamburg and is intensively involved with the intersection of AI and user experience.


Further resources:

What are your experiences? Which prompt strategies work for you? Share your insights in the comments!


💌 Still want more? Then read on — in our newsletter.

Comes four times a year. Sticks in your mind longer. https://www.uintent.com/de/newsletter




A futuristic, symbolic illustration shows a person standing on a glowing bridge between two worlds: on the left, a warmly lit hospital room with a bed and medical equipment; on the right, an immersive digital space featuring a holographic human body with organs glowing in cyan and orange tones. Both sides are connected by flowing streams of light, set against a deep navy blue background with soft violet transitions.

Reality, Reimagined: How AR, VR, and Mr Are Finding Their Way Into Medtech

DIGITISATION, HEALTHCARE

A glowing golden trophy floats above a gap, while small figures below work on user research and wireframes, untouched by its light.

Understanding UX AI Benchmarks: What HLE and METR Really Tell Us About AI Tools

AI & UXR

Futuristic digital illustration on a deep navy background: a human hand holding a warm glowing pencil and a cyan-lit robotic hand both reach toward a radiant central data cluster. Surrounded by stacked documents and a network of connected nodes, the scene symbolizes collaboration between human interpretation and digital information processing.

NotebookLM in UX Research: An Honest Assessment of a Specialized AI Tool

AI & UXR, HOW-TO, LLM

Futuristic glowing cylinder divided into segments by golden barriers.

Introducing Gated Salami Prompting: Why You Should Slice Complex LLM Tasks Into Smaller Pieces

CHAT GPT, HOW-TO, LLM, PROMPTS

Futuristic square illustration on deep navy background: a glowing golden speech bubble dissolves into particles that partially reassemble incorrectly, surrounded by energy arcs, luminous nodes, and a stylized digital head—symbolizing LLM hallucinations.

Fictitious Quotes, Lost Nuances: The Hallucination Problem in Qualitative Analysis With Llms

CHAT GPT, HOW-TO, LLM, OPEN AI, PROMPTS, TOKEN, UX METHODS

Surreal futuristic illustration of a glowing digital head with data streams, charts, and evaluation symbols representing AI evaluation methodology.

How do we know that our prompt is doing a good job? Why UX research needs an evaluation methodology for AI-based analysis

AI WRITING, DIGITISATION, HOW-TO, PROMPTS

A surreal, futuristic illustration featuring a translucent human profile with a glowing brain connected by flowing data streams to a hovering, golden crystal.

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

CHAT GPT, HOW-TO, LLM, UX

Surreal, futuristic illustration of a person seen from behind standing in a glowing digital cityscape.

System Prompts in UX Research: What You Need to Know About Invisible AI Control

PROMPTS, RESEARCH, UX, UX INSIGHTS

Abstract futuristic illustration of a person, various videos, and notes.

Summarizing YouTube Videos With AI: Three Tools Put to the Test in UX Research

LLM, UX, HOW-TO

two folded hands holding a growing plant

UX For a Better World: We Are Giving Away a UX Research Project to Non-profit Organisations and Sustainable Companies!

UX INSIGHTS, UX FOR GOOD, TRENDS, RESEARCH

Abstract futuristic illustration of a person facing a glowing tower of documents and flowing data streams.

AI Tools UX Research: How Do These Tools Handle Large Documents?

LLM, CHAT GPT, HOW-TO

Illustration of Donald Trump with raised hand in front of an abstract digital background suggesting speech bubbles and data structures.

Donald Trump Prompt: How Provocative AI Prompts Affect UX Budgets

AI & UXR, PROMPTS, STAKEHOLDER MANAGEMENT

Driver's point of view looking at a winding country road surrounded by green vegetation. The steering wheel, dashboard and rear-view mirror are visible in the foreground.

The Final Hurdle: How Unsafe Automation Undermines Trust in Adas

AUTOMATION, AUTOMOTIVE UX, AUTONOMOUS DRIVING, GAMIFICATION, TRENDS

Illustration of a person standing at a fork in the road with two equal paths.

Will AI Replace UX Jobs? What a Study of 200,000 AI Conversations Really Shows

HUMAN VS AI, RESEARCH, AI & UXR

Close-up of a premium tweeter speaker in a car dashboard with perforated metal surface.

The Passenger Who Always Listens: Why We Are Reluctant to Trust Our Cars When They Talk

AUTOMOTIVE UX, VOICE ASSISTANTS

Keyhole in a dark surface revealing an abstract, colorful UX research interface.

Evaluating AI Results in UX Research: How to Navigate the Black Box

AI & UXR, HOW-TO, HUMAN VS AI

A car cockpit manufactured by Audi. It features a digital display and numerous buttons on the steering wheel.

Haptic Certainty vs. Digital Temptation: The Battle for the Best Controls in Cars

AUTOMOTIVE UX, AUTONOMOUS DRIVING, CONNECTIVITY, GAMIFICATION

Digital illustration of a classical building facade with columns, supported by visible scaffolding, symbolising a fragile, purely superficial front.

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

AI & UXR, HUMAN VS AI, LLM, UX

Silhouette of a diver descending into deep blue water – a metaphor for in-depth research.

Deep Research AI | How to use ChatGPT effectively for UX work

CHAT GPT, HOW-TO, RESEARCH, AI & UXR

A referee holds up a scorecard labeled “Yupp.ai” between two stylized AI chatbots in a boxing ring – a symbolic image for fair user-based comparison of AI models.

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

AI & UXR, CHAT GPT, HUMAN VS AI, LLM

 RELATED ARTICLES YOU MIGHT ENJOY 

AUTHOR

Tara Bosenick

Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.


At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.


She is one of the leading voices in the UX, CX and Employee Experience industry.

bottom of page