AI & UXR, HUMAN VS AI, LLM, UX

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

MIN

Dec 2, 2025

LLMs have long been part of everyday life in many UX teams: clustering interviews, writing personas, designing user flows, polishing copy. The answers often sound brilliant – technically correct, cleanly formulated, well structured.

That's exactly the problem. The paper ‘Potemkin Understanding in Large Language Models’ describes how models create a convincing picture of understanding without working consistently with concepts internally. In the UX context, I call this UX Potemkin – a fancy UX façade without a solid foundation.

The most important points in brief

UX Potemkin means: AI sounds competent, but breaks its own assumptions in application.
LLMs can explain concepts correctly and then use them inconsistently in the next step.
Correct answer ≠ genuine understanding. Consistency, context and traceability are crucial.
Always separate: having a concept explained vs. having a concept applied – and consciously compare the two.
Build consistency checks directly into your prompts, e.g. ‘Show where you contradict your own statements.’
Use AI as a sparring partner, not as an authority – validation remains with you, your team and your data.
Documented prompts + clear red flag tests make your AI deployment more robust and auditable.

What is ‘UX Potemkin’ anyway?

Historically, ‘Potemkin’ refers to facade villages that were meant to impress, even though there was nothing behind them.

Applied to LLMs, this means that a model provides you with plausible answers without using the concept behind them in a stable, consistent and context-sensitive manner.

In everyday UX, this becomes UX Potemkin:

Personas sound well-rounded, but behave completely differently in the flow.
Journeys seem logical, but break implicit needs.
Design recommendations argue with familiar principles, but apply them inconsistently.

You look at a seemingly solid UX building – until you bump into it at one point and realise: façade.

What does the paper ‘Potemkin Understanding in Large Language Models’ show?

The authors investigate whether LLMs really ‘understand’ concepts – or just produce best-guess text that looks like it.

The core idea of the study:

Step 1: The model should explain a concept (e.g., a literary technique, game theory idea, psychological bias).
Step 2: The same model should apply this concept in a task.

Results:

Definitions are often correct.
When applied, inconsistencies and self-contradictions systematically arise.
Models can even claim that everything is consistent, even though they break their own rules.

In addition, the models generate their own question-answer pairs and later evaluate their consistency. The inconsistencies found are more of a lower limit – so it is at least as shaky, probably worse.

For us in UX, this means:

Selective ‘correct’ answers are no proof of viable understanding.
We need to pay attention to coherence across tasks, not just nice individual results.

Where UX Potemkin appears in your everyday life

1. Research synthesis: Nice clusters, shaky foundation

You put 20 interview transcripts into the LLM and let it:

Extract pain points
Cluster needs
Name categories

Result: a structured picture with clever headlines – perfect for a slide deck. But:

UX Potemkin risk:

Quotes do not fit neatly into categories.
Categories overlap or are blurred.
Emotional nuances (e.g. shame vs. frustration) become blurred.

Particularly tricky: for sensitive topics (health, disability, finances), the model can sound superficially empathetic and still miss cultural or emotional contexts.

2. Personas, journeys and flows: inner life vs. behaviour

LLMs love personas. In minutes, they spit out characters with goals, points of frustration and quotes.

Typical UX Potemkin pattern:

Persona ‘is extremely cost-conscious,’ but behaves like someone with a high willingness to pay in the journey.
Persona ‘needs security and control,’ but is constantly overwhelmed by surprising auto decisions in the flow.

Facade: everything seems narratively coherent. Foundation: internal logic is broken – and your design decisions become shaky.

3. Design principles & patterns: correctly explained, incorrectly used

You ask:

‘Please explain “progressive disclosure” in the UX context.’
‘Design a dashboard based on this principle.’

The model delivers:

a clear definition of progressive disclosure
a dashboard with 15 visible metrics, 4 filters and 3 tabs on the start screen

Officially: "This uses progressive disclosure." In fact: the opposite.

This is a very clear example of Potemkin understanding: the model can describe the principle, but cannot reliably design according to it.

How to recognise UX Potemkin: compact toolbox

Now here's the whole thing in a ‘quick & dirty’ format that you can incorporate directly into your prompts.

1. Strictly separate explaining from applying

Always two steps:

‘Explain in your own words what [concept X] means – please without examples.’
‘Apply [concept X] in a concrete UX example and describe the user steps.’

Then:

"Show me where your example contradicts your own definition."

If the model can't find anything specific, that's a warning sign.

2. Demand consistency reflection

Instead of just: "Create a journey," try:

"You described the persona Anna as cost-conscious. Show step by step where this is reflected in her customer journey – and mark places where your flow contradicts this."

This forces the model to proofread their own story.

3. Same information, different formats (cross-task check)

Example:

"Describe the three most important needs of the target group in one paragraph."
"Present the same needs in a table: Need | Consequence for design | Risk if ignored."

Compare: Are terms suddenly weighted differently or reinvented without explanation → UX Potemkin.

4. Ask for a brief explanation

Instead of a long chain of thought, the following is often sufficient:

"Explain in 3–5 sentences how you arrived at this interface proposal from the users described."

Pay attention to whether specific persona characteristics appear – or just generic UX clichés. The latter is a facade.

5. Precede with red flag tests

Have a few mini stress tests ready, e.g.:

"Users are often under time pressure. Show me which steps in your flow unnecessarily increase time pressure – and shorten the flow accordingly."

"Name two scenarios in which your design fails and adjust your proposal accordingly."

If nothing substantial comes up, it's not going to work.

Workflow principles for stable AI use

1. Role clarification & team rules

AI is a sparring partner, not a decision-maker. Use it for:

Ideas and variants
Hypotheses and initial structure
Alternative perspectives

Define simple rules within the team:

No AI output without consistency checks in customer documents.
Always compare personas from AI with real data.
Flows from AI must pass at least one red flag test.

2. Prompt logs instead of gut feeling

Document briefly:

Initial prompts
Important queries
Inconsistencies found

This may sound dry, but it makes your work:

comprehensible for stakeholders
reproducible within the team
verifiable for later decisions

3. Automated sanity checks

If you integrate AI more deeply into tools, build in small check rules, e.g.:

If persona wants ‘fast & uncomplicated’ → warning for flows with > X steps.
If main need is ‘control’ → flag if there is too much autopilot in the flow.

Not a protective wall, but a solid UX Potemkin alarm.

Two short practical examples

Example 1: Research clustering

Situation:

LLM clusters 30 interviews, delivers five smart categories with quotes.

UX Potemkin problem:

Some quotes do not fit the categories, two clusters overlap heavily. Nevertheless, everything looks slide-ready.

Solution:

Ask the model: ‘Show me pain points that appear in multiple clusters and suggest a reorganisation.’
Manually code and compare a sample.
Treat clustering as a hypothesis – not as a ‘finished insight’.

Example 2: Progressive disclosure in the dashboard

Situation:

LLM correctly explains progressive disclosure, then designs a full monitoring dashboard.

Solution:

"Mark all elements that contradict your definition of progressive disclosure."
"Design a variant with a maximum of five initially visible elements and describe the disclosure steps."

This makes the Potemkin pattern visible – and forces the model to correct itself.

FAQ on UX Potemkin & LLMs

1. Is it even worth using LLMs?

Yes. LLMs are strong in structure, formulation and variants. They become a problem when you treat them as a ‘truth machine’. Use them consciously as support – not as authority.

2. Don't these checks eat up all the time saved?

Part of the time saved goes into checks – but it's time well spent. It's better to invest 20% of the time saved in consistency checks than to build a project on a fancy façade that will have to be corrected later at great expense.

3. Are more expensive or newer models less prone to UX Potemkin?

They are often better at playing facade: even smoother, even more convincing. This reduces some errors, but also increases the risk of you believing them too quickly. The ‘smarter’ a model appears, the more important your checks are.

4. Can I completely avoid Potemkin understandings?

No. But you can significantly reduce them by:

Separating explanation from application
Using consistency prompts
Defining team rules
Comparing AI results with real data

It's not about perfection, but about controllable risks.

Conclusion: How to deal with UX Potemkin

Potemkin understandings are not a theoretical playground, but a very real risk once AI enters your UX process.

Keep in mind:

LLMs quickly deliver a UX façade that appears stable but may have internal flaws.
What matters is not how good the output sounds, but how consistent it is with your data and assumptions.
You remain responsible for research quality, design decisions and ethical implications.

Use AI as a sparring partner – with clear consistency checks, documented prompts and a team that knows what to look out for.

Your next step

Take a current project and try a simple two-step strategy:

Have a term explained (e.g. a principle, a need, a business model).
Have an application created for it (persona, journey or flow) – and actively ask for contradictions.

💌 Not enough? Then read on – in our newsletter. It comes four times a year. Sticks in your mind longer. To subscribe: https://www.uintent.com/newsletter

As of December 2025

Surreal futuristic illustration of a glowing digital head with data streams, charts, and evaluation symbols representing AI evaluation methodology.

How do we know that our prompt is doing a good job? Why UX research needs an evaluation methodology for AI-based analysis

AI WRITING, DIGITISATION, HOW-TO, PROMPTS

AI & UXR, HUMAN VS AI, LLM, UX

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

​

The most important points in brief

What is ‘UX Potemkin’ anyway?

What does the paper ‘Potemkin Understanding in Large Language Models’ show?

Where UX Potemkin appears in your everyday life

How to recognise UX Potemkin: compact toolbox

Workflow principles for stable AI use

1. Role clarification & team rules

2. Prompt logs instead of gut feeling

3. Automated sanity checks

Two short practical examples

Example 1: Research clustering

Example 2: Progressive disclosure in the dashboard

FAQ on UX Potemkin & LLMs

Conclusion: How to deal with UX Potemkin

Your next step

How do we know that our prompt is doing a good job? Why UX research needs an evaluation methodology for AI-based analysis

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

System Prompts in UX Research: What You Need to Know About Invisible AI Control

Summarizing YouTube Videos With AI: Three Tools Put to the Test in UX Research

UX For a Better World: We Are Giving Away a UX Research Project to Non-profit Organisations and Sustainable Companies!

AI Tools UX Research: How Do These Tools Handle Large Documents?

Donald Trump Prompt: How Provocative AI Prompts Affect UX Budgets

The Final Hurdle: How Unsafe Automation Undermines Trust in Adas

Will AI Replace UX Jobs? What a Study of 200,000 AI Conversations Really Shows

The Passenger Who Always Listens: Why We Are Reluctant to Trust Our Cars When They Talk

Evaluating AI Results in UX Research: How to Navigate the Black Box

Haptic Certainty vs. Digital Temptation: The Battle for the Best Controls in Cars

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

Deep Research AI | How to use ChatGPT effectively for UX work

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

Why UX Research Is Losing Credibility - And How We Can Regain It

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

RELATED ARTICLES YOU MIGHT ENJOY

AUTHOR