
AI & UXR, HUMAN VS AI, LLM, UX
UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions
5
MIN
Dec 2, 2025
LLMs have long been part of everyday life in many UX teams: clustering interviews, writing personas, designing user flows, polishing copy. The answers often sound brilliant – technically correct, cleanly formulated, well structured.
That's exactly the problem. The paper ‘Potemkin Understanding in Large Language Models’ describes how models create a convincing picture of understanding without working consistently with concepts internally. In the UX context, I call this UX Potemkin – a fancy UX façade without a solid foundation.
The most important points in brief
UX Potemkin means: AI sounds competent, but breaks its own assumptions in application.
LLMs can explain concepts correctly and then use them inconsistently in the next step.
Correct answer ≠ genuine understanding. Consistency, context and traceability are crucial.
Always separate: having a concept explained vs. having a concept applied – and consciously compare the two.
Build consistency checks directly into your prompts, e.g. ‘Show where you contradict your own statements.’
Use AI as a sparring partner, not as an authority – validation remains with you, your team and your data.
Documented prompts + clear red flag tests make your AI deployment more robust and auditable.
What is ‘UX Potemkin’ anyway?
Historically, ‘Potemkin’ refers to facade villages that were meant to impress, even though there was nothing behind them.
Applied to LLMs, this means that a model provides you with plausible answers without using the concept behind them in a stable, consistent and context-sensitive manner.
In everyday UX, this becomes UX Potemkin:
Personas sound well-rounded, but behave completely differently in the flow.
Journeys seem logical, but break implicit needs.
Design recommendations argue with familiar principles, but apply them inconsistently.
You look at a seemingly solid UX building – until you bump into it at one point and realise: façade.
What does the paper ‘Potemkin Understanding in Large Language Models’ show?
The authors investigate whether LLMs really ‘understand’ concepts – or just produce best-guess text that looks like it.
The core idea of the study:
Step 1: The model should explain a concept (e.g., a literary technique, game theory idea, psychological bias).
Step 2: The same model should apply this concept in a task.
Results:
Definitions are often correct.
When applied, inconsistencies and self-contradictions systematically arise.
Models can even claim that everything is consistent, even though they break their own rules.
In addition, the models generate their own question-answer pairs and later evaluate their consistency. The inconsistencies found are more of a lower limit – so it is at least as shaky, probably worse.
For us in UX, this means:
Selective ‘correct’ answers are no proof of viable understanding.
We need to pay attention to coherence across tasks, not just nice individual results.
Where UX Potemkin appears in your everyday life
1. Research synthesis: Nice clusters, shaky foundation
You put 20 interview transcripts into the LLM and let it:
Extract pain points
Cluster needs
Name categories
Result: a structured picture with clever headlines – perfect for a slide deck. But:
UX Potemkin risk:
Quotes do not fit neatly into categories.
Categories overlap or are blurred.
Emotional nuances (e.g. shame vs. frustration) become blurred.
Particularly tricky: for sensitive topics (health, disability, finances), the model can sound superficially empathetic and still miss cultural or emotional contexts.
2. Personas, journeys and flows: inner life vs. behaviour
LLMs love personas. In minutes, they spit out characters with goals, points of frustration and quotes.
Typical UX Potemkin pattern:
Persona ‘is extremely cost-conscious,’ but behaves like someone with a high willingness to pay in the journey.
Persona ‘needs security and control,’ but is constantly overwhelmed by surprising auto decisions in the flow.
Facade: everything seems narratively coherent. Foundation: internal logic is broken – and your design decisions become shaky.
3. Design principles & patterns: correctly explained, incorrectly used
You ask:
‘Please explain “progressive disclosure” in the UX context.’
‘Design a dashboard based on this principle.’
The model delivers:
a clear definition of progressive disclosure
a dashboard with 15 visible metrics, 4 filters and 3 tabs on the start screen
Officially: "This uses progressive disclosure." In fact: the opposite.
This is a very clear example of Potemkin understanding: the model can describe the principle, but cannot reliably design according to it.
How to recognise UX Potemkin: compact toolbox
Now here's the whole thing in a ‘quick & dirty’ format that you can incorporate directly into your prompts.
1. Strictly separate explaining from applying
Always two steps:
‘Explain in your own words what [concept X] means – please without examples.’
‘Apply [concept X] in a concrete UX example and describe the user steps.’
Then:
"Show me where your example contradicts your own definition."
If the model can't find anything specific, that's a warning sign.
2. Demand consistency reflection
Instead of just: "Create a journey," try:
"You described the persona Anna as cost-conscious. Show step by step where this is reflected in her customer journey – and mark places where your flow contradicts this."
This forces the model to proofread their own story.
3. Same information, different formats (cross-task check)
Example:
"Describe the three most important needs of the target group in one paragraph."
"Present the same needs in a table: Need | Consequence for design | Risk if ignored."
Compare: Are terms suddenly weighted differently or reinvented without explanation → UX Potemkin.
4. Ask for a brief explanation
Instead of a long chain of thought, the following is often sufficient:
"Explain in 3–5 sentences how you arrived at this interface proposal from the users described."
Pay attention to whether specific persona characteristics appear – or just generic UX clichés. The latter is a facade.
5. Precede with red flag tests
Have a few mini stress tests ready, e.g.:
"Users are often under time pressure. Show me which steps in your flow unnecessarily increase time pressure – and shorten the flow accordingly."
or
"Name two scenarios in which your design fails and adjust your proposal accordingly."
If nothing substantial comes up, it's not going to work.
Workflow principles for stable AI use
1. Role clarification & team rules
AI is a sparring partner, not a decision-maker. Use it for:
Ideas and variants
Hypotheses and initial structure
Alternative perspectives
Define simple rules within the team:
No AI output without consistency checks in customer documents.
Always compare personas from AI with real data.
Flows from AI must pass at least one red flag test.
2. Prompt logs instead of gut feeling
Document briefly:
Initial prompts
Important queries
Inconsistencies found
This may sound dry, but it makes your work:
comprehensible for stakeholders
reproducible within the team
verifiable for later decisions
3. Automated sanity checks
If you integrate AI more deeply into tools, build in small check rules, e.g.:
If persona wants ‘fast & uncomplicated’ → warning for flows with > X steps.
If main need is ‘control’ → flag if there is too much autopilot in the flow.
Not a protective wall, but a solid UX Potemkin alarm.
Two short practical examples
Example 1: Research clustering
Situation:
LLM clusters 30 interviews, delivers five smart categories with quotes.
UX Potemkin problem:
Some quotes do not fit the categories, two clusters overlap heavily. Nevertheless, everything looks slide-ready.
Solution:
Ask the model: ‘Show me pain points that appear in multiple clusters and suggest a reorganisation.’
Manually code and compare a sample.
Treat clustering as a hypothesis – not as a ‘finished insight’.
Example 2: Progressive disclosure in the dashboard
Situation:
LLM correctly explains progressive disclosure, then designs a full monitoring dashboard.
Solution:
"Mark all elements that contradict your definition of progressive disclosure."
"Design a variant with a maximum of five initially visible elements and describe the disclosure steps."
This makes the Potemkin pattern visible – and forces the model to correct itself.
FAQ on UX Potemkin & LLMs
1. Is it even worth using LLMs?
Yes. LLMs are strong in structure, formulation and variants. They become a problem when you treat them as a ‘truth machine’. Use them consciously as support – not as authority.
2. Don't these checks eat up all the time saved?
Part of the time saved goes into checks – but it's time well spent. It's better to invest 20% of the time saved in consistency checks than to build a project on a fancy façade that will have to be corrected later at great expense.
3. Are more expensive or newer models less prone to UX Potemkin?
They are often better at playing facade: even smoother, even more convincing. This reduces some errors, but also increases the risk of you believing them too quickly. The ‘smarter’ a model appears, the more important your checks are.
4. Can I completely avoid Potemkin understandings?
No. But you can significantly reduce them by:
Separating explanation from application
Using consistency prompts
Defining team rules
Comparing AI results with real data
It's not about perfection, but about controllable risks.
Conclusion: How to deal with UX Potemkin
Potemkin understandings are not a theoretical playground, but a very real risk once AI enters your UX process.
Keep in mind:
LLMs quickly deliver a UX façade that appears stable but may have internal flaws.
What matters is not how good the output sounds, but how consistent it is with your data and assumptions.
You remain responsible for research quality, design decisions and ethical implications.
Use AI as a sparring partner – with clear consistency checks, documented prompts and a team that knows what to look out for.
Your next step
Take a current project and try a simple two-step strategy:
Have a term explained (e.g. a principle, a need, a business model).
Have an application created for it (persona, journey or flow) – and actively ask for contradictions.
💌 Not enough? Then read on – in our newsletter. It comes four times a year. Sticks in your mind longer. To subscribe: https://www.uintent.com/newsletter
As of December 2025
RELATED ARTICLES YOU MIGHT ENJOY
AUTHOR
Tara Bosenick
Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.
At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.
She is one of the leading voices in the UX, CX and Employee Experience industry.






.png)













