top of page
uintent company logo

AI & UXR, HUMAN VS AI, LLM, UX

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

5

MIN

Dec 2, 2025

LLMs have long been part of everyday life in many UX teams: clustering interviews, writing personas, designing user flows, polishing copy. The answers often sound brilliant – technically correct, cleanly formulated, well structured.


That's exactly the problem. The paper Potemkin Understanding in Large Language Modelsdescribes how models create a convincing picture of understanding without working consistently with concepts internally. In the UX context, I call this UX Potemkin – a fancy UX façade without a solid foundation.


The most important points in brief

  • UX Potemkin means: AI sounds competent, but breaks its own assumptions in application.

  • LLMs can explain concepts correctly and then use them inconsistently in the next step.

  • Correct answer ≠ genuine understanding. Consistency, context and traceability are crucial.

  • Always separate: having a concept explained vs. having a concept applied – and consciously compare the two.

  • Build consistency checks directly into your prompts, e.g. ‘Show where you contradict your own statements.’

  • Use AI as a sparring partner, not as an authority – validation remains with you, your team and your data.

  • Documented prompts + clear red flag tests make your AI deployment more robust and auditable.


What is ‘UX Potemkin’ anyway?

Historically, ‘Potemkin’ refers to facade villages that were meant to impress, even though there was nothing behind them.


Applied to LLMs, this means that a model provides you with plausible answers without using the concept behind them in a stable, consistent and context-sensitive manner.


In everyday UX, this becomes UX Potemkin:

  • Personas sound well-rounded, but behave completely differently in the flow.

  • Journeys seem logical, but break implicit needs.

  • Design recommendations argue with familiar principles, but apply them inconsistently.


You look at a seemingly solid UX building – until you bump into it at one point and realise: façade.


What does the paper ‘Potemkin Understanding in Large Language Models’ show?

The authors investigate whether LLMs really ‘understand’ concepts – or just produce best-guess text that looks like it.


The core idea of the study:

  • Step 1: The model should explain a concept (e.g., a literary technique, game theory idea, psychological bias).

  • Step 2: The same model should apply this concept in a task.


Results:

  • Definitions are often correct.

  • When applied, inconsistencies and self-contradictions systematically arise.

  • Models can even claim that everything is consistent, even though they break their own rules.


In addition, the models generate their own question-answer pairs and later evaluate their consistency. The inconsistencies found are more of a lower limit – so it is at least as shaky, probably worse.


For us in UX, this means:

  • Selective ‘correct’ answers are no proof of viable understanding.

  • We need to pay attention to coherence across tasks, not just nice individual results.


Where UX Potemkin appears in your everyday life


1. Research synthesis: Nice clusters, shaky foundation

You put 20 interview transcripts into the LLM and let it:

  • Extract pain points

  • Cluster needs

  • Name categories


Result: a structured picture with clever headlines – perfect for a slide deck. But:


UX Potemkin risk:

  • Quotes do not fit neatly into categories.

  • Categories overlap or are blurred.

  • Emotional nuances (e.g. shame vs. frustration) become blurred.


Particularly tricky: for sensitive topics (health, disability, finances), the model can sound superficially empathetic and still miss cultural or emotional contexts.


2. Personas, journeys and flows: inner life vs. behaviour

LLMs love personas. In minutes, they spit out characters with goals, points of frustration and quotes.


Typical UX Potemkin pattern:

  • Persona ‘is extremely cost-conscious,’ but behaves like someone with a high willingness to pay in the journey.

  • Persona ‘needs security and control,’ but is constantly overwhelmed by surprising auto decisions in the flow.


Facade: everything seems narratively coherent. Foundation: internal logic is broken – and your design decisions become shaky.


3. Design principles & patterns: correctly explained, incorrectly used

You ask:

  1. ‘Please explain “progressive disclosure” in the UX context.’

  2. ‘Design a dashboard based on this principle.’


The model delivers:

  • a clear definition of progressive disclosure

  • a dashboard with 15 visible metrics, 4 filters and 3 tabs on the start screen


Officially: "This uses progressive disclosure." In fact: the opposite.


This is a very clear example of Potemkin understanding: the model can describe the principle, but cannot reliably design according to it.


How to recognise UX Potemkin: compact toolbox

Now here's the whole thing in a ‘quick & dirty’ format that you can incorporate directly into your prompts.


1. Strictly separate explaining from applying

Always two steps:

  1. ‘Explain in your own words what [concept X] means – please without examples.’

  2. ‘Apply [concept X] in a concrete UX example and describe the user steps.’


Then:

"Show me where your example contradicts your own definition."

If the model can't find anything specific, that's a warning sign.


2. Demand consistency reflection

Instead of just: "Create a journey," try:

"You described the persona Anna as cost-conscious. Show step by step where this is reflected in her customer journey – and mark places where your flow contradicts this."

This forces the model to proofread their own story.


3. Same information, different formats (cross-task check)

Example:

  1. "Describe the three most important needs of the target group in one paragraph."

  2. "Present the same needs in a table: Need | Consequence for design | Risk if ignored."


Compare: Are terms suddenly weighted differently or reinvented without explanation → UX Potemkin.


4. Ask for a brief explanation

Instead of a long chain of thought, the following is often sufficient:

"Explain in 3–5 sentences how you arrived at this interface proposal from the users described."

Pay attention to whether specific persona characteristics appear – or just generic UX clichés. The latter is a facade.


5. Precede with red flag tests

Have a few mini stress tests ready, e.g.:

"Users are often under time pressure. Show me which steps in your flow unnecessarily increase time pressure – and shorten the flow accordingly."

or

"Name two scenarios in which your design fails and adjust your proposal accordingly."

If nothing substantial comes up, it's not going to work.


Workflow principles for stable AI use


1. Role clarification & team rules


AI is a sparring partner, not a decision-maker. Use it for:

  • Ideas and variants

  • Hypotheses and initial structure

  • Alternative perspectives


Define simple rules within the team:

  • No AI output without consistency checks in customer documents.

  • Always compare personas from AI with real data.

  • Flows from AI must pass at least one red flag test.


2. Prompt logs instead of gut feeling


Document briefly:

  • Initial prompts

  • Important queries

  • Inconsistencies found


This may sound dry, but it makes your work:

  • comprehensible for stakeholders

  • reproducible within the team

  • verifiable for later decisions


3. Automated sanity checks


If you integrate AI more deeply into tools, build in small check rules, e.g.:

  • If persona wants ‘fast & uncomplicated’ → warning for flows with > X steps.

  • If main need is ‘control’ → flag if there is too much autopilot in the flow.


Not a protective wall, but a solid UX Potemkin alarm.


Two short practical examples


Example 1: Research clustering

Situation:

LLM clusters 30 interviews, delivers five smart categories with quotes.


UX Potemkin problem:

Some quotes do not fit the categories, two clusters overlap heavily. Nevertheless, everything looks slide-ready.


Solution:

  • Ask the model: ‘Show me pain points that appear in multiple clusters and suggest a reorganisation.’

  • Manually code and compare a sample.

  • Treat clustering as a hypothesis – not as a ‘finished insight’.


Example 2: Progressive disclosure in the dashboard

Situation:

LLM correctly explains progressive disclosure, then designs a full monitoring dashboard.


Solution:

  1. "Mark all elements that contradict your definition of progressive disclosure."

  2. "Design a variant with a maximum of five initially visible elements and describe the disclosure steps."


This makes the Potemkin pattern visible – and forces the model to correct itself.


FAQ on UX Potemkin & LLMs

1. Is it even worth using LLMs?

Yes. LLMs are strong in structure, formulation and variants. They become a problem when you treat them as a ‘truth machine’. Use them consciously as support – not as authority.


2. Don't these checks eat up all the time saved?

Part of the time saved goes into checks – but it's time well spent. It's better to invest 20% of the time saved in consistency checks than to build a project on a fancy façade that will have to be corrected later at great expense.


3. Are more expensive or newer models less prone to UX Potemkin?

They are often better at playing facade: even smoother, even more convincing. This reduces some errors, but also increases the risk of you believing them too quickly. The ‘smarter’ a model appears, the more important your checks are.


4. Can I completely avoid Potemkin understandings?

No. But you can significantly reduce them by:

  • Separating explanation from application

  • Using consistency prompts

  • Defining team rules

  • Comparing AI results with real data


It's not about perfection, but about controllable risks.


Conclusion: How to deal with UX Potemkin

Potemkin understandings are not a theoretical playground, but a very real risk once AI enters your UX process.


Keep in mind:

  • LLMs quickly deliver a UX façade that appears stable but may have internal flaws.

  • What matters is not how good the output sounds, but how consistent it is with your data and assumptions.

  • You remain responsible for research quality, design decisions and ethical implications.


Use AI as a sparring partner – with clear consistency checks, documented prompts and a team that knows what to look out for.


Your next step

Take a current project and try a simple two-step strategy:

  1. Have a term explained (e.g. a principle, a need, a business model).

  2. Have an application created for it (persona, journey or flow) – and actively ask for contradictions.


💌 Not enough? Then read on – in our newsletter. It comes four times a year. Sticks in your mind longer. To subscribe: https://www.uintent.com/newsletter


As of December 2025




Digital illustration of a classical building facade with columns, supported by visible scaffolding, symbolising a fragile, purely superficial front.

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

AI & UXR, HUMAN VS AI, LLM, UX

Silhouette of a diver descending into deep blue water – a metaphor for in-depth research.

Deep Research AI | How to use ChatGPT effectively for UX work

CHAT GPT, HOW-TO, RESEARCH, AI & UXR

A referee holds up a scorecard labeled “Yupp.ai” between two stylized AI chatbots in a boxing ring – a symbolic image for fair user-based comparison of AI models.

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

AI & UXR, CHAT GPT, HUMAN VS AI, LLM

3D illustration of a digital marketplace with colorful prompt stalls and a figure selecting a prompt card.

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

AI & UXR, PROMPTS

Robot holds two signs: “ISO 9241 – 7 principles” and “ISO 9241 – 10 principles”

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

AI & UXR, HUMAN VS AI, CHAT GPT

Strawberry being sliced by a knife, stylized illustration.

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

AI & UXR, TOKEN, LLM

Square motif divided in the middle: on the left, a grey, stylised brain above a seated person working on a laptop in dark grey tones; on the right, a bright blue, networked brain above a standing person in front of a holographic interface on a dark background.

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

AI & UXR, CHAT GPT

Surreal AI image with data streams, crossed-out “User Expirince” and the text “ChatGPT kann jetzt Text in Bild”.

When AI Paints Pictures – And Suddenly Knows How to Spell

AI & UXR, CHAT GPT, HUMAN VS AI

Human and AI co-create a glowing tree on the screen, set against a dark, surreal background.

When the Text Is Too Smooth: How to Make AI Language More Human

AI & UXR, AI WRITING, CHAT GPT, HUMAN VS AI

Futuristic illustration: Human facing a glowing humanoid AI against a digital backdrop.

Not Science Fiction – AI Is Becoming Independent

AI & UXR, CHAT GPT

Illustration of an AI communicating with a human, symbolizing the persuasive power of artificial intelligence.

Between Argument and Influence – How Persuasive Can AI Be?

AI & UXR, CHAT GPT, LLM

A two-dimensional cartoon woman stands in front of a human-sized mobile phone displaying health apps. To her right is a box with a computer on it showing an ECG.

Digital Health Apps & Interfaces: Why Good UX Determines Whether Patients Really Benefit

HEALTHCARE, MHEALTH, TRENDS, UX METHODS

Illustration of a red hand symbolically prioritizing “Censorship” over “User Privacy” in the context of DeepSeek, with the Chinese flag in the background.

Censorship Meets AI: What Deepseek Is Hiding About Human Rights – And Why This Affects UX

AI & UXR, LLM, OPEN AI

Isometric flat-style illustration depicting global UX study logistics with parcels, checklist, video calls, and location markers over a world map.

What It Takes to Get It Right: Global Study Logistics in UX Research for Medical Devices

HEALTHCARE, UX METHODS, UX LOGISTICS

Surreal, glowing illustration of an AI language model as a brain, influenced by a hand – symbolizing manipulation by external forces.

Propaganda Chatbots - When AI Suddenly Speaks Russian

AI & UXR, LLM

Illustration of seven animals representing different thinking and prompting styles in UX work.

Welcome to the Prompt Zoo

AI & UXR, PROMPTS, UX

A two-dimensional image of a man sitting at a desk with an open laptop displaying a health symbol. In the background hangs a poster with a DNA strand.

UX Regulatory Compliance: Why Usability Drives Medtech Certification

HEALTHCARE, REGULATIONS

Illustration of a lightbulb surrounded by abstract symbols like a question mark, cloud, speech bubble, and cross – symbolizing creative ideas and critical thinking.

Why Prompts That Produce Bias and Hallucinations Can Sometimes Be Helpful

AI & UXR, CHAT GPT, HUMAN VS AI, OPEN AI

Illustration of a man at a laptop, surrounded by symbols of global medical research: world map with location markers, monitor with a medical cross, patient file, and stethoscope.

Global UX Research in Medical Technology: International User Research as a Factor for Success

HEALTHCARE, MHEALTH, REGULATIONS

Abstract pastel-colored illustration showing a stylized brain and geometric shapes – symbolizing AI and bias.

AI, Bias and the Power of Questions: How to Get Better Answers With Smart Prompts

AI & UXR, CHAT GPT

 RELATED ARTICLES YOU MIGHT ENJOY 

AUTHOR

Tara Bosenick

Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.


At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.


She is one of the leading voices in the UX, CX and Employee Experience industry.

bottom of page