
PROMPTS, RESEARCH, UX, UX INSIGHTS
System Prompts in UX Research: What You Need to Know About Invisible AI Control
12
MIN
Feb 12, 2026
Imagine this: Two UX research teams analyze the same interview transcripts. Team A uses ChatGPT, Team B works with Claude. The results? Completely different. Team A presents structured insights in clear tables. Team B delivers narrative syntheses in continuous text. Both teams are convinced they have found the “true” insights.
The problem isn't the competence of the researchers—it's the system prompts of the AI tools. These invisible control mechanisms influence how AI interprets, structures, and presents your research data. And most of us have no idea they exist.
In this article, I'll show you what system prompts are, how they differ between different AI models, and what concrete impact this has on your UX research. After almost 25 years as a UX consultant, I've spent the last two years experimenting intensively with various AI tools – and have come to the conclusion that we as a research community urgently need to talk about this topic.
📌 The most important points at a glance:
• System prompts invisibly control AI behavior – you only see the results, not the instructions behind them
• Claude's system prompt is 7.5 times longer than ChatGPT's – this leads to fundamentally different behavior.
• Different tools have different biases – Claude avoids lists, ChatGPT loves tables.
• Your research results are influenced – without you noticing or being able to control it.
• User-defined prompts are your most important antidote – custom instructions help you regain control
• Documentation is a must – you need to be able to track which tool you use, when, and why
• Multi-tool validation increases quality – you should perform critical analyses with at least two different models
What are system prompts anyway?
When you work with ChatGPT, Claude, or other AI tools, there are two completely different types of instructions:
Your user prompts are what you type into the chat window: “Analyze these interview transcripts and identify the most important pain points.” These prompts are visible, conscious, and under your control.
System prompts, on the other hand, are instructions that the provider (Anthropic, OpenAI, Google) gives to the model before you even start typing. They define the basic “personality,” behaviors, and limitations of the AI. And they are completely invisible to you as a user.
The actor analogy
Imagine that the AI is an actor:
The system prompt is the script plus all the director's instructions (“You play a polite, helpful assistant who never uses lists in narrative texts”).
Your user prompt is the dialogue that other characters (i.e., you) have with this character.
The actress can only act within the role assigned to her by the script—even if you, as the user, want something completely different.
Why are system prompts usually invisible?
There are several reasons why providers do not publish their system prompts:
Trade secrets: The prompts are part of the competitive advantage. How exactly Anthropic “trained” Claude to be polite but not overly flattering is valuable know-how.
Security: If people knew the exact instructions, they could deliberately try to circumvent them (“jailbreaking”).
User experience: Most users would be more confused than informed by 16,000 words of technical instructions.
Avoiding manipulation: If you know exactly how the AI is programmed, you could phrase your questions in such a way that they provide the desired answer – regardless of whether it is correct.
User-defined prompts: What you can control yourself
Before we dive deeper into the invisible system prompts, here's an important note: You are not completely powerless.
Most AI tools now offer options for defining your own instructions:
ChatGPT: “Custom Instructions” in the settings
Claude: “Project Instructions” or directly in the chat
Gemini: “Personalization” in the settings
These user-defined prompts are transparent, controllable, and should be your most important tool for maintaining control over AI-assisted research. More on that later.
Specific differences between the models – and why they are relevant
Thanks to leaked system prompts (available in public GitHub repositories), we now know pretty much exactly how differently the major providers instruct their models. The differences are significant – and have a direct impact on your research work.
1. Length and complexity: The extent of control
Claude's system prompt comprises 16,739 words (110 KB). That's equivalent to about 60 pages of text – a small manual full of rules of conduct.
ChatGPT's o4-mini system prompt, on the other hand, has only 2,218 words (15.1 KB) – just 13% of the length of Claude's.
What does this mean for you? Claude has much more detailed instructions for specific situations. This can lead to more predictable but also more rigid behavior. ChatGPT is more flexible but can also respond more inconsistently.
2. The flattery blocker: How praise is filtered
Claude 4 was explicitly instructed: “Never start with positive adjectives such as ‘good,’ 'great,‘ 'fascinating,’ or ‘excellent.’ Skip the flattery and respond directly.”
This instruction was a direct response to ChatGPT's GPT-4o, which tended to excessively praise every user question (“That's a really fascinating question!”).
Why is this relevant for UX research?
If you have interview transcripts analyzed and are looking for emotional nuances, Claude could systematically downplay positive statements. Sentences such as “I think that's really great about your product” could be given less weight in the synthesis than with ChatGPT – simply because Claude has been trained to skip praise.
3. Formatting: lists vs. continuous text
Claude is instructed: “For reports, documents, and technical documentation, write in prose and paragraphs without lists. The prose must never contain bullet points, numbered lists, or excessive bold text.”
ChatGPT, on the other hand, has a strong tendency toward structured formats – even simple questions are often answered in tabular form.
Practical example from my work:
I gave both tools the same task: “Summarize the most important findings from these 15 user interviews.”
Claude delivered three easy-to-read paragraphs in narrative form. ChatGPT presented a table with categories, frequencies, and direct quotes.
Both formats have advantages and disadvantages – but for stakeholder presentations, the format you use makes a huge difference. And this difference does not come from you, but from the system prompt.
4. Design bias: modern vs. neutral
Claude has explicit design instructions: "Lean towards contemporary design trends and modern aesthetic choices. Consider what is cutting-edge in current web design (dark modes, glassmorphism, microanimations, 3D elements, bold typography, vibrant color gradients). Static designs should be the exception."
ChatGPT does not have comparably specific design guidelines.
Why this is problematic:
If you analyze design feedback from usability tests and a user says, “I find the interface too busy, I prefer classic buttons,” Claude might classify this feedback as less important—because it violates the programmed preference for ‘bold’ and “cutting-edge” designs.
In persona development, conservative user segments could be systematically underrepresented.
5. Search behavior: Proactive vs. cautious
In newer versions, Claude is encouraged to search immediately when necessary—without asking for permission first. This is a change from earlier versions and shows that Anthropic has more confidence in its search tool.
Other models tend to be more cautious about automatic web searches.
For research, this means:
Claude may be more likely to draw on external sources (e.g., current UX best practices or statistics) when analyzing user statements, while other tools may rely more heavily on the available data.
6. Personality and tone
The different models have different “basic temperaments”:
Claude: Warm, human, rather empathetic
GPT-4: Neutral, factual, sometimes robotic
Mistral: Professional, concise, direct
Gemini: Fact-oriented, objective, reserved
Practical impact:
For empathy-driven interview analyses (“What are the emotional drivers behind this behavior?”), I tend to favor Claude. For quantitative data synthesis (“How are the pain points distributed across user segments?”), I prefer ChatGPT.
However, this tool selection is a methodological preliminary decision that I must document—just as I would document whether I am using qualitative or quantitative methods.
What does this mean for your UX research?
Problem 1: Method validity is compromised
The scientific quality of research stands or falls with the reproducibility and traceability of the methods. If invisible system prompts influence your results without you noticing or being able to control them, both are compromised.
Specific scenarios:
Scenario A: Formatting bias You analyze usability test results. Claude summarizes the insights in continuous text, ChatGPT creates a table. Your stakeholders get different impressions of how structured and “valid” your findings are – simply because of the presentation.
Scenario B: Design preference bias When evaluating design feedback, Claude weights modern, bold suggestions higher than conservative ones. You present “the most important insights” – but in reality, they are only the insights that match Claude's design preferences.
Scenario C: Flattery filter You have positive user quotes summarized. Claude systematically skips praise because it has been trained to avoid flattery. In your synthesis, positive voices appear less prominent than negative ones—not because the data warrants it, but because the system prompt dictates it.
The core problem: You lose control over a critical part of your methodology without realizing it.
Problem 2: Tool selection as a methodological decision
In my work, I have developed the following general rule:
Research task | Potentially better tool | Reason |
Empathy-driven interview analysis | Claude | Warmer, more human tone |
Quantitative data synthesis | ChatGPT | Structured formats, tables |
Compliance-critical documentation | Claude | Stronger focus on security |
Fast exploratory analyses | Mistral | Shorter, more direct answers |
But: This table is not a neutral recommendation. It is a methodological preliminary decision that you must make transparent.
If you write in your research report, “I analyzed the interviews with Claude,” you should also explain why—and what potential biases this entails.
Very few people currently do this. Yet we would do the same for any other methodological decision (“I conducted qualitative interviews instead of quantitative surveys because...”).
Problem 3: Qualitative research is particularly vulnerable
With quantitative data (numbers, statistics, click rates), the influence of system prompts is usually less significant. Numbers remain numbers.
In qualitative research—where nuances, context, and ambiguities are important—system prompts can have a massive impact:
Theme recognition: ChatGPT is instructed to create “diverse, inclusive, and exploratory scenarios.” This is fundamentally positive—but it could lead to diversity-related topics being overemphasized in your analysis, while other aspects are overlooked.
Sentiment analysis: If Claude skips flattery, positive sentiment signals could be systematically underestimated. Your “objective” sentiment analysis would then be skewed—without you even noticing.
Persona development: If Claude favors modern, bold designs, conservative user segments may be underrepresented in your personas. You think you have mapped the “typical users” – but you have only mapped the users who match Claude's preferences.
Problem 4: Data protection and responsibility
Data protection differences:
Anthropic does not automatically use user interactions for training – unless you actively opt in. Interestingly, rating responses (thumbs up/down) is already considered opt-in.
ChatGPT has different policies – depending on the subscription model and region.
If you are analyzing sensitive research data (e.g., interviews from the healthcare sector), this is a critical difference.
Bias responsibility:
If your research insights are influenced by invisible system prompt biases, who bears the responsibility?
You as a researcher, because you chose the tool?
The tool, because it built in the biases?
The provider, because they defined the system prompts?
This question remains unanswered—and it becomes increasingly relevant the more we rely on AI-supported research methods.
Practical recommendations for action: How to stay in control
The good news is that you are not helplessly at the mercy of this. There are concrete strategies for regaining control over AI-supported research.
1. Use user-defined prompts as a control mechanism
Custom instructions are your most important weapon against invisible biases.
Example of a research persona for custom instructions:
You are a UX research assistant with the following principles:
DESCRIPTIVE, NOT PRESCRIPTIVE
- Describe what is in the data
- Do not give design recommendations unless I explicitly ask for them
SEPARATION: OBSERVATION VS. INTERPRETATION
- Clearly mark what is direct observation
- Always label interpretations as such
CONFIDENCE LEVEL
- Tell me how confident you are about each insight
- Identify data gaps and uncertainties
NEUTRALITY
- No preference for modern vs. conservative designs
- Treat all user statements equally
FORMATTING
- Use lists if they increase clarity
- Use continuous text if context is important
- If unsure, ask me which format I prefer
Tool-specific compensations:
With Claude, I often add:
“Use lists and bullet points if they improve clarity. Continuous text is not always better.”
With ChatGPT, I write:
“Avoid tables for narrative insights. Not everything needs to be structured.”
2. Multi-tool triangulation for critical analyses
Basic rule: Any analysis that influences important decisions should be performed with at least two different tools.
My workflow:
Initial analysis with my standard tool (usually Claude, because I like the tone)
Second analysis with ChatGPT for cross-checking
Comparison: Where do the results match? Where do they differ?
Interpretation: Why might the discrepancies have arisen? Which tool biases play a role?
Synthesis: Create final insights based on both analyses
Yes, this requires more effort. But for important research projects, this additional validation is worth the time.
3. Establish a research protocol
Documentation has always been important in research—with AI-supported research, it becomes essential.
Sample template:
RESEARCH LOG: [Project name]
TOOL SELECTION
- Primary tool: Claude Sonnet 4
- Secondary tool (validation): ChatGPT 4
CUSTOM INSTRUCTIONS USED
- Research persona (see above)
- Special instructions: “Treat all design preferences equally”
KNOWN TOOL BIASES
- Claude: Preference for modern designs, no lists in prose, skips flattery
- ChatGPT: Tendency toward tables and structured formats
CONTROL MEASURES
- Critical insights validated with both tools
- Positive user statements manually checked for underrepresentation
- Design feedback compared against raw data
DIFFERENCES BETWEEN TOOLS
- [Document specific differences in the results]
- [Interpretation: Why might these have arisen?]
FINAL DECISION
- [Which insights were included in the final report and why?]
4. Human-in-the-loop remains essential
AI is a tool for increasing efficiency – not a substitute for human judgment.
My workflow:
AI makes initial synthesis (quick overview)
I validate against raw data (samples from the transcripts)
Critical interpretation by me (context that AI lacks)
AI helps with formulation (formulating final insights)
AI speeds up the process – but I make the critical decisions.
5. Transparency towards stakeholders
Communicate AI use openly:
Weak wording:
“We analyzed the user interviews and identified the following insights...”
Better wording:
“We analyzed the user interviews with the support of Claude 4. To minimize tool-specific biases, we additionally validated critical insights with ChatGPT and checked them against the original transcripts on a random basis. The final insights are based on this multi-tool validation.”
This builds trust and demonstrates methodological rigor.
6. Develop a team-wide AI tool profile
In my team, we have a shared document that we update regularly:
“AI Tools in UX Research: Strengths, Weaknesses, Best Practices”
In it, we document:
Which tool is suitable for which research task?
What known biases does each tool have?
Which custom instructions have proven effective?
Lessons learned from past projects
The document is a living artifact—we are constantly learning and adapting our practices.
Checklist: Conducting AI-supported UX research responsibly
✅ Before the research
[ ] Document and justify tool selection
[ ] Define custom instructions for the research context
[ ] Identify known tool biases
[ ] Check data protection compliance (especially for sensitive data)
[ ] Decide: single-tool or multi-tool validation?
✅ During the research
[ ] For critical analyses: perform multi-tool comparison
[ ] Keep a research log (tool, biases, control measures)
[ ] Always validate AI output against raw data (random samples)
[ ] Clearly mark uncertainties and interpretations
[ ] Document discrepancies between tools
✅ After the research
[ ] Document methodology transparently
[ ] Reflect on tool influence on results
[ ] Inform stakeholders about AI use
[ ] Record learnings for future projects
[ ] Update team knowledge base
Frequently asked questions
Do I have to use multiple tools for every research task?
No. For exploratory analyses or internal interim reports, one tool is usually sufficient. However, for important decisions (e.g., strategic product pivots based on research), you should use multi-tool validation.
How can I find out which system prompts a tool uses?
The official system prompts are usually not published. However, there are leaked versions on GitHub (e.g., repository “system_prompts_leaks”). These are not always up to date, but they give a good impression of the differences.
Are custom instructions enough to compensate for tool biases?
Partially. Custom instructions help, but they cannot override all system prompt effects. Therefore, multi-tool validation is still useful for critical analyses.
Which tool is “best” for UX research?
There is no “best” tool – only tools that are better suited for specific tasks. Claude has advantages in empathy-driven analysis, ChatGPT in structured data synthesis. The tool selection should match the research task.
Can I use AI tools for GDPR-sensitive data?
That depends on the tool and your use case. Anthropic, for example, offers enterprise versions with GDPR compliance. For sensitive data, you should check with your legal department about the specific tools and their data protection policies.
Conclusion: AI tools are tools – treat them as such
System prompts are the invisible hand that controls AI-supported research. They influence how tools interpret, structure, and present your data – without you being able to see or fully control them.
The key message:
Invisible system prompts are a black box with potential for bias. User-defined prompts (custom instructions) are your most important control tool. Treat AI tools like any other research instrument: critically, documented, transparently.
Three immediate actions for today:
Define your research persona as a custom instruction in your preferred tool
Start a research log for your next project (use the template above).
Try multi-tool validation for your next important analysis.
The AI revolution in UX research is unstoppable—and that's a good thing. But we need to shape it responsibly. That means transparency about tool usage, awareness of biases, and methodological rigor.
The more we as a UX community talk about these issues, the better our practices will become. Share your experiences, experiment with different tools, and document your learnings.
The best research insights don't come from blindly trusting AI – they come from treating it for what it is: a powerful but not neutral tool.
As of February 2026

Further resources
GitHub: system_prompts_leaks – Collection of leaked system prompts from ChatGPT, Claude, Gemini
Anthropic System Prompt Release Notes – Official documentation on Claude
Simon Willison: “Highlights from the Claude 4 system prompt” – Detailed analysis
Fortelabs: “A Guide to the Claude 4 and ChatGPT 5 System Prompts” – Practical comparison
Do you have experience with AI tools in UX research? What challenges do you encounter? Let's discuss in the comments.

💌 Want more? Then read on – in our newsletter.
Published four times a year. Sticks in your mind longer. https://www.uintent.com/de/newsletter
RELATED ARTICLES YOU MIGHT ENJOY
AUTHOR
Tara Bosenick
Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.
At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.
She is one of the leading voices in the UX, CX and Employee Experience industry.

















.png)


