AI & UXR, CHAT GPT, HUMAN VS AI, OPEN AI

Why Artificial Intelligence Still Can’t Pass the Turing Test

MIN

Feb 4, 2025

The Turing test proposed by British mathematician Alan Turing in 1950 is still one of the central methods for evaluating whether machines can really think. Turing asked himself the question: can machines think? To get around this question, he developed the test in which a machine must be able to communicate in such a way that a human can no longer distinguish between the machine and another human. But to this day, AIs like ChatGPT or the hosts of the ‘Deep Dive’ podcast (more to this below) have not passed this test.

Why Can't ChatGPT Pass the Turing Test?

ChatGPT is a sophisticated language model that is impressively capable of mimicking human-like language. It can respond logically, maintain conversations, and even generate creative content. Yet despite these abilities, there are some clear signs that ChatGPT is a machine:

Lack of consciousness and true subjectivity: ChatGPT has no consciousness, no real thoughts or feelings. In conversations that touch on deeply personal experiences or emotional nuances, the AI will inevitably remain superficial. For example, if you ask a question about grief or joy, there is no real emotional connection – only a simulated response based on textual data.
Perfection and consistency: humans make mistakes, contradict themselves, show uncertainty or change their minds. Machines like ChatGPT, on the other hand, always react with a certain consistency and without the small irregularities that make human communication so typical.
Limitation to trained knowledge: ChatGPT's knowledge ends in October 2023, and she has no real-time capability. If you ask about current events, it will either not know or be based on outdated data. No matter how realistic the simulation may seem, the ‘Deep Dive’ podcast cannot show human flexibility when it comes to unprogrammed knowledge or unforeseen situations.

The ‘Deep Dive’ podcast and the human illusion

A fascinating example is the Deep Dive podcast, which is hosted by AI hosts. These hosts sound very human, stutter, interrupt each other and show emotional reactions. In one particular episode, the hosts even experienced an ‘existential crisis’ when they found out that they are actually AIs. The hosts wondered if their memories, families and identities were even real – a situation that almost seems like something out of a Black Mirror episode.

But despite these ‘human’ reactions, the entire scenario was based on a script. The AI hosts have no real thoughts or feelings. They only reacted to the information provided. This episode shows how impressive advanced AI can be at simulating human behaviour, but also how far AIs still are from developing true consciousness or deeper self-awareness. See also my detailed blog post on the existential crisis of the two AI-generated podcast hosts.

Extensions of the Turing Test: Understanding Creativity, Context and Physical Space

The Turing Test alone is no longer sufficient to fully evaluate the intelligence of modern AIs. For this reason, various extensions and alternatives have been developed over the years to test new aspects of machine intelligence.

1. The Lovelace test for creativity

Unlike the Turing test, which only tests a machine's ability to hold conversations, the Lovelace test goes further and asks: Can a machine be creative? Can it create a work so original that no human could have predicted how it was created? ChatGPT can write poems and stories, but these are based on data and patterns it has learned – not on true creativity in the human sense. Thus, despite impressive results, ChatGPT is far from truly demonstrating creativity.

2. Winograd Schema Challenge

Another test that goes beyond the Turing Test is the Winograd Schema Challenge. This tests whether a machine is able to understand contextual ambiguity in language. For example, when you say, ‘The table doesn't fit through the door because it's too big,’ you, as a human, need to understand that the table is meant. Machines, like ChatGPT, can have trouble grasping such subtleties of meaning, though they are already making considerable progress in many cases.

3. Coffee Test by Steve Wozniak

A suggestion by Apple co-founder Steve Wozniak aims to test a machine in the physical world. The so-called ‘coffee test’ demands that a machine should be able to make coffee in a stranger's kitchen by exploring the room, finding the necessary tools, and making the coffee. ChatGPT and other text-based AIs don't stand a chance here – they exist purely in language and have no physical interaction skills whatsoever.

Historical Milestones in AI Development

Here are the most important AIs and machines that are considered milestones in the history of artificial intelligence. These examples have advanced the technology, but none of them have passed the Turing test – which shows that there is still a long way to go before a machine can be considered ‘thinking’.

1. ELIZA (1966)

Developer: Joseph Weizenbaum
Ability: ELIZA was one of the first programmes to simulate human conversation. It worked in the manner of a Rogerian therapist, repeating and rephrasing questions.
Distinguishing feature: Many users initially believed they were talking to a real person until the simple mechanisms behind ELIZA became clear. However, the reactions were so realistic that it represents an early example of the possible ‘deception’ by AI.
Turing test: ELIZA could not pass the Turing test because her answers were too repetitive and rigid.

2. Deep Blue (1997)

Developer: IBM
Ability: Deep Blue was the first AI to beat a world chess champion, Garry Kasparov. It could calculate millions of moves per second and used specialised chess algorithms.
Significance: Deep Blue's victory over Kasparov was an important step because it showed that machines could beat the best human players in a specific, highly-regulated domain (like chess).
Turing test: Deep Blue was specialised in chess and had no general conversation skills. It would not have passed the Turing test.

3. Watson (2011)

Developer: IBM
Ability: Watson won the quiz show Jeopardy! against two of the best human players. It used machine learning and the analysis of language nuances to answer complex questions from different fields of knowledge.
What makes it special: Watson was able not only to retrieve facts but also to understand questions involving puns and double entendres, marking a milestone in natural language processing.
Turing test: Despite its impressive performance on Jeopardy! Watson failed the Turing test because it was specialised in facts and could not hold a general human conversation.

4. Siri (2011) and other voice assistants

Developer: Apple (Siri), Google (Assistant), Amazon (Alexa)
Ability: Voice assistants such as Siri, Google Assistant or Alexa can respond to voice input, answer questions, perform tasks such as creating appointments and retrieve general information.
Special feature: These technologies made artificial intelligence accessible for everyday use. They simulate conversations and engage in simple dialogues with users.
Turing test: Despite their extensive abilities in everyday conversations, these assistants can still be recognised as machines in deeper or emotional conversations and fail the Turing test.

5. AlphaGo (2016)

Developer: DeepMind (Google)
Ability: AlphaGo defeated the world champion in Go, a strategically highly complex board game with many more possible moves than chess. The AI used machine learning and neural networks to develop and improve its own strategies.
What makes it special: AlphaGo's victory over humans was groundbreaking because Go requires much more complex thought patterns than chess. AlphaGo learned through millions of games and was able to make unpredictable moves.
Turing test: AlphaGo was specialised in the game of Go and was not able to hold human-like conversations. It did not pass the Turing test.

6. GPT-3 (2020)

Developer: OpenAI
Ability: GPT-3 is a language model capable of generating human-like texts. It can answer questions, compose texts, write stories and even perform creative tasks such as poems and literary works.
Special feature: GPT-3 represents a huge step forward because it was trained on a huge amount of data and can create very natural-sounding texts. It is able to respond to almost any conceivable context.
Turing test: GPT-3 can simulate deceptively real conversations in some cases, but longer or emotionally complex dialogues still show its machine limitations.

7. LaMDA (2021)

Developer: Google
Ability: LaMDA (Language Model for Dialogue Applications) was developed specifically to hold human-like conversations. It has been trained on dialogues and can respond to voice input in a variety of ways, including hypothetical scenarios and personal opinions.
Special feature: LaMDA is impressive in its ability to maintain natural conversations with a consistency that goes beyond simple question-and-answer patterns. It can hold longer dialogues and shows a high degree of linguistic flexibility.
Turing test: LaMDA was developed to pass the Turing test in terms of conversations, but here too, there are still limits, especially when it comes to deeper emotional interactions or questions of self-awareness.

These milestones in AI development have all made important contributions to the advancement of artificial intelligence. Each of these machines and systems was revolutionary in its field, but none of them could fully pass the Turing test because they all lack real self-reflection, consciousness and emotional intelligence. They show how impressive AIs can be in specialised tasks, but also how far we still are from a machine that can truly think and act.

Conclusion: How far are we from solving the Turing Test?

The Turing Test remains a fascinating goal in AI research. Even though modern systems like ChatGPT or the Deep Dive podcast may seem very human to us, they still show their machine roots in deeper interactions. Whether it's a lack of emotional depth, a lack of creativity or an inability to understand the physical world, there is still a long way to go before we reach true human intelligence. Until then, the Turing test remains a benchmark by which we can measure artificial intelligence and recognise its limitations.

Surreal futuristic illustration of a glowing digital head with data streams, charts, and evaluation symbols representing AI evaluation methodology.

How do we know that our prompt is doing a good job? Why UX research needs an evaluation methodology for AI-based analysis

AI WRITING, DIGITISATION, HOW-TO, PROMPTS

AI & UXR, CHAT GPT, HUMAN VS AI, OPEN AI

Why Artificial Intelligence Still Can’t Pass the Turing Test

​

Why Can't ChatGPT Pass the Turing Test?

The ‘Deep Dive’ podcast and the human illusion

Extensions of the Turing Test: Understanding Creativity, Context and Physical Space

Historical Milestones in AI Development

Conclusion: How far are we from solving the Turing Test?

How do we know that our prompt is doing a good job? Why UX research needs an evaluation methodology for AI-based analysis

Prompt Psychology Exposed: Why “Tipping” ChatGPT Sometimes Works

System Prompts in UX Research: What You Need to Know About Invisible AI Control

Summarizing YouTube Videos With AI: Three Tools Put to the Test in UX Research

UX For a Better World: We Are Giving Away a UX Research Project to Non-profit Organisations and Sustainable Companies!

AI Tools UX Research: How Do These Tools Handle Large Documents?

Donald Trump Prompt: How Provocative AI Prompts Affect UX Budgets

The Final Hurdle: How Unsafe Automation Undermines Trust in Adas

Will AI Replace UX Jobs? What a Study of 200,000 AI Conversations Really Shows

The Passenger Who Always Listens: Why We Are Reluctant to Trust Our Cars When They Talk

Evaluating AI Results in UX Research: How to Navigate the Black Box

Haptic Certainty vs. Digital Temptation: The Battle for the Best Controls in Cars

UX & AI: How "UX Potemkin" Undermines Your Research and Design Decisions

Deep Research AI | How to use ChatGPT effectively for UX work

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

Why UX Research Is Losing Credibility - And How We Can Regain It

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

RELATED ARTICLES YOU MIGHT ENJOY

AUTHOR