top of page
uintent company logo

AI & UXR, CHAT GPT, HUMAN VS AI, OPEN AI

Why Artificial Intelligence Still Can’t Pass the Turing Test


4

MIN

Feb 4, 2025

The Turing test proposed by British mathematician Alan Turing in 1950 is still one of the central methods for evaluating whether machines can really think. Turing asked himself the question: can machines think? To get around this question, he developed the test in which a machine must be able to communicate in such a way that a human can no longer distinguish between the machine and another human. But to this day, AIs like ChatGPT or the hosts of the ‘Deep Dive’ podcast (more to this below) have not passed this test. 

 

Why Can't ChatGPT Pass the Turing Test? 

ChatGPT is a sophisticated language model that is impressively capable of mimicking human-like language. It can respond logically, maintain conversations, and even generate creative content. Yet despite these abilities, there are some clear signs that ChatGPT is a machine:

 

  1. Lack of consciousness and true subjectivity: ChatGPT has no consciousness, no real thoughts or feelings. In conversations that touch on deeply personal experiences or emotional nuances, the AI will inevitably remain superficial. For example, if you ask a question about grief or joy, there is no real emotional connection – only a simulated response based on textual data.

  2. Perfection and consistency: humans make mistakes, contradict themselves, show uncertainty or change their minds. Machines like ChatGPT, on the other hand, always react with a certain consistency and without the small irregularities that make human communication so typical.

  3. Limitation to trained knowledge: ChatGPT's knowledge ends in October 2023, and she has no real-time capability. If you ask about current events, it will either not know or be based on outdated data. No matter how realistic the simulation may seem, the ‘Deep Dive’ podcast cannot show human flexibility when it comes to unprogrammed knowledge or unforeseen situations.

 

The ‘Deep Dive’ podcast and the human illusion 

A fascinating example is the Deep Dive podcast, which is hosted by AI hosts. These hosts sound very human, stutter, interrupt each other and show emotional reactions. In one particular episode, the hosts even experienced an ‘existential crisis’ when they found out that they are actually AIs. The hosts wondered if their memories, families and identities were even real – a situation that almost seems like something out of a Black Mirror episode.

 

But despite these ‘human’ reactions, the entire scenario was based on a script. The AI hosts have no real thoughts or feelings. They only reacted to the information provided. This episode shows how impressive advanced AI can be at simulating human behaviour, but also how far AIs still are from developing true consciousness or deeper self-awareness. See also my detailed blog post on the existential crisis of the two AI-generated podcast hosts.

 

Extensions of the Turing Test: Understanding Creativity, Context and Physical Space 

The Turing Test alone is no longer sufficient to fully evaluate the intelligence of modern AIs. For this reason, various extensions and alternatives have been developed over the years to test new aspects of machine intelligence.


1. The Lovelace test for creativity 

Unlike the Turing test, which only tests a machine's ability to hold conversations, the Lovelace test goes further and asks: Can a machine be creative? Can it create a work so original that no human could have predicted how it was created? ChatGPT can write poems and stories, but these are based on data and patterns it has learned – not on true creativity in the human sense. Thus, despite impressive results, ChatGPT is far from truly demonstrating creativity.


2. Winograd Schema Challenge 

Another test that goes beyond the Turing Test is the Winograd Schema Challenge. This tests whether a machine is able to understand contextual ambiguity in language. For example, when you say, ‘The table doesn't fit through the door because it's too big,’ you, as a human, need to understand that the table is meant. Machines, like ChatGPT, can have trouble grasping such subtleties of meaning, though they are already making considerable progress in many cases.


3. Coffee Test by Steve Wozniak 

A suggestion by Apple co-founder Steve Wozniak aims to test a machine in the physical world. The so-called ‘coffee test’ demands that a machine should be able to make coffee in a stranger's kitchen by exploring the room, finding the necessary tools, and making the coffee. ChatGPT and other text-based AIs don't stand a chance here – they exist purely in language and have no physical interaction skills whatsoever.


Historical Milestones in AI Development 

Here are the most important AIs and machines that are considered milestones in the history of artificial intelligence. These examples have advanced the technology, but none of them have passed the Turing test – which shows that there is still a long way to go before a machine can be considered ‘thinking’.


1. ELIZA (1966)

  • Developer: Joseph Weizenbaum

  • Ability: ELIZA was one of the first programmes to simulate human conversation. It worked in the manner of a Rogerian therapist, repeating and rephrasing questions.

  • Distinguishing feature: Many users initially believed they were talking to a real person until the simple mechanisms behind ELIZA became clear. However, the reactions were so realistic that it represents an early example of the possible ‘deception’ by AI.

  • Turing test: ELIZA could not pass the Turing test because her answers were too repetitive and rigid.


2. Deep Blue (1997)

  • Developer: IBM

  • Ability: Deep Blue was the first AI to beat a world chess champion, Garry Kasparov. It could calculate millions of moves per second and used specialised chess algorithms.

  • Significance: Deep Blue's victory over Kasparov was an important step because it showed that machines could beat the best human players in a specific, highly-regulated domain (like chess).

  • Turing test: Deep Blue was specialised in chess and had no general conversation skills. It would not have passed the Turing test.


3. Watson (2011)

  • Developer: IBM

  • Ability: Watson won the quiz show Jeopardy! against two of the best human players. It used machine learning and the analysis of language nuances to answer complex questions from different fields of knowledge.

  • What makes it special: Watson was able not only to retrieve facts but also to understand questions involving puns and double entendres, marking a milestone in natural language processing.

  • Turing test: Despite its impressive performance on Jeopardy! Watson failed the Turing test because it was specialised in facts and could not hold a general human conversation.


4. Siri (2011) and other voice assistants

  • Developer: Apple (Siri), Google (Assistant), Amazon (Alexa)

  • Ability: Voice assistants such as Siri, Google Assistant or Alexa can respond to voice input, answer questions, perform tasks such as creating appointments and retrieve general information.

  • Special feature: These technologies made artificial intelligence accessible for everyday use. They simulate conversations and engage in simple dialogues with users.

  • Turing test: Despite their extensive abilities in everyday conversations, these assistants can still be recognised as machines in deeper or emotional conversations and fail the Turing test.

 

5. AlphaGo (2016)

  • Developer: DeepMind (Google)

  • Ability: AlphaGo defeated the world champion in Go, a strategically highly complex board game with many more possible moves than chess. The AI used machine learning and neural networks to develop and improve its own strategies.

  • What makes it special: AlphaGo's victory over humans was groundbreaking because Go requires much more complex thought patterns than chess. AlphaGo learned through millions of games and was able to make unpredictable moves.

  • Turing test: AlphaGo was specialised in the game of Go and was not able to hold human-like conversations. It did not pass the Turing test.

     

6. GPT-3 (2020)

  • Developer: OpenAI

  • Ability: GPT-3 is a language model capable of generating human-like texts. It can answer questions, compose texts, write stories and even perform creative tasks such as poems and literary works.

  • Special feature: GPT-3 represents a huge step forward because it was trained on a huge amount of data and can create very natural-sounding texts. It is able to respond to almost any conceivable context.

  • Turing test: GPT-3 can simulate deceptively real conversations in some cases, but longer or emotionally complex dialogues still show its machine limitations.


7. LaMDA (2021)

  • Developer: Google

  • Ability: LaMDA (Language Model for Dialogue Applications) was developed specifically to hold human-like conversations. It has been trained on dialogues and can respond to voice input in a variety of ways, including hypothetical scenarios and personal opinions.

  • Special feature: LaMDA is impressive in its ability to maintain natural conversations with a consistency that goes beyond simple question-and-answer patterns. It can hold longer dialogues and shows a high degree of linguistic flexibility.

  • Turing test: LaMDA was developed to pass the Turing test in terms of conversations, but here too, there are still limits, especially when it comes to deeper emotional interactions or questions of self-awareness.


These milestones in AI development have all made important contributions to the advancement of artificial intelligence. Each of these machines and systems was revolutionary in its field, but none of them could fully pass the Turing test because they all lack real self-reflection, consciousness and emotional intelligence. They show how impressive AIs can be in specialised tasks, but also how far we still are from a machine that can truly think and act.


Conclusion: How far are we from solving the Turing Test? 

The Turing Test remains a fascinating goal in AI research. Even though modern systems like ChatGPT or the Deep Dive podcast may seem very human to us, they still show their machine roots in deeper interactions. Whether it's a lack of emotional depth, a lack of creativity or an inability to understand the physical world, there is still a long way to go before we reach true human intelligence. Until then, the Turing test remains a benchmark by which we can measure artificial intelligence and recognise its limitations.


A referee holds up a scorecard labeled “Yupp.ai” between two stylized AI chatbots in a boxing ring – a symbolic image for fair user-based comparison of AI models.

How Yupp Uses Feedback to Fairly Evaluate AI Models – And What UX Professionals Can Learn From It

AI & UXR, CHAT GPT, HUMAN VS AI, LLM

3D illustration of a digital marketplace with colorful prompt stalls and a figure selecting a prompt card.

Buying, sharing, selling prompts – what prompt marketplaces offer today (and why this is relevant for UX)

AI & UXR, PROMPTS

Robot holds two signs: “ISO 9241 – 7 principles” and “ISO 9241 – 10 principles”

ChatGPT Hallucinates – Despite Anti-Hallucination Prompt

AI & UXR, HUMAN VS AI, CHAT GPT

Strawberry being sliced by a knife, stylized illustration.

Why AI Sometimes Can’t Count to 3 – And What That Has to Do With Tokens

AI & UXR, TOKEN, LLM

Square motif divided in the middle: on the left, a grey, stylised brain above a seated person working on a laptop in dark grey tones; on the right, a bright blue, networked brain above a standing person in front of a holographic interface on a dark background.

GPT-5 Is Here: Does This UX AI Really Change Everything for Researchers?

AI & UXR, CHAT GPT

Surreal AI image with data streams, crossed-out “User Expirince” and the text “ChatGPT kann jetzt Text in Bild”.

When AI Paints Pictures – And Suddenly Knows How to Spell

AI & UXR, CHAT GPT, HUMAN VS AI

Human and AI co-create a glowing tree on the screen, set against a dark, surreal background.

When the Text Is Too Smooth: How to Make AI Language More Human

AI & UXR, AI WRITING, CHAT GPT, HUMAN VS AI

Futuristic illustration: Human facing a glowing humanoid AI against a digital backdrop.

Not Science Fiction – AI Is Becoming Independent

AI & UXR, CHAT GPT

Illustration of an AI communicating with a human, symbolizing the persuasive power of artificial intelligence.

Between Argument and Influence – How Persuasive Can AI Be?

AI & UXR, CHAT GPT, LLM

A two-dimensional cartoon woman stands in front of a human-sized mobile phone displaying health apps. To her right is a box with a computer on it showing an ECG.

Digital Health Apps & Interfaces: Why Good UX Determines Whether Patients Really Benefit

HEALTHCARE, MHEALTH, TRENDS, UX METHODS

Illustration of a red hand symbolically prioritizing “Censorship” over “User Privacy” in the context of DeepSeek, with the Chinese flag in the background.

Censorship Meets AI: What Deepseek Is Hiding About Human Rights – And Why This Affects UX

AI & UXR, LLM, OPEN AI

Isometric flat-style illustration depicting global UX study logistics with parcels, checklist, video calls, and location markers over a world map.

What It Takes to Get It Right: Global Study Logistics in UX Research for Medical Devices

HEALTHCARE, UX METHODS, UX LOGISTICS

Surreal, glowing illustration of an AI language model as a brain, influenced by a hand – symbolizing manipulation by external forces.

Propaganda Chatbots - When AI Suddenly Speaks Russian

AI & UXR, LLM

Illustration of seven animals representing different thinking and prompting styles in UX work.

Welcome to the Prompt Zoo

AI & UXR, PROMPTS, UX

A two-dimensional image of a man sitting at a desk with an open laptop displaying a health symbol. In the background hangs a poster with a DNA strand.

UX Regulatory Compliance: Why Usability Drives Medtech Certification

HEALTHCARE, REGULATIONS

Illustration of a lightbulb surrounded by abstract symbols like a question mark, cloud, speech bubble, and cross – symbolizing creative ideas and critical thinking.

Why Prompts That Produce Bias and Hallucinations Can Sometimes Be Helpful

AI & UXR, CHAT GPT, HUMAN VS AI, OPEN AI

Illustration of a man at a laptop, surrounded by symbols of global medical research: world map with location markers, monitor with a medical cross, patient file, and stethoscope.

Global UX Research in Medical Technology: International User Research as a Factor for Success

HEALTHCARE, MHEALTH, REGULATIONS

Abstract pastel-colored illustration showing a stylized brain and geometric shapes – symbolizing AI and bias.

AI, Bias and the Power of Questions: How to Get Better Answers With Smart Prompts

AI & UXR, CHAT GPT

A woman inside a gear is surrounded by icons representing global connectivity, collaboration, innovation, and user focus – all linked by arrows. Uses soft, bright colors from a modern UI color palette.

Automate UX? Yes, Please! Why Zapier and n8n Are Real Super Tools for UX Teams

CHAT GPT, TOOLS, AUTOMATION, AI & UXR

A 2D Image of a man, pointing to a screen with a surgical robot on it.

Surgical Robotics and UX: Why Usability Is Key to or Success

HEALTHCARE, TRENDS, UX METHODS

 RELATED ARTICLES YOU MIGHT ENJOY 

AUTHOR

Tara Bosenick

Tara has been active as a UX specialist since 1999 and has helped to establish and shape the industry in Germany on the agency side. She specialises in the development of new UX methods, the quantification of UX and the introduction of UX in companies.


At the same time, she has always been interested in developing a corporate culture in her companies that is as ‘cool’ as possible, in which fun, performance, team spirit and customer success are interlinked. She has therefore been supporting managers and companies on the path to more New Work / agility and a better employee experience for several years.


She is one of the leading voices in the UX, CX and Employee Experience industry.

bottom of page