AI in User Research: A Case Study on Analyzing International Surveys using ChatGPT
Our recent experiment to analyze an international survey using ChatGPT. Spoiler for fellow user researchers: Quote picking belongs to the past!
User research plays a vital role in evaluating digital products and understanding user experiences. With advancements in artificial intelligence (AI), specifically the use of chatbots using large language models (LLMs) like ChatGPT, we have an opportunity to expedite the analysis process of the user research studies. In this blog article, we report our recent experiment to analyze international user research surveys using ChatGPT.
Fellow user researchers, quote picking belongs to the past
Caution! ChatGPT uses user input to improve its model. As it is possible to retrieve training data from LLMs (Large Language Models), no confidential information should be given to ChatGPT!
Setting
The case study involved a usability survey of a digital product with approximately 1000 responses from different markets. The survey included both multiple-choice but also open-ended questions to give the participants the possibility to provide additional details. Our focus for the analysis was on the open-ended answers.
Goal
The primary goal of this case study was to analyze frequently mentioned topicsAdditionally, we aimed to select a few representative participant quotes for each topic.
Approach: Zero-shot prompting + two-steps approach
The most important aspects of our approach in utilizing ChatGPT in survey analysis are the following:
Zero-shot prompting:
We created a prompt that included a short description a survey question, instructions for analysis and output, and the survey data. Due to the limited character length of ChatGPT, we had to analyze the data for each market separately(around 300 responses were the limit). Here again, be careful not to pass the sensitive information to ChatGPT!
Two-steps approach:
- Step 1: We created a prompt to analyze frequently mentioned topics in the open-ended questions. Using the prompt, we made ChatGPT generate the response multiple times. By comparing the outputs, we selected the topics that appeared most frequently.
- Step 2: In this step, we included the selected top topics in the prompt to list up representative quotes. If necessary, we removed the categories to which not many quotes were listed up.
Also, to compare the quality of AI-supported analysis with analysis by human researcher, we analyzed the result from a few markets manually.
Prompts
The prompts we used in the case study were as follows:
For step 1
Below is an open-ended response from a survey, in which participants describe [what the question was about], with focus [specification]. One line is one response. Acting as a usability expert, analyze the responses according to the following instructions.
<<Instructions>>
Please create a table with the most frequently mentioned [topic]. In the first column, write up to 10 most frequently mentioned topics in the following <<data>>. Please ignore non-specific comments such as "I don't know" or comments that are not relevant for usability evaluation of the website such as "cars are not useful". In the second column, write the frequency of appearance of the topic. In the third column, cite all the related comments from <<data>> that fall into the topic in the first column. Do not modify the comments when you cite them.
Then create another table regarding LIKES following the procedure described above.
<<data>>
[Data, one response in one line]
For step 2
Below is an open-ended response from a survey, in which participants describe [what the question was about], with focus [specification]. One line is one response. Acting as a usability expert, analyze the responses according to the following instructions.
<<Instructions>>
Some of the responses in <<data>> falls into the categories shown in the section <<categories>>. For each category, pick FIVE responses from <<data>> that are LONGER THAN SIX WORDS and clearly elaborate the category. Output should be in form of bulletpoints with indenting.
<<categories>>
[List of the categories defined in step 1]
<<data>>
[Data, one response in one line]
Learnings
The case study yielded several valuable learnings:
- The averaged topics derived by ChatGPT were very similar to the categories created by human researchers, encompassing all the frequently mentioned aspects.
- Outputs from ChatGPT varied with each generation. To obtain reliable results, we recommend generating at least three responses using “regenerate response” button on ChatGPT and take the ones that most frequently appeared.
- By using appropriate prompts, it is possible to make ChatGPT retrieve quotes word by word, without summarizing or modifying tehm! Fellow user researchers, we must make use of it. Quote picking is so yesterday 😍
Other Approaches
Apart from the approach followed in this case study, there are other methods that can be explored for more complex analysis:
Prompt in form of pseudo-programs
For a more complex analysis and/or more control on ChatGPT’s behaviour, pseudo-programs might help. Just describe the steps in a format that look like programming languages, using “program vocabulary” such as if, for, while etc. For example, this pseudo-programs determine the compliance of responses to a given instruction, improve them if necessary, and then print the relevant comments in a table format.
For each topics in the table:
while !comply(instruction):
improve responses
Print_in_table(topics, frequency, relevant comments)
Print("done. Comply")
Utilize ChatGPT+
If available, using the "Code Interpreter" or Browsing function of ChatGPT+ can provide additional background information and enhance the analysis process. For detailed application, wait for the following case study!
Embedding and clustering
If the survey responses are of high quality, you might want to consider employing embedding techniques to cluster frequently mentioned topics together.
Caution!
While AI-powered analysis can be beneficial, we need to take certain precautions:
Quality check
Just like with normal surveys, you need to thoroughly check the quality of the responses. Participants may provide inaccurate information, or even just make up some non-existent feature, so it is crucial to cross-verify and validate the result generated by ChatGPT.
Data security
It is important to consider data security when using ChatGPT. As ChatGPT uses user inputs as training data, there is still a possibility of retrieving sensitive data (even when the data appears only once in training set! For details see this paper). Therefore, DO NOT FEED ChatGPT with ANY SENSITIVE INFORMATION!
Conclusion
AI, particularly large language models, can significantly contribute to user research analysis. This case study demonstrates the feasibility and effectiveness of using ChatGPT to analyze international surveys. By leveraging the capabilities of AI, researchers can gain valuable insights in a faster and more efficient manner. However, it is crucial to approach AI analysis with caution, ensuring quality checks and safeguarding data privacy.