Further Explorations on the Use of Large Language Models for Thematic Analysis. Open-Ended Prompts, Better Terminologies and Thematic Maps
DOI:
https://doi.org/10.17169/fqs-25.3.4196Keywords:
thematic analysis, large language models, semi-structured interviews, initial coding, thematic mapsAbstract
In this manuscript I build upon an initial body of research developing procedures for leveraging large language models (LLMs) in qualitative data analysis, by carrying out thematic analysis (TA) with LLMs. TA is used to identify patterns by means of initial labelling of qualitative data followed by the organisation of the labels/codes by themes.
First, I propose a new set of LLM prompts for initial coding and generation of themes. These new prompts are different from the typical prompts deployed for such analysis in that they are entirely open-ended and rely on TA language. Second, I investigate the process of removing duplicate initial codes through a comparative analysis of the codes of each interview against a cumulative codebook. Third, I explore the construction of thematic maps from the themes elicited by the LLM. Fourth, I evaluate the themes produced by the LLM against the themes produced manually by humans. For conducting this research, I employed a commercial LLM via an application program interface (API). Two datasets of open access semi-structured interviews were analysed to demonstrate the methodological possibilities of this approach. I conclude with practical reflections on performing TA with LLM, enhancing our knowledge of the field.
Downloads
References
Braun, Virginia & Clarke, Victoria (2006). Using thematic analysis in psychology. Qualitative Research in Psychology, 3(2), 77-101.
Braun, Virginia; Clarke, Victoria & Weate, Paul (2016). Using thematic analysis in sport and exercise research. In Brett Smith & Andrew C. Sparkes (Eds.), Routledge handbook of qualitative research in sport and exercise (pp.191-205). New York, NY: Routledge.
Chandrasekaran, Dhivya & Mago, Vijay (2021). Evolution of semantic similarity—a survey. ACM Computing Surveys (CSUR), 54(2), 1-37.
Chen, Banghao; Zhang, Zhaofeng; Langrené, Nicholas & Zhu, Shengxin (2023). Unleashing the potential of prompt engineering in large language models: A comprehensive review. arXiv preprint, https://arxiv.org/abs/2310.14735 [Date of Access: June 29, 2024].
Chew, Robert; Bollenbacher, John; Wenger, Michael; Speer, Jessica & Kim, Annice (2023). LLM-assisted content analysis: Using large language models to support deductive coding. arXiv preprint, https://arxiv.org/abs/2306.14924 [Date of Access: December 11, 2023].
Clarke, Victoria & Braun, Virginia (2017). Thematic analysis. The Journal of Positive Psychology, 12(3), 297-298.
Curty, Renata; Greer, Rebecca & White, Torin (2021). Teaching undergraduates with quantitative data in the social sciences at University of California Santa Barbara: A local report, https://doi.org/10.25436/E2101H [Date of Access: May 10, 2024].
Curty, Renata; Greer, Rebecca & White, Torin (2022). Teaching undergraduates with quantitative data in the social sciences at University of California Santa Barbara [Data set], https://doi.org/10.25349/D9402J [Date of Access: May 10, 2024].
Dai, Shih-Chieh; Xiong, Aiping & Ku, Lun-Wei (2023). LLM-in-the-loop: Leveraging large language model for thematic analysis. arXiv preprint, https://arxiv.org/abs/2310.15100 [Date of Access: January 6, 2024].
De Paoli, Stefano (2023a). Writing user personas with large language models: testing phase 6 of a thematic analysis of semi-structured interviews. arXiv preprint, https://arxiv.org/abs/2305.18099 [Date of Access: June 15, 2024].
De Paoli, Stefano (2023b). Performing an inductive thematic analysis of semi-structured interviews with a large language model: An exploration and provocation on the limits of the approach. Social Science Computer Review, https://doi.org/10.1177/08944393231220483 [Date of Access: December 7, 2023].
De Paoli, Stefano & Mathis, Walther S. (2024). Reflections on inductive thematic saturation as a potential metric for measuring the validity of an inductive thematic analysis with LLMs. arXiv preprint, https://arxiv.org/abs/2401.03239 [Date of Access: June 15, 2024].
Drápal, Jakub; Westermann, Hannes & Savelka, Jaromir (2023). Using large language models to support thematic analysis in empirical legal studies. arXiv preprint, https://arxiv.org/abs/2310.18729 [Date of Access: January 16, 2023].
Gao, Jie; Guo, Yuchen, Lim; Gionnieve, Zhan; Tianqin, Zhang; Zheng, Li; Toby, Jia-Jun L. & Perrault, Simon T. (2023). CollabCoder: A GPT-powered workflow for collaborative qualitative analysis. arXiv preprint, https://arxiv.org/abs/2304.07366 [Date of Access: January 22, 2024].
Hamilton, Leah; Elliott, Desha; Quick, Aaron; Smith, Simone & Choplin, Victoria (2023). Exploring the use of ai in qualitative analysis: A comparative study of guaranteed income data. International Journal of Qualitative Methods, 22, https://doi.org/10.1177/16094069231201504 [Date of Access: January 22, 2024].
Hanchard, Matthew & San Roman Pineda, Itzel (2023). Fostering cultures of open qualitative research: Dataset 2—Interview transcripts, https://doi.org/10.15131/shef.data.23567223.v2 [Date of Access: October 24, 2023].
Hoxtell, Annette (2019). Automation of qualitative content analysis: A proposal. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 20(3), Art. 15, http://dx.doi.org/10.17169/fqs-20.3.3340 [Date of Access: January 16. 2024].
Huber, Patrick & Carenini, Giuseppe (2022). Towards understanding large-scale discourse structures in pre-trained and fine-tuned language models. arXiv preprint, https://arxiv.org/abs/2204.04289 [Date of Access: December 12, 2023].
Lee, Vien V.; van der Lubbe, Stephanie C.; Goh, Leih H. & Valderas, Jose M. (2023). Harnessing ChatGPT for thematic analysis: Are we ready?. arXiv preprint, https://arxiv.org/abs/2310.14545 [Date of Access: December 12, 2023].
Madaan, Aman; Tandon, Niket; Gupta, Prakhar; Hallinan, Skyler; Gao, Luyu, Wiegreffe; Sarah, Alon, Uri; Dziri, Nouha; Prabhumoye, Shrimai; Yang, Yiming; Gupta, Shashank; Majumder, Bodhisattwa P.; Hermann, Katherine; Welleck, Sean; Yazdanbakhsh, Amir & Clark, Peter (2023). Self-refine: Iterative refinement with self-feedback. arXiv preprint, https://arxiv.org/abs/2303.17651 [Date of Access: December 12, 2023].
Maguire, Moira & Delahunt, Brid (2017). Doing a thematic analysis: A practical, step-by-step guide for learning and teaching scholars. All Ireland Journal of Higher Education, 9(3), 3351-3359, https://ojs.aishe.org/index.php/aishe-j/article/view/335 [Date of Access: October 7, 2023].
Ofoeda, Joshua; Boateng, Richard & Effah, Jhon (2019). Application programming interface (API) research: A review of the past to inform the future. International Journal of Enterprise Information Systems (IJEIS), 15(3), 76-95.
Reimers, Niels & Gurevych, Iryna (2019). Sentence-bert: Sentence embeddings using Siamese Bert-networks. arXiv preprint, https://arxiv.org/abs/1908.10084 [Date of Access: May 16, 2024].
Saldaña, Johnny (2021). The coding manual for qualitative researchers. London: Sage.
Saunders, Benjamin; Sim, Julius; Kingstone, Tom; Baker, Shula; Waterfield, Jackie; Bartlam, Bernadette; Burroughs, Heather & Jinks, Clare (2018). Saturation in qualitative research: Exploring its conceptualization and operationalization. Quality & Quantity, 52, 1893-1907, https://doi.org/10.1007/s11135-017-0574-8 [Date of Access: October 10, 2023].
Schiavone, Will; Roberts, Chirstopher; Du, David; Sauro, Jeff & Lewis, Jim (2023). Can ChatGPT replace UX researchers? An empirical analysis of comment classifications [Online post], https://measuringu.com/classification-agreement-between-ux-researchers-and-chatgpt/ [Date of Access: June 12, 2023].
Serrano, Sofia; Brumbaugh, Zander & Smith, Noah A. (2023). Language models: A guide for the perplexed. arXiv preprint, https://arxiv.org/abs/2311.17301 [Date of Access: December 10, 2023].
Terry, Gareth; Hayfield, Nikki; Clarke, Victoria & Braun, Virginia (2017). Thematic analysis. In Willig Carla & Stainton Rogers Wendy (Eds.), The Sage handbook of qualitative research in psychology (2nd ed., pp.17-37). London: Sage.
Waldherr, Annie; Wehden, Lars O.; Stoltenberg, Daniela; Miltner, Peter; Ostner, Sophia & Pfetsch, Barbara (2019). Inductive codebook development for content analysis: Combining automated and manual methods. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 20(1), Art. 19, https://doi.org/10.17169/fqs-20.1.3058 [Date of Access: January 16. 2024].
Wasserman, Stanley & Faust, Katherine (1994). Social network analysis: Methods and applications. Cambridge: Cambridge University Press.
Wiedemann, Gregor (2013). Opening up to big data: Computer-assisted analysis of textual data in social sciences. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 14(2), Art. 23, https://doi.org/10.17169/fqs-14.2.1949 [Date of Access: January 16. 2024].
Wollin-Giering, Susanne; Hoffmann, Markus; Höfting, Jonasw & Ventzke, Carla (2024). Automatic transcription of English and German qualitative interviews. Forum Qualitative Sozialforschung / Forum: Qualitative Social Research, 25(1), Art. 8, https://doi.org/10.17169/fqs-25.1.4129 [Date of Access: January 19, 2024].
Xiao, Ziang; Yuan, Xingdi; Liao, Vera Q.; Abdelghani, Rania & Oudeyer, Pierre Y. (2023). Supporting qualitative analysis with large language models: Combining codebook with GPT-3 for deductive coding. In Association for Computing Machinery (Ed.), Companion proceedings of the 28th International Conference on Intelligent User Interfaces (pp.75-78). New York, NY: Association for Computing Machinery, https://dl.acm.org/doi/proceedings/10.1145/3581641 [Date of Access: June 30, 2023].
Yu, Zihan; He, Liang; Wu, Zhen; Dai, Xinyu & Chen, Jiajun (2023). Towards better chain-of-thought prompting strategies: A survey. arXiv preprint, https://arxiv.org/abs/2310.04959 [Date of Access: December 10, 2023].
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Stefano De Paoli
This work is licensed under a Creative Commons Attribution 4.0 International License.