CS3 lab for Computational Survey and Social Science is an interdisciplinary group of researchers from various fields, such as social and computer science, assembling expert knowledge in survey methodology, ux research, machine learning, NLP, and generative AI. CS3 lab is led by Prof. Dr. Jan Karem Höhne and situated in the Research Infrastructure and Methods Department at the German Centre for Higher Education Research and Science Studies (DZHW). Together, the members of CS3 lab constantly explore new avenues for extending the methodological and analytical toolkit for substantive social science research.

In our research, we utilize online surveys as a comprehensive tool for collecting various digital data about people’s attitudes, traits, and behaviors. This includes trace data from mobile apps, search queries, and website visits to, for example, draw conclusions about people’s living conditions, such as pregnancy and parenthood. This is accompanied by research on smartphone sensors, such as accelerometer data for inferring motion conditions and activity levels. Similarly, we introduce qualitative research impulses to quantitative data collection by gathering voice answers to open narrative questions that are recorded through the built-in microphone of smartphones. In doing so, we are going beyond pure text-as-data methods extracting tonal cues to infer affective states in situ. Finally, we successively engage in social media sampling strategies and investigate the potential of synthetic data for social science research.

Most recently, we started to work on fusing conversational AI-based interviewers with online surveys. Specifically, we envision a fusion of elements of interviewer-based and online surveys with a multi-modal agent that is sensible to text, voice, and video input potentially increasing survey participation, respondent satisfaction, and data quality. The ultimate goal is to create Embodied Conversational Agents (ECAs) that autonomously conduct interviews facilitating human-like interactions in online surveys.

Our methodological and data collection efforts are accompanied by the utilization of powerful data analysis techniques. For example, we use machine learning algorithms in unsupervised settings to analyze human-based answer behavior. We extract features from textual and non-textual information. This includes downstream analyses of text, such as topic modeling, sentiment analysis, and entailment, using special embeddings, linguistic features, and transformers. We also apply pre-trained classification models, such as Support Vector Machines, with modular structures facilitating the fusion of, for example, voice and image data.

In all research efforts, we ascribe ourselves to open-science. We release data collection tools, analysis codes, and models through open-source repositories, such as Harvard Dataverse, HuggingFace, and Github. We also release data for extended replications and quality assurance following the notion of the European Research Council (ERC): As open as possible, as closed as necessary.

In addition, the CS3 lab holds a regular CS3 meeting by inviting international expert researchers to present their contemporary work in the field of computational survey and social science. CS3 meeting takes palce online during summer (April to July) and winter terms (October to January) and is broadcasted to the scientific community. In doing so, the CS3 lab aims to provide a floor for scientific exchange and the discussion of novel research directions in the social sciences and beyond.

Upcoming and recent CS3 meetings:
Anke Radinger, Ulrike Efu Nkong, & Dorothée Behr – GESIS Leibniz Institute for the Social Sciences (January 16, 2025 from 3:15 to 4:00 PM)
Comparing professional translators and social scientists when producing questionnaire translations from scratch vs. based on machine translation output

Laura Boeschoten – Utrecht University (December 12, 2024 from 3:15 to 4:00 PM)
Digital trace data collection through data donation

Oriol Bosch – Oxford University (July 12, 2024 from 12:15 to 1:00 PM)
Tell me what you read, and I will tell you who you are: a novel method for measuring ideology using web browsing data

Carolina Haensch, Leah von der Heyde, & Alexander Wenz – LMU Munich and University of Mannheim (June 26, 2024 from 12:15 to 1:00 PM)
Assessing bias in LLM-generated synthetic datasets: examining LLM personas in German and European elections

Timo Lenzner – GESIS Leibniz Institute for the Social Sciences (June 12, 2024 from 12:15 to 1:00 PM)
Integrating ChatGPT into cognitive pretesting procedures