CS3 lab for Computational Survey and Social Science is an interdisciplinary group of researchers from various fields, such as social and computer science, assembling expert knowledge in survey methodology, ux research, machine learning, NLP, and generative AI. CS3 lab is led by Prof. Dr. Jan Karem Höhne and situated in the Research Infrastructure and Methods Department at the German Centre for Higher Education Research and Science Studies (DZHW). Together, the members of CS3 lab constantly explore new avenues for extending the methodological and analytical toolkit for substantive social science research.

In our research, we utilize online surveys as a comprehensive tool for collecting various digital data about people’s attitudes, traits, and behaviors. This includes trace data from mobile apps, search queries, and website visits to, for example, draw conclusions about people’s living conditions, such as pregnancy and parenthood. This is accompanied by research on smartphone sensors, such as accelerometer data for inferring motion conditions and activity levels. Similarly, we introduce qualitative research impulses to quantitative data collection by gathering voice answers to open narrative questions that are recorded through the built-in microphone of smartphones. In doing so, we are going beyond pure text-as-data methods extracting tonal cues to infer affective states in situ. Finally, we successively engage in social media sampling strategies evaluating data integrity. This especially includes the threat through bots that potentially shift research outcomes.

Most recently, we started to work on fusing conversational AI-based interviewers with online surveys. Specifically, we envision a fusion of elements of interviewer-based and online surveys with a multi-modal agent that is sensible to text, voice, and video input potentially increasing survey participation, respondent satisfaction, and data quality. The ultimate goal is to create Embodied Conversational Agents (ECAs) that autonomously conduct interviews facilitating human-like interactions in online surveys.

Our methodological and data collection efforts are accompanied by the utilization of powerful data analysis techniques. For example, we use machine learning algorithms in unsupervised settings to analyze human-based answer behavior. We extract features from textual and non-textual information. This includes downstream analyses of text, such as topic modeling, sentiment analysis, and entailment, using special embeddings, linguistic features, and transformers. We also apply pre-trained classification models, such as Support Vector Machines, for predicting affective states with modular structures facilitating the fusion of, for example, voice and image data.

In all research efforts, we ascribe ourselves to open-science. We release data collection tools, analysis codes, and models through open-source repositories, such as Harvard Dataverse, HuggingFace, and Github. We also release data for extended replications and quality assurance following the notion of the European Research Council (ERC): As open as possible, as closed as necessary.

CS3 lab members and collaborators
Jan Karem Höhne (Computational Survey and Social Science; Head)
Joshua Claaßen (Computational Survey and Social Science; Member)
Manuel Uwihs (Political Science; Member)
Hayastan Avetisyan (Computational Liguistics; Collaborator)
David Broneske (Computer Science; Collaborator)
Saijal Shahania (Computer Science; Collaborator)