What
The reflexive machine learning (ML) toolbox provides a pipeline and task-oriented tools for the analysis of text data by means of machine learning techniques. The tools are taylored to use cases in interpretivist social research and the humantities (e.g. digital media studies, sociology, historical research, anthropology, political science, science and technology studies). Rather than providing statistical methods for text analysis, our tools aim at supporting qualitative analysis processes such as data exploration, labeling, coding and analysis. We thrive for a ‘reflexive’ approach to digital data analysis, which enables a critical reflection of the ways our techniques co-construct our research objects and phenomena.
Where
The code of the toolbox will be published here: https://github.com/FUB-HCC/Reflexive-Machine-Learning-Toolbox
(following soon - please bear with us!)
For whom
We are working with a variety of people to make the toolbox valuable for research scenarios. So far, these scenarios include
Social media research (Sonja)
Sonja (pseudonym) is a media scholar and analyzes online debates on social media platforms. She is interested in the ways people debate certain issues online (e.g. climate change, COVID19) or how and why they debate (e.g. political deliberation, hate speech, faktoids). In the past, she focused on post-video discussions on YouTube and debates on Twitter, but plans to focus on other platforms in the future (e.g. Telegram, Reddit). Her main empirical material consists of the postings (e.g. tweets) arranged in a list. Sonja typically uses qualitative coding techniques for theory buidling (e.g. grounded theory) to make sense of the data. She would label the data items by hand in MS Excel, then iteratively refine her codes (“labels”) and compare the data with existing theories from media studies and social science literature. Sonja also experimented with qualitative data analysis software (MAXQDA) to do this kind of work, which recently added functionalities for social media text analysis. She decided, however, that the functionalities are not helpful for his purposes or impede the way she likes to structure her data coding and analysis.
The main challenge for Sonja is to become familiar with relatively large datasets, identify interesting data points and get hold of changing dynamics in the discussion of a subject. She typically spends a lot of time scrolling through the data and looking for these elements. Much of the data is actually not particularly interesting for her and hidden gems may only appear after long sessions of such scrolling and diagonal reading. It would be great to have technical instruments available to gain new perspectives and ‘surface’ interesting data points and their relationships.
Sonja has basic knowledge of programming languages for data science (R, Python) and is familiar with data analysis software used in social media analysis (e.g. network analysis). She is open and ready to set up his own programming environment (e.g. a Jupyter Notebook) and is interested to incorporate and experiment with state-of-the-art technologies (e.g. machine learning).
More scenarios to come…
By whom
The reflexive ML toolbox is the outcome of an interdisciplinary collaboration at the Human-Centered Computing research group of Freie Universität Berlin. Main contributors are Michael Tebbe, Simon David Hirsbrunner and Claudia Müller-Birn.
Re-use
We strongly support and encourage the re-use and further development of our software!
The toolbox is licensed with AGPL-3.0.
Cite
If the toolbox was used in your own research projects, please use the following citation in your publications:
Tebbe, Michael, Simon David Hirsbrunner and Claudia Müller-Birn (2021): Reflexive Machine Learning Toolbox. Version 0.1. [Software] doi: https://zenodo.org/badge/latestdoi/372527512