Counting Potatoes vs. Computational Mysticism - Using CHAOSS for Research


28 February 2024

In this episode, host Georg Link is joined by Daniel, Anita, Sophia, and Sean, to discuss their research experiences with CHAOSS metrics and software for open source community health analysis. They dive into various topics, such as collecting and interpreting data from different perspectives, considerations regarding privacy and ethics, and the importance of collaboration between academics and industry professionals. They also highlight some significant projects and studies where CHAOSS metrics and software were employed, and their hopes and concerns for the future direction of research in the field. Furthermore, they discuss the necessity of bridging the gap between academia and industry and touch on the importance of linguistics and cultural context when examining data. Download this episode now!

[00:02:48] Anita discusses the history of open source software research and how CHAOSS provides a common framework for various metrics used by researchers, and Sean emphasizes the standardization of metrics by CHAOSS, which aids in consistency across research.

[00:04:52] Sophia highlights the discrepancies in metric calculations and definitions, seeking standard methodologies, especially for non-academic publications, and Daniel reflects on the differences in research approaches between academia and industry, emphasizing the importance of methodological rigor.

[00:08:25] Sean critiques academic papers for often lacking complete method descriptions, calling for a more rigorous methodological transparency, and Daniel shares about transitioning from academia to industry and the different expectations for communication and results.

[00:10:44] Georg inquires about the impact of CHAOSS research capabilities, and Daniel explains that CHAOSS is shaping research by reflecting the interests and observations of its contributors.

[00:12:16] Sean talks about the increased capacity for research offered by CHAOSS, particularly through tools like Grimoire Lab and Augur, Anita shares her experience using Grimoire Lab for creating interventions and dashboards for open source communities to monitor their projects, and Daniel adds historical context and mentions the importance of tools that allow the replication of analysis in research.

[00:17:10] Georg introduces a study using CHAOSS metrics and software that hasn’t been officially published yet, and Sophia shares some details and explains the study’s premise.

[00:21:00] Anita raises a philosophical point about the potential limitations of metrics, suggesting that they may only reflect what is observable and could lead to gamification if people optimize their behavior based on the metrics.

[00:22:14] Sean speaks about the importance of deep field engagement and the combination of social science with data mining to fully understand the data’s underlying human behavior. Sophia shares her perspective from market research, discussing the design of surveys, the selection bias inherent in data collection, and the importance of understanding the population that is excluded by the research filters used.

[00:25:56] Anita discusses the challenges of academic surveys, and Daniel discusses the bias that may arise from the data available.

[00:28:10] Sophia contemplates the behavioral nuances dictated by different platforms’ processes, and Sean suggests a focus on common software engineering processes across different tools and advocates for social scientific research in open source to better understand the human aspects.

[00:30:32] Georg transitions to discussing survey methodologies and their relation to CHAOSS metrics, and Anita shares her experiences with survey design for the international Apache Software Foundation community and implementation.

[00:33:10] Daniel reflects on the collaborative effort with the ASF community to ensure the survey’s terms and questions were appropriately adapted for an international audience. Sophia suggest the need for a consistent taxonomy is research to ensure cultural sensitivity and understanding.

[00:36:15] Sean touches on the use of large language models in research to identify common language patterns, discussing the ethical considerations of using machine learning to evaluate inclusivity in projects. Anita shares thoughts on presenting survey data responsibly and the need for careful consideration of what information is shared.

[00:38:53] Georg questions the future direction for research in open source using metrics and software. Sean advocates for deeper social scientific engagement, Anita points out the silos between industry and academics, highlighting the need for more interaction and collaboration to synergize efforts and ask more relevant questions, and Sophia stresses the need to focus on gaps in data and to consider work not visible in trace data.

[00:42:59] Daniel brings a pessimistic view, cautioning that the different goals of industry and academia might lead to problems unless they find ways to work together more effectively.

[00:44:11] Georg asks Daniel to clarify the problems he foresees with the current research trajectories. Daniel elaborates on the potential ethical and legal issues that may arise when data is used beyond the limits of fair use, such as in mental health analysis from developer messages, and Sean and Anita add some thoughts as well.

Value Adds (Picks) of the week:

  • [00:47:09] Georg’s pick is baking cookies.
  • [00:47:59] Sean’s pick is a book he read called, “Language Variation and Change in Social Networks.”
  • [00:48:31] Anita’s pick is a book she is helping write on “Inclusive Open Source.”
  • [00:48:59] Daniel’s pick is two books he read called, “The Culture Map” and \ “From the Soil.”
  • [00:50:54] Sophia’s pick is returning to FOSDEM, seeing people, and learning about a new tool called, Cosma.


