Causation, Correlation, and Big Data in Social Science Research

Cowls, Josh and Schroeder, Ralph (2015) Causation, Correlation, and Big Data in Social Science Research. Policy & Internet 7 (4), 447-472.

The emergence of big data offers not only a potential boon for social scientific inquiry, but also raises distinct epistemological issues for this new area of research. Drawing on interviews conducted with researchers at the forefront of big data research, we offer insight into questions of causal versus correlational research, the use of inductive methods, and the utility of theory in the big data age. While our interviewees acknowledge challenges posed by the emergence of big data approaches, they reassert the importance of fundamental tenets of social science research such as establishing causality and drawing on existing theory. They also discussed more pragmatic issues, such as collaboration between researchers from different fields, and the utility of mixed methods. We conclude by putting the themes emerging from our interviews into the broader context of the role of data in social scientific inquiry, and draw lessons about the future role of big data in research.

Advertisements

The Ethics of Given-off versus Captured Data in Large-scale Social Research

Cowls, Josh and Schroeder, Ralph (2015) The Ethics of Given-off versus Captured Data in Digital Social Research. Workshop on Ethics for Studying Sociotechnical Systems in a Big Data World, CSCW 2015, March 2015, Vancouver, B.C., Canada.

This paper proposes new terminology to enhance understanding of how big data can be used for research, in both commercial and academic contexts. We distinguish between data as given-off and data as captured, and draw on insights from interviews conducted with researchers using such data to elaborate on this distinction. We conclude with a series of recommendations for research design and conduct, based on this re-conceptualization of ‘data’ and ‘capta’.

Ad-hoc encounters with big data: Engaging citizens in conversations around tabletops

Fjeld, Morten, Woźniak, Paweł, Cowls, Josh and Nardi, Bonnie (2015). Ad-hoc encounters with big data: Engaging citizens in conversations around tabletops. First Monday 20 (2).

The increasing abundance of data creates new opportunities for communities of interest and communities of practice. We believe that interactive tabletops will allow users to explore data in familiar places such as living rooms, cafés, and public spaces. We propose informal, mobile possibilities for future generations of flexible and portable tabletops. In this paper, we build upon current advances in sensing and in organic user interfaces to propose how tabletops in the future could encourage collaboration and engage users in socially relevant data-oriented activities. Our work focuses on the socio-technical challenges of future democratic deliberation. As part of our vision, we suggest switching from fixed to mobile tabletops and provide two examples of hypothetical interface types: TableTiles and Moldable Displays. We consider how tabletops could foster future civic communities, expanding modes of participation originating in the Greek Agora and in European notions of cafés as locales of political deliberation.

Big Data and Positive Change in the Developing World: Challenges and Opportunities

Taylor, Linnet, Cowls, Josh, Schroeder, Ralph and Eric T. Meyer (2014). Big Data and Positive Change in the Developing World: Challenges and Opportunities. Policy & Internet 6 (4), pp. 418-444.

This paper is the product of a workshop that brought together practitioners, researchers, and data experts to discuss how big data is becoming a resource for positive social change in low- and middle-income countries (LMICs). We include in our definition of big data sources such as social media data, mobile phone use records, digitally mediated transactions, online news media sources, and administrative records. We argue that there are four main areas where big data has potential for promoting positive social change: advocacy; analysis and prediction; facilitating information exchange; and promoting accountability and transparency. These areas all have particular challenges and possibilities, but there are also issues shared across them, such as open data and privacy concerns. Big data is shaping up to be one of the key battlefields of our time, and the paper argues that this is therefore an opportune moment for civil society groups in particular to become a larger part of the conversation about the use of big data, since questions about the asymmetries of power involved are especially urgent in these uses in LMICs. Civil society groups are also currently underrepresented in debates about privacy and the rights of technology users, which are dominated by corporations, governments and nongovernmental organizations in the Global North. We conclude by offering some lessons drawn from a number of case studies that represent the current state-of-the-art.

The Crowd in the Cloud? Three challenges for gauging public opinion online

Cowls, Josh (2014) The Crowd in the Cloud? Three challenges for gauging public opinion online. IPP2014: Crowdsourcing for Politics and Policy, September 2014, Oxford, UK.

Much excitement surrounds the use of social sources of big data – harvested from popular networking platforms like Twitter and Facebook, as well as other forms of socially generated data including Wikipedia edits and Google searches – in the pursuit of social scientific discovery. In this paper I assess the extent to which these newly available sources of socially-generated big data can tell us about public opinion in a society at large. I draw on data from a series of interviews conducted with researchers at the forefront of big data approaches to social science, in order to outline the opportunities and issues around this area of research. In my analysis I identify three challenges to the validity of online public opinion measurement – the reliability of the data collected, the representativeness of the ‘sample’ being analysed, and the replicability of this form of public opinion research – and suggest various ways in which these challenges can be met.

Big Data, Ethics, and the Social Implications of Knowledge Production

Schroeder, Ralph and Cowls, Josh (2014) Big Data, Ethics, and the Social Implications of Knowledge Production. Data Ethics Workshop, KDD@Bloomberg, August 24 2014, New York, NY, USA.

This position paper addresses current debates about data in general, and big data specifically, by examining the ethical issues arising from advances in knowledge production. Typically ethical issues such as privacy and data protection are discussed in the context of regulatory and policy debates. Here we argue that this overlooks a larger picture whereby human autonomy is undermined by the growth of scientific knowledge. To make this argument, we first offer definitions of data and big data, and then examine why the uses of data-driven analyses of human behaviour in particular have recently experienced rapid growth. Next, we distinguish between the contexts in which big data research is used, and argue that this research has quite different implications in the context of scientific as opposed to applied research. We conclude by pointing to the fact that big data analyses are both enabled and constrained by the nature of data sources available. Big data research will nevertheless inevitably become more pervasive, and this will require more awareness on the part of data scientists, policymakers and a wider public about its contexts and often unintended consequences.

Mapping the UK webspace: fifteen years of british universities on the web

Hale, SA, Yasseri, T, Cowls, J, Meyer, ET, Schroeder, R and H Margetts (2015) Mapping the UK webspace: fifteen years of British universities on the web. Proceedings of the 2014 ACM conference on Web science, 62-70.

This paper maps the national UK web presence on the basis of an analysis of the .uk domain from 1996 to 2010. It reviews previous attempts to use web archives to understand national web domains and describes the dataset. Next, it presents an analysis of the .uk domain, including the overall number of links in the archive and changes in the link density of different second-level domains over time. We then explore changes over time within a particular second-level domain, the academic subdomain .ac.uk, and compare linking practices with variables, including institutional affiliation, league table ranking, and geographic location. We do not detect institutional affiliation affecting linking practices and find only partial evidence of league table ranking affecting network centrality, but find a clear inverse relationship between the density of links and the geographical distance between universities. This echoes prior findings regarding offline academic activity, which allows us to argue that real-world factors like geography continue to shape academic relationships even in the Internet age. We conclude with directions for future uses of web archive resources in this emerging area of research.