Focus: The Emerging Contours of Data Science

Focus: The Emerging Contours of Data Science

William Housley, (Cardiff University)

Revolution is everywhere, in everything. It is infinite. There is no final revolution, no final number. The social revolution is only one of an infinite number of numbers: the law of revolution is not a social law, but an immeasurably greater one. It is a cosmic, universal law — like the laws of the conservation of energy and of the dissipation of energy (entropy). Some day, an exact formula for the law of revolution will be established. And in this formula, nations, classes, stars — and books — will be expressed as numerical quantities. (Zamyatin, WE)

Zamyatin’s dystopian novel WE interrogates the relationship between individual freedom and the application of mathematics to social integration and engineering. A central motif of the book is The Table of Hours and the transparent monitoring and granular co-ordination of social life and interaction exemplified by the ubiquitous use of glass architecture and mutual surveillance within this imaginary social world.

It is said that the book influenced both Huxley’s Brave New World and Orwell’s 1984. In some respects WE has particular relevance today due to the way the book speaks to the rise of computation and big data as a transformative global social force. The Table of Hours is, after all, a pre-electronic computational literary construct that serves to organize social organization and relations according to mathematical principles; a latent goal of the Big Data Imaginary.

Data science has been promoted and heralded by a number of voices as a new paradigm through which the emerging computational age and the associated data deluge can be primarily understood. It has been described as a new scientific revolution but it has its roots in engineering. In an age of disruptive technologies and the alleged emergence of the automated society one of the functional requirements for this new social system, it is argued, is the need to assemble and support the skills and means to analyse and make sense of the big and broad data that is routinely produced by digital devices and networks: in order to inform the refinement of the emerging infrastructure and optimize the integration of human populations and requirements with computerized social systems. These include data generated via Web 2.0, social media, smart sensors in urban space, body proxemic wearable technologies, Google searches and mobile telephony. Big social data in particular has the potential to provide significant insights into the nature and character of social life as well as having the capacity to emerge as a social force in its own right wherein the recorded quantification of everyday life radically alters routine decision making by people, communities, organisations and government as well as commercial entities.

These contemporary processes are mirrored by some reconsideration and repositioning in the social sciences. This has been framed as an empirical crisis; though it is now recognized that this early pessimism needs to be balanced with the emerging theoretical, methodological and empirical opportunities for sociology and social science more broadly that are generated by these developments in digitization and data. The real disciplinary threat comes from the fact that the domain of the ‘social’ is becoming a strategic commercial and scientific resource in the digital age thereby attracting competition from established and new forms of occupational and disciplinary practices that are beginning to colonize the subject area of social science.

Two areas of expertise represent a key source for colonizing this domain in the digital age: computer science and advanced statistics; although other allied disciplines are also emerging as key contributors to this emerging field; one example that comes to mind is Physics that can approach the generation and flow of data within systems with radically advanced mathematical explanations.

Why should any of this have anything to do with society, politics or even culture? The simple answer is that the emerging contours of digital society characterized by ‘disruption’ speaks to the core questions of sociological inquiry and raises a raft of issues that are connected to privacy, transparency, citizenship, surveillance and new forms of economic organization and public policy. A key issue here is how new forms of data and technology re-order social and economic relations. Yet, sociology and related social sciences have been slow to respond to the theoretical, empirical and methodological challenges and opportunities presented by significant system transformation. At the same time many have heralded the opportunities presented by new forms of data as the catalyst for the identification of a new interdisciplinary ‘data science’ paradigm. However, this interdisciplinary creature in the data foliage is hard to discern.

If ‘data science’ aims to make claims about social life and social organization then it would seem logical to envisage social science having some input into this emerging domain. However, things are never so simple within the contested business of making claims about social and economic life. This may well be partly to do with the way in which new disciplines and domains come to be and, as stated earlier, the chatter around ‘data science’ comes at a time where social science is undergoing a process of re-invigoration and repositioning. To this end if we accept that data science is a tangible interdisciplinary domain; then at this moment it may be useful to remind ourselves that sociology and the social sciences has a number of strategies that are grounded firmly in its theoretical and empirical traditions which might facilitate engagement with this new field of inquiry. In the following sections of this article I will try and outline what some of these strategies might look like and why they might be important.

Strategy 1: Collaboration
Collaborative working as a practice and topic of inquiry in its own right has an established history in the social sciences. Matters relating to collaboration and computing have been conceptualized and empirically studied through the programme of CSCW. In the UK the e-science programme (Jirotka et al 2006): “… suggested a way to address these challenges through the development of global, collaborative multi-disciplinary research communities that, in turn, rely upon the construction of more powerful computational, data and communication infrastructures.”

This has resulted in a number of successful projects that responded to and enhanced the process of multidisciplinary collaborative design with a particular focus on the human computer interface. More recently the emergence of the networked researcher supported by a variety of information and communication technologies has led some commentators to consider the reshaping of research collaboration through specific forms of virtual research environments known as ‘collaboratories’; Carusi and Jirotka (2010:277) state: “Virtual research organizations, collaboratories, and VRE’s have emerged around new capabilities of cyberinfrastructure that are potentially exploitable by researchers and institutions. The thrust of these converging terms is towards creating cyberinfrastructures that are real to researchers and are emerging as part of their research resources in a form that impacts on their routine research practices within the institutions in which they work.”

This work can involve software that enables access by researchers to remote instruments, data analysis, model testing and other resources (Wulf, 1993:854). A key dimension of this collaborative work is the relationship between different domains of expertise and practice; as Carusi and Jirotka (2010:292) argue: “Because of the collective nature of research practices, the disciplinary, social, or community dimension of research is fully recognized as being implicated in the development of these research tools and technologies in an ongoing process of coevolution.”

The explosion of social data in recent years has required a response from both the social science and computational science communities. This has been framed as an attempt to move beyond the coming crisis of empirical sociology (Savage and Burrows, 2007, 2009, Edwards, et al, 2013). These responses have necessarily taken place in a context where there has been a significant focus on big and broad data, although previous areas of co-development and design such as interfaces and workflow, remain pertinent. In particular, the study programme known as computer supported co-operative work and the broader frame of ethnomethodological oriented studies have shed light on interdisciplinary working by examining how social scientists and computer scientists work together in order to inform things like human computer interfaces, online interaction configuration and design. Interdisciplinary collaboration is notoriously difficult to accomplish and communication and interaction are often key aspects of successful collaborative enterprise.

The age of big and broad social data presents particular problems and opportunities within what is a contested and competitive field. ‘Data Science’ may provide an opportunity for technical skill and statistical competence in the social sciences to find translation in modelling and other practices alongside data mining, machine learning, annotation strategies, coding and algorithm design. A more difficult task is exploring issues associated with the need to ‘refresh algorithms’ due to rapid socio-cultural change, linguistic variation and the migration of key demographic groups from platform to platform rendering social data engineering an ongoing process without end. The key challenge here is realising sufficient social science input into sustainable digital infrastructures in ways that mitigate disciplinary and occupational closure. 

Strategy 2: Observational Studies
The second strategy involves the development and operationalization of observational studies within the realm of social media usage and the aligned technology of big data. At a fundamental level this will include studies of how social actors use technology such as Twitter, Instagram, Foursquare and how they make use of these technologies in realizing everyday and institutional life. In terms of the former the ‘Quantified Self’ movement is a useful frame through which to observe and examine the interactional appropriation of technology and data production that also represents the emergent integration of body proxemics sensors and enhancements that measure bio-information for self-analytics.

These technologies also facilitate networked access to social media for sharing with a trusted ‘peer’ community for commentary and feedback as well as providing a means of integrating these data with other forms of data such as location and activity updates. This suggests an assemblage of digital conduits, sensors and networks that can be understood to produce a form of digitally augmented interaction order worthy of systematic scrutiny and study. In the context of self-improvement Pantzar and Shove (2005;5) state: “… technological developments in the portability, precision and ‘accuracy’ of heart rate meters has transformed the realm of everyday calculability. They allow us to ‘see’ our own heart (instant feedback), and in seeing, allow us to make adjustments in what we do: they allow us to quite literally tune our own engine. The results are made evident through longer term record keeping—a personalising of the medical record. As such heart rate meters have the potential to re-define the meaning of being well.”

More recent developments have confirmed the way in which technology and digital data are driving responsibilisation and new forms of quantified lifestyle. The recent arrival of ‘Fitbit’ onto the market may herald an acceleration of this process; whilst the digital data generated through such devices represents an opportunity for scoping and understanding emerging flows of real time digital health data the ways in which this reconstitutes situated social relations and identity on the move represents an important opportunity for understanding these significant social changes.

Strategy 3: Method and Measurement 2.0 – A Methodographic Approach
‘Methodography’ is a study programme within sociology that takes formal methods as not merely resources but topics of inquiry in their own right. It has its roots in Aaron Cicourel’s classic Method and Measurement in Sociology (1964) and has been given renewed focus and vigour, in recent years, through the work of Professor Wes Sharrock of Manchester University. A range of studies have examined interviewing, survey design, statistical analysis, focus groups and research teams as practical matters. In doing so a range of unreported practices, policies and methodological practices have been rendered visible in such a way that we can gain insight into how empirical data and analysis are actually produced.

This is important in order to generate a full social understanding of knowledge generation and claims making. It would appear that a fruitful strategy for sociology and social science is to apply this study policy to the generation of digital data analysis, the practical interrogation of algorithms, crowdsourcing techniques for data annotation, the use of particular categories and terms as proxies for demography, distributed and networked research practice and so on. In addition to site specific studies of data science practices and procedures, a methodographic approach would also engage with the ways and means through which populations contribute to the generation of data and new forms of crowdsourced public analyses; a form of labour extraction liberally described as ‘citizen science’.

Other topics might include the ethics and politics of data and a concern with what has become called ‘the social life of methods’ inclusive of the occupational and commercial dynamics inherent in the drive for a new form of scientific inquiry where data has been identified as ‘the new oil’. Of import here is an acknowledgement of the commercial interest in the analysis of big and broad social data and the way in which this is generating new forms of occupational identification i.e. ‘the data scientists’ and associated forms of credentialising, regulation and professional closure: the rise of a new form of methodological defined profession in the digital age needs to be explored and understood. Lastly, but in no way least, the emergence of big and broad social data and the massive interest from government, policy makers and commercial agents it has received means that it has to be understood as a social, political, cultural and economic force in its own right.

Strategy 4: Critical Engagement with the Data Imaginary
Despite the empirical focus of data science (and its primary object i.e. big and broad data) it represents fecund ideological territory for critical inquiry and analysis. Ruth Levitas’s development of utopia as method is a promising starting point for interrogating this imaginary within the context of futures and the dynamics of social change unleashed and promoted by different social agents within complex late modern social forms. The ideational character, impact and reproduction of the digital data imaginary and the accompanying tropes of big and broad data, automation, social computation, the social graph etc. represents a new plenum that has received little critical scrutiny.

For example, ideas concerning the ‘networked social factory’ and the ‘social graph’ are informing the ways in which governance is being re-imagined. The notion of the 10,000 foot view and the visual representation of the interconnection of relationships in online social networks is an emerging mode and resource for mass marketing and governance. These ideas connect with the handmaiden of digital capital – data science – and the new imaginary of big data – that rests on the optimisation of real time data extraction from ‘free and open’ services and platforms. This not only provides revenue and data streams for new forms of digital governance and commercialisation/extraction (exemplified by the social graph) but also serves the ideological drive for mass machine learning and algorithm design in order to bring about convergence i.e. an artificially intelligent web infrastructure trained on data generated by populations.

Sociology and social science are at the early stages of engaging with this imaginary; but it is one that resonates with earlier debates across the social sciences and modern literature. Furthermore, questions concerning citizenship, surveillance, ethics and privacy are beginning to emerge as counterpoints and sites of resistance to the vision of society as a social machine. In terms of the emerging contours of digital society the reflexive promise of the sociological imagination needs to be exercised in a way that makes clear that the social is both the object and subject of history.

This article has tentatively and briefly identified some broad strategies through which sociology, and social science more generally, might continue to relate to the challenges and opportunities presented by the rise of the digital data society wherein new forms of knowledge production, claims making, occupational closure and competition, commercial, state and public interests move within a contested, complex and changing terrain. In many ways these broad strategies also need to be underpinned by engagement with a public sociology that also makes use of new digital technologies as social forces in their own right and as a means for engaging with different audiences in the production of sociological knowledge. Citizen social science represents one way in which this might be achieved where mass participation is mobilized to address social problems and issues through the use of networked digital technologies and tools. For example, the use of crowdsourcing techniques to annotate data, test concepts and comment on findings.

However, these new forms of public collaboration and engagement will also need to be augmented with a variety of traditional methods and theoretical work that takes the classic questions of the sociological imagination; namely how is social organization possible? why do societies change over time? and what type of identity is promoted in a given social form? and operationalizes them in order to engage and interrogate the emerging contours of digital society as well as the claims advanced by those who believe that the ‘data speaks for itself’. It is the task of sociology and social science more broadly to demonstrate, once again, why the idea of ‘data as mouthpiece’ is reductive, problematic and misses the full texture of social life wherever we might find it. The tensions between engineering and the social are central to many of the deepest concerns and significant events of the C20th.

The embedding of automation and computation into social life will have significant unintended consequences. Sociology and social science will be vital to documenting, mapping and anticipating some of these consequences in ways that will allow a digital dimension of modernity to reflect upon and know itself in ways that are life affirming and support the grounds for human flourishing. To this extent the sociology of the digital data society will be one of the core concerns for social science in an age where the divide between engineering and social life is becoming intertwined, inter-dependent and imbricated in novel, complex and significant ways that have consequences for private and public life. The extent to which this can be done in dialogue with other disciplines, new paradigms and publics remains to be seen but any study of the inexorable logic of sociality in the digital age would do well to heed the insights of modernity’s reflexive science.

Carusi, Annamaria, and Marina Jirotka. “10 Reshaping Research Collaboration: The Case of Virtual Research Environments.” World wide research: Reshaping the sciences and humanities (2010): 277.
Cicourel, Aaron V. “Method and measurement in sociology.” (1964).
Edwards, Adam, William Housley, Matthew Williams, Luke Sloan, and Malcolm Williams. “Digital social research, social media and the sociological imagination: Surrogacy, augmentation and re-orientation.” International Journal of Social Research Methodology 16, no. 3 (2013): 245-260.
Jirotka, Marina, Rob Procter, Tom Rodden, and Geoffrey C. Bowker. “Special issue: Collaboration in e-Research.” Computer Supported Cooperative Work (CSCW) 15, no. 4 (2006): 251-255.
Pantzar, Mika, and Elizabeth Shove. “Manufacturing leisure-Innovations in happiness, well-being and fun.” (2005).
Savage, Mike, and Roger Burrows. “Some further reflections on the coming crisis of empirical sociology.” Sociology 43, no. 4 (2009): 762-772.
Savage, Mike, and Roger Burrows. “The coming crisis of empirical sociology.” Sociology 41, no. 5 (2007): 885-899.
Wulf, William A. “The collaboratory opportunity.” Science 261, no. 5123 (1993): 854-855.


William Housley, is a sociologist, based at the Cardiff University School of Social Sciences, who works across a number of research areas that include language and interaction, social media, the social aspects of disruptive technologies and the emerging contours of digital society, economy and culture. Professor Housley was a co-founder of COSMOS and is currently working on a number of ESRC funded projects that relate to digital society and research; he co-convenes the Digital Sociology Research Group at Cardiff University, is co-editor of Qualitative Research (SAGE) and serves on the editorial board of Big Data and Society (SAGE).