Big Data, Predictive Machines and Security: Enthusiasts, Critics and Sceptics

Big Data, Predictive Machines and Security: Enthusiasts, Critics and Sceptics

Adam Edwards (Cardiff University)

Big data is in part responsible for a rejuvenated interest in the use of artificial intelligence for security applications such as predictive policing. Enthusiasts argue that using this data in concert with advances in computational machine learning will enhance the capacity to anticipate and pre-empt problems of crime and security. In response, critics either challenge the political feasibility of predictive machines, noting their unjustifiable intrusion into private communications or their self-fulfilling prophesies, or they doubt the technical feasibility of training machines to learn the improvisation that is central to social relations such as crime and insecurity. Alternatively, sceptics are concerned with the powers and liabilities of hybrid human-machine learning in predicting security ‘scripts’.

The emergence of ‘social computing’ given the arrival of ‘read/write technologies’ (such as blogs, micro-blogs, social networking etc.) on the ‘interactive World Wide Web’ has provoked further interest in the prospect of integrating such user-generated data with digitally collected and archived administrative and commercial datasets to provide a ‘bird’s eye view’ of social relations. Computational scientists refer to this amalgamation of big data in terms of the ‘10,000 foot view’ of ‘the social graph’ and argue that read/write technologies are just the beginning of an ‘age of social machines’.  It is envisaged that these machines will rapidly evolve from a situation in which various read/write applications operate in isolation from one another (an exchange on Facebook, a discussion on Twitter, comments on a broadcast media website, opinions registered through on-line surveys, the retrieval and annotation of digitally archived police, health, education and census data and so forth) to one in which they interact with each other. In this way it is believed that social machines will enable an exponential increase in the kinds of artificial intelligence and collaborative work needed to grasp and solve the complexity of social problems that confront us, and which are irremediable through individual thought and effort, from climate change through major public health challenges to mobilising local community responses to crime and violence.

There is fertile speculation about the kinds of security scenarios that could unfold once such faith is placed in artificial intelligence. Enthusiasts for predictive policing (PREDPOL) in the United States, and for ‘prospective crime mapping’ (PROMAP), in the United Kingdom have developed algorithms, premised on a ‘contagion thesis’, which seek to detect when and where crimes will occur by factoring in different kinds of assumptions about how crime spreads from an initial offence in particular environments given the routine activities and rational calculations of offenders, victims and control agents. These predictions are then tested against the crime patterns actually registered through conventional methods of police recording and self-report studies of offending and victimisation and then the algorithms are subsequently revised as a means of better anticipating crimes and targeting pre-emptive interventions. The enthusiasm for building predictive machines is now being further extended to design algorithms or ‘machine classifiers’ to better ‘sense’ and anticipate patterns of threatening or ‘hateful’ on-line communications through social media and forecast their putative relationship to off-line events such as terror attacks.

There are predictable criticisms of engineering predictive machines for security applications. In this context the most notable are those political concerns raised by the whistle-blower Edward Snowden about the massive and routine invasion of privacy through the US National Security Agency’s PRISM surveillance programme. PRISM collects communications through the Internet, without reasonable suspicion, and then mines them for intelligence on, and forecasting about, various security threats including terror plots and illicit drugs markets. The Snowden revelations suggest how an understanding of security premised on Big Data necessarily contravenes the right to private communications because the 10,000-foot view of the ‘security graph’ cannot be accomplished without generalised data collection from whole populations.

A related concern is that predictive machines generate self-fulfilling prophecies. They become active ingredients in the targeting of suspects such that problems of security become artefacts of the way the algorithms, machine classifiers and underlying assumptions of predictive machines, speculate about security to include certain concerns (e.g. speech about ‘radicalised’ Muslim youth) whilst obviating others (e.g. speech about the culpability of Western foreign policy in the Middle East). In this regard, targeting the usual suspects ceases to be just a consequence of episodic prejudicial police actions and becomes automatically reproduced by a social machine. The alienation of entire social groups as a consequence of this kind of group profiling and targeting, along with the creation of a policing environment conducive to miscarriages of justice is an established theme in critical criminology, particularly in the UK with reference to the war in Ireland and the long history of antagonism between the police and street populations, particularly of young males from minority social groups. However, the consequences of automating this policy failure are only just beginning to be appreciated.

Finally enthusiasm for predictive machines can be criticised on grounds of technical feasibility. It is argued there are only a few instances in which machines can be programmed to effectively mimic human actions, such as swinging golf clubs or dialling telephone numbers, whilst there are many of these actions that machines cannot accomplish and, crucially, will never be able to accomplish, because they require the fundamentally human capacity for improvisation, such as in writing love letters or subverting factory work regimes.  From this perspective, the potential success or colossal failure of predictive policing hangs on the question of how improvised security problems are. It has been argued, for example, that improvisation is the central dynamic of much crime, particularly sophisticated organised crimes, in which perpetrators and preventers are in an ongoing correspondence, in this case an ‘arms race’ rather than an amorous exchange, to outflank and outwit each other. In this scenario predictive machines will fail because they cannot adapt quickly enough to improvised real world social relations.

Sceptics of both the enthusiastic embrace and rejection of predictive machines argue that social relations may not be akin to swinging golf clubs but they may be sufficiently ‘scripted’ to be predicted, in part, by automated learning. Security problems may, to continue the analogy, be more like performances of a play in which the actors improvise around the script but still rehearse their lines and do not completely rewrite the story from one performance to another.  How these scripts and their narrative structures can be understood is the subject of current methodological argument and innovation in ‘digital social research’, in particular the prospects for hybrid human-machine learning in which algorithms driving the automated collation and analysis of big data are collaboratively designed and frequently refreshed.

An example of a security script is Hope’s thesis on the ‘immunisation’ of non-victims of volume crimes, such as domestic burglary in England and Wales. In this script, non-victims immunise themselves through their access to private ‘club goods’ such as access to commercial household security, the market value of their homes as a proxy for the segregation of the residential population into wealthier neighbourhoods less susceptible to crime and through their participation in ‘gated communities’ with enhanced security surveillance and patrols. It is argued this immunisation explains the grossly unequal distribution of victimisation and non-victimisation for this problem in which four fifths of the residential population experience only one fifth of the burglary whilst 20 per cent of the population, with negligible access to these security club goods, are chronically victimised, experiencing an estimated 80 per cent of the problem. There may be improvisation around this script, in that property crime is often further concentrated within particular households even within ‘high crime’ neighbourhoods, but this is unlikely to result in any radical redistribution of household burglary by neighbourhood and therefore any refutation of the immunisation script.  This is an institutional question of the balance between public and private policing rather than a question of household dynamics. If this distribution of victims and non-victims is relatively stable, it ought, therefore, to be anticipated by a predictive machine driven by algorithms premised on Hope’s thesis. Whether more improvised security problems, such as urban riots, could ever be anticipated by predictive machines is a moot point, but between the extremes of stable domestic burglary patterns and episodic civil unrest, there are other security problems whose scripted and predictable qualities are open to further research, debate and argument. An example of this is the use of artificial intelligence to indicate tension in social media communications.

Sceptics consequently offer an altogether messier and less certain reflection on the limits to hybrid human-machine learning but one that is irreducibly driven by humans in constituting problems of security not simply registering objective truths. There are also grounds for scepticism about the integration of variegated data sets composed of material collected over hugely varying temporal and spatial horizons. There may be opportunities for recomposing this data in ways that enable it to be meaningfully linked but even where relatively robust administrative data sets are concerned this entails a substantial input from human intelligence. Whether and how the lower fidelity data generated by users of social media can be meaningfully linked to other administrative and commercial data as well as the primary data sets produced by social scientists remains a very challenging, possibly insurmountable, methodological problem. Even if technically feasible, and this is a very big ‘if’, there are genuine ethical and political concerns about engineering predictive machines capable of collating person-specific data from multiple sources in order to circumvent controls on the anonymity of such data, and thus the profiling and monitoring of ‘risky’ individuals and groups. This is particularly so where predictive machines could be deployed within any governing regime not just in liberal democracies with lively, open, debates about ‘snooper’s charters’ and Orwellian objections to Big Brother. At best we are in a situation that requires deliberation about the levels of confidence inspired by the technical feasibility of predictive machines and then about the appropriate regulatory frameworks for governing the access to and uses of such machines.


Adam Edwards is Senior Lecturer in the Cardiff University School of Social Sciences. He is co-founder of the Collaborative On-line Social Media Observatory (COSMOS) and Deputy Director of the Cardiff Centre for Crime, Law and Justice.