Big Data – What’s New(s)?

The following is a slightly edited version of a talk I gave at the Data Power conference in Sheffield this week, presenting work by myself and Ralph Schroeder.

The question of what drives news coverage far pre-dates the Internet and the rise of social media, and over the decades – or indeed the centuries – of mass media, myriad explanations have been offered in answer. I don’t have enough time here to dwell on all or even very many of those explanations. Suffice it to say, though, that media elites sitting in front of the typewriter, microphone and more recently the keyboard have for a long time enjoyed great power over precisely what they believe the public ‘needs to know’ in regard to information about the outside world. Media elites, it has been said, may not tell the public what to think, but they do tell the public what to think about (Cohen, 1963).

It’s probably fair to say that until recently, not many newspeople thought of their craft as especially data-driven. Certainly, rudimentary stats like sales figures and viewing ratings shaped the provision of coverage to some extent, as well as a far wider selection of less quantifiable but no less pertinent concerns such as the commercial interest and ideological leaning of their organisation. It’s fairly simple I think to see why. News producers were hamstrung by the lack of availability of data at a granular enough level to pick out trends – at the level of individual stories, and at the required frequency – to get a vivid enough sense of specifically which news stories appealed to their audience. Consider that it was not until 1972, with a famous study by McCombs and Shaw, that a relationship between audience interest and news provision was demonstrable even on an issue-by-issue basis – and even here, the relationship ran in the other direction to what has been discussed – from the media to the public, with no evidence of a feedback loop (McCombs and Shaw, 1972).

To be sure, I don’t want to disregard decades of careful study into what drives news production, whether through focus groups, media content analysis, or whatever else. But it’s hard to deny that with the entrance of the Internet, social media, and big data, we face a different ball game altogether. Numerous changes, at the level of both the social and the technical, regarding how the public consumes news in the Internet era has led to a massive increase in the information available to news producers and, as a result, changed how the news is made.

The definition often used of big data – datasets of unprecedented volume, velocity and variety (Manyika 2011) – has probably run its course, yet it is a useful lens through which to gauge the impact of the bonanza of information about news consumption that has suddenly become available. At the most basic level, the currency of news consumption is now measured at the level of the story, at best, rather than the edition or programme. Newsmakers now know with far more granularity which issues most grab the public’s attention – and, of course, which writers command the most loyal readership. In the 2012 election it was estimated that 20% of visitors to the New York Times website included a visit to Nate Silver’s blog. Even the most self-confident journalist would have found it hard to make such a case for a pay rise without this sort of evidence.

Relatedly, there is no longer really such thing as an edition anymore. Information and attention on the Internet is characterised more by velocity (and volatility) than any kind of permanence. This too yields unprecedented amounts of data. Finally, the third ‘v’ of big data, ‘variety’, also comes into play. Not only the greater variety of content available, but innovations in format (think of BuzzFeed’s listicles) and platform have also diversified the experience of news consumption.

This latter point – platform – is I think of particular significance as we turn to look at the effects of this abundance of data. To put it more pointedly: who gets this data, and who can use it? The picture here, and the implications for power which arises from this information about audience preference, are someone nuanced and multifarious, and I’ll spend the rest of the talk walking through them.

In the first instance, of course, data about user engagement with news comes straight back to news producers themselves. From an optimistic standpoint, this may have the potential to reverse the traditional media-to-public effects seen in earlier agenda setting studies, allowing the media to know in far greater depth what the public cares about. You’d be hard pushed to argue that news organisations shouldn’t, at least to some extent, represent and reflect their audience’s views and interests, and so a more granular measurement of this can be seen as a good thing in comparison to what has gone before. Typically, public opinion was judged through slow, costly and narrowly constituted surveys. Moreover, in the specific case of election prediction – as we saw in the recent general election – they may not even work very well. Today, Twitter for example offers up fast, cheap and spontaneously expressed public opinions in abundance. I use the term public opinions here advisedly – Twitter is not representative of national populations at large. But a recent study by Neumann et al suggest that nonetheless, social media can and does affect the traditional media online (Neumann et al 2014). This phenomenon is in contrast to the offline media landscape, as anyone who has dared to read the Daily Mail in its digital and paper formats, and seen the substantial differences in coverage between them, can attest.

Moreover, in certain circumstances, the information served up to audiences themselves also plays a role. Recent work by colleagues at the OII, Jonathan Bright and Tom Nichols, shows that featuring in the ‘most read’ section of the BBC News website increases the length of time a story stays on the front page of the site (Bright and Nichols, 2014). Online editors, then, are influenced by readership statistics, even when this data is available to the audience.

But perhaps the most serious implications for the realignment of power relations in the age of online news are in the new platforms and means of consumption, and how these relate to third party commercial interests. Yet again, analogies exist with the offline world here: any newspaper or television channel run for profit has long had market pressures which substantially influence output, ordinarily in the form of setting aside substantial space to advertisements.

Yet the emergence of myriad social media platforms on which news, amongst other things, is published, read, and shared, has introduced a raft of new ‘signals’ and hence new influences on news production. For me the most interesting platforms for news consumption online aren’t those I’ve already mentioned, like Twitter or BuzzFeed, but Facebook. There are a few reasons for this. The first is simply scale. Not only does Facebook have more than a billion users, but a recent Pew survey suggests that, at least among younger generations, the use of Facebook for reading news is enormous. The survey showed that for the millennial generation, 61% turn to Facebook as a primary source for news. So how Facebook serves up news is ipso facto important.

And clearly Facebook realises this. Its new Instant Articles service, announced recently in conjunction with nine partners from the news industry, allows content from these partners to be read directly from Facebook. This is I think significant especially because of three characteristics which Facebook as a social network epitomises. The first is individualisation. Facebook knows a huge amount about all its users and has spent years using this data to tailor what content it serves up – in contrast to Twitter, which despite a few experiments with adverts, still serves up tweets in an easily-understood, reverse chronological order. The second is networkisation. Facebook doesn’t only know you, but it knows your friends. Needless to say, this data is also fed into what appears in your feed – not merely in terms of content your friends directly share, but as a basis for serving you ads based on what they like. The third characteristic is hybridisation. As we can see from the launch of Instant Articles, Facebook increasingly sees itself as a walled garden. Except, instead of keeping you out, which is the effect of newspaper paywalls, it wants to keep you in.

These characteristics are not necessarily threatening for certain purposes. Many would see using Facebook for free at the expense of seeing a few ads powered by expressed preferences as a perfectly acceptable trade-off. But news stories aren’t the same as advertisements, and the use of algorithms geared to sell stuff to serve up news provokes three challenges in particular.

The first is ideological. Here’s a thought experiment: what if Mark Zuckerberg woke up feeling like Rupert Murdoch? At present, it’s seen as acceptable for Murdoch’s print titles to espouse ideologically biased opinions – particularly in advance of elections – but not for Zuckerberg to do so. But the potential is clearly there – consider the 2010 study by Bond et al which showed that merely showing prospective voters social information regarding an election can encourage them to vote to a statistically significant – and potentially election-swinging – extent (Bond et al 2012). In that experiment, there was no ideological slant, merely the provision of voting information. But that sort of algorithmic ability suggests that merely the smallest tweak could have enormous effects. And even if Facebook’s algorithms for serving news content are entirely benign, they could still be detrimental: Zeynep Tufekci amongst others has written powerfully about how different events, particularly divisive or controversial ones such as the riots in Ferguson, Missiouri – play at different speeds and with different frames on different social media platforms.

The second implication is commercial. An interesting nugget of information in the Instant Articles announcement was that BuzzFeed, one of the launch partners, would be allowed to serve their ‘sponsored stories’, which are presented much in the form of standard articles on the site but paid for and in service of third party corporations. This is hybridisation-within-hybridisation: a convergence of platforms – Facebook and BuzzFeed – and a convergence of functions – news reporting and advertising.

The third implication is societal. As I earlier showed, data can flow in multiple directions – to news producers and to the public. But if Facebook is the future of news, we should be more concerned. Yes, news producers themselves will receive stats in order to tailor their own content, but the chance of ordinary users getting access to this information is of course minute. It is, as Joseph Turow mentioned earlier in the case of dynamic pricing, ‘under the hood’. Facebook’s complex algorithms are both intensely personal and almost completely invisible, and it’s hard to imagine this changing for the better with the introduction of news. Instead, the stakes are simply higher.

There’s much about news online which I haven’t had time to mention, including many positive consequences such as citizen journalism and the spreading of information to keep regimes more accountable. But the promise of what I would summarise as ‘A/B testing the news’, as I’ve discussed today, has substantial and unsettling consequences for where power is held in modern democracies.


Bond, Robert M., Christopher J. Fariss, Jason J. Jones, Adam D. I. Kramer. Cameron Marlow, Jaime E. Settle & James H. Fowler. A 61-million-person experiment in social influence and political mobilization Nature 489, 295–298 (13 September 2012) doi:10.1038/nature11421

Bright, J., and Nicholls, T. (2014) The Life and Death of Political News: Measuring the Impact of the Audience Agenda Using Online Data. Social Science Computer Review 32 (2) 170‐181.

Cohen, Bernard C. (1963). The Press and Foreign Policy. Princeton, Princeton University Press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity. r_innovation.

McCombs, M.E., & Shaw, D. (1972). The agenda-setting function of mass media. Public Opinion Quarterly, 36, pp. 176-185.

Neuman, W. R., Guggenheim, L., Mo Jang, S. and Bae, S. Y. (2014), The Dynamics of Public Attention: Agenda-°©‐Setting Theory Meets Big Data. Journal of Communication, 64: 193–214.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s