Social Scientists love Twitter, but they should be cautious!

13 octobre 2020·temps de lecture: 3 mn

When Trump tweeted last week « … FLOTUS [the first lady] and I tested positive for COVID-19… », the message spread like a wildfire. His followers retweeted it nearly a million times and commented it another half a million times, making this tweet the new personal record for Trump. Interestingly, such events open up big opportunities for social scientists.

Comments on platforms like Twitter are anything but calm words combined into multi-faceted paragraphs. Most of the time, they are short statements. Examples, in this case, included: « Please get well soon! Our country and the world need you Mr. President! », « So what, NOT Such an ‘HOAX’ after all?? » and ignominiously also « HAHAHAHAHA eat shit ». Leaving the debate about the appropriateness of such statements and the general debate on hate speech in online forums aside, one can easily guess the general sentiment of each tweet.

This is an opportunity for social scientists, in particular in political sciences. As the reactions of people to major events are publicly available, one can investigate them in detail. Years ago, it required costly surveys to gain insights into the public opinion, but having learned a thing or two from their friends in the computer science departments, social scientists are well equipped to quickly investigate it on their laptops. Analysing the sentiment in half a million twitter comments seems quite complicated, but in fact, very simplistic techniques are used to do so. First, keywords can be simply counted and compared over time. Second, unsupervised machine learning can be used to cluster similar tweets. Last, the words can be compared to dictionaries like AFINN, which assigns each word a value, for example, how positive or negative a word is. Aside from these basic steps, the bag-of-word method and a thousand others can be used where tweets are more complicated, for example, negations and irony.

In recent years, there was a wave of research using data offered by platforms like Facebook, Google, Weibo, WeChat, and in particular Twitter. Still, there has been a lot of critiques, too. What about data privacy? How to be sure that the data on Twitter is statistically representative? And how to deal with the fact that platforms only offer researchers limited access to the data sources? These are all important questions, but still, it leaves a big one aside, which actually seems quite obvious but somehow was not part of the discussion of academic critiques.

As Angela Xiao Wu, assistant professor at NYU, and Tanja Harsh were able to show in a recent paper; data by platforms are not unobtrusive recordings of human behaviour. « Rather, they are direct records of how we behave under platforms’ influence. » The data that is generated by platforms, for example, the number of reactions to a tweet are actually used by the company itself to measure its success. User engagement is important for the income of Twitter, and the company employs hundreds of people to tweak the design and functioning of the platform so that users are nudged to interact more. When user interaction is falling, Twitter is likely to change its recommendation algorithm, the search query autocompletion, the personalized recommendations or the social feed curation. In other words, they constantly change the measurement condition.

This is already an old and well-known problem. In 1968, Andrew Ehrenberg investigated the differences between which programs people viewed on a given time of day and their preferences for certain programs which were examined in surveys. The interesting findings were that the program schedule and people’s socially situated availability were more important than their content preferences. Analogous to the Twitter data, it was the behaviour of TV companies that determined user behaviour to a great extent.

So, having seen the underlying problems with using data generated by social media platforms, social scientists should become more careful when conducting research. If we are data journalists trying to predict the next public elections or academics studying the reactions of people to political events, we should bear in mind that the data also shows how effective these platforms are in nudging our behaviour.

But we should not abandon the research based on platform data altogether, at least according to Angela Wu. As an improvement to the current situation, she rather proposes that platform data should be collected by independent third-party measurement firms, just as this was done in the TV age. Data could be collected that does not equally serve as a measure of the company’s success but would be a more unobtrusive record of human behaviour. Still, for this to become reality, big tech would have to freely agree. But given their current market power, chances are low.

Jakob Kampik

Economie

« Pas de Suisse à 10 millions ! » (partie 2/2) : les coûts économiques et diplomatiques d’un oui

« Pas de Suisse à 10 millions ! » (partie 1/2) : décryptage d’un texte aux multiples zones d’ombre

M&A : 70% d’échec de deals, le paradoxe au cœur des fusions-acquisitions

Société

Un monde naïvement idéal ou une vision apocalyptique sociétale ? L’utopie et la dystopie analysées au travers des blockbusters.

Comment la Canada a sauvé son premier pilier du désastre financier

The Long Road to the Promised Land: A Short History of Women’s Hockey in North America

LifeStyle

La psychologie de l’investissement

STAGNATING LIKE A STATUE – ENTER SHIKARI’S NEW ALBUM ‘LOSE YOUR SELF’ WORST TO DATE?

10 Years of Geneva Public Transport Accident Data a Statistical Analysis

Made in HEC

D’HEC à voix littéraires les plus suivies sur Instagram : rencontre avec Martin Boujol, l’ex-financier qui a tout plaqué pour les livres

Made in HEC : Primus Berger

MADE IN HEC : Celia Aouadj

Campus

Clôture de la rédaction pour la pause estivale

Mon échange à Berkeley

Bilatérales III : une nouvelle étape décisive entre la Suisse et l’Union européenne

Archives

« Wonder Wheel » : Crime sans châtiment

IR 101 : G-Zero for dummies

Les garçons sauvages de Bertrand Mandico

Social Scientists love Twitter, but they should be cautious!