Using AI to predict your tweets

By Yusuf Ali

Feb. 20, 2019
Posted in Science & Tech

A recent report entitled ‘Information flow reveals prediction limits in online social activity’ showcased the possibility of estimating how predictable one’s future words can be by analysing the user’s Twitter stream and the streams of their 15 closest contacts.

Researchers at the University of Vermont in Burlington calculated the level of randomness (entropy) in the sequence of words, and then used Fano’s inequality – an information theory tool – to calculate how well a user’s stream could predict the first word in his or her next tweet. The upper bound on accuracy for the first word in the next tweet was, on average, 53%. Furthermore, when estimating the upper bound for prediction based on the user’s stream, plus the twitter feeds of their 15 closest friends, the accuracy rate rose to about to 60%. When removing the individual’s stream from the equation, this dropped to about 57%.

The upper bound on accuracy for the first word in the next tweet was, on average, 53%.

Therefore, in principle, even if the person isn’t on twitter, it is still possible to predict what a user might tweet via identifying a person’s closest friends offline and analysing their feeds. Given that apps such as Facebook ask for access to contact lists, they already have a list of one’s friends who are not on the social media platform – so-called “shadow profiles.”

As mentioned in the research, it’s vital to “emphasise that the information-theoretic predictability as defined here is distinct from prediction, in that it does not actually make prediction about future text. Instead, this predictability provides a method-independent upper bound on prediction accuracy.”

Even if the person isn’t on twitter, it is still possible to predict what a user might tweet via identifying a person’s closest friends offline and analysing their feeds

The dangers lie in one’s privacy, and how much information can be revealed about a user. For example, political orientation, sexuality, and location are just a few of the things that can be inferred from Twitter behaviour. As stated in the research, “Data collected from online social platforms are a boon for researchers, but are also of concern for privacy, as the social flow of predictive information can reveal details on both users and non-users of the platform.”

James Bagrow, the lead author of the research which was published in Nature Human Behaviour, stated “When they [people] give up their own data, they’re also giving up data on their friends. What we think is an individual choice in a social network is not really.”

Political orientation, sexuality, and location are just a few of the things that can be inferred from Twitter behaviour

One benefit is that big data analysis can be used to infer and perhaps predict potential health risks. On the other hand, understanding online behaviours could harm a user and their closest of connections.

The researchers explored both the content and timing of messages based on factors such as the extent to which information is encoded through language into an individual’s social ties, the role of tie strength between individuals in the flow of information, and the relationship between structural network properties, such as homophily and information.

The researchers explored both the content and timing of messages based on factors such as the extent to which information is encoded through language into an individual’s social ties

Social media and word analysis have been used to predict personality. James Pennebaker, a social psychologist at the University of Texas, stated “There’s a revolution going on in the analysis of language and its links to psychology, now, we can analyse everything that you’ve ever posted, ever written, and increasingly how you and Alexa talk, which results in richer and richer pictures of who people are.”

The problem arises when companies exploit such information to benefit their revenue margins – as in the recent case of Facebook and Google in which ‘research’ apps were used to learn more about the user – who would opt in and be paid a small fee, around $20. This gave access to user’s data on their phone. It caused a stir in which Apple have pulled out the apps claiming it broke their iOS app store guidelines.

The problem arises when companies exploit such information to benefit their revenue margins – as in the recent case of Facebook and Google

User behaviours are a gold mine for data-oriented companies and nation states. It is therefore vital for users to be aware of what is done with their online information. The goal for technologists and researchers alike is to always ensure that such reports are accessible and understandable to the masses to provide user data awareness.

Comments

Leave a Reply Cancel reply
Your email address will not be published. Required fields are marked *
Comment *
Name *

Email *

Website

Δ

This site uses Akismet to reduce spam. Learn how your comment data is processed.