Arseniy Tretyakov, a PhD student at ITMO’s Faculty of Infocommunication Technologies, spent two semesters at the Charles III University of Madrid. During this exchange, he gathered a dataset for a system aimed at identifying fake news. As of now, Arseniy continues to complete and improve it. We talked to him and other ITMO specialists about the difficulties associated with such a task.
The issue of fake news is as old as journalism itself. It was not always possible for the authors to get precise information first-hand, so they had to rely on questionable sources, or simply rumors. The situation started to change with the development of such communication means as the telegraph, regular mail, the telephone, and finally the internet. These inventions made communication between the journalist and their source much easier, and with time, the quality of publication began to grow.Research
Still, we shouldn’t forget that fake news appears not only due to technical difficulties or authors’ mistakes. Fake news was often commissioned and used to destroy a competitor’s reputation or to destabilize the situation in a country.
ITMO PhD student Arseniy Tretyakov is working on a system that will make use of neural networks and datasets to calculate the probability of a text being fake. Before joining a Master’s program at ITMO, Arseniy studied journalism. Having changed his focus to technical research, he began the development of an automated system for identifying fake news.
“Arseniy decided to tackle an understudied field. Despite the fact that the term “fake news” and the phenomenon itself were well-known, no one knew how to automatically identify them. I believe that his invention will help protect the society from excessive fears, panic, and reckless reactions,” comments Natalia Gorpushkina, Arseniy’s research advisor and an associate professor at the Faculty of Infocommunication Technologies.
At first, the PhD student was planning to work with data in Russian, but he encountered a problem. It turned out that in Russia, there aren’t enough fact-checking projects that aggregate and systemize fake news that are essential for training the neural network.
“In order to work with databases, you need data – simple as that. In Russia, there was a problem with getting enough organized data, but the key factor that contributed to working with the Spanish language was my internship destination. I went to Spain, where they got very interested in my idea. They had lots of data on identified fake news: in recent years, Spain experienced a series of political and economic events that initiated a surge in falsified information. What’s more, there are fact-checking agencies that monitor such cases. One of them, Maldito Bulo, shared their data with me; I also took some data from MyNews.es and gathered some more manually. After that, I proceeded with creating a dataset and teaching my neural network,” explains Arseniy Tretyakov.
The student notes that most research on fake news focused on content in English, and very little had to do with identifying Spanish or Russian fake news. What’s more, most researchers prefer to work with tweets, which is easier and more effective, but as more types of fakes are becoming widespread, for example, fakes in Whatsapp, it makes sense to cover them all.
The PhD student is currently continuing to reinforce the future system’s database, test the system, and study the opportunities for introducing metadata. His goal is to develop some software or a plugin where one can upload content for a checkup. According to Arseniy, tests show that the system will be able to identify fake news with a 90% probability thanks to the combination of deep learning, natural language processing and identification of named objects.
“The instruments offered by Arseniy can not just reduce the information noise but also teach users to rely on trustworthy data and hence improve the quality of their economic and social decisions. By all means, this instrument of news control shouldn’t be available to just any user, or we’ll get negative results. It would be better if it worked without a user’s participation, much like an antivirus, by cutting out everything that it sees as untrustworthy data,” comments Oleg Basov, deputy dean of the Faculty of Infocommunication Technologies.
He added that a toolset for the analysis, classification and interpretation of heterogeneous information flows in underformalized fields of knowledge is constantly being developed at the School of Translational Information Technologies.
“The emergence of a legal mechanism that will regulate the distribution of fake news definitely makes the issue of their identification even more relevant. I’d like to note that if there were a government order for such a system, our School would be able to offer the first version in just a year. All we’d have to do then is to reinforce the dataset and aim for better accuracy at identifying fakes,” adds Oleg Basov.
He also notes that the Faculty already has several technological solutions that allow for identifying whether information communicated with the help of speech or video messages is fake. According to his words, this experience can help facilitate the creation of the means for dealing with untrustworthy information in the Russian media space.
We’d like to add that at the end of March 2020, a law has been enacted in the Russian Federation that makes spreading fake news about the COVID-19 coronavirus punishable by law.