Social internet data to forecast disease outbreaks


Wikipedia-based searches could unlock the key to forecast the spread of diseases in the near future

Social internet data could soon become an effective tool to predict the spread and progress of diseases. At least this is what a group of scientists from Los Alamos National Laboratory, New Mexico, are proposing in a recent paper published in the journal PLOS Computational Biology.

The basic concept is simple. By cross correlating search queries for a specific disease in – let’s say – a web search engine and the geographical location of the queries themselves, one could think of forecasting the appearance and development of that specific disease. The concept is not new: in the past, attempts to use tweets and Google Flu Trends as proxy for predicting diseases have already been made with yet unconvincing results. The researchers from Los Alamos believe Wikipedia data could have better success for at least two reasons. Firstly, Wikimedia Foundation is much more public with its data and secondly, Wikipedia data is already organized in categorized pages. This is a great advantage over Google which, for instance, could not distinguish between entries for “dengue fever” pointing at the disease or the band with the same name. The authors of the paper correlated the spread of a disease with the relevant number of searches on Wikipedia, sorted by language, and demonstrated the model’s forecasting abilities in 8 out of 14 location-disease combinations.

Does this means that soon we will be able to predict the spread of certain diseases and take adequate measurements well in advance? Well maybe… Or maybe not. While very promising, the development of this idea is still at its infancy. For instance, the model accounts for language but not for country, which is clearly a limitation for cases such as English or Spanish which are spoken worldwide. Also, the model needs to be tuned to account for scenarios such as the recent outbreak of Ebola which likely created a surge in the Wikipedia traffic, mainly due to fear rather than actual cases of the disease. Finally, one should not ignore that some of these diseases originate and spread in poor countries which do not have access to internet.

The following two tabs change content below.
Carlo Bradac

Carlo Bradac

Dr Carlo Bradac is a Research Fellow at the University of Technology, Sydney (UTS). He studied physics and engineering at the Polytechnic of Milan (Italy) where he achieved his Bachelor of Science (2004) and Master of Science (2006) in Engineering for Physics and Mathematics. During his employment experience, he worked as Application Engineer and Process Automation & Control Engineer. In 2012 he completed his PhD in Physics at Macquarie University, Sydney (Australia). He worked as a Postdoctoral Research Fellow at Sydney University and Macquarie University, before moving to UTS upon receiving the Chancellor Postdoctoral Research and DECRA Fellowships.

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *

Blue Captcha Image