Date of Award
Bachelor of Science
The purpose of this research is to create an efficient way of detecting disease outbreaks from news articles using Support Vector Machines (SVM). An SVM is a supervised machine learning method used for classification and regression problems. The role of the SVM in this project is to “learn” to distinguish between news articles that may indicate a disease outbreak and those that do not.
A series of health-related articles from the World Health Organization is parsed using a Java program in order to create vectors for the SVM. Each such article thus results in a vector. A basic negation detection algorithm is also built in this parser in order to detect what words are negated and improve vector accuracy. Using the resulting vectors, an SVM model is trained on approximately 63% of the vectors, while the remainder is used to test the accuracy of the SVM. The findings of this research might be useful for other projects aiming to develop systems for predicting and preventing disease outbreaks.
Dragu, Nicolae, "Predicting Disease Outbreaks using a Support Vector Machine model". Senior Theses, Trinity College, Hartford, CT 2012.
Trinity College Digital Repository, http://digitalrepository.trincoll.edu/theses/158