Date of Award
Spring 2012
Degree Name
Bachelor of Science
Major
Computer Science
First Advisor
Ralph Morelli
Second Advisor
Takunari Miyazaki
Abstract
The purpose of this research is to create an efficient way of detecting disease outbreaks from news articles using Support Vector Machines (SVM). An SVM is a supervised machine learning method used for classification and regression problems. The role of the SVM in this project is to “learn” to distinguish between news articles that may indicate a disease outbreak and those that do not.
A series of health-related articles from the World Health Organization is parsed using a Java program in order to create vectors for the SVM. Each such article thus results in a vector. A basic negation detection algorithm is also built in this parser in order to detect what words are negated and improve vector accuracy. Using the resulting vectors, an SVM model is trained on approximately 63% of the vectors, while the remainder is used to test the accuracy of the SVM. The findings of this research might be useful for other projects aiming to develop systems for predicting and preventing disease outbreaks.
Recommended Citation
Dragu, Nicolae, "Predicting Disease Outbreaks using a Support Vector Machine model". Senior Theses, Trinity College, Hartford, CT 2012.
Trinity College Digital Repository, https://digitalrepository.trincoll.edu/theses/158
Comments
Senior project completed at Trinity College for the degree of Bachelor of Science in Computer Science.