Date of Award

Spring 2012

Degree Name

Bachelor of Science


Computer Science

First Advisor

Ralph Morelli

Second Advisor

Takunari Miyazaki


The purpose of this research is to create an efficient way of detecting disease outbreaks from news articles using Support Vector Machines (SVM). An SVM is a supervised machine learning method used for classification and regression problems. The role of the SVM in this project is to “learn” to distinguish between news articles that may indicate a disease outbreak and those that do not.

A series of health-related articles from the World Health Organization is parsed using a Java program in order to create vectors for the SVM. Each such article thus results in a vector. A basic negation detection algorithm is also built in this parser in order to detect what words are negated and improve vector accuracy. Using the resulting vectors, an SVM model is trained on approximately 63% of the vectors, while the remainder is used to test the accuracy of the SVM. The findings of this research might be useful for other projects aiming to develop systems for predicting and preventing disease outbreaks.


Senior project completed at Trinity College for the degree of Bachelor of Science in Computer Science.