Research Profile – The Right Algorithm for the Job


Dr. David Buckeridge

A McGill University researcher is evaluating statistical algorithms that can comb through health data and detect disease outbreaks.

Back to main article ]

How do you know when an infectious disease is on the loose?

Traditionally, disease surveillance is a very labour-intensive process. It relies on lab technicians faxing test results to public health officials, who then manually input the reports into a database.

But electronic health records and social media have created new opportunities to expand and automate public health surveillance. Newer approaches involve using doctor’s reports, pharmacy records, emergency calls, even tweets and internet searches to detect and respond to outbreaks. But how do officials begin to sort through all this information?

At a Glance

Who – Dr. David Buckeridge, McGill University.

Issue – Automated disease surveillance is on the rise, but there isn’t much research to help public health officials choose the best algorithms for analyzing different types of data.

Approach – Dr. Buckeridge and his colleagues are using a large computer simulation of Montreal’s water system to test different algorithms for detecting water-borne disease.

Impact – Dr. Buckeridge hopes to develop a guide that will help public health officials choose the right algorithm for the job.

“Once you’ve collected these data, you want to look for unexpected changes that suggest there is something going on,” says Dr. David Buckeridge at McGill University.

Researchers and public health officials typically rely on statistical algorithms to “read” the data.

In computer science, an algorithm is a set of step-by-step instructions that a computer must follow to complete a task. Statistical algorithms sift through large data sets to detect unexpected patterns or changes. They can be used, for instance, to spot a sudden spike in the number of people being admitted to hospital, or being prescribed a specific medication.

When they are looking for signs of a disease outbreak, public health officials may have several different algorithms to choose from, which can vary in terms of their sensitivity, specificity and timeliness.

“There is really very little evidence to guide surveillance practitioners in public health settings, if they want to detect changes in data, as to what is the best algorithm to use,” says Dr. Buckeridge.

With funding from the Canadian Institutes of Health Research, Dr. Buckeridge and his team are evaluating the performance of different algorithms in public health surveillance. One project is a collaboration with researchers at École Polytechnique de Montréal who have developed a very detailed computer simulation of the city’s water distribution system. The model allows the researchers to simulate failures in Montreal’s water treatment system and see which areas of the city would be affected by the contaminated water.

Dr. Buckeridge and his team have created a second model, which builds on the results of the water distribution simulation and further simulates infections and symptoms in people and their resulting use of health resources. It will allow the researchers to generate public health data for different contamination scenarios, such as how many people will become sick and when they will start showing up in emergency rooms. Working together with colleagues in computer science at McGill, he is using many simulated data sets to test different algorithms to see how effectively they detect the signs of water-borne disease outbreaks.

“Because we’ve simulated the data, we know exactly where the outbreaks are and where they aren’t, so we can score which algorithms seem to be doing better,” he says.

Dr. Buckeridge hopes to develop guidelines that will help public health analysts pick the right algorithm for the right dataset, or build automated surveillance systems that can switch back and forth between algorithms depending on the type of data they’re trying to analyze.

As Dr. Buckeridge wraps up this study, he is beginning to look beyond infectious disease detection and explore the applications of statistical algorithms in disease outbreak management.

“If you use the analogy of a clinician seeing an individual patient, the problem is often not the diagnosis, but the management of the disease,” says Dr. Buckeridge. “Here, the diagnosis is the detection of the epidemic, and really the bigger problem is figuring out how to intervene to best control the outbreak and limit illness, death, and cost.”

"Once you've collected these data, you want to look for unexpected changes that suggest there is something going on."
– Dr. David Buckeridge, McGill University

Supplemental content (right column)