University of Rochester: University of Rochester researchers have developed an app that monitors social media to identify health risks posed by restaurants using natural language processing and machine learning | Case Study

Context

nEmesis analyzes tweets and creates alerts about outbreaks and identifies the source of outbreaks

The Project

"Computational approaches to health monitoring and epidemiology continue to evolve rapidly. We present an end-to-end system, nEmesis, that automatically identifies restaurants posing public health risks. Leveraging a language model of Twitter users’ online communication, nEmesis finds individuals who are likely suffering from a foodborne illness. People’s visits to restaurants are modeled by matching GPS data embedded in the messages with restaurant addresses. As a result, we can assign each venue a “health score” based on the proportion of customers that fell ill shortly after visiting it. Statistical analysis reveals that our inferred health score correlates (r = 0.30) with the official inspection data from the Department of Health and Mental Hygiene (DOHMH). We investigate the joint associations of multiple factors mined from online data with the DOHMH violation scores and find that over 23% of variance can be explained by our factors. We demonstrate that readily accessible online data can be used to detect cases of foodborne illness in a timely manner. This approach offers an inexpensive way to enhance current methods to monitor food safety (e.g., adaptive inspections) and identify potentially problematic venues in near-real time."

AI Usage

"Leveraging a language model of Twitter users’ online communication, nEmesis finds individuals who are likely suffering from a foodborne illness. People’s visits to restaurants are modeled by matching GPS data embedded in the messages with restaurant addresses. As a result, we can assign each venue a “health score” based on the proportion of customers that fell ill shortly after visiting it."

Data

"Restaurants in DOHMH inspection database 24,904 Restaurants with at least one Twitter visit 17,012 Restaurants with at least one sick Twitter visit 120 Number of tweets 3,843,486 Number of detected sick tweets 1,509 Sick tweets associated with a restaurant 479 Number of unique users 94,937 Users who visited at least one restaurant 23,459 The model is always evaluated on a static independent held-out set of 1,000 tweets. The model M achieves 63% precision and 93% recall after the final learning iteration. Only 9,743 tweets were adaptively labeled by human workers to achieve this performance: 6,000 for the initial model, 1,176 found independently by human computation, and 2,567 labeled by workers as per M’s request"

Results

Citations for health violations in 15 percent of inspections using nEmesis, compared to 9 percent using the random system. Researchers estimate that these improvements to the efficacy of the inspections led to 9,000 fewer food poisoning incidents and 557 fewer hospitalizations in Las Vegas during the 3 month period of study.

Related Use Cases

Use Case

Predict population health patterns

→

Use Case

Predict hazardous locations and establishments based on open source data

→

←Back to Case Studies