In an effort to reduce the public’s exposure to food-borne illness, the City of Chicago partnered with Allstate’s Quantitative Research & Analytics department to develop a predictive model to help prioritize the city's food inspection staff.
The model calculates individualized risk scores for more than fifteen thousand Chicago food establishments based on real-time, publicly available data. The results of this predictive model are delivered through a web application developed in R using the Shiny package for R. This project was presented at the [data science pop-up] in Chicago (watch the video here). Here’s the backstory of how all this came together. The story, featured below, is an article originally published by Data-Smart City Solutions.
Consider this: Chicago, a city with nearly three million people and more than 15,000 food establishments, has fewer than three dozen inspectors who are in charge of annually checking the city’s entire lot.
When inspectors check this entire lot, 15% of these establishments, on average, earn a critical violation.
Having a critical violation, which generally relates to food temperature control, can drastically increase the odds that a restaurant may start or spread a foodborne illness. Because of the obvious negative effects this can have on a population, efficiently and effectively targeting food establishments with critical violations is a top public health priority.
Chicago’s challenging task to quickly locate and address these violations is a prime candidate for optimization with advanced analytics. It’s also an opportunity that Chicago’s analytics team has been sure to seize as the City pioneers in its use of data.
Chicago committed to build the first-ever municipal open-source, predictive analytics platform when selected as one of five cities for [Bloomberg Philanthropies’ inaugural Mayors Challenge] an ideas competition that encourages cities to generate innovative ideas to solve major challenges and improve city life. The City’s goal was to aggregate and analyze information to help leaders make smarter, faster decisions and prevent problems before they develop.
Chicago committed to build the first-ever municipal open-source, predictive analytics platform
The City’s recently completed pilot program to optimize the city’s food inspections process – conducted by the Chicago Department of Innovation and Technology (DoIT), along with the Department of Public Health (CDPH) and research partnerships with Civic Consulting Alliance and Allstate Insurance – has been a milestone that has yielded striking results. When using an analytics-based procedure, Chicago was able to discover critical violations, on average, seven days earlier than if they had used the traditional inspection procedure.
The results have implications not only for Chicago, but for cities anywhere that wish to optimize inspections processes using advanced analytics. Moreover, Chicago’s collaborative and open method for launching such an initiative provides lessons for other places that wish to start analytics programs of their own.
Understanding the Pilot
Tom Schenk, Chicago’s Chief Data Officer, has had food inspections on his mind for a long time. Nearly two years ago, Schenk held discussions with CDPH around how the two departments could collaborate around the city’s growing analytics program. Following on the City’s goal to build the first municipal open-source, predictive analytics platform with the $1,000,000 award from Bloomberg Philanthropies’ Mayors Challenge, the City had begun constructing the SmartData Platform. To test the application for operationalizing predictive analytics, the City had also already introduced its first analytics pilot program, which used advanced analytics to enhance the Department of Streets and Sanitation’s rodent baiting efforts.
Schenk and CDPH reached out to Civic Consulting Alliance (CCA), a local organization that pairs corporations with City of Chicago departments to engage in meaningful pro bono projects. Through CCA, Chicago met with Allstate, the Chicago-area based insurance giant with a history of community involvement. Allstate has worked with Chicago before, notably as a leading member of Get IN Chicago, a cross-sector initiative to improve neighborhood safety across the city.
With CCA’s help, the city teamed up specifically with Allstate Insurance's data science team—a new kind of relationship for both parties. Allstate Insurance operates Project Lightbulb, which commits up to 10% of team members to pro bono projects. Often, these projects include volunteer opportunities, such as the Adopt-a-Highway program or service in soup kitchens. This project, however, offered an opportunity to let Allstate’s data scientists apply their skills to a research project with a new level of impact.
With a coalition in place, Schenk and the team then began by interviewing CDPH inspectors in order to better understand the logistics of their jobs and see how they interact with data. Chicago identified food inspection reports, 311 service data, and weather data as top candidates for exploration for predictors of food inspection outcomes. The team also used other information on the city’s open data portal, such as community and crime information, to bolster the model.
The importance of Chicago’s open data program to this project cannot be understated. With multiple parties involved, the Chicago coalition needed to work with a centralized, comprehensive, and universally accessible source of data. Fitting this bill, the portal was able to provide a means for partners to easily exchange research and analysis while working on the project. Without the data portal, Chicago’s key datasets would have been located in disparate systems, making this extremely time-consuming and difficult.
In processing and analyzing the data, Chicago found several key predicting variables that, when observed, indicated there could be a considerable likelihood that a restaurant may earn a critical violation. These predicting variables include the following:
- Possession of a tobacco and/or incidental alcohol consumption license
- Possession of a tobacco and/or incidental alcohol consumption license
- Length of time establishment has been operating
- Length of time since last inspection
- Location of establishment
- Nearby garbage and sanitation complaints
- Nearby burglaries
- Three day average high temperature
These predictors were then factored together into a model, which was tested against food inspection procedures via a double-blind post-diction analysis. In other words, after collecting a set of data, Chicago performed a simulation that used this past data to predict what its future outcome would have been under data-optimized conditions.
What does that process mean, then, when put into practice in a government setting?
In September and October of 2014, CDPH performed food inspections using their traditional method, and then handed detailed inspection information over to Chicago’s analytics team. During that trial period, CDPH’s inspectors visited 1,637 food establishments in total. Of these inspections, CDPH found there were 258 establishments—approximately 15% of that total—that had at least one critical violation.
Keep in mind that the goal of the pilot is for analytics to help deliver inspections results faster—so that critical violations may be detected earlier. This is why data was collected over a two month period; September’s and October’s numbers were used as a source for comparison. In September, CDPH inspectors found more than half of those aforementioned establishments with violations—141 of them, or 55%, to be precise. In October, the remainder of that total (45%) was then found.
If 55% of critical violations were found during a normal-operations first half, then what percent of critical violations could be found with an analytics-optimized-operations first half?
It’s worth noting that CDPH inspectors had a good first month—finding more than 50% of violations within the first half of a given time period is always a good sign. But what if, instead of inspecting these food establishments via standard procedure, inspectors first visited all locations that met the key predicting variables listed above?
To find out, the analytics team assigned each inspected food establishment from the trial period with a probability of earning a critical violation, which was based on how many predicting variables each establishment met. For example, if one establishment met 8 out of 10 predicting factors, it would be assigned a higher probability than an establishment that met two out of ten predicting factors. Since this newly ranked “forecast” list was assembled using trial period data, it provided an ideal source for comparison between the two inspection methods.
When using an analytics-based approach, Chicago was able to discover critical violations; on average, seven days earlier than if it had used the traditional inspection procedure.
Chicago’s analytics team found that inspections could be allocated more efficiently with the data-optimized forecast list than they were with the traditional procedure’s list. In the simulation, 178 food establishments, or 69% of inspections with critical violations, were found in the first half.
These numbers mean that had inspectors been using the data-optimized forecast list instead, an additional 37 establishments with critical violations would have been detected during the first month. By detecting these establishments earlier, CDPH is able to prevent restaurant patrons from becoming potential victims of foodborne illnesses.
When comparing the traditional procedure to the data-optimized procedure for inspecting food establishments, the pilot’s results led to a 25% increase in the number of critical violations found in the first half. Given the first half was one month, this 25% increase when measured in time shows us that using a data-optimized procedure for inspecting food establishments leads to finding critical violations, on average, 7 days earlier.
What Does It All Mean?
The success of Chicago’s food inspections pilot does not mean more critical violations are being found, nor does it mean that CDPH will be changing their inspections processes overnight. Rather, the pilot is a key step towards the continued adaptation of advanced analytics into city operations. While the same general critical violations average of 15% is still being found, it is being found faster than before, which brings public health benefits to the city. These results—and the process to obtain them—are similar to Chicago’s aforementioned work on rodent baiting analytics.
When Chicago’s rodent pilot concluded earlier in 2014, its key gain for rodent baiter staff was a 20% increase in the amount of time that staff have been able to spend actually rat baiting (as opposed to figuring out where to bait rats). Chicago’s Department of Streets and Sanitation has already adopted analytics into their operations, with data-optimized lists of rat locations being sent to rat baiting teams on a regular basis.
Both cases show that advanced analytics isn’t an indictment of current operations, but rather an enhancement. Following the completion of the pilot, the food inspections use case is in the process of becoming a more operational mechanism within CDPH.
Forecasting and the Future
Chicago’s new use case is a key marker for the continued development of the SmartData Platform. Each completed pilot provides a new algorithmic model which can then potentially be replicated to apply to other city services and operations.
Furthermore, with each use case, Chicago’s collaborative, multi-partner system for advanced analytics initiatives is becoming more adept and experienced.
While the rodent control pilot pulled primarily from one source of data—311 calls—the food inspections pilot included a broader range of datasets, and required a more complex model to be developed. To accommodate this, Chicago expanded its working base to include more partners on the project. By leveraging the talents of Allstate data engineers working pro bono, Chicago was able to grow—and even help formalize—a partnership-centric working plan. This allows the city to take on projects it would otherwise not have the resources to complete entirely on its own. This new working plan has piqued the interest of other cities as well.
Lastly, in line with its collaborative approach, Chicago has created an open-source repository to share the code used for the pilot, as well as evaluation information, on code-sharing site GitHub. This means that other cities interested in adopting similar projects of their own can begin by using Chicago’s analytics work as a starting point, rather than having to begin the entire process from scratch.
All the data that Chicago used to develop its food inspections forecasting model was from its publicly available data portal. By releasing the model’s source code, Chicago is essentially giving away both the tools and the building materials it used to more quickly locate critical food establishment violations, and is inviting others to use these materials and do the same. The effort hopes to foster increased innovative development and cooperation between cities.
Watch the presentation from the data science pop-up in Chicago where Gene Leynes, from the City of Chicago analytics team and Gavin Smart and Stephen Collins from Allstate discuss technical model development, the application design and deployment, and a model for a successful research partnership.