How Morningstar’s Data Science and IT teams are setting the stage for scale
By Karina Babcock, Director, Corporate Marketing, Domino on March 11, 2020 in Perspective
Editor’s note: This is part of a series of articles sharing best practices from companies on the road to becoming model-driven. Some articles will include information about their use of Domino.
Just five years ago, Morningstar’s Quant Research team (the company’s centralized data science function) consisted of a handful of data scientists who rolled out a few models a year. Back then, Lee Davidson, who leads Morningstar’s Quant Research team, says one of the biggest challenges was identifying a problem data science could solve and getting the budget to solve it.
That’s no longer the case; the business is becoming increasingly model-driven across all facets. Davidson’s team consists of nearly 70 quantitative researchers, data scientists, and research engineers today, and in the last year they put more than 50 new models into production. One of their most significant challenges now? How to deploy and maintain these models at scale.
Below is a summary of a few key points discussed during that session.
Jeff Hirsch: From a working relationship, the key to me is our touchpoints and how we can integrate early and often. I think setting conventions for how models are built and where data is going to reside, and getting aligned on the handoff, will pay dividends. For example, we’ve developed a checklist for the handoff when models go to be productionalized. And as part of our 2020 roadmap, we’re looking at how we can create a convention-based approach for coding to decouple things or scale them more efficiently.
Lee Davidson: We’ve also started defining project stages. We’ve developed an internal vernacular with five project stages:
Exploration and active research
Launch and maintain
We try to categorize every project into one of those five buckets, and the way we manage those projects changes depending on the stage. We’ve also adopted agile. When we’re in the first phases [exploration and research], we’re doing what in agile parlance is called a “spike”: Data scientists can research something for a month or two, and if it looks promising and we want to continue we can. Once we settle on a solution that we want to develop, that’s when we start moving more in an agile fashion and use a lot of software engineering principles and best practices. We’re trying to get people with more engineering backgrounds as early in the process as possible, and we have a data engineer, a data scientist, and a software engineer working together on a problem during the first four project stages.
The handoff to IT
Lee Davidson: Anyone who’s been doing data science for the past few years knows if they have a successful project, there is a decent likelihood that your model will get out of the lab, and now you’re on the hook for maintaining it. Several years ago, we were flying by the seat of our pants, which created a lot of pressure on researchers to maintain what they produced. Today, we’re trying to clarify upfront what the researcher is expected to do and what the handoff looks like, whether it’s to a different person or a different team. For example, are researchers responsible for delivering production-level and thoroughly tested code, or will they only need to provide a prototype that someone else is going to implement? We’ve also broken down the division of labor, identifying who maintains what, how frequently models run, and how they are trained.
Jeff Hirsch: If you look at how support works from an IT org, you usually have three tiers.
Tier 1 is a technical operations center;
Tier 2 is some sort of DevOps with a little bit more app-specific expertise; and
Tier 3 goes to the developers or the quants of the model in this case.
The more we can push up to Tier 1 and Tier 2 is the approach we’ve been taking. One of the things we’ve been doing is establishing relationships to share Jupyter notebooks and ensuring our QAs understand the results of the notebooks. The overall spirit is that the IT organization needs to move closer toward the Quant organization in learning some of their skills, and the Quant organization needs to move closer to development and engineering and learn some of those skills.
Prioritizing new projects
Lee Davidson: We work with IT in the typical way that you would see departments working together. Doing roadmap planning together, we sit in the war room, draw stuff out on diagrams, look at resources and the like, and we hash all that out months in advance. Additionally, monthly Jeff [Hirsch], myself and P&L leaders talk about what we’re working on and the priorities, and discuss any changes.
Jeff Hirsch: The commercial heads of the various product suites, such as our risk model, also provide feedback at monthly meetings. For example, they may want the risk model to include fixed income data. So then the question is, how do we prioritize getting that data into the data lake and updating the model to leverage it? We respond to that kind of feedback in an agile manner and try to adjust our roadmap. It’s definitely a collaboration.
Reproducibility and auditability
Lee Davidson: One of our QA practices when we’re retraining models or running health checks is saving the results for querying later. Many of our clients want to validate the models independently, so we often need to provide those model statistics in conjunction with the insights and analytics. Having a good QA plan upfront that’s specific to your use case is critical. Another aspect is tracking all the data transformations. We’re trying not to have researchers maintain the models that they build. That means people who didn’t build them are maintaining something they aren’t intimately familiar with, so we’re building in processes to save the output at different transformational steps.
Jeff Hirsch: For our data lake, we want to know what files in the lake were leveraged by which models. We have certain principles that we adhere to; immutability is one of them. Once something hits the data lake, it never gets changed; we just keep adding new files or deltas, and so tracking that is a huge thing. From a model perspective, the audit trail in my mind is essential to debugging. You’re taking steps to make sure that a model is working the way that it should, and writing that information out to disk and persisting that is something that can add value as you’re doing it and then as a side effect it becomes part of it.