Originally posted to the Domino Data Science Blog.
Over the past few years, we’ve seen a new community of data science leaders emerge.
Regardless of their industry, we have heard three themes emerge over and over: 1) Companies are recognizing that data science is a competitive differentiator. 2) People are worried their companies are falling behind — that other companies are doing a better job with data science. 3) Data scientists and data science leaders are struggling to explain to executives why data science is different from other types of work, and the implications of these differences on how to equip and organize data science teams.
We recently gathered this community of data science leaders at Rev. There, we shared our vision for “Model Management”, a set of processes and technologies that allows companies to drive competitive advantage from data science at scale. This post is a summary of that talk, which you can watch here, or download the whitepaper here.
Since we started Domino five years ago, we have talked to hundreds of companies that are investing in data science, and heard all about their successes and their challenges.
At various points during that time, we focused on different aspects of the challenges that face data scientists and data science teams.
At every point along the way, we felt like there was something larger we wanted to say, but we didn’t quite know how. Like the parable of the blind men describing different parts of an elephant, we knew we were describing pieces but not the whole.
So about a year ago we took a step back. We had long discussions with our customers to distill and synthesize what makes data science different and what differentiates companies who apply it most effectively.
Our major insight came when we asked ourselves: “what do data scientists make?”
Beyond the hype about AI and machine learning, at the heart of data science, is something called a model. By “model,” I mean an algorithm that makes a prediction or recommendation or prescribes some action based on a probabilistic assessment.
Models can make decisions and take action autonomously and with speed and sophistication that humans can’t usually match. That makes models a new type of digital life.
Data scientists make models.
And if you look at the most successful companies in the world, you’ll find models at the heart of their business driving that success.
An example that everyone is familiar with is the Netflix recommendation model. It has driven subscriber engagement, retention, and operational efficiency at Netflix. In 2016, Netflix indicated that their recommendation model is worth more than $1B per year.
Coca-Cola uses a model to optimize orange juice production. Stitch Fix uses models to recommend clothing to its customers. Insurance companies are beginning to use models to make automated damage estimates from accident photos, reducing dependence on claims adjusters.
Though obvious in one sense, the realization that data scientists make models is powerful because it explains most of the challenges that companies have making effective use of data science.
Fundamentally, the reasons companies struggle with data science all stem from misunderstandings about how models are different from other types of assets they’ve built in the past.
Many companies try to develop and deploy models like they develop and deploy software. And many companies try to equip data scientists with technology like they were equipping business analysts to do queries and build business intelligence dashboards.
It’s easy to see why companies fall into this trap: models involve code and data, so it’s easy to mistake them for software or data assets.
We call this the Model Myth: it’s the misconception that because models involve code and data, companies can treat them like they have traditionally treated software or data assets.
Models are fundamentally different, in three ways:
The companies who make the most effective use of data science — ones who consistently drive competitive advantage through data science — are the ones who recognize that models are different and treat them differently.
We’ve studied the various ways these companies treat models differently, and organized that into a framework we call Model Management.
Historically, “model management” has referred narrowly to practices for monitoring models once they are running in production. We mean it as something much broader.
Model Management encompasses a set of processes and technologies that allow companies to consistently and safely drive competitive advantage from data science at scale.
Model Management has five parts to it:
Each of these facets of managing models requires unique processes and products. When integrated together, they unlock the full potential of data science for organizations.
Data science is a new era of computing. The first era was hardware, where engineers made chips and boards. The second era was software, where engineers made applications. In the third era, data scientists make models.
And like past revolutions in computing, two things are true about the data science era:
Model Management is the set of processes and technologies a company needs to put models at the heart of their business. It’s required because models are different from software, so they need new ways to develop, deliver and manage them. And by adopting Model Management, organizations can unlock the full potential of data science, becoming model-driven businesses.