Visit Domino News for press releases and mentions.
Visit the Data Science Blog to learn about data science trends, tools, and best practices.
By Josh Poduska, Chief Data Scientist, Domino on January 19, 2021 in Perspective
The evolution of technology, much like the evolution of life on our planet, has been characterized by steady progress interspersed with occasional mass extinctions and bursts of innovative new life. We are fortunate to be experiencing one of those evolutionary inflection points. Innovations in data science and AI abound and are already changing the very nature of business and life. With this technological boom comes real risks for organizations who do not acquire the survival traits of the new era.
Many executives and analytical professionals lack vision as to where this process is taking us. Their definition of the current era we live in, what we are calling the MLOps era, is constrained. This causes them to entrench technologies and adopt processes (traits, in the evolutionary sense) that will limit their ability to compete in the new marketplace. And what of the next evolutionary era for enterprise AI? Those who arrive first will find themselves atop the food chain. In this post we provide a definition of MLOps and discuss how top analytical enterprises are already evolving beyond our current era. We give insight into what the next era will look like, and, importantly, what kind of organizations will survive and thrive.
The MLOps Era
As we begin 2021, the data science, ML, and AI industry is currently in the early days of the MLOps era. This is an exciting evolution built on the innovations of the past. It seeks to solve the last mile problem of getting more data science products into production – to operationalize ML and AI. It promises modularized and reusable components to meet that end. It adopts principles from DevOps and software engineering such as CI/CD methods that are modified to fit the needs of data science work. It emphasizes clean data pipelines to support the operationalization process. Importantly, each of these MLOps era aspects is experiencing a transition to cloud workloads. It is helpful to note the stages of the data science lifecycle that are emphasized in this era. They are the Validate, Deploy, and Monitor stages.
The Data Science Lifecycle
This is the prevailing view of the MLOps era, but we are early in this era and have work to do before we transition to the next big thing. In particular, we will see a focus in academia and in industry on rounding out the validation and monitoring aspects of the data science lifecycle.
Validation is currently ahead of monitoring in its progress. Academia has published a large amount of research on explainability, ethics, and bias in models. Industry is translating those ideas into tools and processes used by leading organizations today. There is still a lot of work to do, and model validation will be much more mature by the time we leave this era.
Academia is later to the game when it comes to model monitoring. This is partly because much of it is a solved problem, at least in an academic sense. However, we have plenty of unanswered questions about the best way to apply known monitoring principles to models in production. Even when best practices are established, implementation will not be trivial; in order to be effective in an MLOps sense, it must complete the feedback loop to retrain or completely rebuild models. Both academia and industry have work to do before model monitoring is on a solid foundation, but they will get there by the end of this era.
Current and Future Aspects of the MLOps Era
Current Focus of the MLOps Era
- Modularized and reusable components
- DevOps and software engineering principles applied to data science
- Data integrity
- CI/CD (continuous integration and continuous delivery or deployment)
- Functional model validation
- Functional model monitoring
- Trend toward cloud workloads
Future Focus of the MLOps Era
- Model validation will expand to include explainability and ethics
- Model monitoring will leverage statistical principles and integrate with the research process, completing the feedback loop
The Unnamed Future
For all the energy and talk around MLOps, MLOps is still about the “Production” half of the lifecycle. MLOps requires us to figure out complicated APIs and stitch together services and technologies. MLOps is about the how. This future, unnamed era will apply some of the same principles of the MLOps era to the “R&D” half of the lifecycle. That’s when we enter the new era which will be characterized by efficiencies of scale.
As one forward-looking data science leader who is already building for this future era put it,
“Ten years ago, data was our competitive advantage. Then it was our models. Today it is our process.”
With the tools of the MLOps era in place, the next era will be about process. That doesn’t mean technology will not play a part. Innovations will help leaders and teams operationalize the rest of the data science lifecycle. We’re not talking about auto-ML or an easy button for data science. It is about standardization. Operationalizing the Ideate, Analyze, and Develop stages of the lifecycle means providing tools to capture and share institutional analytical knowledge. It means providing a way to automatically track research and reproduce it with the click of a button. It has a lot to do with data science portfolio management and establishing a hierarchy of needs. It will see a focus on data science project management. It will be characterized by holistic asset management, from models to datasets to images and all things in between. It involves tracking the business value of data products. In short, it’s about emphasizing the science in data science - providing structure so teams can operate like a group of collaborative research scientists.
The Future Era - It’s About the Process
- Data science as a true enterprise capability
- Institutional knowledge management
- Optimize research team workflows across the entire lifecycle
- Collaboration for analytical professionals
- Maximize business value with finite resources and a big talent gap
- System of record
- Top-down portfolio view
- Project management
Traits Needed to Survive the Next Mass Extinction
Based on this vision of what MLOps will look like by the end of the current era and what the next era will bring, we see three strong traits organizations and companies must develop in order to survive and thrive in the next era. Some are already developing them.
Traits of the Future Era
- Lifecycle management
- Knowledge management
- Portfolio management
Lifecycle management will be the end state of organizations that effectively adapt during our current era, the MLOps era. It leverages the principles and tools of MLOps as a means of optimizing the process of getting models into production. It sets an organization up to extend those principles to research teams across the entire lifecycle.
Knowledge management relies on a well-defined system of record strategy for data science work. This will be made possible by technology – data science platforms to be specific – but will also require leadership to standardize work while not stifling creativity. Knowledge management systems will be the spark of inspiration for analytical breakthroughs. It will provide a compounding effect in the value created by data science teams.
Lastly, those who adopt an effective portfolio management strategy, with the tools to support it, will finally realize their hierarchy of needs vs. today’s bottoms-up approach of data science project work. All leaders, from team leads to the C-suite, will have visibility into data science projects and research. Tracking business value will become a reality. The analytical engine of organizations will finally begin to fire on all cylinders.
Organizations should embrace the current MLOps era while simultaneously laying the foundation for the next. Embrace the operationalization mindset of today. Build the pipelines. Invest in the right talent. At the same time, begin to experiment with standardization of analytical research. Begin to think about and test knowledge management principles and tools. Be cognizant of your analytical portfolio, where it came from, and how to manage it. Read about how to structure your organizations for success based on the latest research from top analytical leaders. Taking these steps now will be the key to winning in the coming decade of analytical evolution.