Skip to content

    MLOps: Machine Learning Operations

    Machine Learning Operations (MLOps) is a relatively new arrival to the world of data science. Barely five years old, it has already become viewed as a critical requirement for organizations in just about every industry and business sector that want to become model-driven by weaving data science models into the core fabric of their business.

    However, organizations are finding that implementing MLOps at the enterprise level is a much more complex problem than just implementing MLOps for a few models or a single team. Scaling data science and MLOps practices swiftly, safely, and successfully across an enterprise requires a broader version of MLOps that encompasses the entire data science lifecycle and meets the requirements of various teams both now and in the future. Enterprise MLOps is a new, robust category of MLOps that solves this problem.

    What is MLOps?

    MLOps is a system of processes for the end-to-end data science lifecycle at scale. It provides a venue for data scientists, engineers, and other IT professionals, to efficiently work together with enabling technology on the development, deployment, monitoring, and ongoing management of machine learning (ML) models.

    It allows organizations to quickly and efficiently scale data science and MLOps practices across the entire organization, without sacrificing safety or quality. Enterprise MLOps is specifically designed for large-scale production environments where security, governance, and compliance are critical. 

     

    How Did We Get Here: The Journey to Today's Enterprise MLOps

    Until a decade ago, the majority of work done in machine learning (ML) was experimental due to limitations in computing power. As it became practical to process vast amounts of data, those companies that were able to transition experimental ML models into production reaped huge rewards – but these successes were exceptions, not the norm.

    The majority of projects stumble when models are transitioned from the data scientists to the production engineers for a wide variety of reasons including the need to recode models into different languages for deployment (e.g. Python/R vs Java), inability to recreate the data used for training in production, and no standardization in the deployment processes.

    This is because the majority of companies are still using what Deloitte describes as an "artisanal" approach to ML development and deployment. This lack of scalable patterns and practices delays the value of data science. This is borne out by the results of a recent survey by DataIQ where one-third of respondents reported that it took months to get models into production. Visibility into project projects is also limited, with over 45 percent of respondents providing no or periodic updates. In another survey, 47 percent of ML projects never get out of the testing phase. Of those that do, another 28 percent fail anyway.

    To overcome these challenges the data science community looked to DevOps (or Development Operations) from the software engineering field for inspiration. Many of the concepts focusing on shortening development time, and increasing speed and quality were adopted. However, because data science and application development produce very different products, a new practice, MLOps, was born.

    What Are the Benefits of Enterprise MLOps?

    One of the key benefits of enterprise MLOps is the ability to generate rapid business value within your organization through the repeated deployment of different models and the associated continuous monitoring of those models. This results in the following concrete examples in practice:

    Business Integration

    Successful ML projects are characterized by the fact that all responsible employees are aware of the benefits of ML technology from the beginning of a project. In addition, the challenges associated with the implementation of a model should be known. Through a structured integration of MLOps, ML models can be used successfully in the long term, and existing applications can be updated and exchanged at any time.

    Technical Integration

    With an MLOps process flow, short development cycles and quality assurance are guaranteed due to the fact that technical development, testing, and integration steps are largely automated. Because of this, all processes are monitored effectively from the beginning of each project.

    Scalability

    Experience shows that the use of scalable platforms for ML applications is worthwhile in practice. The advantage of these is that they map the entire lifecycle of a model and offer the possibility of continuous improvement from development to implementation.

    Additional Benefits of MLOps

    Other benefits beyond business integration, technical integration, and scalability are as follows:

    • Rapid deployment of multiple models through automated processes
    • Accelerated time-to-value by building and deploying models faster
    • Increased productivity due to improved cooperation and the reuse of models
    • Reduced risk of using unproductive models

    With enterprise MLOps, everything from data analysis and data processing to scalability and tracking can be made more efficient.

    MLOps vs. DevOps: ML Models Are Not Software Applications

    To understand why DevOps doesn't meet the needs of data science, it's important to understand the key differences between models and software applications. Both involve code and are saved as files, but the behavior of software is predetermined whereas model behavior changes over time.

    The materials used to develop them are different

    They involve code, but they use different techniques and different tools than software engineering. Unlike software, they feed on data as a critical input. They use more computationally intensive algorithms, so they benefit from scalable compute and specialized hardware like GPUs. And they leverage packages from a vibrant open source ecosystem that's innovating every day.

    The process to build them is different

    Data science is research — it's experimental and iterative and exploratory. You might try dozens or hundreds of ideas before getting something that works. Often you pick up where another team left off, with their work being a jumping-off point of discovery and innovation. To facilitate breakthroughs, data science teams need tools to test many ideas, organize and preserve that work, and search and discover it later.

    Their behavior is different

    Models make predictions based on the data they are fed. They have no a priori correct behavior — they can just have better or worse behavior when they're live in the real world. Unlike software which never needs retraining or updating unless the business process changes, models do. Model performance can change as the world changes around them, creating risk from unexpected or degraded behavior. So organizations need different ways to review, quality control, and continually monitor them to control risk.

    Roles in an Enterprise MLOps Team

    When properly scoped an Enterprise MLOps platform supports the needs of everyone involved in the data science lifecycle. While the composition of any Enterprise MLOps team is going to vary from one organization to another, most members take on any of seven different roles:

    Data Scientist: Often seen as the central player in any MLOps team, the Data Scientist is responsible for analyzing and processing data. They build and test the ML models and then send the models to the production unit. In some enterprises, they are also responsible for monitoring the performance of models once they are put into production.

    Data Analyst: The data analyst works in coordination with product managers and the business unit to uncover insights from user data. They typically specialize in different types of tasks, such as marketing analysis, financial analysis, or risk analysis. Many have quantitative skills comparable to those of data scientists while others can be classified as citizen data scientists that have some knowledge of what needs to be done, but lack the coding skills and statistical background to work alone as data scientists do.

    Data Engineer: The Data Engineer manages how the data is collected, processed, and stored to be imported and exported from the software reliably. They may have expertise in specific areas, like SQL databases, cloud platforms, as well as particular distribution systems, data structures, or algorithms. They are often vital in operationalizing data science results.

    DevOps Engineer: The DevOps engineer provides data scientists and other roles with access to the specialized tools and infrastructure (e.g., storage, distributed compute, GPUs, etc.) they need across the data science lifecycle. They develop the methodologies to balance unique data science requirements with those of the rest of the business to provide integration with existing processes and CI/CD pipelines.

    ML Architect: The ML Architect develops the strategies, blueprints, and processes for MLOps to be used, while identifying any risks inherent in the life cycle. They identify and evaluate the best tools and assemble a team of engineers and developers to work on them. Throughout the project life cycle, they oversee MLOps processes. They unify the work of data scientists, data engineers, and software developers.

    Software Developers: The Software Developer works with data engineers and data scientists, focusing on the productionalization of ML models and the supporting infrastructure. They develop solutions based on the ML architect's blueprints, selecting and building necessary tools and implementing risk mitigation strategies

    Domain Expert/Business Translator: A Domain Experts/Business Translator has deep in-depth knowledge of business domains and processes. They help the technical team understand what is possible and how to frame the business problem into an ML problem. They help the business team understand the value offered by models and how to use them. They can be instrumental in any phase where a deeper understanding of the data is crucial. 

    MLOps and the Data Science Lifecycle

    There are four phases in the data science lifecycle:

    1. Manage: This stage focuses on understanding the objectives and requirements of the project and prioritizing the work.
    2. Develop: This is where data scientists build and assess various models based on a variety of different modeling techniques.
    3. Deploy: This stage is when the model moves into a state where it can be used within business processes for decision making.
    4. Monitor: This is the operational phase of the lifecycle where organizations ensure that the model is delivering the expected business value and performance.
    A Guide To Enterprise MLOps
    The Four Phases of the Data Science Lifecycle.

    Today, most MLOps platforms just provide a stable platform for data science, and data engineering typically focused on the production side of the data science lifecycle. They help prevent models from degrading due to unplanned or inconsistent refresh cycles, without the constant monitoring models would normally require. They're also used for testing and validating models. The diagram below outlines how MLOps works within the data science lifecycle.

     

    The stages of MLOps in an enterprise organization

    MLOps in the Data Science Lifecycle

    Developing an MLOps Strategy

    A successful MLOps strategy consists of multiple components, such as the following:

    Unified Experiment Management

    As mentioned before, MLOps usually involve several different roles working together. For this reason, an ML team needs a centralized platform for model training and evaluation. This provides a central hub for the entire team to access, facilitating better cross-team communication and the ability to quickly bring up one another’s work.

    Automated Training and Comparison

    Automation is a key element of MLOps because the number of ML models, experiments, and tests is so large that it makes manual management difficult. Establishing an automated pipeline for training, optimization, and testing of ML models helps to reduce the time of an iteration and speeds up the deployment time of a model to production.

    Automated Deployment

    After an ML model has been successfully trained and validated, it needs to be deployed to production. Over the entire lifetime of an ML project, this step can be repeated multiple times. Every time an improvement is made to the model (eg in the form of hyperparameter optimization or retraining with new data), the model must be redeployed. Before a model can be deployed, the process will likely involve a human-in-the-loop review of the model’s performance.

    However, to speed up the deployment process, an automatic test of the model could be established that checks whether the new version of the model satisfies the acceptance criteria. Once the model passes the test, the new model is automatically deployed to production.

    Automated Monitoring

    Even after an ML model has been deployed to production, MLOps doesn’t end there. Models in production need to be constantly monitored. Monitoring service health takes on even greater importance in MLOps than in DevOps due to the model’s performance degradation over time. This degradation is caused by natural changes in the input data the model receives that differ from the data the model was previously trained on during development.

    Expanding on MLOps for the Enterprise

    Organizations have realized that even if they have implemented some level of MLOps, there are still things standing in the way of safely and universally scaling data science.

    • Inflexible Infrastructure. Data scientists are unproductive without access to powerful compute, high-value data, and the latest open-source tools. Even worse, time spent on DevOps tasks with bespoke tools and hardware reduces innovation. Many surveys have established that data scientists work with data and infrastructure 80% of their time, leaving little bandwidth for analysis and insights.
    • Wasted Work. Data scientists often work independently and with many different tools. Low standardization and visibility of work create duplicate effort, barriers to collaboration, and poor reproducibility. A recent Forrester survey of 467 enterprises found that 39% of respondents claimed IT and developers "don’t collaborate at critical stages of the AI journey if they ever collaborate at all.”
    • Production Pitfalls. Recent Gartner research shows that only 53% of projects make it from AI prototypes to production. Many data science models stop performing well in production due to issues such as data drift. The lack of repeatable processes from deployment to monitoring adds hidden costs and unnecessary complexity, delays and compliance risk.

    Tackling these three challenges requires a discipline that looks beyond the deployment portion of the data science lifecycle, which is where MLOps platforms have focused to date. It requires enterprise-grade capabilities that allow projects to progress through the end-to-end data science lifecycle faster and provides for safely and universally scaling data science with the requisite security, governance, compliance, reproducibility, and auditability features. For these reasons, leading organizations are adopting Enterprise MLOps practices and enabling platforms.

    Capabilities of an Enterprise MLOps Platform

    An Enterprise MLOps platform needs to serve the requirements of all of the different members of the MLOps team, the organization's management, its workflows and lifecycles, and the continued growth of the organization as a whole. Enterprise MLOps capabilities can be thought of in two ways: tooling enhancements and process transformations.

    Tooling enhancement capabilities include:

    • On-demand access to data and scalable compute
    • On-demand access to centralized tooling
    • User access control and security
    • Version control and reproducible research

    These capabilities dramatically increase productivity for data science and IT teams as well as provide storage and organization of all data science artifacts including data sources, data sets, and algorithms for reproducibility and reusability. They allow IT to manage infrastructure and costs, govern and secure technology and data, as well as enable data scientists to self-serve the tools and infrastructure they need.

    Process transformation capabilities include:

    • Collaboration
    • End-to-end orchestration of the data science lifecycle
    • Project management
    • Knowledge management and governance.

    These capabilities are what allow organizations to safely and universally scale data science by making the most efficient use of resources, building on prior work, providing context, and enhancing learning loops. Everyone uses consistent patterns and practices regardless of how or where the model was developed. All together they eliminate manual, inefficient workflows across all the activities of the data science lifecycle creating momentum that increases model quality, reduces the time required to deploy successful models from months to weeks, or days, and instantly notifies of changes in model performance so models can be quickly retrained or replaced.

    Everyone learns from the successes and failures. Collaboration also includes engaging with the business in a non-technical manner so they can understand the projects and outcomes. Finally, data science leaders can easily manage workloads and track project progress, impact and cost.

    When these tooling and process transformation capabilities are all available, an Enterprise MLOps platform optimizes the throughput across the data science lifecycle, driving more models from development into production faster, while keeping them at peak performance and providing the tools and knowledge needed to repeat the cycle.

    Whitepaper

    Learn the keys to becoming a model-driven business

    This whitepaper introduces a holistic approach to scaling the production of models across modern enterprises through the underlying technologies and guiding principles found in Enterprise MLOps.

    Download the Guide

    Core Components of the Domino Enterprise MLOps Platform

    The Domino Enterprise MLOps platform is feature-rich and designed to handle the needs of model-driven organizations using state-of-the-art data science tools and algorithms. The platform provides three critical functions for modern data science teams:

    As a system of record, Domino captures all data science work in a central repository, so your team can easily find, reproduce and reuse work. Gone are the days of data scientists starting projects from scratch only to find out another team member is working on the same problem. Instead, knowledge is compounded with reusable code, artifacts, and learnings from previous experiments, integrated project management capabilities, and the ability to replicate development environments.

    As an integrated model factory, Domino supports the end-to-end data science lifecycle from ideation to production: explore data, train machine learning models, validate, deploy, and monitor. Then rinse and repeat – all in one place. Enable repeatable processes and workflows that get models into production faster, enable automated monitoring, retrain and republish models more often, and much more – all designed to reduce friction and increase model velocity on your way to becoming a model-driven business.

    And finally, as a self-service infrastructure portal, Domino automates the time-consuming DevOps tasks required for data science work at scale. With only a few clicks you can spin up a development sandbox pre-loaded with your preferred tools, languages, and compute, including popular distributed compute frameworks. Jump between environments, bring in more data, compare experiments, deploy and iterate on models, and just be more productive with a platform optimized for code-first data science teams.

    Domino Enterprise MLOps Platform

    Benefits of Domino's Enterprise MLOps Platform

    Customers who have adopted the Domino Enterprise MLOps platform consistently point to three primary reasons that have allowed them to effectively scale data science:

    Open & Flexible

    Domino supports the broadest ecosystem of open-source and commercial tools and infrastructure. Unlike SageMaker which is AWS-specific, or Databricks which is tied to Spark, Domino is an open system. Domino’s unique architecture supports on-premise, cloud and hybrid environments for maximum flexibility. Domino supports the latest tools, packages, and compute frameworks such as Spark, Ray, and Dask.

    Built for Teams

    Domino is designed for data science at scale. Teams using different tools can seamlessly collaborate on projects and rely on Domino to automatically track all data science artifacts. Domino establishes full visibility, repeatability, and reproducibility at any time for every use case. Dashboards let managers set project goals and inspect in-flight work.

    Integrated Workflows

    Domino integrates workflows to accelerate the end-to-end data science lifecycle from experimentation to production. For example, Domino automatically sets up prediction data capture pipelines and model monitoring for deployed models to ensure peak model performance. Domino’s integrated approach ensures everyone involved in data science can maximize their productivity and impact.

     

    The Model-Driven Future with Domino Data Lab Enterprise MLOps

    In just a few short years, data science has brought us self-driving cars, risk analysis engines, Alpha Go, movie recommendation engines, and even a photorealistic painting app. Where data science takes us from here is anyone's guess (specifically, an innovative and well-researched guess).

    The companies that scale ML innovation over the next decade will be those that are model-driven, making money on their projects, building on each subsequent success, learning faster, developing more efficiently, reducing costs, and minimizing poor outcomes.

    Does your company strive to become model-driven? Work with Domino Data Lab to ensure your company's success. To see the Domino Enterprise MLOps Platform in action, you can watch a demo.

    Related Enterprise MLOps Resources

    Blog
    How Enterprise MLOps Turbocharges Data Science: 4 Real-World Use Cases
    Blog
    7 Key Roles and Responsibilities in Enterprise MLOps