By David Bloch, Data Science Evangelist, Domino on June 17, 2020 in Perspective
Deploying machine learning models in a repeatable, scalable manner requires an understanding that the algorithms and techniques that underpin models are rapidly evolving and are managed differently than traditional software development tools.
With significant advances happening in the open-source community, the tools, techniques and algorithms that your data scientists use today to solve business problems will undoubtedly change very soon. Of course, not all change is good, and in many situations, implementing techniques and approaches that aren’t understood can lead to catastrophic failures.
Here are some guiding principles to ensure your company increases its chance of being successful in its machine learning practices.
1) Encourage your data scientists to experiment with and explore new techniques.
Maslow’s hammer talks of a cognitive bias that involves using a familiar tool: “If all I have is a hammer, every problem looks like a nail.” A significant challenge at the start of many data scientists’ careers is becoming comfortable with new approaches and techniques. In academic institutions, they received guidance and had the opportunity to validate the path they were taking with supervisors. At your organization, being able to confirm a new approach, test it against other methods, and ensure it is appropriate for the problem at hand becomes a critical step for data scientists to grow comfortable with new technologies and techniques.
The role of a data science leader is to ensure that their teams have the time, space, and assistance to learn new algorithms, as well as real scenarios in which a machine learning model would be applicable.
The benefits of getting this right are that data scientists become more comfortable exploring outside the boundaries of their existing knowledge. Not only does this improve their understanding of potential modeling techniques, but it also becomes a critical step in assessing new problem statements and thinking about the most appropriate solutions.
2) Make it easy for your data scientists to test new technologies.
As machine learning packages, algorithms and underlying infrastructure are continually evolving, flexibility and scale need to be at the heart of your machine learning practice.
But for many data scientists, the hurdles and barriers in trialing new tooling make that impossible to do. Restrictive environments or legacy platforms that do not encourage openness and extensibility create massive overheads in trialing new approaches—so much so that, in many cases, data scientists either give up or utilize their laptops and desktops to bypass the problem (“shadow IT”). Any new techniques they test become problematic to implement, which significantly disrupts the ability to productionize and scale their models.
Centralized and open data science platforms make it easier to implement new environments and keep a contrast between gold-standard, production-worthy environments and evolve research and development environments that seek to test new coding approaches, techniques, or underlying technologies. By lowering the barrier to entry, you can ensure that time spent assessing new methods is primarily spent on the model outputs and inputs, rather than configuring machines that make it possible to run the codebase.
3) Have a process for validating and reproducing new modeling approaches.
Not all change is good, and not all techniques will be worth implementing. Your data scientists must be able to validate the way a model works and explain or interpret the results correctly. To do this, they must be able to reproduce results and create visibility into the inner mechanics of why a model produces those results. Being able to offer interpretation as to why the results of a model are what they are and what input variables are the most significant is essential to creating change management programs that enable successful model implementation.
Data scientists must be able to understand and clearly explain the algorithm they’re using within the model before they implement it. New models should only be put into production when data science teams have the knowledge they require to implement the model safely and limit the possibility of unintended consequences.
4) Prepare for anything.
Predicting the future as it relates to machine learning isn’t possible, even for the models themselves. All aspects of machine learning are prone to change, be it the algorithms, the underlying compute resources that best suit them, or the frameworks companies need to implement them successfully.
Encouraging a culture of experimentation and learning at the heart of your data science practice is the only way to ensure that the knowledge they bring to your organization remains current, and that the techniques and approaches they use to solve business challenges are appropriate.