At Domino, we’re fortunate to be able to work with the leaders of the world’s largest data science organizations. One thing that’s really stood out is that even the most sophisticated organizations struggle to understand and manage the depth and breadth of their data science investments. This situation gets even worse when you expand “data science” to include the broader big data, BI, and analytics capabilities that organizations are investing in. The complexity stems from the explosion in new tools, capabilities, and processes over the last five years, and many CIOs and CDAOs don’t actually know which systems are driving which business processes, and which teams are getting stuck vs. delivering value.
Software has a robust “Software Development Lifecycle” that’s been well-matured over the last two decades. Data science needs its own “Data Science Lifecycle”, and later this year we'll be introducing the Model Velocity Assessment. With it, data science leaders will be able to gauge their maturity across the four stages of the data science lifecycle. We’re finding that even the most mature companies have areas they can improve upon to help make data science more scalable and valuable to their business.
That’s why we partnered with DataIQ to survey their membership of data and analytics professionals to understand more about their approaches to data science. Of businesses that can measure the business benefit of data science, about one in four organizations expect data science to impact topline revenue by more than 11%. For these companies, their investment in data science and their emphasis on making it a first-class corporate function are already paying significant benefits.
Unfortunately, almost a third of those surveyed (29.1%) don’t know what business benefit they can expect from data science, and of those that can, over half (57.3%) believe that data science is only driving nominal gains of less than 5% impact on annual revenue. Digging deeper into the data suggests that the root cause stems from the lack of a model-driven culture where business stakeholders, data science practitioners, and IT have a close bond and common goals.
Consider some of the numbers from the survey:
- Two out of five organizations (39.5%) see a ‘Weak understanding or support for data science in business’ as their biggest challenge.
- One out of eight (12.8%) sees ‘use cases not compelling’ as one of their biggest challenges. This result isn’t surprising given the weak understanding of data science as highlighted in the previous point.
- One out of three organizations (33.7%) sees ‘Conflict between data science and IT’ as one of their biggest challenges.
- Even companies that rate themselves as ‘Advanced’ or ‘Reaching maturity’ in terms of their level of adoption of data and analytics are not immune to conflict. For both of these groups, ‘Conflict between data science and IT’ is their biggest challenge (52.4% and 50% respectively).
Models are a New Type of Digital Life
The challenges identified in the survey are a direct result of the way data science is done within many companies. For example, department leaders fall in love with the idea that business analysts (who lack formal data science training or experience) can suddenly use automated machine learning tools to create and deploy models. From our experience, these tools can be used to solve some basic business challenges, but only when the “citizen data scientists” that use them are paired with expert (and properly trained) data scientists who can validate their work. Companies that go too far and think they can replace expert data scientists often wind up with problems related to weak understanding and support.
‘Conflict between data science and IT’ manifests itself in many ways, and it often starts because IT treats data science models like they do other software projects. Models require re-training, are developed in an experimental fashion, and are made using many different software tools. There is no need to “retrain” software code, but production models need to be retrained frequently. Companies that realize the most significant value from data science understand that models are a new type of digital life that calls for different people, processes, and platforms.
We also see many conflicts stem from inadequate access to the scalable infrastructure that data science teams need to work with larger datasets and more complex algorithms to address a broader set of use cases. The solution for data scientists is often ‘shadow IT’ workarounds where work and deployed models exist on private servers or in the public cloud outside of IT’s purview – at the expense of visibility, governance, and security.
Establish a Positive Culture for Data Science
Fortunately, the survey results also provide several suggestions about how to help establish a data culture to allow data science to flourish. Since many organizations have trouble measuring the benefits of their data science efforts, it’s not surprising that ‘better metrics for ROI on data science’ is at the top of the list (39.5%). But, following it are three suggestions that will help to establish a positive culture for data science.
- 39.5% want a ‘clearer definition of needs state from stakeholders’
- 38.4% want ‘training across [the] business in data science concepts’
- 32.6 want a more ‘positive relationship between stakeholders and data science’
Understanding the barriers that inhibit scalable data science is one of the first steps toward becoming one of the ‘advanced’ businesses in this survey. By working together, business stakeholders, data science teams, and the IT community that supports them can better understand each team’s unique needs, establish clear metrics, and focus on the most impactful use cases.
You can view the complete survey results here.