To build or not build. That is the question.
We see many organizations struggling with whether or not to build a data science platform. But this is too narrow a question. For most organizations, what is really important is scaling data science. In fact, a study by Accenture found that 75% of business executives believed they would go out of business in five years if they didn’t scale data science.
So the question they should be answering is: What platform will help us accelerate our ability to scale data science?
Organizations are rapidly realizing that they must remove infrastructure friction, eliminate siloed work, and accelerate the velocity of models into production in order to scale data science across the organization. This is because productivity drops dramatically when team members are using different tools, struggling to set up environments, storing data science artifacts in different places, and manually shepherding projects through the data science lifecycle. Similarly, when IT has to support bespoke data science hardware and tools, and recreate deployment processes for each model, their support burden and cost escalates.
That’s when someone inevitably says, “Stop the insanity! We need a data science platform!”.
And that’s true - they do need a platform to scale data science. But to do it correctly they need a platform that orchestrates the end-to-end data science lifecycle, provides flexibility and turbocharges productivity. As showing in the figure below, the platform needs to provide basic requirements such as on-demand access to data, scalable compute, and tools. It also needs to provide process transformation requirements such as project management, reproducibility, knowledge management, and governance.
Unfortunately, what we often see is organizations only including self-service data, compute, and tools in the scope for their platform. That appears easy to deliver and early progress is exciting and shows promise. But that only solves one piece of the scale problem - and only for a short period of time. Often this limited platform limps along for a while and then is abandoned because too many capabilities are missing. Other times, additional requirements will be attempted but capabilities like security, version control, collaboration, orchestration, knowledge management, and governance are hard to deliver quickly, particularly if they were not scoped in the project initially. The platform becomes a never-ending development project that takes years to deliver value.
Building a platform to deliver all the requirements takes significant resources to scope, build, and manage, particularly the high-value process transformation requirements. It can also take significant time - we have seen customers spend years trying to deliver the high-value requirements. Organizations also need to plan for the constant evolution of the tool ecosystem and deployment options. That requires ongoing support and a dedicated development team.
And there is no time to lose, as time is money for data science. Delays in delivering models to the business come at great opportunity cost.
So if the next sentence after “We need a data science platform?” is “Let’s build it! It will be faster/easier/cheaper/better!” stop and really think it through. Can your organization truly deliver a robust data science platform that meets all the needs of your organization? Can you support it over the long term? Can you deliver it in months, not years?
Even if you answer yes to all these questions, is the effort and investment required going to deliver competitive differentiation to your organization? Or is the real competitive differentiator scaling data science? That’s the conclusion our customers have come to.
“Our leadership directive is, if it’s not a differentiating capability, we shouldn’t be building it; we should be looking to buy it. In my experience, there is initial excitement about building in-house tools, and they’re great for two years, and then by year three, nobody cares about maintaining them anymore.”
Senior Director of Decision Sciences, Software Services
SOURCE: Forrester Total Economic Impact of the Domino Data Lab Enterprise MLOps Platform
7 key questions to ask before building a data science platform
To help you decide whether building or buying is the right approach for your organization, you need to answer these seven questions:
- What capabilities are needed to scale data science?
- If we build a platform, how long will it take to fully deliver those capabilities?
- Are our needs really unique compared to commercially available platforms on the market?
- Can we deliver the level of focus, commitment, skills, and funding needed to develop, support, and augment a platform over the long term?
- Can we future-proof our platform so we don’t have to start over or do significant rework as requirements change?
- What is the difference in opportunity cost between building and buying a platform?
- Is a data science platform a core competency we should support in our organization?
For more information, check out our new whitepaper, The True Cost of Building a Data Science Platform. This paper provides information about the capabilities, costs, and considerations for building a data science platform, including:
- A lesson-learned case study from an executive who went down the build route, and never wants to make that mistake again.
- Details on the capabilities each person in your organization needs from a data science platform.
- A detailed checklist of features for each of the eight capabilities.