Mashable is “not for the casually curious,” according to Chief Data Scientist Dr. Haile Owusu. The global, multi-platform media and entertainment company has built its business publishing deep coverage of viral topics.
Mashable relies on data science to predict how much social engagement each piece of content will get, and has become a core driver of Mashable’s strategy to spread content through influential circles.
By deploying a Domino Data Lab platform, Mashable’s data scientists can work in a flexible and collaborative environment that allows them to leverage their range of expertise while accommodating individual preferences for specific tools and methods. The result? Faster innovation and ability to explore new areas in data science and deep learning.
“One thing about media that I think is a little bit different from other environments in which data science is prosecuted is that you don’t know very much,” said Owusu. “There are no fundamental models that give you a basic theory for how media propagates primarily in the social media world. This is a set of extremely unsolved questions.”
Mashable relies on input and collaboration from various team members with different areas of expertise to address a broad range of questions.
“We needed a consolidated environment where we could build models, chat about models, critique our models and also deploy models into production,” said Owusu. Such an environment should accelerate the team’s pace of innovation by allowing researchers to:
- Provision computing resources easily, so they can quickly get to work building models at broad scale without worrying about infrastructure. This process was “unnecessarily cumbersome” in the legacy environment, according to Owusu.
- Experiment collaboratively and coherently, encouraging input and commentary in a central repository that captures data and versions of analysis on top of the data.
- Push models to production, without getting stuck on implementation details.
Mashable considered using its own engineering resources to build a data science hub, but it would have taken too long and constrained data science progress while consuming at least two full-time employees’ time. Instead, Mashable implemented Domino. The first major use of the platform, referred to internally as “Mash That,” was an initiative to consolidate the company’s data into one central repository. Mashable coordinated extract, transform and load (ETL) processes using Domino’s scheduling functionality. From there, Mashable started running reports through Domino, feeding dashboards and presentations to executives and stakeholders across the company. “It has become a central point for our basic reporting needs,” said Owusu.
Next, Mashable standardized its data science efforts on the Domino platform. Domino satisfies Mashable’s aforementioned needs, allowing researchers to “commandeer computing resources” an order of magnitude faster than before, without requiring engineering support. It provides the ability to essentially run races between different models, easily do parameter searches and make results apparent to the entire team. The team can then comment on the results and formulate adjustments to the experiments. And the transparency of Domino allows one team member to pick up where another left off if necessary.
“It’s the closest I’ve seen to kind of a computational laboratory and that’s really what I was looking for,” said Owusu. “The platform has been really, really remarkable.” Beyond the platform capabilities, Mashable values its partnership with Domino because of the customer service Domino provides along two dimensions: Domino provides fast resolution to Mashable’s support needs, and Domino looks to Mashable for feedback and alignment on its vision and product roadmap.
“I would recommend Domino Data Lab unreservedly,” noted Owusu.
The Domino Effect
As a result of implementing Domino, Mashable innovates faster and is more productive. Every member of the Mashable team can collaborate, leveraging each other’s unique skill sets and ideas, with the flexibility to work using the tools they’re most comfortable with. The reproducibility Domino provides is another benefit.
One area of innovation that’s been possible to explore with Domino: deep learning approaches to understanding images. This became an all-hands-on-deck operation, involving a large parameter search and numerous data scientists going down different paths. Thanks to Domino, all of those paths and parameters were well documented. And when any one path led to a seemingly dead end, a different team could fork it in a different direction without having to recreate any of the work that had already been done. Without Domino, it would have been very onerous to coordinate the parallel experimentation paths and to prevent redundancies.
“Domino has been a really ideal platform for providing a coherent, collaborative environment,” summarized Owusu. “If you’re going to do the science component of data science, you really have to have a methodology for running experiments, building on models and theorizing on top of data. That’s much more of a lab environment, and to my mind, Domino is the only group in the space that’s addressing things from that point of view.”