Our mission at Domino is to enable organizations to put models at the heart of their business. Models are so different from software — e.g., they require much more data during development, they involve a more experimental research process, and they behave non-deterministically — that organizations need new products and processes to enable data science teams to develop, deploy and manage them at scale.
Today we’re announcing two major new capabilities in Domino that make model development easier and faster for data scientists. “Datasets” offers data scientists the flexibility they need to make use of large data resources when developing models. And “Experiment Manager” gives data scientists a way to track, find, and organize all the ideas they have tested over the course of their research.
As is often cited, data scientists spend 80% of their time “wrangling” data – data prep, data engineering, feature extraction – all time-consuming activities aside from model development and training. This pain point is magnified in organizations with teams of data scientists working on numerous experiments.
Datasets provide a high-performance, revisioned data store that enables data scientists to track, share, and reuse large file-based data resources so teams can iterate on research faster and accelerate model development. Rather than storing data in your Domino project files, Datasets provide localized collections of data readily available for data science experiments. Datasets have been engineered to handle use cases that require a large number of individual files and/or massive file sizes. The ability to share and reuse curated data in Datasets can drastically the time spent wrangling data by eliminating redundant work and ensuring everyone works off the latest and greatest.
“The Datasets feature has saved us a lot of time,” said Luiz Scheinkman, principal software engineer at machine intelligence company Numenta. “It used to take hours to process data, so we’d run those processes overnight. With Domino Datasets, we can preprocess the data and attach the preprocessed data to an experiment in 21 minutes. This means we can iterate, see our results, and make continuous improvements throughout the day — it makes a huge difference in expediting the whole model development process.”
Domino Datasets also decouple the versioning of data from the versioning of code, allowing for higher fidelity lineage with full reproducibility of experiments. Datasets allow for more fine-grained application of data privacy and protection. For more about Datasets in 3.3, see the Domino Support site and watch this tutorial:
Domino Experiment Manager
In addition to Datasets, Domino 3.3 also introduces the Experiment Manager. Data science is different from other workstreams like software development in that it involves open-ended exploration and experimentation to find optimal solutions. Traditional knowledge management systems aren’t equipped to seamlessly track and organize work like this, so data scientists often resort to manually preserving metadata in spreadsheets or, worse, losing the many threads of their experiment. Data scientists and data science leaders require a single view to track, organize and manage experiments at various levels of detail. The Experiment Manager provides this visibility in Domino, serving like a “modern lab notebook” for data scientists as they go through the iterative process of training and tuning models. A single, highly performant view into the status, performance and impact of experiments empowers data science teams to manage experiments in order to find relevant past work, perform meta analysis, and retrospect.
The Experiment Manager allows the user to search, group, sort and filter experiments, making it easy to find specific past results and view experiments’ performance in granular detail, while putting those experiments in broader context of the ideas they were tested against.
With the Experiment Manager, data scientists can compare the results of two distinct runs in one screen for rapid analysis, and access detailed log records of experiments.