Data Science

Domino 3.3: Datasets and Experiment Manager

Domino2019-03-20 | 5 min read

Our mission at Domino is to enable organizations to put models at the heart of their business. Models are so different from software — e.g., they require much more data during development, they involve a more experimental research process, and they behave non-deterministically — that organizations need new products and processes to enable data science teams to develop, deploy and manage them at scale.

Today we’re announcing two major new capabilities in Domino that make model development easier and faster for data scientists. “Datasets” offers data scientists the flexibility they need to make use of large data resources when developing models. And “Experiment Manager” gives data scientists a way to track, find, and organize all the ideas they have tested over the course of their research.

As is often cited, data scientists spend 80% of their time “wrangling” data - data prep, data engineering, feature extraction - all time-consuming activities aside from model development and training. This pain point is magnified in organizations with teams of data scientists working on numerous experiments.

Domino Datasets

Datasets provide a high-performance, revisioned data store that enables data scientists to track, share, and reuse large file-based data resources so teams can iterate on research faster and accelerate model development. Rather than storing data in your Domino project files, Datasets provide localized collections of data readily available for data science experiments. Datasets have been engineered to handle use cases that require a large number of individual files and/or massive file sizes. The ability to share and reuse curated data in Datasets can drastically reduce the time spent wrangling data by eliminating redundant work and ensuring everyone works off the latest and greatest.

Datasets in Domino Data Lab Enterprise MLOps platform

“The Datasets feature has saved us a lot of time,” said Luiz Scheinkman, principal software engineer at machine intelligence company Numenta. “It used to take hours to process data, so we’d run those processes overnight. With Domino Datasets, we can preprocess the data and attach the preprocessed data to an experiment in 21 minutes. This means we can iterate, see our results, and make continuous improvements throughout the day -- it makes a huge difference in expediting the whole model development process.”

Domino Data Lab Enterprise MLOPs workspace

Domino Datasets also decouple the versioning of data from the versioning of code, allowing for higher fidelity lineage with full reproducibility of experiments. Datasets allow for more fine-grained application of data privacy and protection. For more about Datasets in 3.3, see the Domino Support site and watch this tutorial:

Domino Experiment Manager

In addition to Datasets, Domino 3.3 also introduces the Experiment Manager. Data science is different from other workstreams like software development in that it involves open-ended exploration and experimentation to find optimal solutions. Traditional knowledge management systems aren't equipped to seamlessly track and organize work like this, so data scientists often resort to manually preserving metadata in spreadsheets or, worse, losing the many threads of their experiment. Data scientists and data science leaders require a single view to track, organize and manage experiments at various levels of detail. The Experiment Manager provides this visibility in Domino, serving like a "modern lab notebook" for data scientists as they go through the iterative process of training and tuning models. A single, highly performant view into the status, performance and impact of experiments empowers data science teams to manage experiments in order to find relevant past work, perform meta analysis, and retrospect.

The Experiment Manager allows the user to search, group, sort and filter experiments, making it easy to find specific past results and view experiments’ performance in granular detail, while putting those experiments in broader context of the ideas they were tested against.

Searching for tag in Domino Data Lab experiment manager

Data visualization in Domino Data Lab experiment manager

With the Experiment Manager, data scientists can compare the results of two distinct runs in one screen for rapid analysis, and access detailed log records of experiments.

Comparing runs in Domino Enterprise MLOps platform

For more about the Experiment Manager in 3.3, see the Domino Support site.

Domino 3.3 is currently generally available - be sure to check out the product demo to see the latest platform capabilities.

Domino

Domino Data Lab empowers the largest AI-driven enterprises to build and operate AI at scale. Domino’s Enterprise AI Platform unifies the flexibility AI teams want with the visibility and control the enterprise requires. Domino enables a repeatable and agile ML lifecycle for faster, responsible AI impact with lower costs. With Domino, global enterprises can develop better medicines, grow more productive crops, develop more competitive products, and more. Founded in 2013, Domino is backed by Sequoia Capital, Coatue Management, NVIDIA, Snowflake, and other leading investors.

Summary

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.

Domino 3.3: Datasets and Experiment Manager

Domino Datasets

Domino Experiment Manager

Other posts you might be interested in

Domino expands Generative AI capabilities with AI Gateway and Vector Data Access

Prompt engineering slowing you down? It’s time to try RAG and here's why.

Fine-Tuning for mortals: Ray and Deepspeed Zero on Domino

Subscribe to the Domino Newsletter