Subject archive for "reproducibility," page 3

Data Science

Principles of Collaboration in Data Science

Data science is no longer a specialization of a single person or small group. It is now a key source of competitive advantage, and as a result, the scale of projects continues to grow. Collaboration is critical because it enables teams to take on larger problems than any individual. It also allows for specialization and a shared context that reduces dependency on "unicorn" employees who don't scale and are a major source of key-man risk. The problem is that collaboration is a vague term that blurs multiple concepts and best practices. In this post, we clarify the differences between repeatability, reproducibility, and whenever possible the golden standard of replicability. By establishing best practices of frictionless in-team and cross-team collaboration, you can dramatically improve the efficiency and impact of your data science efforts.

By Eduardo Ariño de la Rubia17 min read

Data Science

Achieving Reproducibility with Conda and Domino Environments

Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does environment setup waste time on-boarding people, but configuration issues across environments can undermine reproducibility and collaboration, and can introduce delays when moving models from development to production.

By Eduardo Ariño de la Rubia8 min read

Data Science

Domino raises $10.5M in funding for collaborative, reproducible data science

Today we’re announcing that we have raised $10.5 million in a funding round led by Sequoia Capital.

By Nick Elprin4 min read

Data Science

Reproducible Research in Computational Sciences

This guest post was written by Arnu Pretorius, a Masters student in Mathematical Statistics at the MIH Media Lab, Stellenbosch University. Arnu's research interests include machine learning and statistical learning theory.

By Arnu Pretorius11 min read

Data Science

Providing Digital Provenance: from Modeling through Production

At last week's useR! R User conference, I spoke on digital provenance, the importance of reproducible research, and how Domino has solved many of the challenges faced by data scientists when attempting this best practice. More on the topic, and a recording of the talk, below.

By Eduardo Ariño de la Rubia1 min read

Data Science

The Real Value of Containers for Data Science

Every year, $50 of your taxes is invested in research that can't be reproduced.
Erik Andrejko, VP Science, The Climate Corporation, speaking at Strata+Hadoop World San Jose 2016

By Daniel Chalef3 min read

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.