Subject archive for "model-management," page 12

Data Science

The Cost of Doing Data Science on Laptops

At the heart of the data science process are the resource intensive tasks of modeling and validation. During these tasks, data scientists will try and discard thousands of temporary models to find the optimal configuration. Even for small data sets, this could take hours to process.

By Eduardo Ariño de la Rubia6 min read

Data Science

Benchmarking Predictive Models

It's been said that debugging is harder than programming. If we, as data scientists, are developing models ("programming") at the limits of our understanding, then we're probably not smart enough to validate those models (“debug”) effectively.

By Eduardo Ariño de la Rubia13 min read

Data Science

Data Science on AWS: Benefits and Common Pitfalls

More than two years ago, we wrote about the misguided fear of the cloud among many enterprise companies. How quickly things change! Today, every enterprise we work with is either using the cloud or in the process of moving there. We work with companies that insisted, just two years ago, that they “can’t use the cloud” — and are now undertaking strategic initiatives to have “real work in AWS by the end of 2017.” We see this happening across industries including finance, insurance, pharmaceuticals, retail, and even government.

By Nick Elprin4 min read

Data Science

Principles of Collaboration in Data Science

Data science is no longer a specialization of a single person or small group. It is now a key source of competitive advantage, and as a result, the scale of projects continues to grow. Collaboration is critical because it enables teams to take on larger problems than any individual. It also allows for specialization and a shared context that reduces dependency on "unicorn" employees who don't scale and are a major source of key-man risk. The problem is that collaboration is a vague term that blurs multiple concepts and best practices. In this post, we clarify the differences between repeatability, reproducibility, and whenever possible the golden standard of replicability. By establishing best practices of frictionless in-team and cross-team collaboration, you can dramatically improve the efficiency and impact of your data science efforts.

By Eduardo Ariño de la Rubia17 min read

Data Science

Fitting Gaussian Process Models in Python

A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear functional form, such as a sinusoidal, exponential, or polynomial function, to describe one variable's response to the variation in another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships. However, knot layout procedures are somewhat ad hoc and can also involve variable selection. A third alternative is to adopt a Bayesian non-parametric strategy, and directly model the unknown underlying function. For this, we can employ Gaussian process models.

By Chris Fonnesbeck27 min read

Data Science

Introducing the Data Science Maturity Model

Many organizations have been underwhelmed by the return on their investment in data science. This is due to a narrow focus on tools, rather than a broader consideration of how data science teams work and how they fit within the larger organization. To help data science practitioners and leaders identify their existing gaps and direct future investment, Domino has developed a framework called the Data Science Maturity Model (DSMM).

By Mac Steele2 min read

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.