Subject archive for "python," page 9

Data Science

Multicore Data Science with R and Python

This post shows a number of different package and approaches for leveraging parallel processing with R and Python.

By Eduardo Ariño de la Rubia16 min read

Data Science

Sampling Based Methods for Class Imbalance in Datasets

Imagine you are a medical professional who is training a classifier to detect whether an individual has an extremely rare disease. You train your classifier, and it yields 99.9% accuracy on your test set. You're overcome with joy by these results, but when you check the labels outputted by the classifier, you see it always outputted "No Disease," regardless of the patient data. What's going on?!

By Manojit Nandi11 min read

Data Science

Fitting Gaussian Process Models in Python

A common applied statistics task involves building regression models to characterize non-linear relationships between variables. It is possible to fit such models by assuming a particular non-linear functional form, such as a sinusoidal, exponential, or polynomial function, to describe one variable's response to the variation in another. Unless this relationship is obvious from the outset, however, it involves possibly extensive model selection procedures to ensure the most appropriate model is retained. Alternatively, a non-parametric approach can be adopted by defining a set of knots across the variable space and use a spline or kernel regression to describe arbitrary non-linear relationships. However, knot layout procedures are somewhat ad hoc and can also involve variable selection. A third alternative is to adopt a Bayesian non-parametric strategy, and directly model the unknown underlying function. For this, we can employ Gaussian process models.

By Chris Fonnesbeck27 min read

Data Science

Achieving Reproducibility with Conda and Domino Environments

Managing “environments” (i.e., the set of packages, configuration, etc.) is a critical capability of any Data Science Platform. Not only does environment setup waste time on-boarding people, but configuration issues across environments can undermine reproducibility and collaboration, and can introduce delays when moving models from development to production.

By Eduardo Ariño de la Rubia8 min read

Data Science

Python 3.6 with Domino in Minutes

For Pythonistas like me, the holidays started a little early with today's release of Python 3.6.

By Mark Silverberg2 min read

Data Science

Python for SAS Users: The Pandas Data Analysis Library

Ths post is a chapter from Randy Betancourt's Python for SAS Users quick start guide. Randy wrote this guide to familiarize SAS users with Python and Python's various scientific computing tools.

By Randy Betancourt15 min read

Subscribe to the Domino Newsletter

Receive data science tips and tutorials from leading Data Science leaders, right to your inbox.

*

By submitting this form you agree to receive communications from Domino related to products and services in accordance with Domino's privacy policy and may opt-out at anytime.