Anaconda

What is Anaconda?

Anaconda is an open-source distribution of the Python and R programming languages for data science that aims to simplify package management and deployment. Package versions in Anaconda are managed by the package management system, conda, which analyzes the current environment before executing an installation to avoid disrupting other frameworks and packages.

The Anaconda distribution comes with over 250 packages automatically installed. Over 7500 additional open-source packages can be installed from PyPI as well as the conda package and virtual environment manager. It also includes a GUI (graphical user interface), Anaconda Navigator, as a graphical alternative to the command line interface. Anaconda Navigator is included in the Anaconda distribution, and allows users to launch applications and manage conda packages, environments and channels without using command-line commands. Navigator can search for packages, install them in an environment, run the packages and update them.

Anaconda Navigator (GUI)

Anaconda Navigator GUI

The big difference between conda and the pip package manager is in how package dependencies are managed, which is a significant challenge for Python data science. When pip installs a package, it automatically installs any dependent Python packages without checking if these conflict with previously installed packages. It will install a package and any of its dependencies regardless of the state of the existing installation. Because of this, a user with a working installation of, for example TensorFlow, can find that it stops working after using pip to install a different package that requires a different version of the dependent NumPy library than the one used by TensorFlow. In some cases, the package may appear to work but produce different results in execution. In contrast, conda analyzes the current environment including everything currently installed, and together with any version limitations specified (e.g., the user may wish to have TensorFlow version 2.0 or higher), works out how to install a compatible set of dependencies, and shows a warning if this cannot be done.

Open source packages can be individually installed from the Anaconda repository, Anaconda Cloud (anaconda.org), or the user’s own private repository or mirror, using the conda install command. Anaconda Inc. compiles and builds the packages available in the Anaconda repository itself, and provides binaries for Windows 32/64-bit, Linux 64-bit and MacOS 64-bit. Anything available on PyPI may be installed into a conda environment using pip, and conda will keep track of what it has installed itself and what pip has installed.

Differences between Anaconda and Data Science Platforms

While Anaconda supports some functionality you find in a data science platform, like Domino, it provides a subset of that functionality. Domino and other platforms not only support package management, but they also support capabilities like collaboration, reproducibility, scalable compute, and model monitoring. Conda can be used within the Domino environment.

Additional Resources