Image source: https://www.r-project.org/
The latest update of R dropped on the 24th of April, taking the statistical computing language to Version 4.0. The update is packed with new features, bug fixes and changes to the underlying behavior and syntax of existing functions that help optimize implementation and reusability of code.
Due to changes in core components in R 4.0, R packages will need to be re-installed to ensure that they function correctly due to underlying changes in the way core libraries function. This update can create model reproducibility headaches, and brings the age-old challenge of ensuring consistent versions across teams of data scientists, enabling them to co-develop models and ensure that models developed on their machines are able to be run by other data scientists and deployed into production.
A full list of the changes made in R 4.0 can be found at the Comprehensive R Archive Network.
Environment Management is a key component of Domino’s platform. It enables teams to configure and keep multiple analytics environments that can easily be applied to their projects and workbenches in a safe, seamless manner, with minimal to no IT interaction. It doesn’t require developers to configure their own local level machines or setup new virtual environments to test updates or new packages, and can easily be rolled out to other team members once deployed.
This functionality is often used to produce “gold standard” production environments where all packages and software versions have been tested and assessed for being worthy of production, alongside research and development environments where data scientists can test the latest and greatest packages and software updates.
Like all functions within Domino, environment management is fully version-controlled meaning that changes can be tracked over time, and full environments can be restored in the case that there is a breaking change in an underlying package or software component. This is critical for major upgrades like R 4.0 that could otherwise be disruptive, causing your models to break. Having a platform that can recall and use the exact environments needed to reproduce your model, even years later, is key.
So how does Domino help?
The tools and technology that underpin data science are constantly evolving. The usual approach by data science teams has been to stand up virtual machine environments as development hubs, and testing packages out before rolling out instructions for each data scientist to undertake themselves in order to update their development environments. This approach often meant package and version incompatibility, requiring troubleshooting, and hampering data scientists’ progress.
Environment Management makes it easy for data scientists to test new packages and software updates, but more importantly, these environments are easily shared across data science teams meaning that the days of spending hours trying to set up an environment to test a peer’s work are long gone.
This gives data scientists an ability to set up gold standard production-worthy environments that can be trusted, without sacrificing any freedom or flexibility in their ability to try new emerging packages and software as they become available.