Apache Spark is the de facto standard for processing large datasets, and is increasingly being used for fitting and scoring complex machine learning models. GPU-accelerated worker nodes can substantially speed up the model training phase and simultaneously reduce costs (frequently by orders of magnitude). Although data scientists are usually comfortable using Spark through Scala, Python, and R, the complexity of provisioning and maintaining the Spark cluster can be considerable.
We’ll present an integrated solution based on the Domino Data Science Platform, NVIDIA NGC containers, and RAPIDS Accelerator for Apache Spark, which enables data scientists to easily provision a Spark/RAPIDS cluster with an arbitrary number of GPU-accelerated workers, and access it through their favorite integrated development environment.
Speaker: Nikolay Manchev - Principal Data Scientist for EMEA, Domino Data Lab