Running complex workloads using on-demand GPU-accelerated Spark/RAPIDS clusters

Apache Spark is the de facto standard for processing large datasets, and is increasingly being used for fitting and scoring complex machine learning models. GPU-accelerated worker nodes can substantially speed up the model training phase and simultaneously reduce costs (frequently by orders of magnitude). Although data scientists are usually comfortable using Spark through Scala, Python, and R, the complexity of provisioning and maintaining the Spark cluster can be considerable.

We’ll present an integrated solution based on the Domino Data Science Platform, NVIDIA NGC containers, and RAPIDS Accelerator for Apache Spark, which enables data scientists to easily provision a Spark/RAPIDS cluster with an arbitrary number of GPU-accelerated workers, and access it through their favorite integrated development environment.

Speaker: Nikolay Manchev - Principal Data Scientist for EMEA, Domino Data Lab

Get the Video

Latest resources


A Guide To Enterprise MLOps


2020 Gartner Magic Quadrant for Data Science and Machine Learning Platforms


The True Cost of Building a Data Science Platform


Accelerate Adoption of SAS® Data Science Use Cases in the Cloud Using Domino

Dun & Bradstreet seal