How Janssen R&D Accelerated Model Training on Multi-GPU Machines for Faster Cancer Cell Identification

Video Event

Learn how global pharmaceutical research leader Janssen Research & Development has accelerated model training on multi-GPU machines, allowing them to more quickly and accurately diagnose and characterize cancer cells through whole-slide image analysis. This pharmaceutical R&D company relies on computational data science research across immunology, compositional chemistry, and biology groups to develop new drugs, optimize clinical trials, and automate diagnosis techniques. We’ll discuss one instance where they were unable to train CNN image classifier models on a large dataset due to memory constraints and challenges fully utilizing multi-GPU compute resources. Meanwhile, they wanted to run distributed training for hyperparameter tuning so they could build a reusable, scalable image processing ML pipeline that would support use cases for different parts of the organization.

Watch this session to learn how and why Janssen Research & Development implemented the Domino Data Lab data science platform to grant researchers self-service infrastructure access to diverse tools, languages, data sets, and scalable compute – including NVIDIA GPUs, which are critical for training deep learning models on large data sets. We’ll discuss the improvements made to their data loading pipeline and model, utilizing Horovod Spark on Domino with NVIDIA GPUs, and how this has accelerated their ability to identify cancerous cells.

Latest resources


The Practical Guide to Managing Data Science at Scale


The Forrester Wave™: Notebook-Based Predictive Analytics and Machine Learning, Q3 2020


Kubernetes: The IT Standard for Data Science Workloads


Accelerate Adoption of SAS Data Science Use Cases in the Cloud Using Domino

Dun & Bradstreet seal