By David Schulman, Head of Partner Marketing , and Caroline Phares, Global Head of Health and Life Sciences
From early disease detection to intelligent clinical trial designs to personalized medicine, the promise of AI/ML in healthcare and life sciences is great, with “more than 57% of healthcare and life sciences organizations rating AI/ML important or very important, ” according to Ventana Research in their recent white paper Top AI Considerations for Chief Data and Analytics Executives.
- Janssen has accelerated training of deep learning models, in some cases as much as 10 times faster, to more quickly and accurately diagnose and characterize cancer cells through whole-slide image analysis.
- Eli Lilly combines multimodal data* across electronic medical records, insurance payer claims, real-world evidence (RWE), and other sources to translate population-level patterns into the next-best-action for patient treatments (*Note: 36:56 into panel recording).
- Evidation continuously measures the health of individuals using patient-generated health data (PGHD) from apps and wearable technologies, such as smartphones, activity trackers, and smartwatches.
- NVIDIA’s Mona Flores, Global Head of Medical AI, describes using federated learning with privacy-protected data to recognize diseases and predict which treatments will be needed.
But machine learning models require data, and balancing the openness and flexibility required by data scientists for experimentation and innovation with governance and data privacy regulations is a major challenge for data and analytics executives managing software and infrastructure in global healthcare and life sciences enterprises.
Managing the Proliferation of Distributed, Multimodal Data
Healthcare and life science organizations work with massive amounts of disparate data, including electronic medical records, billing, patient visit reports, insurance claims, radiological images, PGHD, and many more. This data is often distributed across cloud regions, providers, and on-premises data sources – 32% of organizations report using more than 20 data sources, while 58% self-report as using “big data”, with petabyte size databases becoming more common, according to Ventana Research. For example, Janssen’s histopathology images can be between two and five gigabytes in size, and larger clinical trials can over a hundred thousand images.
While managed cloud databases offer promise, ingress and egress costs can dramatically hinder data science efforts. Ventana Research notes that extracting and analyzing a petabyte of data could cost as much as $50,000. Data gravity performance considerations (i.e., reducing model latency by co-locating data and compute), and data residency/sovereignty regulations further complicate data collection and processing, often locking data sets in a single geography. This, paired with regulations such as HIPAA and GDPR, highlights the importance of hybrid and multi-cloud configurations to ensure appropriate data management and geofencing. Ventana Research highlights that:
“By 2026, nearly all multinational organizations will invest in local data processing infrastructure and services to mitigate against the risks associated with data transfer.”
Governing AI/ML with Hybrid/Multi-Cloud MLOps
While data is distributed, data science is a team sport, requiring fast experimentation, easy access to data, and easy reproducibility for true innovation. AI/ML/Analytics “Centers of Excellence” (COE) strategies are becoming more common, compounding knowledge through collaboration while providing infrastructure and governance. Ventana Research notes that 8 in 10 organizations realize the importance of governing AI/ML. Johnson & Johnson has an internal Data Science Council helping to “integrate a company’s data science community into the business workflows, enabling faster application of machine learning models, feedback, and impact.” Additionally, these CoEs foster innovation by ensuring data scientists access to the required data, tooling, and infrastructure so they can focus on building breakthrough models rather than DevOps.
Many healthcare and life sciences machine learning use cases, such as computer vision (e.g., Janssen’s deep learning use case) require purpose-built AI infrastructure, including GPUs from firms like NVIDIA. Taking into account data transfer costs, security, regulations, and performance, it often makes more sense to bring the compute/processing to the data, as opposed to transferring or replicating data sets across clouds or geographies.
In theory, GPUs on cloud infrastructure solves the problem - until cost and performance are accounted for. Protocol recently reported on the trend of companies shifting ML data and models back into in-house, on-premises settings, “spending less money and getting better performance.” Data science workloads are variable by nature, with massive bursts required for training models that can be difficult to predict. Repatriating some of these workloads back into in-house infrastructure can significantly reduce costs while improving performance.
For ML CoEs, governing AI/ML from a single-pane-of-glass becomes even more challenging in hybrid/multi-cloud and on-prem environments, especially across distributed data and the blend of on-premises and cloud infrastructure present in global firms. Data and analytics executives have difficult decisions to make across the sprawling data and analytics technology stack, from data management to analytics to data science.
Data Science Platform Considerations for Healthcare and Life Sciences Organizations
Domino Data Lab is at the frontier of hybrid/multi-cloud support for AI workloads with our recent Nexus Hybrid Cloud Data Science Platform announcement. A true hybrid data science platform enables data scientists to access data, compute resources, and code in every environment where the company operates, in a secure, governed fashion. Our deep collaboration with NVIDIA and support for the broader data and analytics ecosystem provides data and analytics executives confidence in fostering AI/ML innovation while providing the flexibility required for enterprise-wide governance.
Ventana Research emphasizes the importance of an open and flexible data science platform, “future-proofing your data science practice in the face of evolving hybrid strategies, ever-changing data science innovations, and maximize value from purpose-built AI infrastructure on-premises or in the cloud.” To learn more, check out their white paper.