The Pros and Cons of Spark in a Modern Enterprise Analytics Stack


Spark is a distributed computing framework that has skyrocketed in popularity over the last several years for data engineering and analytics use cases. This paper provides a brief overview of Spark’s strengths and weaknesses in the context of data science and machine learning workflows.

While Spark is extremely effective with certain types of workloads on very large datasets, it has some drawbacks, including performance overhead for certain workloads, onerous setup and management, and competition from more modern distributed computing frameworks. It is smart for enterprises to understand the pros and cons of Spark so they can implement an analytics technology strategy that incorporates Spark for projects that can benefit from it, and support alternative options when its complexity is unnecessary or even detrimental to the business.

Get the Whitepaper

Latest resources


A Guide To Enterprise MLOps


2020 Gartner Magic Quadrant for Data Science and Machine Learning Platforms


The True Cost of Building a Data Science Platform


Accelerate Adoption of SAS® Data Science Use Cases in the Cloud Using Domino

Dun & Bradstreet seal