Skip to content

    Beyond Spark: Dask and Ray as Multi-node Accelerated Compute Frameworks

    Apache Spark has been the incumbent distributed compute framework for the past 10+ years. But the overhead and complexity of Spark has been eclipsed by new frameworks like Dask and Ray.

    We'll discuss the history of the three, their intended use-cases, their strengths, and their weaknesses. From individual trade-offs, we pose the question of how to select the right framework based on the available infrastructure, data volumes, workload complexity, etc.

    Finally, we’ll explore their capabilities in the context of GPU-accelerated computing, presenting an integrated solution that enables data scientists to easily provision a Spark/Ray/Dask cluster and access it through an integrated development environment.

    Speaker: Nikolay Manchev - Principal Data Scientist for EMEA, Domino Data Lab

    Watch On-Demand