What if you wanted to do something really ambitious in data science–something like designing an innovative new search engine? Today, that would be a daunting task, and you’d probably need a big, highly qualified team of data scientists and programmers to bring your innovation to life. And you’d need months, if not years, to finish it.
That will change, Robert Nishihara recently told Domino Data Lab, when rapidly improving distributed computing interfaces and technology make it easy to get all the resources you need from your laptop.
Could Today's Unthinkably Huge Python Data Science Projects Become Commonplace?
“Today, that [creating a new search engine] would be a huge project. There’s a lot of data processing, you have to do web crawling to get the pages, you have to do data processing to extract the key words and build the search indices, you need to train ML models to rank pages, as you need to do serving to handle queries,” he says.
“Every single one of these components needs to be scalable, and it’s a tremendous infrastructure lift,” he explains. “One of our goals with Ray is to enable developers to build scalable applications like that in a day without any knowledge of distributed systems. We’re going to enable developers to reason only about their application logic.”
Nishihara envisions that someday his company, Anyscale, will make it easy to develop Python applications that scale across hundreds of nodes or GPUs, unleashing a new wave of innovations that previously would have been infeasible or impossible.
Domino recently interviewed Nishihara for its ebook on data science and its top innovators, The Data Science Innovator’s Playbook. Download the ebook to read the full interview.
Other Featured Innovators Weigh in on Data Science’s Ascendancy
Download the free The Data Science Innovator’s Playbook to read more insights from Nishihara–as well from as many other top innovators–on the themes, strategies, and innovations that are making data science such a transformative force in business and beyond.. This exclusive content includes interviews with these top leaders::
- Cassie Kozyrkov—Chief Decision Scientist, Google
- Andy Nicholls—Senior Director, Head of Statistical Data Sciences, GSK plc
- Mona G. Flores—Global Head of Medical AI at NVIDIA
- Najat Khan—Chief Data Science Officer and Global Head, Strategy & Operations for Research & Development at the Janssen Pharmaceutical Companies of Johnson & Johnson
- Robert Nishihara—Co-creator of Ray, and Co-founder & CEO, Anyscale
- John K. Thompson—Analytics Thought Leader, Best-selling Author, Innovator in Data & Analytics
- Glenn Hofmann—Chief Analytics Officer, New York Life Insurance Co.