Data science, advanced analytics and the advantages provided by open source technologies are critical for modern businesses to remain relevant and competitive in our rapidly evolving technology environment. Startups are disrupting a variety of industries by leveraging new technical and analytic techniques to change what customers expect from the industry as a whole. Many larger organizations can struggle with adopting new analytic approaches, and broad adoption is best achieved through a cultural shift towards leveraging analytics in every aspect of the business.
Companies, both big and small, are exponentially increasing their data science teams as a means to transform their business and drive technical change through data science across the enterprise. But simply hiring the data scientists is no guarantee of the paradigm shift most firms envision. The key to this crucial transition is data science literacy.
At S&P Global–financial services provider of credit ratings, data and analytics, research, and benchmarks–we pride ourselves on a long history of providing essential insights. The company’s roots trace back to 1860, when Henry Varnum Poor published an investor’s guide to the U.S. railroad industry that provided essential insights to help investors make smart investment decisions.
The company was born from smart, effective applications of data and analytics processes, which are now being supplemented with predictive models wherever we’re able; our core competency centers around our ability to be model-driven. Like many companies might realize, sustaining and growing on this approach means more than just the team of data scientists crunching numbers – to truly enable a cultural change, we need to arm employees with essential data science literacy.
Where to start?
First, we had to decide what mechanism(s) we’d use to educate 17,000 globally distributed employees on data science. We needed to figure out how to implement a training program that would scale in an interesting and interactive way. We also needed to think through the right set of resources and materials that would be valuable to a wide-ranging group of participants–educating those new to data science while also providing useful, hands-on experience for more advanced learners.
We decided to offer a hybrid approach to data science education, leveraging open source learning materials available in Massive Online Open Courses (MOOCs), and supplementing with additional internal resources to tie all of the course modules back to specific “familiar” applications. To facilitate this, we identified four primary components for the session:
- Videos and exercises from an open source MOOC
- Live, interactive review sessions facilitated by internal experts
- Online forum to facilitate discussion and community among participants
- Online Data Science platform to facilitate curation, delivery and execution of technical materials
In selecting the right course, we set out to find one that would provide enough breadth and depth to appeal to a broad base without sacrificing technical rigor. Believe it or not, though critical to the business, learning about data science might not be the top appeal for your average employee.
We aimed to identify a course that would align with employees’ job priorities, and one that we could incorporate domain and company specific information into. For the interactive sessions, we tapped members of the S&P Global Market Intelligence Data Science Department to facilitate discussions and showcase existing initiatives to demonstrate how data science techniques are currently being leveraged within the organization. We also wanted to identify a free and easy-to-use forum platform for participants to interact with the instructor and, more importantly, with other participants. And finally, we needed to make it technically simple for employees to set up data science environments and get technical support as needed.
Here’s what we did.
- We turned to Udacity’s open source course UD120: Intro to Machine Learning for computer-based guided training comprised of lectures (videos) and coding exercises, which employees performed in a GitHub repository over a 10-week period.
- We leveraged Piazza’s free Q&A platform to facilitate asynchronous communication and collaboration among employees.
- We hosted live, weekly sessions for synchronous reviews. During these sessions, we’d do several things which helped gauge the progress and ongoing success of the program:
- Review the week’s materials.
- Provide supplemental information or recommended resources to give employees topics to look into further.
- Review coding exercises and address any roadblocks or challenges.
- Engage in interactive, open-ended Q&A.
- Apply course learnings and exercises to actual data science projects within S&P.
- We turned to the Domino Data Lab platform to essentially act as the “glue” underpinning this program. Domino:
- Hosted our course materials.
- Provided a shared platform for employees to work from, making it simple for them to spin up and down compute resources as needed, and facilitating easy collaboration and results sharing.
- Streamlined “pushes” of updates to course materials to employees.
How’s it going? What results have we seen?
The program in its first instance ran for 10 weeks, during which more than 130 employees spanning multiple global divisions participated. Engagement has been active throughout all 10 weeks, within every guided study session and the online forum.
We’ve been collecting feedback which has been quite positive. Here are some of my favorite anecdotes from employees that participated:
- “The review sessions were useful as it guided through the lessons, clarifying doubts that come midway.”
- “It is hard to cover the breadth of data science models in 10 weeks. I thought the mix of math lessons and coding practice was the best recipe for understanding.”
- “I thought the technical exercises tempted your curiosity well, and people dove into parts that were interesting. The Guided Sessions were great to see real world examples and as a chance to ask for questions.”
- “I loved hearing different data science folks explain concepts. Visuals are great and answering questions were helpful.”
We still have a long way to go before all 17,000 S&P employees are “data science literate,” but we’re off to a running start and look forward to seeing how this program fosters an even more model-driven culture than we have today.