Being a manager anywhere is hard, yet overseeing a data science team can be especially challenging. Roles are still in flux, turnover is high, and companies are ironing out the best ways for teams to function. And being a technical whiz doesn’t necessarily prepare you to manage others.
Whether you’re managing a data science team today, preparing to launch one, or hope to do so in the future, this field guide will make you a better data science manager in the enterprise.
In this field guide:
On a panel discussion at the recent Rev summit for data science leaders, three experts shared their tips for hiring, retaining, and nurturing data science talent.
Their tips are summarized below.
If you start with a junior hire or someone fresh from academia, they’re likely to feel lost and frustrated without mentorship. Michelangelo D’Agostino, senior director of data science at ShopRunner, suggested installing a more experienced person first to give the team direction.
Don’t just focus on technical talent and experience. The panelists agreed that humility, curiosity and an ability to listen and take feedback are crucial traits for a senior role. “Someone who’s going to be in charge has to know they don’t have all the good ideas or all the answers,” D’Agostino said. He suggested asking prospective hires to describe a situation in which they failed and how they would avoid repeating it to gauge capacity for self-reflection.
Given the competitive hiring landscape, onerous take-home tests can screen out qualified candidates and create a tense, exam-like atmosphere. You don’t need these challenges to make strong hires, said Patrick Phelps, lead data scientist at Insight Data Science. “It’s really hard to scale…[and] it takes a huge amount of time to grade,” he said. “I’d rather just put a good data scientist on my team in a room with them for an hour.” If you do include a challenge, D’Agostino suggests having candidates complete a coding exercise in the office and talk through it as in an informal code review.
This hiring and onboarding plan template walks through key questions to help find and train new data scientists. The plan template includes key questions around attracting top talent, hiring process, on boarding, retention, and more.
By taking a systematic approach, data science leaders will maximize the odds of finding and cultivating a team that is greater than the sum of its constituent parts.
Don’t oversell the role. Half of data scientists stay at their jobs for two years or less. To reduce turnover, be truthful about the position you’re hiring for, advised Conor Jensen, customer success manager at Domino. “Be very realistic upfront about what the role is, what the pain is going to be, where you think the impact is going to be, and what the timeline looks like,” he said. “A lot of times we get very excited about what we’re going to accomplish as data scientists, and we can get a little ahead of ourselves.”
Understand team members’ motivations. Jensen recommended taking time to discover each employee’s goals, interests and personal incentives. Then you can pair them with rewarding projects and recognize accomplishments in a meaningful way.
Offer support. “Data science can be a discipline of failures: Models fail, processes fail, data sources turn out to be terrible,” Phelps said. He suggested offering positive reinforcement and reminding team members that it can take years to see an impact. Jensen also suggested breaking problems into manageable chunks so employees aren’t intimidated by an overwhelming project.
Create learning opportunities. Data scientists often leave their jobs because they’re bored, observed D’Agostino. If core projects aren’t cutting-edge, he suggested creating opportunities for team members to learn new things, such as a weekly lunch to discuss the latest research or occasional hackathons to test a new software framework or computational technique.
The following are the seven habits we have observed in many successful data science managers, in no particular order.
Data scientists often greet the topic of knowledge management with a sense of dread. Some see it as a time-sucking distraction from their “real” jobs; others don’t fully grasp what it means. Even many who see the concept’s value find the process painful.
But knowledge management capabilities will become a key source of competitive advantage for companies, according to Matthew Granade, chief market intelligence officer at Point72, and Mac Steele, director of product at Domino Data Lab. In the video below, the pair laid out why knowledge management matters and how businesses should make it a priority.
The key points about knowledge management for data science teams are below.
The goal of knowledge management is to capture insight, which can be defined as “better understanding.” Insight is thus relative—it’s about constantly improving upon previous ideas. From Einstein to Freud, insight is often seen as the purview of the “lone genius.” In reality, most insight comes from collaborating with others and expanding on existing ideas.
Creating that kind of “compounding machine” requires a way to capture knowledge, a framework for users to follow and mechanisms to improve through feedback. Increasingly, companies’ futures will be determined by how well they do this. With more algorithms and infrastructure widely available, the pool of data science talent growing and requirements to share data expanding, the ability to capture and augment unique insights will become a key differentiator.
Some knowledge management challenges plague every industry:
Other obstacles are unique to data science teams:
There are four steps that can help data science leaders improve knowledge management in their enterprise organizations:
The more things are in there, the more connections you have across them, and the value grows that way. You don’t want people operating on the fringes. A common platform that encompasses both the core work and knowledge management is key to ensure it gets done and minimizes the burden. If you can’t capture everything, start with the most valuable model or knowledge, and build a system around that.
Test: Ask five data scientists in your company, separately, “How many projects do you think this team is doing right now?” They’ll probably have different answers.
Discovery: Data scientists spend much of their time searching for information, cutting into productivity. Teams have to decide whether to curate knowledge (the Yahoo approach) or index it (the Google approach). Curation makes sense when the domain is relatively stable. Indexing and searching is best when the domain is fluid, and you can’t possibly know beforehand what the taxonomy should look like.
Test: Ask a new hire to work on a topic, and time how long it takes them to collect the right artifacts. If it’s weeks or months, that’s a red flag.
Provenance: Let people focus on the aspects of knowledge management that matter. Use a platform that allows people to synthesize their work, not have to track which software version they used.
Test: Write down beforehand what percentage of time you think your team members should spend on documentation. Then ask a few how long they actually spend. This could be eye-opening.
Reuse: If it won’t run, it won’t get reused. That requires access to not only code, but also historical versions of datasets.
Test: Ask a new hire to reproduce the work that another data scientist did six months ago, preferably one who has left the team or organization. Ask him or her to update it with the most recent data. If it takes a week or a month, that’s troubling.
Decompose and Modularize: Ensure that people have the incentives and tools to create building blocks that can be reused and built upon.
Test: Ask two teams that have worked on similar projects to do a post mortem and identify overlapping work.
Compounding systems rely on units of knowledge. In academia, those are books and papers; in software, it’s code. In data science, the model is the right thing to organize around, because it’s the thing data scientists make. The model includes the data, code, parameters and results.
Changes at the people and process levels are also important. Reframe how people see their jobs: They should spend less time doing and more time codifying and learning. Make collaboration a priority in hiring and compensation. Finally, while knowledge management should be seen as everyone’s job, some organizations create new roles for curating or facilitating knowledge.
The following three videos provide a range of lessons on fostering collaboration among data scientists and other stakeholders within the enterprise.
What does it take to run a sophisticated data science organization? What are some of the things that need to be on your mind as you scale to a repeatable, high-throughput data science machine? The two videos below provide two perspectives.
Erik Andrejko, VP of Science at The Climate Corporation, has spent a number of years focused on this problem, building and growing multi-disciplinary data science teams.
In the video below, Erik discusses what it takes to continue building world-class data science teams. He also discusses the practice of data science, the scaling of organizations, and key components and best practices of a data science project.
Through working with companies ranging from agile startups to the Fortune 500, we have been able to curate use cases and learnings from these organizations about the challenges and successes of growing data science teams.
In this video we share share some of those learnings, including: Goals for data science programs, their challenges, performing a diagnosis, managing projects and systems, and leveraging a data science platform to scale.
This field guide covered the human components of managing data science teams in the enterprise: Hiring and onboarding, nurturing teams to success, building the right habits, capturing and managing knowledge, and fostering collaboration. Now, those data scientists need something to do. Learn how to manage data science projects in the enterprise.
The Model Management whitepaper provides a framework to overcome the Model Myth and to build a model-driven business.