Data Science Leaders | Episode 19 | 42:29 | September 14, 2021
People Analytics: Data Science, Ethics, and Opportunity in HR
Get new episodes in your inbox
powered by Sounder
People analytics—the application of data science and analytics in the world of HR—can provide valuable insights into recruitment, retention, and productivity.
But when working with people's sensitive demographic, compensation, and performance data, ethical and privacy considerations must come first.
In this episode, Adam McElhinney, Chief Data Science Officer, VP of Data Insights at Paylocity, explains how his company approaches people analytics, and what all data science leaders can learn from the discipline. Plus, he offers a view into the hiring process Paylocity uses to add top-notch data science talent to its team.
The conversation covers:
- People analytics and HR
- Data science in the hiring process
- Embedding data science into SaaS platforms
Welcome to another episode of the Data Science Leaders podcast. I'm your host, Dave Cole, and our guest is Adam McElhinney. He comes to us from Paylocity. Adam you are the Chief Data Science Officer and the VP of Data Insights at Paylocity. So first of all, what is Paylocity?
Thanks Dave. Paylocity, we are a publicly traded software company. We make a class of software called HCM, or Human Capital Management software. You can think of it as everything you need to run an HR department—so payroll, benefits administration—but more and more we're really focused on employee engagement and productivity solutions.
Very cool. So today’s agenda: we're going to start off talking a bit about people analytics, HR analytics. That's your wheelhouse at Paylocity. I think it's a universal use case and opportunity for data science leaders out there because everybody has a people team, an HR team. If you're looking for various ways, as a data science leader, in which you can help your company, I think HR is a great place to start. So we're going to dive in and talk a little about some of the use cases and some of the challenges, some of the data that you need to have in order to get those projects going.
And then we're going to segue briefly into the hiring process and how data science can be used in the hiring process, to improve it. That will be an interesting topic.
Then last but not least, Paylocity is a SaaS, software as a service, company and there are data science products within Paylocity. I'd love to talk a little bit about how to embed data science analytics into a SaaS offering and SaaS platform. What are some of the tips and tricks and challenges? Does that sound good, Adam?
Sounds great, absolutely!
Awesome. Well first of all, what, to you, is people analytics? Just to talk to us on a high level and we'll just dive in.
Sure. The background of people analytics is pretty interesting. It's something that's been talked about for quite a while, but I think there's really been a shift in the landscape in the past maybe five to seven years. A lot of the large tech companies like Google, Amazon, etc. really went forward and public with some pretty high profile and impressive people analytics initiatives that they have been doing internally for themselves. That really started educating executives and HR professionals about the value of people analytics. Of course people have been using descriptive reporting for decades, but you're really starting to see a lot more machine learning and predictive and prescriptive analytics being applied to the human resources landscape as evidenced by those FANG companies that I talked about.
Now what's been an interesting shift though is that started to come down market. So the way those large companies (Amazon and Google) have done this is they have huge internal people analytics teams that they've spun out, that roll up to HR and support these initiatives. We've seen more centralized data science teams that may be smaller and mid-market type companies—still large companies but not on the scale of a Google or an Amazon—spending some of their data science resources working on people analytics projects. We've also seen a lot of software solutions start to embed more sophisticated people analytics into those tools for those people that just want to buy something off the shelf.
So for the audience, who may not be up to speed on people analytics, where did Google start? You know a little bit about the history there. You mentioned some of these larger companies, the FANG companies, actually creating analytics teams within HR departments and people teams. Can you give us a little bit about the history, the use case, and what they did?
Yeah. They've done a ton of research on studying what makes the most effective teams. Studying characteristics of developers and seeing how that affects developer productivity, bug rates in the code, things like that. They have such a large and well curated dataset for that. They're really in a unique position to do that. Amazon famously uses a lot of people analytics for predicting headcount requirements: figuring out how many people they're going to need in the distribution centers, what percentage of people are going to show up sick on a given day? If you think you need 100 people, but you know that five people are probably going to show up sick—"Okay, well maybe I want to staff 105 people"—really helping optimize their labor force through the use of people analysts.
Yeah, I think if I'm out there right now, I might think to myself, do I have the amount of data to actually use data science to move the needle? Do I have to be a 100,000+ person company or 1,000 or even 5,000? Do I still need people analytics? Talk to us a little bit about what data do you think is important to build out some of these capabilities.
Sure. So regarding sample sizes, regardless of the size of the company, I really recommend that everybody start with just some basic descriptive statistics. Provided you have even a couple hundred employees you'll probably find some value there, particularly once you start stitching all the data sources together. Some of the main data sources that you'll find, the most important dataset typically, is the employee dataset. In the HCM world this is sometimes called the HRIS, the Human Resources Information System. These are things like people's names, demographic information, job title, salary or hourly rates, supervisor, department...all the types of attributes that you would assume to have about an employee.
Other important datasets are typically your recruiting dataset. That's where you have all the candidate information: what were the contacts that were made with the candidate? What's the funnel that was set up for recruiting? How far did this candidate make it in the funnel? That's usually pretty useful.
Another useful dataset is typically employee performance data. This one, depending on the nature of the job and the company, can be pretty subjective and sometimes pretty dirty and require quite a bit of cleaning. Most companies do some type of periodic performance review, feedback process, maybe yearly, twice yearly, something like that. That's another very powerful dataset.
Other datasets are really around employee engagement. That's typically measured via a large annual survey, many times it's anonymized, sometimes it's not. Any other periodic pulse or engagement surveys that your team might be sending. The best practice there is, instead of doing one big yearly survey, do much smaller periodic surveys so you can collect and trend that data over time.
So those are a lot of the main datasets that you'll have. Any exit interview and turnover data, as well, is another typically very valuable dataset. How those are all stored, how easy it is to get them into a single place and join them together for analysis, depends on how you set up your infrastructure, but those are typically the main datasets that you're interested in.
Yeah. One dataset, maybe it was unintentional, that I didn't hear you mention, is pay. That is near and dear to many people's hearts. I imagine that's also probably tied to the performance data as well, right?
Yeah. Compensation data is definitely important. When we look at what are the factors that drive people to leave a company, we definitely see being paid under market as a factor that's positively associated with people leaving. There's been some conflicting research on this. Other third party research has said that pay isn't a super strong predictor. I think a lot of this gets nuanced based on what type of roles you're talking about and how you are defining pay, but compensation certainly is an important dataset to look at.
Right. Certainly in the news and in general in society, there's been a lot of talk of equitable pay for the same role, tenure, experience. That's something that an analytics team can be helpful with. You mentioned some of the leading indicators for employees leaving. I imagine that's another big use case: looking at employee turnover or employee churn. I've always heard, you don't leave companies... what is it?
You leave bosses, not companies.
Yes. I don't know if that's true or not, but maybe you have more information. Is that apocryphal or is that more of a true adage? I don't know if you have any evidence on that.
Yeah, we definitely see evidence of that. One of the use cases that my team works on a lot is a retention risk dashboard for our customers that shows the employees most likely to leave your company. And then we try to surface up the risk factors for attrition, so that you can take some type of preventative action to keep them. Some of the top risk factors that we see are things that intuitively make a lot of sense. One thing we see is that attrition begets more attrition. If you're in a high period of attrition overall, that's going to be a leading indicator that you're going to have more attrition in the future, which is not surprising.
But we do typically tend to see pockets of attrition. We'll see a certain manager or department tend to spike together. Whether that's being driven by some change in the organization, or maybe not a particularly effective manager or department, it's difficult to say without more anecdotal data, but that's something we definitely see. We see people that are paid lower relative to the median for their role are more likely to leave. We see tenure-related effects, unsurprisingly. This varies a lot based on whether it's an hourly role versus a salary role, the industry, level of experience, etc.
Typically you see some type of bathtub curve where there's some people in the first three to six months that just aren't going to make it or they decide the job isn't for them. Then attrition tends to flatten out. After some period of time, you'll start to see it slowly increase a little bit. That duration varies a lot. As you might suspect in more entry-level or hourly positions, it's going to be a much shorter window. For salaried professional positions that window could be decades.
Yeah. I'm curious. Let's say the data is clearly showing that a manager has high attrition. In your role and in your experience, there's some tough conversations to be had. I imagine what you're doing is that you're really arming the people or HR team to go and have those conversations, which often can be very tough. The reason that you might have high turnover in a particular department certainly might be due to the department head not being a great manager, or it could just be for other reasons. Maybe that manager just has a very high bar; the expectation is that if you don't meet that bar, I'm going to go out and look for somebody who's better. Maybe they're performing at a high level. All these things, they're very nuanced in tough conversations. Is that how you see it, or not? Does the data science team play a different role?
No, you're absolutely right. My advice to anybody getting started with people analytics is that you need to approach all of these use cases with a very high degree of humility. You need to surface up conversations and data points that kick off much larger discussions. Sure, we can run some descriptive statistics and look at turnover rate by a given manager, but one very much just cannot blindly say "Oh, Adam's team has a higher turnover rate than Dave's team, therefore Adam's doing a worse job."
There's so much nuance and interpretation there so we really advise people and our customers and we try to bake that into the product. This is the same advice if you're doing people analytics yourself, which I suspect a lot of the audience will. You really just need to put this data out there and then say, "Okay, what do we think is going on here?
I strongly discourage you from leading with conclusions like saying that the turnover rate in a department is unacceptable. You’ve really got to approach this with a high degree of humility.
We're all statisticians. We all know correlation doesn't equal causation and we all know that these machine learning algorithms are just fitting functions to data point. There's a lot of interpretation and subjectivity here, so approach this with a high degree of humility and make sure you're involving subject matter experts early and just using this to facilitate discussion.
Right. Segue a little bit here. You mentioned Google looking at the performance and the number of bugs that, maybe, the developers have created, as some indicator of performance. Have you also looked into, seeing as the HR function looks at productivity, helping the other areas of the business be smarter and better? An enablement capability to some of the other departments?
Yeah. Productivity is really tough to measure. The datasets that we look at, we don't have access to financial data or anything like that. We do have access to turnover data and headcount growth and employee sentiment and satisfaction. Measuring productivity is really difficult.
In the Google example, because they have such a large sample size of software engineers, and there's a defined target variable there, you can look at the number of bugs that were introduced by somebody, and number of lines of codes written. Developers listening to this will know: there's still a bunch of caveats with that. That's a best case example.
Sales is another example where maybe you can have something of an objective target variable there. You have dollars of revenue they brought in, but even still, there can be differences in territories or products that certain people are assigned, so even that's tough. Developing that objective measure of performance is extremely difficult in practice.
Yeah. In the world of sales, looking at what makes a great account executive or salesperson. Looking at how many meetings they have, how many conversations do they have, time to close. Obviously you look at the amount of revenue that they bring in, but what data can you potentially look at that is the predecessor to those outcomes? Landing opportunities? It's hard. Meetings: how good are the meetings? There's all sorts of qualitative aspects of it but I think some of it can be extremely helpful. If you're a salesperson out there and you're not having much in the way of any meetings, that's a leading indicator that probably your quarter's not going to be very good. So there's some things that can start a conversation.
How could you be a little more curious about what's going on? I imagine the risk of being a little bit “Big Brother-y.” Like, "Oh man, the people team is really analyzing things all over the place." I don't know if you've ever gotten pushback along those lines.
Oh yeah, definitely. Whenever you're doing HR analytics, there's huge ethical and privacy considerations.
So for our own internal usage at Paylocity, for example, we actually have a whole AI ethics committee that has a statement of ethics. Every project goes through an ethics scorecard that we've developed. Depending on the nature of the project, there are certain remediation steps and documentation that have to be filled out. The ethics committee reads out to Paylocity's broader D&I initiative. We publish our ethics risk assessments for anyone in product and technology to read and review and push back on. We're constantly tweaking that and we're also partnering with our director of data privacy and our AppSec and InfoSec teams. That's hugely important, whenever you're doing any of these HR initiatives. You really have to give a lot of thought to that.
Like you said, there's the “Big Brother” aspect, legal aspects, fairness, and ethical aspects. You're dealing with some of the most sensitive data that you can ever have about a given human being. You really need to tread carefully and give some thought into that before you kick off any of these initiatives.
This is not on the agenda, but this is a big topic: ethics and data science, bias, all these other things. You clearly have at Paylocity, it sounds to me, a well established process and approach. What advice do you have for the data science leaders out there who might be going down the path of building models that might have impact from an ethics standpoint? Data science leaders that might be worried about D&I and bias and gender and ethnicity and so on and so forth. What approach would you recommend they take?
There's a few tips that I think we've learned. So first you really need to think through these considerations independent of any projects, and you need to really outline that framework ahead of time. If once you're halfway through a project, and that's when you start considering things, it's easy to get trapped into that sunk cost fallacy early. The teams are motivated about the project, they've already spent time on it and they'll just see this as an impediment, or they're just excited about the momentum that they have. We're data scientists: we love building models; we love crunching numbers and these types of paperwork stuff, sometimes that feels less important. So I recommend you really need to do this work ahead of time.
There's a lot of really good third-party research out there if you do your homework on this. I don't think you have to reinvent the wheel and I think you should leverage third party research. But I do think you should spend some time to figure out what research is most typical for your situation and then bring that into your grounding.
Hospitals have these medical ethics committees or these ethics boards. We've modeled our AI ethics practices off that and we found that to be useful. This way, it's not people's full-time jobs, but it's something they're held accountable for. It creates somewhat of an arm's length distance from the project team and the people reviewing the ethics assessment aren't on the project team. So having some amount of arm's length distance from there makes a lot of sense.
Also, this really can't be a data-science-only initiative. You need to bring in outside people who have a vested interest. They might not be data scientists or be technical at all, but that's fine. That's exactly the point. You need to bring in different viewpoints as part of this process. I recommend being very open with your assessments and saying, "Here's where we think the risk is. Here's what we're doing to mitigate it.” Allowing people to view it or debate it if they push back.
What we learned when we went through our process is that it's not cut and dry. It's all shades of gray. Being very explicit about what are the risky areas and what you can do to mitigate those, is the approach that we've found works the best.
We're also constantly evolving. Every quarter, the ethics committee sets quarterly goals for things that they want to change and improvements they want to make. We just keep iterating.
The last piece is a substantial training component as well. We do an annual training for all employees, and then there's a special training for new hires on AI ethics and our best practices there.
That's great. I think there are a lot of good nuggets in there, like having a committee that is not part of the project, but has that arm's length, enough separation to be able to ask and approach questions about the dataset that you're using and maybe even question the project at all, should it even be embarked upon. So I think that's all really good advice and hopefully there are some easy-to-find third-party resources, like you said, that are out there for the folks who are listening.
We talk a lot on the Data Science Leaders podcast about the challenge of hiring data scientists. There are two aspects of it. First of all, I'm curious to know how you hire into your team, Adam.
Secondly, I'm also interested in how data science can be used to improve the hiring process. There are both aspects. Take whichever one and dive in. Maybe start with what sort of approach do you take to hiring on your team?
Sure. Our hiring process is probably pretty similar to most companies. We have an internal Wiki page where we specify the qualifications and types of candidates we're looking for. We try to write everything down so that it's clear and people can debate it. People can say, “I don't think this is reasonable,” or that it’s potentially unfair or something like that, so we have all that documented. Then we have our recruiter phone screen, and then a phone screen with somebody from data science.
If somebody at that point wants to move forward, we ask them to complete a case study for us. It can be done on their own time. We send them the materials and they send it back to us. That's really meant to mirror what they would actually do in the day-to-day job. It's a very realistic scenario, scaled down because of time requirements and stuff like that.
They have two main outputs. One is a business presentation that they would present to a non-technical audience. The second is their code and models. We do have a Kaggle, a holdout dataset that they give, and then we have scripts that score them, but honestly, that's probably the least important part of the process.
It is a time commitment from them. If they agree to do that, then we promise them that they'll get a final round interview with us and then we'll give them a decision within a week. We think it is a fair bargain. They're going to give us some time, they're going to get a guaranteed final round interview, and then we'll give them a yes or no in a week, so they can just know and get the return from their investment in time.
So what that final round interview looks like: the first part of that is them presenting that business case study that we talked about. They present that to a data science manager, and they also present it to a product owner from Paylocity. The reason we do that is that our data science model is a centralized model that we embed into other teams. The data scientists will embed into these teams at the service of the product owner, with the goal of productionalizing machine learning enabled features that live in those products. So the first part of that presentation is really focused on, “Can this person work with a product owner? Can they translate between business requirements and technical requirements? Can they translate model outputs back into business impact? Can they produce useful insights from the data that a product owner would find interesting?”
The second part of the interview is when they meet with a data science manager and another data scientist, going really deep into the technical aspects. They actually look at the code, at the modeling. How did you deal with missing data? How did you deal with dirty data? What are the models you used? How have you validated those models? Is the code clean, is it not clean?
Then the last interview is an interview to see if there's an alignment with our culture and values. That's a much more behavioral focused interview, and we have a great team that runs that.
This is a phenomenal blueprint. I'm nodding my head as you're talking here. You're evaluating the candidates on your team, the data scientists on their technical acumen, on their ability to work with their future manager, who might be a DSM, a data science manager. You're also evaluating them on their ability to work and present a business outcome and work with the product owners.
I'm curious, are the data scientists solid-lined to you, they’re reporting into your team? You're nodding your head, yes. Okay, cool.
And then just a dotted line, I guess, into the various embedded business units, is that basically how it works?
That makes total sense. Yeah. And I think there's a lot of conversation too: how business savvy should the data scientists be, versus that nice blend of being able to present your work and making it relatable to somebody who may not be steeped in statistics or machine learning, versus just being an expert in deep learning and something highly technical. I think clearly you're hiring for that mix.
Some companies out there have said, "You know what, not every data scientist is going to clearly be able to describe business outcomes. They might be great at building models. Sometimes they create an engagement manager type role, or even a data science product manager, who can work as a translator."
You're looking for the combination. Both models work, but there's one less person in your model for sure and that's great.
Yeah. I think a lot of that decision comes down to the size of the team, the maturity with respect to data science, and the breadth of use cases that you're working on.
So for example let's say you're a company, I don't know, you do mortgage issuance. I'm just coming up with something that's at the top of my head. And you need to build that mortgage default model. That might be an area where you don't need all your team to be super business focused. You probably need a core group of data scientists who are just going to crank on that model every month, incorporate new data sources, shave off the prediction error and it's well understood how that fits into the rest of the business, what the business impact of an improvement in the model performance is. That's an aspect I think you're totally right.
Where we're at, we're at a point at Paylocity where we're seeking to embed machine learning into every product that we have. And a lot of times we're entering a new product for the first time.
So there's a fair amount of work in understanding the product. There'll be a number of possible use cases, figuring out which ones are feasible, which ones are going to have high business value, and then helping translate from what that product owner wants to achieve with their product, to how data science and machine learning can help. That's why where we are at in our current maturity level, we need more of those generalists. As we expect to grow, I definitely plan to hire more specialists. I foresee us hiring a natural language processing specialist. I foresee us hiring a survey design and survey analysis specialist. I think it's a function of size and maturity and breadth of use cases.
Makes total sense. If you know what your business outcome is going to be, and you're just working on improving the accuracy of your model and you're using additional data sources, maybe different types of algorithms to approach that, maybe you don't need the most business savvy data science team.
If you're more in that creative problem solving type mode, and each month comes with a new set of challenges, then yeah, you need to weigh more heavily on the more well-rounded data scientists. But I think as you just said, as you grow, there might be changes that are needed.
Yep. Jumping back to that, we tried to mirror the interview process as to what their day-to-day would look like. I want a candidate to leave the interview process and say, "I really understand what I'd be doing there on a day-to-day basis," and decide if it is or isn’t for them. Totally fine either way.
I want them to leave feeling like, "Hey, this was a good experience. I have a lot of clarity into what I'm going to be doing." If somebody doesn't want that focus or that business aspect, that's totally fine. Then they can decide, "Hey, I get it. I understand what they're looking for, that's just not what I want to do. I just want to try out fancy new machine learning techniques with lower prediction error, work on model version 20 or something like that." That's just not what we're looking for right now.
Yeah. Well, the second part of my original question was talking a little bit about applying data science to the hiring process, and not just for your team, I imagine, but for the wider team. For companies themselves to look at what makes up a great hiring process. I imagine most managers want to have their own process that they've used in the past. I'm curious: what are some nuggets or interesting things that may be in datasets that you have used to evaluate the whole recruiting process?
Yeah. This is really, really tricky. It can get unintentionally very ethically fraught. If you're just saying, "Okay, how can we use data science to improve the hiring process," something might pop in your head like, "Okay, let's build a machine learning model that figures out who is a successful candidate and who's not a successful candidate." You'll quickly get into a couple of different problems.
One problem is defining what a successful candidate even means is very difficult. Is it somebody who got an offer? Is it somebody who made it to a final round interview? Is it somebody who got an offer and accepted an offer? Is it somebody who was a successful employee for six months or one year? So just even defining that it's very difficult.
Second, unless you're a really huge company, you'll probably run into a sample size issue, particularly for a given position. So you probably need some amount of a sample size for a given position. The interview processes are probably different for different positions, so you'll run into that.
One of the biggest issues is there's a huge risk of unintended bias, not statistical bias, but ethical bias, creeping into those models. It might learn that people who are on the lacrosse team are great candidates and that could just be totally unrelated to the job. It just happens to be that's who they've hired or it's reinforcing some existing prejudice or bias in the hiring process. It's really, really difficult so we actually very purposely don't have that in any of our products and we don't do that internally.
Some of the more interesting stuff that I've seen companies do is use data science for masking and extracting data. Instead of a recruiter reading a resume that has a bunch of information like names or organizations from which you could determine potentially somebody's demographics, can you use data science to identify those and obfuscate those in such a way that you have more of a blind assessment of the candidate based on their actual qualifications? That's very promising and we're really excited about.
Basic descriptive reporting here is still really important. How many people came into the funnel? How many people made it to stage one, stage two, stage three?
Sure, basic stuff.
Where are you losing people? And then slice and dice by various factors to, again, just to start conversations and say, "Hey, we're losing this type of person at step two of our funnel." Again, just start a conversation there, assume humility and say, "What do we think is going on there? And is there an opportunity for us to iterate on that?"
That's great. I think as a manager, I would absolutely purchase something that would allow my team to mask resumes. There's a lot of press on that—it might be unintentional bias—but our bias creeps in, just looking at the name. So I think that's not an easy problem. It sounds easy, but it can be very difficult to do it right.
Absolutely. You might think, "Oh, I'll start with the name," but then you might want to then mask the address or location or the college, or then various organizations and activities. And as you can imagine there's a very long tail of these, huge different numbers of formats.
And you can't mask it so much so that you're like, "Okay, I have no idea what this person did."
So yeah, again, it's a tightrope that you have to walk. So that's fascinating.
Look, Paylocity is a SaaS platform and your team is also embedding data science into the platform itself. I want to talk briefly about just some of the advice you have for others out there who are also maybe data science leaders for SaaS companies and what to be aware of as they put their models into production.
Sure. So, with respect to productionalization, there's four patterns that I think everything falls into, and they're in ascending order of implementation difficulty in my mind.
The first is just the one-time analysis. This might be just a strategic analysis that you're doing for somebody, maybe in how to reduce customer churn, just as an example. Your deliverable is going to be some type of research or report. In which case that's fine, your code can just run on one person's machine. It doesn't have to be super well-documented. There isn't a whole lot of productionalization there. I still would encourage you to follow best practices with respect to code reviews and to make sure that somebody else has tried to run this code and replicate it in case you want to revisit it in the future, or somebody is on vacation and you need to extend or change something about the analysis. But there's not a whole lot of productionalization that you're doing there.
Second, it's a batch productionalization. So you can really use this for any model that maybe doesn't need to update more than once an hour. So something that you need to update, call it once an hour or greater. Once an hour, once a day, once a week, once a month, something like that. In a lot of HR use cases, HR data doesn't move super quickly depending on what you're doing, but typically most people's characteristics aren't changing too quickly or the data is not changing too fast. So a batch productionalization can do a lot. And so we've invested in building out paved road deployment patterns for all these. So it's a combination of documentation, plus code, plus procedures and education to do a batch productionalization as an example. It's very tightly linked to our infrastructure and our environments. That's something companies have to very much modify for their type of infrastructure and environment.
The third is really an API-based productionalization. That's very useful where you need something that updates more frequently and maybe there's multiple applications that want this information. So they can just submit an API request and say, "Hey, run the model or score this particular dataset and get our result back.”
Again, there's a lot of good tools and scaffolding that you can find in just the public research, but we're working on really flushing out our paved road for that one a lot more and making that really clear. When you start to do things like that, there's a whole lot of things that a typical data scientist wouldn't think about, like logging, error handling, what happens if this thing breaks at 2:00 AM, is the data scientist getting a call, or who's getting a call? What does that support and escalation look like?
I'm curious, do you have a separate role for that? Because your data scientists are great at building models and also your team is great at talking to and driving business outcomes with your business counterparts.
When it comes to, "Hey, can this API handle 10,000 requests per minute?" sometimes that can be a different skill set. Do you have a machine learning engineer or some title along those lines, or is the expectation that your team is able to figure that out?
We don't have that role. The expectation is the team can figure that out and we try to give them as much scaffolding and guidance on that as possible. And then we follow just our general escalation procedures that we use across all of product and technology for dealing with those issues. But we want the team that built the models to own the code and own the productionalization.
I'm a little skeptical of the machine learning engineer role for most use cases, unless you get really big or have really some specialized deployment patterns. I think A) it can become a bottleneck, B) there can be incentives to just throw something over the wall that maybe isn't in a good spot, and C) in the process of doing hardening and productionalization, I think you learn a lot that informs how you do the whole data science life cycle.
“Can I functionalize this data cleaning thing that originally was just some ad-hoc script that I ran? Hey, do you know that we actually don't have great documentation for what we're doing here? Can I abstract this piece of code out? Hey, this code actually could be useful for three or four projects. Right now it's just in some model training script. Can I abstract that out and harden it and develop unit tests around it?"
For some specialized use cases, the machine learning engineer route makes a lot of sense, but that's even harder to hire for than a traditional data scientist. If you want to go that route, it introduces a whole other set of challenges.
The last use case in my mind is the streaming use case, where you need true streaming model deployment. Before my current role, I was in IOT for a number of years, and that was an area we were looking for. From the time a piece of data was captured on a machine, we were looking to have the model pipelines run, the model execution happen and our result to get back in about 20 milliseconds, so it was a true streaming based architecture. That's something where I think a machine learning engineer role makes a lot more sense because that is quite a bit higher of a deployment hurdle. In that scenario, you really want to invest in a lot of very high quality tooling for that type of deployment, but that really is a minority of use cases for most types of typical businesses.
Well Adam, this has been fantastic, I've learned a lot. We covered a broad swath of topics, talking about the hiring process, your hiring process. We even dove a bit into ethics and we started, of course, with people analytics.
I think if you're a data science leader out there take a hard look at what your HR team is doing today and maybe even offer up your services. It's something that more companies need to do.
When we talked about ethics and bias in data science, we're just talking about the art of doing data science, but I think data science can be helpful in uncovering biases and hopefully just helping out your people team be better, more proficient and more ethical, I guess, in that hiring process.
It's super fascinating and interesting. If people want to find out more and talk to you more, Adam, can they reach out to you on LinkedIn? Or do you have any social media?
Yeah, LinkedIn is the best place to get a hold of me.
LinkedIn is the best place. Awesome. Well, Adam, this has been great. Thank you so much for joining the Data Science Leaders podcast.
Really appreciate it.
Thank you so much.
Listen how you wantUse another app? Just search for Data Science Leaders to subscribe.
About the show
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.