
Embedding Responsible AI in Your Models and Your Team
Summary
Transcript
Who uses the models that we create and how do they use them? Those key questions underpin the notion of responsible AI.
Since algorithms can have a significant societal impact, it’s vital that data scientists are aware of the broader context in which they may be applied.
In this episode, Anand Rao, Global Artificial Intelligence Lead at PwC, breaks down why responsible AI should be an important consideration for every data science team. Plus, he explains what you need to be successful in AI consulting, and why a portfolio approach to ROI is the best way to demonstrate value to the business.
We discuss:
- The difference between AI in the 1980s and today
- Why data science leaders should care about responsible AI
- The ingredients for an effective data science consulting practice
- ROI analysis in data science
DAVE COLE
Hello, welcome to another episode of the Data Science Leaders podcast. I’m your host, Dave Cole. Today's guest is Anand Rao. Anand is the Global AI Lead at PwC. He has over 25 years’ experience in data science, primarily in the research and consulting space. He also has a PhD in AI from the University of Sydney. Anand, welcome!
ANAND RAO
Great to be here. Thanks, Dave. Thanks for having me on your podcast.
DAVE COLE
We have a few exciting topics today. We're going to talk about responsible AI. What is it? Why, as a data science leader, should I care about it?
We'll also be talking about consulting in AI. You have a number of years' experience in consulting. So we'll answer the question, “If I were starting a data science consulting practice today, what are some tips and tricks that you would give us?”
And then, to align with the recurring topic on the Data Science Leaders podcast of ROI and AI, we’ll explore some things that a data science leader should be thinking about. And I think you have a unique perspective there.
Before we dive into those interesting topics, one thing that I can't help but comment on is that you received your PhD in Artificial Intelligence from the University of Sydney. I don't want to date you, but it was sometime in the ‘80s. That's pretty rare, right? I was just a young boy, and I had no idea what AI was at that time. I'm curious: what has been the difference in AI, based on what you learned in college, versus AI today?
ANAND RAO
It's a very different world now versus what we were in those days. I would say that there are a couple of key differences. We didn't have as much data as we have now, as we all know, right? We've accumulated data since the 2000s, with the internet and all of the things that came back. In fact, I remember ARPANET, the precursor to the internet, which we used in the academic network. Data was much harder to come by and there were very structured pieces of data, as opposed to all the things that we have now: images, video, audio and the whole lot. That's one difference.
The other major difference is also, I would say, the notion of open source. As a data scientist, you almost never start anything from scratch. You always have libraries to build on. In fact, you have your own multiple pieces of code to build on and it's all open source. Everyone knows about GitHub and all the things that you can get from there and build on from other people. In those days, none of them existed, right? It was almost starting from scratch.
I’ll give an example. If I had to build a natural language processing system, NLP system, I would have to completely understand everything and build from scratch. I need to get the data, but then I need to understand English grammar, right? So I need to build a parser for it from scratch. You have to build so much machinery. Given that none of that was available freely, although there were lots of academic papers, it still wasn’t code. You have to build all your code from scratch and then you can work with it further. We all are so used to pulling data, code and different things, and then going forward. In that sense, it's very different.
The third thing I would add is the compute power. The things that we are doing today, in my area of specialty, are agent-based modeling, agent-based simulation, multi-agent systems etc. We could only do or run so much with compute power. Now we can run millions of these customer agents, make decisions and so on. In those days it was restricted to a few dozen, or maybe a maximum of a few hundred. So a very different world, in terms of compute data and open source, I would say.
DAVE COLE
Let's move on to responsible AI. As a data science leader, why should I care about responsible AI? Are there any laws out there? Are there any regulations that I should be aware of?
ANAND RAO
Yeah. There are a number of related terms that people use. One is, as you said, responsible AI. Other terms include ethical AI, beneficial AI and, for people in the European Union, trustworthy AI. I wouldn't say they are the same, but rather that they are related concepts.
The reason why you should pay attention to any of these words is that, as data scientists, we have traditionally been very focused on making things work. Give me a prediction problem or a forecasting problem. How do we get the best accuracy, given the data? What kind of data do we need? How do I improve my accuracy? What are the different algorithms? How do I choose between these algorithms? We’re very, very focused on performance, specifically around accuracy and various metrics of accuracy.
What all these terms (responsible, ethical AI etc.) have raised is more around when and how these models get used by people. Who are the people who are going to be using them, and how are they going to use them? Under what context and broader context is it being used? The societal way in which these models and data science algorithms are embedded can have a number of potentially negative implications, like risks associated with them. As data scientists, while we are building the model, we should also be very conscious and aware of the context in which our models are going to be used. That's why I think we need this broader framing.
Examples that people talk about for machine learning models include building them based on the data that you have, the historical data that you have. If the historical data is biased in some way, right, so either in terms of a time, or in terms of the overall cohort, then your model is going to be biased. No matter what your accuracy is, which could be very high as it relates to that given data, if that data itself is biased, your accuracy level doesn’t matter. Overall, it's not going to be satisfying from a societal perspective. That’s just one example.
Explainability is the other thing that people talk about, right? So now we have built a model. Who's using it? If it's an end consumer, is it a statistician? Is it another data scientist? Based on that, you need to give the right explanation. You can use different types of models to explain or execute the primary task. Those things become important. Many communities have been looking at what kinds of constraints, guidelines and regulations would be required. There’s a whole spectrum around whether the industry can police itself and govern itself. With this comes self-governance types of guidelines versus more regulation.
You mentioned something about some of the regulations coming through. One of the things that has happened over the past five or six years is that the European Union has been very active with various committees, academics, policy makers and corporates, in drafting many of these policy documents. Of course there are a number of other professional associations, like IEEE and others, who have also been involved in looking at the ethical consequences and challenges around these models.
We do have a policy proposal, a regulation I should say, that is under discussion by the EU members. It's called the Artificial Intelligence Act, which was released in April. Various EU members are examining them and also incorporating them within their respective national legislation. It's still probably a year or two from actually enacting in every country, but then they'll also have a rollout.
In addition to that regulation, we see a number of other national governments or policy bodies in these governments coming up with various guidelines. The US is very active, as well as Canada and the UK. A number of other countries are actively publishing various guidelines.
That also starts impacting data scientists and how they work. If you're a data scientist, you definitely should be looking beyond the technical articles, into some of these broader ones. What the AI has already done is also drawn attention to some of the technical aspects of fairness, explainability, interpretability etc. so there's a data science community very much focused around ethics as well.
DAVE COLE
Certainly when you talk about bias, most people think about racial bias or negatively impacting a specific community. Are there any other types of biases that data scientists should be aware of? Is it merely looking at the features and making sure that the data is representative? Are there also other aspects, like the types of algorithms that the data scientists use, that they should be aware of?
How should we think about both the types of biases in the data sets, and then the algorithms themselves?
ANAND RAO
As data scientists, I think we need to be concerned about a number of related things. You mentioned bias. There's a recent book by a couple of authors around the notion of noise: noise in human judgment; noise in algorithms, the latter of which data scientists are familiar with. You need to be worried about noise.
You need to also be looking at bias. There's also a related concept of fairness. Fairness is much more of a social construct bias, as a very specific statistical or technical meaning. Similarly, noise can be categorized in different ways. The book by Daniel Kahneman, Olivier Sibony, and Cass R. Sunstein, “Noise,” categorizes the different types of noises in human judgment. That's also important for us to be cognizant of.
Now the bias—I'll just tackle bias here. Bias can come in different ways at different stages of the model. One is, as we just went through, from just the data. Sampling of the data and its historical nature might lead to biases, depending on what you want to do. The notion of bias is also very closely tied with the objective of your model. What is it trying to do? What is the goal or objective that you're after: trying to minimize; maximize; predict, etc.? It depends on that. That's one way that bias could come in.
The other way that people also talk about is the lack of diversity in the people who are building the model. The algorithm is capturing the various features. The data that you want to include (or not) is dependent on both availability, unless now with the vast amounts of data we have, we make choices on which data to include, or which features to include in your feature engineering. That is very much human-dependent. There has been a lot of work in trying to understand what a diverse group of data scientists we need, so that we can make sure that the model development is not biased.
You can go through both and still have bias, just based on the way that the model gets used. Some of it could be extremely random. Let's say you have a recommendation system which recommends a list of top five songs. Let's say in initializing it, there is no one who has heard it, and you have all five of equal value. I.e. zero people have viewed it, but the ordering matters. Economics clearly tells us that we, as humans, go in a certain order and that ordering matters.
Let's say someone randomly picks one from that and it doesn't matter which order the songs are in. Now suddenly one of the songs has a like of one and the other four of zero. The second person comes in and then starts looking at it and says, "Oh, no one has looked at the others, so I'm going to listen to the same song." Now suddenly you have a popularity of two versus zero, right? This is just a very artificial scenario, but that's exactly what's happening. Maybe it's not zero, but lower values. People sometimes think randomly but still that could lead to bias. Detecting that bias is also important.
There are some other very interesting biases that come in more because of the system. Interaction between multiple things causes bias. It's really something that we need to be cognizant of so that we can guard against some of them. I don't think every bias can be prevented. I think we just need to be conscious of what bias we are introducing and, therefore, how we are using the model. Back to my earlier point, that's where the usage becomes important: what is the impact of this model on the broader community? Is it in their health? Is it their financial status? That becomes important so that then we can say what is biased.
Similarly, on the fairness side, there are 30+ definitions of fairness. It's not so much the algorithm that is fair or unfair. It is more about other conceptions of what is fair or not fair. What might be fair to me might be unfair to you, right? That also becomes much more of a human or societal issue, as opposed to an algorithmic issue. Knowing that would help the data scientists as well, so that they can have the right conversation with the business stakeholders: it's not all in the technical solution. I think quite a bit of that is how this model gets used in a broader context, which obviously the business folks know. That's something that the data scientists should tease out, even if they can’t articulate it. Tease out the business context, harms and risks that these models could cause so that you can prevent them.
That's why I think it's a very good move that the community is going through, in thinking broader. I think you asked me about how things have changed from the ‘80s, ‘90s, to now.
DAVE COLE
Right.
ANAND RAO
There is far more adoption now. We need to take into account these societal impacts, now more so than before. We were still in the lab. The consequences of doing things were about either getting a paper published or not. Now, people are using it. Millions, even billions, of people might be using our machine. We better be careful about what we really recommend.
DAVE COLE
That's exactly right. The advent of AI usage and machine learning has created and forced us to have these tough conversations with our business users, and with the wider community, to help educate them on the best ways to put together these models.
It’s also about demystifying the models themselves. I wanted to recap a little bit. You mentioned noise in human judgment from Daniel Kahneman's book. My understanding is, given the same amount of data, you and I could look at that data set and come to very different conclusions. That is something to be aware of as you're building your models.
The other is just fairness. How is fairness defined? You and I can disagree on what is fair and what is not fair. I can imagine, as a data science leader, trying to come up with those boundaries over what is fair, is a useful conversation to have with your team.
ANAND RAO
That's right.
DAVE COLE
There's also the background of the modelers and the data scientists themselves. People should be aware that they're bringing their biases into their work, not just into their political opinions and the like.
Lastly, when the model is actually implemented, bias can creep in as to how you display the results of that model, if it's a recommendation engine or something like that. It's important to be aware of all facets of these things as you're putting your models into production.
ANAND RAO
Yup.
DAVE COLE
Great. Let's move on to our next topic. This has all been very fascinating.
I also want to talk a little bit about your day job too, which is that you're working at PwC. PwC is a large consulting firm, as we all know. If you were advising somebody who was just creating a small boutique consulting firm, focused on data science—maybe we'll go to the larger consulting firm—what things would you talk to him or her about?
ANAND RAO
First, of course it's all about the people, right?
DAVE COLE
Right.
ANAND RAO
The kind of data scientists you bring in. Next, what kind of an environment you provide. Thirdly, who are the others that are necessary as part of this consulting, for you to be really successful with your clients, in solving problems.
If I take those three pieces now, in terms of how we bring in people, I would say that there are a couple of things that we want to look at in terms of the type of people that you want to bring in. For a data scientist or an entry-level person, I would look for three specific characteristics. I call it the three Cs: curiosity, creativity and communication.
As a data scientist, you should be curious on why something is happening or how you can tease out some insights from the data. That’s the data scientist’s job, in my view. If you're not curious, I don't think you can be very effective in teasing out those insights. That's number one.
The second one is creativity. You need to be a creative problem solver. It's not that you are given this data set and you're just running through a set number of algorithms. These days, there are libraries of 100s of algorithms. Run all of them, understand them and then pick the one that has the biggest accuracy, and so on.
DAVE COLE
Right. You don't sound like a big fan of AutoML. Is it a sign?
ANAND RAO
Yes. That's right. That's fine to some extent, but then you really need to be creative in how you solve the problem and how you go deeper.
The third one is communication. For this model to be actually useful, you need to be clear on what actions and decisions it's going to be driving, and communicate the value of that, how well or otherwise the model is doing, and the implications of the model. Communication is, I think, critical for a data scientist. Technical communication, so you can stand behind your algorithm, the way you have done it, and so on, but also stand in front of business users and stakeholders, to explain to them what is happening: how you arrived, took the data, labeled it, did one of our learning mechanisms or natural language processing. All those three become very important.
Two other criteria that we usually look for, as people progress in the data science ladder, is two more Cs. One being a coach, and the next one being a collaborator. The coach is still very much an art rather than engineering or a science. It's also art of that science, rather than everything nailed down like a software engineering practice. Data scientists, as they get more senior, need to be coaches and guides to the younger ones, to show what to look for, what not to look for, where to look and so on.
It's a very active area, so there are a number of algorithms that are coming out almost on a daily basis. You need to be able to play around with it, and that's the other side of the curiosity, "Hey, I found this interesting thing. How did they actually do it? Is it a different technique? Is it the technique that I know?" Those kinds of things, a coach becomes an important role for.
The collaborator: as you get into the managerial side, you need to collaborate with other domain and business experts. You have the techniques and they have the problems, right? Trying to collaborate with them in understanding and breaking down the problem, in a way that you can actually go find the data and do it, becomes a critical skill. These five Cs are probably critical in both getting the people in and then keeping them there, keeping them infused.
We also wondered about retention in data science? It's a very, very dynamic area. Lots of things are happening. This is also true for other areas but data science is at least one of a few where every day there is something new. There's so much research happening and so many papers. You need to allocate the right time for your data scientists, to what we call exploration.
It's not necessarily everyone who should do this, but in our team, we have a 60-40 person rule. 60% of the resources are focused around taking some of the ideas, assessing problems, and then implementing that for our clients, for the businesses—hands-on application. The other 40% of the time is very much that exploration, like reading articles. Once you read articles, go into the open source, download them, experiment with them, combine them with what you have and then try to apply it into some of the problems that you are seeing. We call them innovation sprints. Explorations and innovation sprints become very critical.
One is to obviously retain the people, because that's something that people really enjoy. We don't want to lose track of what's happening. The other thing that's important is you can lag behind very quickly. Whether you are in a corporate job, or especially consulting, working with technology and NLP that was there five years back, not keeping pace with the new ones could be disastrous. I'm not suggesting that everything that has come now has superseded everything that happened earlier. You just need to be aware of the pros and cons of the different approaches so that you can bring the best of everything in a consulting environment. That’s the main part, I would say, that you need to be conscious of as you build your data science group.
Finally, the other side: who are the other people? One of the key roles that we have is what we call a bilingual or a multilingual. What we mean by that is data scientists who have maybe naturally grown into a specific industry sector, like financial services or healthcare, or they might be functional experts, customer service, customer experience, risk expert and so on. There are people who are either coming in from the business or functional domain, gathering data science knowledge, or data scientists as they get more experience going into that.
But it's not just the data science of the business domain. You might also want technologies or software engineers. You might also want data experts. You typically come up with a multilingual person who knows quite a bit about a particular domain in data science, but also has been a software engineer in the past, knows about software, knows about testing and so on. That is the kind of capability that I think is good for a data scientist group to have. Again, I’m not suggesting that everyone should be multilingual, but the team itself, I think, should be very much multilingual. This is so you have people who can talk to the business functional areas, work with the software engineers and work with the data folks. The bulk of them might still be data scientists, but you need the ML ops, ML engineers and all of the ops people.
DAVE COLE
Yeah.
ANAND RAO
So that's how we build a team.
DAVE COLE
As I heard you answering that question, I was trying to think of myself. How is what you're describing here different from a data science team? If I'm building a data science team within a single company, not a non-services company. I heard a strong emphasis on communication. I think that's so important. The fact that you call out that it's almost a different language, bilingual, being able to understand a specific vertical, or being able to understand a specific function, plus also having that data science background, I think is really important.
What makes the consulting industry sort of unique is that, in this case a data scientist, as a consultant who's doing data science, you're getting exposed to many different companies, right? Then many different problems, in potentially many different verticals. The type of problem that you're dealing with can really go a lot broader. As you spend more time at PwC, you have an opportunity to focus, and that sort of bilingual, deeper expertise comes in.
ANAND RAO
We have what we call a major and a minor, right? The staff are allowed to choose, and that's something that evolves as they go through their careers. So the major might be a specific technology. It might be, "Hey, I want to go deeper into NLP, natural language processing, generation, understanding," or, "I want to be more in working with vision, computer vision, deep learning, those areas more based on the data." Some might say, "I want to be more in the simulation reinforcement, learning-agent based modeling." So that's more a technology-based major, and they might have a minor in terms of a particular functional area. It might be a particular vertical or we sometimes reverse it.
Your primary thing might be banking, for example, but you'll know a variety of different techniques and use cases within banking, which you can bring various data science elements to. Banking might be your major, and then AI and various areas of AI might be your minor. You might even add a specific thing there in terms of, "I'm a risk expert in a banking environment. Anything associated with risk, I know I can do it. I can bring the right people." That major-minor also works quite well in developing those skills. You tend to rotate that, so over a period of time, you can build the right balance of expertise.
DAVE COLE
Got it. I love the three Cs: curiosity, creativity, and communication. I mean, the communication I hear loud and clear. I think there are some data science leaders that create separate roles, like engagement managers who are responsible for doing the translation between the business and the data science team.
ANAND RAO
Yes.
DAVE COLE
It sounds to me like your expectation is that the data scientists themselves work on that communication and their ability to present their findings back to the business users. In so doing, that's how you get that vertical and potentially subject matter expertise and so on.
ANAND RAO
That's right. Yeah.
DAVE COLE
Totally agree there. Let's move on to our last and final topic, one of my favorite ones on the Data Science Leaders podcast: discussing ROI and AI. That's also very important in starting up this new consulting firm: making sure that the work that you do for your customers is driving value, right?
ANAND RAO
Yup.
DAVE COLE
How should a data science leader think about ROI?
ANAND RAO
Into the very term ROI, I think we need to dig a little bit deeper from a data science perspective, right? I know everyone is used to that word, at least in the business world. You need an ROI for whatever systems that you are implementing. It might just be a program marketing campaign. For anything, you do need to have the ROI, which is obviously, everyone knows, return on investments.
The key thing for data science is to just try and unravel that definition a little bit. What is the investment? What is the return? In one sense, it's all the dollars at the end of the day; dollars that you're getting back versus dollars that you're putting in. That's a hard R and a hard I, investment and return, but there are a number of softer things that also come into play.
Even in the dollar one, if you look at AI analytics more broadly, or data science more specifically, what these models do are fundamentally a couple of things. They improve the productivity of the person using it. Some of the automation-related ones, and then you can go into RPA, IPA, or robotic process automation, intelligent process automation. Some of those are more productivity-enhancing, which means that you are going to save more time. Whatever investment you are doing in building something, the hope is that you'll save time. If something took eight hours to do, you're promising using my system, you can do it in one hour, for example. You don't have to completely eliminate it. Let's say it's sort of one hour.
So time is one, but then people also look at headcount savings. Can you actually transfer these time savings into headcount savings? That gets a little bit tricky.
DAVE COLE
You're scaring many of our business audience out there.
ANAND RAO
It gets very tricky, as you know. If I'm saving 20% of 100 people's time, can I aggregate all of these 20%, and then get rid of a few people? Unfortunately it may not work that way. If you imagine 20% of your time, and I remove that, you might have other things to fill up that 20%. There may be no way of measuring what else you are doing that 20% of the time. Depending on what you are doing, knowledge or value-added work, it's very difficult to justify that you’re doing something valuable for that 20% and not have people removed. Headcount reduction becomes challenging, but that's definitely one metric that people use.
If I did the reverse, taking a particular task and eliminating 80% of the work through automation, giving only 20% of the work to the individual, then it might be difficult for an individual to fill up their time with 80% of "value added work" or whatever that they may be doing. That's where I think you need to be careful. The number of hours saved doesn't equate necessarily to headcount reduction, just by doing the simple map there. It really depends on the nature of how you are getting that saving. So this is just on one level, getting deeper.
Then of course the more problematic one is revenue enhancement. There are a number of measures where you are targeting better: effectiveness-based data science models. You are recommending things better, making better decisions and so on. Those are adding more to the top line as opposed to the bottom line. There are definitely measures but you need to also take off the things which could have normally happened: typical test-learn cycles and control test groups that you should have.
The third one, which is more difficult to measure, is experience quality: more intangibles which have an impact. This is just in terms of going after returns. I wouldn't have gone to the investment side. On the investment side people think, "Yeah, of course it's the dollars. Yes. It's the people, and whatever you pay your people." Yes. Some people might count the compute time, storage time, compute, the storage dollars, the compute time. But then there are other things which sometimes go hidden.
If you're doing machine learning, you need a labeled data set, right? Labeling that, if you don't have a label set, might be a significant cost. That cost might be a fixed cost upfront to go and do that, but once you have done that, the variable cost is much lower and you can spread it and scale it much better. You still have to incur this labeling cost, for example, or the data preparation cost. You need to be conscious of some of those.
If you go even broader, it depends on what compute you do. I know everyone is on the sustainability side. You're crunching through all of these algorithms and compute, using more compute time — does it really add appreciable difference in the accuracy? If it is only one percentage point difference for a huge amount of compute time, is it really worth it? How do you know, or how do you determine whether it's worth it or not without doing it, right? That's why you need to compute that rather than saying, "You know, I'll run all my deep learning algorithms and everything that I need to run, and then determine the ROI.”
Trying to piece it down into hard and soft factors on the return and investment is one task. There's a second issue, which is also very important, and that is about saying that the data science model is performing better: better than what? As all of the ROI measures are, the algorithm does better than a human. Now that's easy, very understandable to everyone. But which human, right?
Let's say you're in an insurance company. Is it your best underwriter? Is the algorithm performing better than your best person? Is it one person, or is it a group of people, say all your underwriters? Is it better than the average, or is it better than your entry-level person? All of them might have various values, but you need to be clear going in as to what ROI you are expecting. When you try and measure the ROI for an initiative, you need to have an understanding, because that changes how you are running your testing and your data science model. The person who should be testing is not someone who just joined your company as an underwriter. They are not the one that you are measuring against. You need to get your average underwriter to be the person who is testing and validating your system from a user perspective. It becomes important as to what the ROI is, and against who you're measuring.
Then there is a further complication. Even if you decide which group it is, you might still have lots of noise. This is where I think the book, Noise, highlights that there's a noise with respect to multiple people on the same problem. There's noise with respect to the same person, same problem, depending on the time of the day and your mood, you might do different things.
DAVE COLE
Yeah.
ANAND RAO
And then of course, different people are also different, right? All of those things also have to be weighted in doing the comparison. So again, it's sort of a can of worms, as you can see, that unravels. That is the second aspect. One is, what is the ROI definition, soft, hard aspects of it, and then what is the baseline and how are you comparing that baseline?
The third set of issues is where data science is more of a science than an engineering. It's not software engineering. It is data science. If you go back to pharma, life sciences and drugs, there's a huge number of formulas or drug formulas that are tried, and a very small proportion comes out at the other end, and gets approved by the authorities. That whole process takes literally 10 years and billions of dollars. I'm not suggesting data science is like that, but in some sense, it is testing and learning. You have various phases. You need to divide up your way in which you are thinking into multiple phases. In other words, you should think of your data science projects as a portfolio, so that there are some projects which are tested, well and truly tested now.
Sediment analysis. So pulling out recommendations in games or chat bots, not that it's "easy," but we know that there are hundreds, if not thousands of companies that have done it. Given the right data and team, you can build it, right? There's some of those where you can expect an ROI based on the experience you can determine. At the same time, you want to be pushing the boundary into some of the newer areas.
Let's say you are embedding reinforcement learning within a simulation system, digital twin. It's exploring newer techniques. You need to have a balance of your portfolio. What we also recommend is, as a person who will run a data science unit, don't go and promise an ROI on every little project to your stakeholders. Take it as an overall portfolio, and then say, "For this portfolio, for this investment into this portfolio, I'll get you this return." There might be some models that may be "blockbusters." In other words, they'll save huge amounts of money to the firm, or might generate a lot of revenue. There may be a few which just fail, and that's okay.
We need to have that portfolio mentality, as opposed to having our data scientists define exact ROI for every data science project, and insisting that their X number of hours are spent, you have Y % accuracy, and you need to generate Z dollars at the end of the day. It's not going to really work. If your measurement is structured that way, I'm pretty sure you're bound to fail, because data science is still not a area where it is clear cut like that. It's very much a science and an exploration. It's an experiment that you are running.
DAVE COLE
If you look at it just project by project, you run the risk of being very cost-conscious, right? You have that one bad project, whereas you look at the entire portfolio—I sound like a financial advisor, who's advising, but there is truth to it. You're not going to win them all. If you take a broader perspective, it's the right way to go. There is a test and learn approach to this and there are sort of unforeseen circumstances.
As you were talking through some of the harder benefits on the R side of the ROI analysis, like hearing about productivity and potential headcount savings, I also think there's a combination. The human in the loop. The AI model + the human. If you're just comparing against your top underwriter to a model, the third wheel is around what if you combine both? What if the underwriter was using the model to inform their decision, right?
ANAND RAO
Yeah.
DAVE COLE
And that, I think, also is really interesting. We don't have time today, but if we did have time, I would be very curious to hear about rolling that out, and what that might look like.
We touched on a lot here today. Anand, I really appreciate you taking the time. We talked about, obviously, the ROI analysis. We also talked about responsible AI. We talked about building out our consulting practice. I learned a lot and I really appreciate you being on the Data Science Leaders podcast.
ANAND RAO
Thank you very much for having me, Dave. I enjoyed the conversation. Thank you.
DAVE COLE
Great. If people want to reach out and chat with you, is LinkedIn one of the better ways to do that?
ANAND RAO
That's good. Yeah.
DAVE COLE
Perfect. Thank you very much.
ANAND RAO
Thank you.
Popular episodes
What It Takes to Productize Next-Gen AI on a Global Scale
Help Me Help You: Forging Productive Partnerships with Business...
Change Management Strategies for Data & Analytics Transformations
Listen how you want
Use another app? Just search for Data Science Leaders to subscribe.About the show
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.