Data Science Leaders | Episode 23 | 37:32 | October 12, 2021
Tracking Business Value with Data Science Portfolio Management
Get new episodes in your inbox
powered by Sounder
You may not have a formal “portfolio management” function within your data science team, but in all likelihood, you’re executing some of its key components already.
But being more intentional around portfolio management can pay big dividends. Without it, you could be missing out on a powerful and holistic way of demonstrating the value your team provides to the business.
In this episode, Katya Hall, Director of Enterprise Analytics at McKesson, explains how the portfolio management process sets the groundwork for defining KPIs that track the actual value derived from predictive models and insights. Plus, she shares her thoughts on a process for validating model accuracy and managing risk.
- Tips for working with business counterparts
- Data science portfolio management
- Model risk management
- Supply chain analytics
Hello, welcome to another episode of the Data Science Leaders podcast. I am your host, Dave Cole. And today's guest is Katya Hall. Katya is the Director of Enterprise Analytics at McKesson. Prior to McKesson, she was the Senior Manager of Enterprise Data Science and Analytics at CUNA Mutual Group. Welcome, Katya, to the Data Science Leaders podcast.
Hello, thanks for having me!
Great. Well, on today's episode, we're going to be talking a little bit about your unique origin story of how you became a data science leader. A lot of folks out there are wondering if they can themselves be data science leaders. Maybe they don't have a PhD in stats and are wondering if they can. I think your story is certainly unique: you started your career more as an engagement manager, and now are in a data science leader role. We're going to delve a bit into that.
I also want to talk about data science portfolio management: what is a portfolio when it pertains to data science?
We're going to delve into model risk management. I think there are a lot of companies, especially in the regulated industries that know model risk management very well. If you're not in a regulated industry, there are probably some great lessons in understanding that whole process and why it's important.
Last but not least, we'll delve into an area of expertise for you from a data science perspective and talk a little bit about the supply chain. I hear about some of those interesting use cases and learn about topics that might be universal from a data science perspective—many companies have supply chain challenges—and how data science is being applied.
Let's go ahead and just start off. So you started your career basically being an engagement manager, right? How did you make that leap to becoming a data science leader?
Sure. So being an engagement manager, I felt like it prepared me for working with business customers and understanding their problems. We did a lot of process optimization type work in an enterprise setting. That leap really happened when I was at CUNA Mutual Group; it was one of those serendipitous moments. I attended a presentation by our Chief Digital Officer at the time, and he spoke about data science, predictive modeling, and prescriptive analytics that they did back at Capital One. It just really blew me away. That's how I feel I was bitten by that data science bug at that time. It just seemed like black magic. I was totally fascinated with how you can take a lot of data and predict the future so accurately. My background was in macroeconomics, but this really takes that data, processing data and understanding it to a different level.
It's a leap from statistics, from econometrics into data science. From that fascination, I ended up running into him—Harsh Tiwari was our Chief Data Officer—and chatting him off about opportunities to work in his area. I knew that I could bring a lot of that organizational expertise, problem solving, working with the business, being that translator into the mix. My missing component was the data science knowledge and understanding. At that point, he and his team pointed me towards a training that I could take. I enrolled in a couple of online programs.
First I did a Johns Hopkins program, Data Science for Executives or Data Science Management, something like that. Then I did a business analytics and data science program at Wharton School of Management. It was a crash course. You don't become a data scientist from attending a program like that, but you start to really understand the art of possible: what can data science do in an organization, what problems it can and cannot solve, the realities of data science in the real world, how to manage a team of data scientists, what are some of the pitfalls of that? It was very educational and I was able to drop aboard of the data science team.
My role initially was model risk management and portfolio management. Eventually it evolved, after a year, into the translator role where I worked with our sales and marketing folks in the B2B sales area. I helped them solve some of the problems around understanding customers and doing customer segmentation: how do we take the insights from customer segmentation and create action from that, that's beneficial for the business?
It takes a lot of grit and effort to take an online course in addition to your day job. When you mentioned that you were bitten by the bug and data science, I’d say that's quite the bite. I'm impressed. It then sounds like you had an opportunity to actually work with the sales side. When I hear about executives’ courses or management training, my guess is that it doesn't delve too deep into hands-on data science work, but I could be wrong. For some of these projects that you're working on with the sales team, were you actually doing some of the data science, or were you managing the projects and you had a team of data scientists? So how did you cut your teeth?
It's a double question from you. It has two sides. As part of training, there was some hands-on work. We did build some optimization models. It wasn't anything particularly sophisticated, but we got a feel for how to build an optimizer and learned a lot about concepts. You can talk the talk, but not necessarily walk the walk. You're still heavily leaning on the team of data scientists, also learning how to best support them and give them the right tools, space and time, and how to talk sense into business, to protect this team, because the business will try to misinterpret that sometimes.
Not intentionally. I'm sure it's unintentional, right?
Yeah, they'll work themselves into a corner. There's a lot of art and science in even managing the process and the business and the team. I felt like I could contribute to that. I never personally wrote code for the customer segmentation model. It was more managing the process and making sure that those insights were being adopted. It was extremely interesting for me, learning to see just how many different avenues you can take customer segmentation insights and how many different uses there are. It's kind of a foundational analysis that ends up growing a lot of additional insights on it, depending on who's applying the segmentation. Is it people in customer engagement, marketing and sales? Is it underwriting? So they're different users and then they start to customize it. It was more of the product owner role at that point in time.
Yeah. With customer segmentation, the goal typically is to better understand your customer, better understand what products they're interested in. What does the customer look like, that purchases product A, product B, etc.? It could be to figure out what the right marketing message is that will resonate with them; what is the right approach from a sales perspective—should you first hit them with a direct mail campaign or email campaign before you talk to them? By better understanding your customer, you can better understand how you can engage with them. Do they want to engage online, or do they want to engage on the phone or other ways, right? There's just so much you can do with that customer segment by better understanding your customer.
You brushed over something that I want to delve into, which I think is really important. You said that sometimes working with the business side can be challenging. That's something that comes up time and again, right, if you're a data science leader: managing expectations; properly being able to communicate status and, obviously, results. Are there any words of wisdom that you have for other data science leaders out there in terms of being able to work with business counterparts? What would your two minute crash course look like?
Well, I think you hit on that with managing expectations. You can build a model in a day, but it takes a year to build a really good model, at least.
Setting correct expectations with the business is definitely key, but also interpreting the results and boiling it down because you're communicating to a lay audience. You're talking to them potentially about them giving you access to their sacred process that they manage, where their job is on the line, their career is on the line. Something goes wrong and you are asking them to trust your data, trust your model, trust your insights and go ahead and execute on what the predictive or prescriptive analytics are telling you. It takes a lot of trust. It takes time to build that trust. It probably starts with building some credible descriptive analytics for them first so that they can start trusting your data set, your understanding of their domain, your understanding of their business.
They feel like they connect with you personally and that you get them and their business, before they can start taking bets on the predictive insights that you are providing them. My other piece of advice would be patience. Be patient with your business customers because to build that true impact. Sometimes it will take some time and trust in relationship building. I think that that is key. Also challenge them. Push the business to take that action. Sometimes we've had to actually illustrate, through data, the cost of inaction to the business.
Right. That's good, I like that.
Yeah. For example, not taking the bets, not building inventory where we're forecasting demand spikes, we demonstrate omitted orders, and here's what it's costing. Do you want to really keep sitting on your hands or do you want to start taking bets?
I like that. You mentioned something in passing there that I think is really important to point out, which is one way in which to build that trust, right, is starting with descriptive analytics: simple visualizations of the data; data scientists actually saying what they’re seeing, just looking at the data, asking if it looks right; sharing the distribution of orders by some customer demographic or some customer segment that you've built out etc.. Then you start that conversation, right? Then you have the folks on the business side who say, “Yeah, that looks about right. That's what I would expect to see,” or, “No, that looks a little off. I want to drill into that. I want to learn more.” Then you have a dialogue back and forth. As you go through those conversations, I imagine the data scientist is learning a heck of a lot, right, about how the business has worked traditionally, what sort of gut instincts are out there, are those gut instincts right, or do they maybe need to be refreshed if they’re no longer accurate. I think all of that is a great way to build that relationship.
And then you can say you actually want to build a model that is predicting or classifying something or taking some existing process and using data science for it, instead of maybe your gut instinct. I think that's great and ‘here's the cost of not taking your medicine’ is a great insight as well. I want to talk a little bit about portfolio management. So I never heard that concept as it pertains to data science, but I can imagine what it means. What does it mean to you? What is portfolio management?
It's the process that really allows you to take a bird's eye view of the body of work that your team has completed: what it's working on, what it's got in the backlog etc.. You may or may not have a formal function called portfolio management but if you are a data science leader, the odds are you're at least informally doing that already. You have a list of use cases that you maintain, you understand what modeling methodologies are being used there, who is the business sponsor of each of the use case, who's the business owner, who are SMEs (subject matter experts), what data is being used, what tooling is being used for them, is there opportunity to reuse those models elsewhere within the company, because the code can be reusable. Excellent models can be applied in different contexts.
What that allows you to do, having more structured portfolio management, is to first of all see how your body of work is evolving and performing. You can define key performance indicators for it. For example, how quickly do you turn around your work requests? How do they flow through the lifecycle stages of a use case? How well are you doing at intake? When you sit down with a stakeholder to understand what they want from you in a model, do you really dig in and understand the problem they're trying to solve, or the opportunity they want to capitalize on? Do you structure your hypothesis very well, so that you can then prove it out and say, “Here's how I'm going to solve your problem with this kind of model,” or an analytical approach, and really improving the rigor of that.
Then, also, how do you monitor adoption of the use cases with the business? Good portfolio management can really shine a light on the lack of adoption. You can start to diagnose what's going on. Are you throwing use cases over the wall and saying, “Hey, I built a dashboard for you. Why don't you go play with it, and tell me how you feel about it,” versus, “Hey, I've got a demand forecast for you, and here's the action you want to take with it. Please would you go take that action?” You're giving them a to-do list rather than an insight to play with. Then you can start to track the value that your insights are driving in the organization, through that action that the business takes.
At McKesson we've been able to implement a lot of rigor around that, which means we partner with the business and we build financial models around what value these use cases bring to the business. Then we bring in financial analysts who present the model and indicate how they track the value of the action that the business takes on their predictive models and insights. We take it to a pretty deep transactional level. Every holdout placed on the inventory; we can track any buy or demand or a price adjustment. It can be all tracked through the data because we're already swimming in the data. We might as well build a dashboard that quantifies the business lift that is happening through the action.
You can roll that up at a portfolio view. The beauty of doing that is you start to create a complete value proposition for your data science team. You're showing where the dollars are, that you’re creating for the organization. If you ever want to grow the team, you have yourself a platform of value to stand on. You can show the body of work, you can say, “Here's what we've done, here's what we intend to do. We needed this many people to do this much. We can do this much more if we had these people, because these are the opportunities we have scoped, and this is what they're worth.” It just makes a much more powerful value story.
So, Katya, the picture I have in my mind, right, is there's some visualization, some dashboard, right, that shows the portfolio of model assets. It could be anything from a visualization to an actual model, right. Then there's some team that is responsible for putting this together and doing the financial analysis. One of the hard parts here is actually being able to tie the output of the data science team to a change of behavior. This team of folks, I think you called them financial analysts, is focused on looking at that change of behavior.
Is there an A/B test or something that's going on here? How is that done? How is the financial analyst actually being able to tie that change in action to something that your team has put together?
Yeah, let me give you an example. We have a predictive model that looks for patterns in the data that will indicate key supply disruption; some kind of upstream supply chain event is happening. There's a ship that's stuck in the Suez Canal and some pharmaceuticals are impacted. All of a sudden we can't get Metformin in the building. We actually have a very accurate model, probably in the 95th percentile, that always comes through for us. We bubble up these disruption predictions to our purchasing; we know what action they're going to take is that they're going to buy the product before it's disrupted (before it disappears off the manufacturer rosters, formularies rather).
When they buy it, that's quantifiable inventory that they will then sell through and that revenue or the operating margin, whichever way we want to track the value, that's something we can quantify. We can see it very easily. We know the product. We know when we gave them notice. We know how much they were able to buy. We know when it did get disrupted. We know when it all sold because we have all the data from the enterprise resource planning system. We can then quantify ourselves. So the data scientists who built the model have an interested party to show the value that’s been produced. Then we have the business look it over.
We have finance look it over. Finance will validate the approach and say whether it makes sense, if we’re quantifying revenue correctly, alignment with how we do P&L statements. We get everyone's alignment. This is a defensible number that we can then put on the books and say that supply disruption model delivered X millions of dollars worth of margin.
So in this case, in this example, the data scientist is actually responsible for putting together the analysis of the ROI, then the finance team is responsible for blessing the method and the approach. Do I have that right? Then at some point you're aggregating the value in some way, shape or form into some sort of a cross portfolio view. So by portfolio management, it's that ability to look across all of your data science output. It could be models. It could be visualization and being able to see where you drive in the most value. I imagine that there are some stinkers out there, right? There might be some model that’s just sitting out there that isn't providing a whole heck of a lot of value. What do you do in that case?
We accept that a lot of the portfolio constituents will not deliver value. Some of it is accepted from the get-go as just informative. It might be just the report that helps the business operate, that they've already been doing. Maybe we automated for them. We call them relationship builders. They make your life easier. We get to know your business and data. We will then recommend an opportunity for you to have a truly impactful, robust, predictive model, as we get to know if there's an opportunity to do so. Not everything is going to create value and that's okay. We prioritize work that does create value. As soon as we see an opportunity, we will definitely want to push it forward. We will alert the business that there's an opportunity.
Sometimes we find those opportunities. Sometimes the business will bring it to us and say that we have this inefficiency here and ask if they can work with us to solve it. It's bidirectional. Sometimes we think it's going to deliver value and the hypothesis doesn't check out. We do a proof of concept or a pilot and the analytical solution doesn't solve the problem, for whatever reason. It just doesn't work sometimes and that's okay. We fail fast. We recognize that, we pivot, maybe we change our approach. Maybe we didn't have the right data. That happened to us once, where we were sourcing retail insights and the data just never quite materialized in the richness and the volume that we were expecting. We put the whole project on hold and we went seeking additional third party sources and we weren't able to pursue it. Curve balls happen.
That's another thing with portfolio management. You create a forum to make those decisions. “How is this project going? Where are we, are we stuck? Are we moving forward? Do we have unsurmountable dependencies here? How can we help? Which levers can we push to keep the projects engaged and energized and moving in the right direction?”
Another thing is having this place to have those conversations to keep the value moving forward.
You mentioned conversations. I imagine that there are lots of great conversations around where resources should be placed based on taking that portfolio view. And also seeing that you identify inefficiencies in the sense that you have similar models that could offer consolidation opportunities, right, by taking that portfolio view. I think this is a great recommendation to all data science teams out there, to take that portfolio view. I wanted to segue into the next topic: model risk management. Help the audience understand: what is model risk management?
It's actually a pretty robust discipline. I think it originated in the banking sector and there's quite a bit of regulation around it. I think it's called SR 11-07. It's a whole playbook for how to do model risk management and it's surprisingly readable.
It will be my next bedtime read for sure.
There you go, if you can't sleep at night. Model risk management, especially keeping your models accurate, is like model hygiene: keeping your models running well and healthily and accurately. There’s a surprising number of variables and factors that can impact that accuracy and can derail your models. The good old ‘garbage in, garbage out’ of course: data quality. You're relying on the platform and tools and a lot of moving parts there too. If you look at your end-to-end data flow through the modeling environment, things can go wrong from the initial SQL query all the way through the scoring algorithms, all the way out to egress to your downstream applications, a lot of moving parts, so things can go wrong. Part of model risk management rigor is to really examine that end-to-end flow and find what's called failure modes and effects.
We did these exercises. It’s fascinating how you can really visualize the entire flow of data and find people who work in the environments: platform engineers; cloud engineers; data engineers, data scientists—all coming together and thinking through it in a safe place, asking what can go wrong at this step. Before you know it, you’ve got a list of 30 steps and 50 risks of things that can go wrong. A lot of them will tell you that you have gone wrong if you give them a safe space. Then understand how you prioritize controls; how you plug those holes and soft spots in the process where things have gone wrong. Do you create a test suite that runs every time you merge new code? Then that's your merge criteria. Before you merge new code, you have to run thorough tests — do you have a test to unit test coverage?
Do you have good coverage so that you are not leaving any parts or packages untested on merge and regression testing? A lot of different controls can go and there's diminishing returns, right? You can control the heck out of the whole process to the point where you're not building any models and you're just doing controls all day long. There's that balance and sweet spot risk appetite that you have to determine, what you are comfortable with within your context environment. Is it life or death? Are you curing patients in an ICU? Or are you doing a marketing campaign?
How expensive is your marketing campaign? How many dollars are on the line? Quantifying the risk is important, and we were fortunate enough that we were in an insurance context. We had access to tremendous risk management professionals and actuaries who sat down and looked at the process with us and showed us how much risk was trapped in there, dollar values.
“If you do this, this and this, you can slash this much risk off and squeeze it out of the process.” It was like, “Wow, we know what to do now.”
So just to summarize, I think the first step is getting the various members of the team together who are actually responsible for producing the final output: the final model. Obviously those are the data scientists, but then you have cloud engineers, ML engineers and other counterparts who are all involved in this process. You get them together, you get them talking about what are all the potential failure points all along the way, from the drift in the data to not receiving some data to the model going down while it's doing some real time scoring or what have you. Once you have identified all those various points, then that will inform how you go ahead and you test the model.
It also informs what controls you put in place to try to mitigate the potential risk, “If this failure were to happen, how do we get alerted so that we can be proactive about fixing and resolving it?” But all of this needs to be balanced against speed, right? You have the risk and what your appetite for risk is, the use case, the actual problem that your model is attempting to solve if it were to have a catastrophic failure, like what's truly at risk. That risk needs to be balanced with your ability to be agile, right, your ability to quickly retrain the model and to keep its accuracy high.
Or just get it out into production because it's going to have some obvious lift over the previous process that you might have had in place. If it is life-threatening, if it is a critical model, then I imagine the amount of testing and the amount of rigor that goes into the controls is of critical importance. There are a lot of lessons there, around model risk management, that should be applied. All data science teams should be thinking about model risk and these controls and monitoring and looking at the data coming in. I think it's pretty universal.
Yeah. I would agree with you. I had the most hands-on experience with deploying model risk management. It was a regulated industry but the main reason why the model risk exercise really bubbled up to the top priority for us was because of the dollars we were losing. At some point when the model reaches a certain level of adoption, the business is really placing bets (millions of dollars) on the predictions that the model is providing. Your risk will grow with that. It was a marketing campaign that unfortunately crashed and burned because of the model risk problem. It was actually caused by a failure in master data management. We went through a system migration. We moved our master data from one system to another, and it created false new leads in the system.
For some folks, it had exceptionally conservative find-and-match algorithms, which created duplicates that the scoring model did not recognize as recipients of past campaigns, which of course are going to be less responsive to your campaign. It ranked them as brand new ‘fresh meat’ leads and put them at the top of the campaign. That wasn't very effective compared to any previous campaign for years. Understanding that master data management is a link in the chain that can fail you in an ugly way. That's about understanding that entire environment and what impacts your models. That's the end to end, but also of course, model validation and having what's called ‘effective challenge’: having another data scientist come in and be able to reproduce the model from the ground up from nothing, retrain it on a fresh data set and see if they...
Just from the documentation and from like a different sample data set?
Exactly, reproduce the steps laid out. That validates the documentation was accurate, that validates that the model research is reproducible: very important. You're getting the same results. You're getting the same accuracy, the same rankings and predictions. That's critical. That takes a lot of work. It takes a lot of commitment so you're not going to do it for every single model. That's where the model inventory comes into play. You catalog all your models and you rank them in terms of importance, financial impact, how critical it is, is a reputational risk in there, is a monetary risk in there. Then you create a top tier of models: nothing can go wrong here and that's where you're going to put all your rigor and effort to validate the models, to scrub the environment, put all the controls in so that the data flows cleanly through that tier. The rest can be a little more loose because you just don't have unlimited resources.
That makes perfect sense. Let's segue into the last topic here, which is an expertise of yours. Just talking a little bit about data science and how it applies to the supply chain. What are some of the typical supply chain related use cases and challenges that you think that any company, that has a supply chain in its mix, should be thinking about and using data science to apply to the supply chain challenges?
Sure. So we have a supply chain center of excellence in enterprise analytics, focused completely on all things inventory management, labor management, transportation management: the three main pillars so-to-speak. Distribution center operations are also part of that, of course. I would say we've done the most use cases to date in the inventory management space, concerning demand forecasting, getting really good at longer term demand (30, 60 day demand). We can communicate that to our suppliers upstream, and get their commitment to supply to those levels. We've really improved the accuracy of those models and it forms binding commitment with our suppliers. They have to supply to that level and it has improved our inventory positions because of the accuracy. Then, short term demand forecast so that we can respond to things like COVID-19 outbreaks regionally.
We have a model that monitors patient dispense regionally and responds to that patient demand. We can move inventory to where the outbreaks are happening. It doesn't have to be COVID. It can be seasonal allergies, like pollen levels all of a sudden spiked over here in Georgia so we moved the right product there as needed. That's more of a short term forecast model. Supply disruption is a really big one. We've got an ensemble model running there, monitoring purchase order, fulfillment patterns upstream from suppliers and then they monitor ‘like products’. There are products that correlate and move together in a disruption pattern. This guy gets disrupted from this guy, and then so you monitor that.
So this part of the supply chain gets disrupted in some form or fashion, then this is how it impacts supplies downstream. Is that right?
Yeah. Kind of. That's something where we're branching out now to start looking at externalities like shipping disruptions or port disruptions, or manufacturing in India, API ingredient manufacturing. We're starting to really partner. We have a global security operations center and they monitor several different locals stateside, but also abroad, looking for different weather disruptions, social unrest, and security things. We are starting to look and see how we can translate that into supply chain disruptions. How to connect the dots from geography to manufacturers to products, so we can weave that fabric of supply chain understanding and have greater lead time in anticipating disruptions for certain products.
That's something that we're starting to look at now. Then transportation, just having transparency of movement of goods. Are there route optimization opportunities? Are there ways we can just ultimately save money for patients by being more efficient as to how we move goods?
We're playing with the concept of a distribution network: digital twins. We are simulating how our network will respond to different external factors, demand fluctuations, supply fluctuations. If we try out different configurations of customers dedicated to certain distribution centers and products—playing it out without changing things in the physical world. Can we do that digitally and arrive at an accurate representation of what that could look like? That's something we're working towards.
We've got a version of that on the inventory side, so we're working to improve and expand on that. The latest cool thing that we've done is we're just getting started on an initiative to take sensor spatial and video data in distribution centers and convert that into structured data and insights, to be able to catch things related to risk, security and inefficiencies. That's a pilot that's spinning up right now that we're pretty excited about.
That's great. Wow. I learned a lot in this episode and we started out talking about just your career and how you became a data science leader. We talked about portfolio management. We talked about model risk management, and finally, talking through some use cases with regards to supply chain, how data science is being able to improve the flow of goods and making sure that inventories are always kept up to date in order to satisfy patients and have a great customer experience. I thoroughly enjoyed it, Katya, I learned a lot. Thank you so much for being on the data science leaders podcast. Is there a way in which people can get in contact with you if they have any follow on questions?
Yeah, I'm on LinkedIn, just look up Katya Hall at McKesson and I'll be happy to chat with you and answer any questions you might have.
Fantastic. Well, thank you so much for being on the Data Science Leaders podcast, Katya. I appreciate it.
Thanks so much, Dave.
Listen how you wantUse another app? Just search for Data Science Leaders to subscribe.
About the show
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.