
Data Challenges and the Promising Role of Product Analytics in Healthcare
Summary
Transcript
In a perfect world, healthcare data would always be strategically organized, up-to-date, and easily accessible—all in a patient-centered, privacy-first way. But the reality is much more complex.
Robin Foreman, Director of Data Science at CVS Health, joins the show to discuss the challenging world of data science in clinical trials. She also explains how product analytics can be used on the back end of model implementation to answer the key question of “did it work?”
Robin shared her perspective on:
- Turning a PhD in public health into a career as a data science leader
- Navigating data science and clinical trials
- The life cycle of product analytics
DAVE COLE
Welcome to another episode of the Data Science Leaders podcast. I’m your host, Dave Cole. Today our guest is Robin Foreman. Robin is the Director of Data Science at CVS Health. Prior to that, she received her PhD in public health from the University of South Florida. Robin, welcome to the Data Science Leaders podcast!
ROBIN FOREMAN
Thank you for having me! Just as a quick note, all of my opinions are my own and not necessarily representative of my company, but I'm excited to share them with you.
DAVE COLE
And that's why you're here: I want to hear what you have to say, not what your company has to say. That's going to be probably where we start, because there are a few topics I want to dive into today. Your path from public health to a data science leader is kind of a unique one and an interesting one. I want to dive in and hear a bit about that because I think what's interesting about our various data science leaders is how they become data science leaders. You may not always start with a PhD in statistics. You do have a PhD, but not in statistics, but we'll talk about how that played a role in what you're doing today.
The other thing that's interesting is that you have expertise around data science and clinical trials. We'll dive into a little bit of that. I'm going to get educated on what that means for your world.
Last but not least, we'll talk a bit about product analytics. Before we dive into all of those topics, CVS is a big place. Can you at least talk to us about what CVS Health is, specifically?
ROBIN FOREMAN
Oh, that is a tough question. Many of you are not aware that CVS has their hands in all aspects of the healthcare system. We have the pharmacy, which is probably our most recognizable piece of it. CVS recently bought out Aetna health insurance, so some of you may have had our health insurance; I came from that side. We sell health insurance, we do the pharmacy, pharmacy benefit management and continue to try and expand. CVS really wants to be a healthcare company and not kind of siloed into these traditional areas. Right now that's really the goal of the company.
DAVE COLE
Got it. Let's dive into your journey into becoming a data science leader. You started out of school, you got your PhD in public health. What did you want to be, back in the day when you were getting your PhD? What was your goal? What did you think you'd be doing right now?
ROBIN FOREMAN
I think I started before my PhD, so I'm going to jump back a little bit.
DAVE COLE
Okay. All right, great!
ROBIN FOREMAN
I got my Bachelor's degree in mathematics and psychology, which everybody told me was a strange combo…it probably was. After graduating, I went on to get my Master's degree in forensic psychology. I gave up the math aspect of it and, while doing that, I worked as a mental health counselor. I worked with people with serious mental illness and did that for a few years then decided that direct client care was not what I wanted to do. As part of my Master's they did an internship. While I was an intern, my mentor was like, "Hey Robin, you seem really good at the research aspect. Have you ever thought about going into public health?"
I responded, "I don't even really know what that is," but he introduced me to someone else who said, "It’s very heavy statistics. It would bring you back to math.” It's still trying to improve health and you can focus on people with mental illness, still, but get some of the math back in there.”
DAVE COLE
On a broader scale.
ROBIN FOREMAN
Yeah, so that's why I went to get my PhD. While working on my PhD, my focus was really on comorbidity, which means people with mental illness and medical conditions.
Some people would cross both boundaries. I got specific within that so I focused on people who were involved with the criminal justice system. I looked at interventions that could potentially keep people out of the criminal justice system and get them the right kind of care.
DAVE COLE
When you talk about comorbidity, I assume you're talking specifically about mental care and mental health.
ROBIN FOREMAN
Yeah. A lot of it is mental health care, but there are also people with mental health conditions. Years ago I had a client with a pretty severe mental health condition. She also had a history of heart problems in her family. Because of her mental illness, doctors didn't really ever pay attention to the history of heart problems in her family. She had a heart attack and died when she was 60. It was devastating, but it's nobody's fault, right? It's the way our system functions. We have silos. That's what I was focusing on in my PhD: how do you bring these systems back together to really treat people holistically?
DAVE COLE
Right.
ROBIN FOREMAN
Which doesn't, at all, sound like data science.
DAVE COLE
Well, it can. It depends. I'd imagine that data science can play a role in helping treat patients holistically, and not just being very siloed, right? Obviously in the medical profession people specialize, which is great because they become experts in heart disease and oncology and so on and so forth. You always want to be working with an expert, but I think what you're saying is that some of these things interplay.
ROBIN FOREMAN
They do.
DAVE COLE
They either can be overlooked or they can interrelate. If you don't see the forest from the trees, if you don't take a step back, then you're not able to treat the patient in a great way. Carry on with your journey. Sorry for that little segue.
ROBIN FOREMAN
No, it was great, but yeah you’re right. I was doing my PhD, taking all these statistics courses as my electives. I found them interesting. Of course, this is where data science comes in, taking stats courses as electives.
That's where I realised if I ever want to have a large scale impact, I want to focus on the quantitative side of things. That's where my interest is, what I'm good at. If I really want to fix some of these systematic problems, data science can be that avenue.
DAVE COLE
Right.
ROBIN FOREMAN
Back then we didn't call it data science. I think it was still referred to as analytics but, you know, it’s the same thing.
DAVE COLE
Great. You decided that your love of statistics was really the way for you to impact people; many more people than working one-on-one. You came out with this PhD in public health and then how did you then get into data science? Did you just start as a data scientist?
ROBIN FOREMAN
Yeah, I got a job working as a data analyst and then moved into a role that, I think, was called a research analyst. I did that for a couple of years and then applied to my current company as senior data scientist, which is what it was at the time. It was an interesting time, when titles were changing. It used to be analysts or informatics or things like that. That was the point where data science was really coming out as a job title. The role sounded interesting. It was similar to my background so I jumped on it and I've been in data science ever since.
DAVE COLE
That’s great. I see this all the time. Data science and data scientists have become quite the job titles, quite the buzzwords. There are a lot of analysts out there who might have more data visualization expertise or a BI (business intelligence) sort of expertise. They're wanting to move into data science. Some of them are going back and getting degrees, or taking courses in statistics and data science, to become data scientists. Others are just sort of rebranding.
You happen to have, I imagine with those electives, a strong foundation in statistics, but is there anything else that you did? Or did you think, "Hey, this is right up my alley. It's easy for me to move from data analyst and change my title to be data scientist because that's basically what I am."
ROBIN FOREMAN
That's an interesting question. I was trained a little bit more classically, so you would think of traditional statistics. Data scientists nowadays are not trained classically. You still get people with a statistics background, but even that's a little different. It's more a focus on predictive modeling. When I was trained, the focus was more on inferential models. The difference between them in practice: now they call logistic regression a predictive model, which it is, but it used to be an inferential. So, there's some overlap there, but some of the more machine learning techniques weren't as popular back then. For me, I learned those on the job whereas now there's this expectation to come into data science already pre-trained in all of those. I think it's forever changing. Even when I talk to other data scientists, there are things they do that are completely out of my wheelhouse.
DAVE COLE
Before we go there...when you say, "I was more classically trained," I assume you mean classically trained as a statistician?
ROBIN FOREMAN
Yeah. I was doing means by hand. You had to calculate chi-square by hand, and learn how that actually worked, and do your first linear regression by hand.
DAVE COLE
Yeah. I don't think it's a slight against somebody who's trained as a data scientist, but what I've seen when I talk to guests is that when they talk about sort of the difference between that classical statistician and more of a data scientist, the data science is a little bit more focused on the end result, “I don't care quite how we get..." There's a little bit of ‘the ends justify the means. I don't care what algorithm is the right one, as long as the accuracy is as high as possible, as long as we don't over-fit’ and that kind of thing. A statistician really takes a lot of care in figuring out what features and which model is being used, really understanding the nature of the data set that they're working with. Is that a fair summarization? I'm not sure if I'm onto something there.
ROBIN FOREMAN
Yeah, I do think that's fair. I think it depends on the organization that you're working for and on which of those skill sets is most important. Even within my company, there are certain areas where accuracy is the most important thing. There are other areas where you're building models and people are going to ask you, "What's in it? What features are in it? What is popping up as being important?"
I don't think that one way is better than the other. Honestly, I think a combination of those skills makes you the most effective data scientist.
DAVE COLE
Yeah. I can see both backgrounds being very complimentary in a lot of ways, in working together. I think one piece of hidden advice coming out of this conversation here is just thinking about that mix of classically trained statisticians becoming labeled as data scientists. Folks who just came through the data science ranks, maybe came with more of an engineering background with an interest in statistics. I think that's an interesting combination.
Let's talk a little bit about what you're doing now, though. I do want to talk about clinical trials and controlled experiments and RCTs—I'll let you fill in that acronym. It sounds like they're a big part of what you do. Talk to us a little bit about what it is that CVS Health does.
ROBIN FOREMAN
I'll try to keep it kind of broad. The team that I work on, we really are focused on people who have chronic conditions and how we can help people better manage their chronic conditions so that they stay healthier and stay out of the hospital. It’s a tall order but we have programs in place where we employ nurses who reach out to people and try to help them manage their condition: helping get medication, talking through treatment plans or wherever people need help. Historically a lot of that was based on, “This is who we think we should be talking to,” so it might've just been by diagnosis, “anybody with hypertension: we should talk to them." Over time, as data science has gotten more embedded in some of those clinical processes, we've been able to drive that conversation.
Where we were previously at, "Hey, we're just going to reach out to everyone with hypertension,” they were asking, "Does it work? Does it do anything?" Your response then is kind of like, "Well, it's hard to tell."
We went from all that to saying, "Hey, why don't we build a predictive model based on future risks? We know which of these people are going to end up in the hospital. Build a predictive model to see this risk level. Then instead of reaching out to everyone, do some randomization." What that means, and this is the randomized control trial (RCT), is pulling back a small control group from any kind of intervention and just outreaching the majority of the people with a small control group, then seeing from those people that we outreached: did it actually work better than the people that were held back?
In a perfect world, that's how an RCT works. We live in the real world where there's not just one program; there's overlap. There are people that are held back, probably getting some other kind of treatment. It's incremental: on top of whatever else they would have gotten, is it better? So for us, we have that predictive analytics aspect but we also need to be able to say, "Does whatever we're doing work? And is it worthwhile to keep doing it?" That's why we're trying to move towards RCTs, but it's complicated in healthcare.
DAVE COLE
Right. It's complicated because the patient is receiving all sorts of different treatments and not all of that may be representative of the data. The data set might be just a subset of what is actually going on in their world. It's hard, right, to work with an incomplete data set as a data scientist, even when you know it's incomplete? I imagine that's one of the big challenges. Are there any tips or tricks that you have that you’ve found that work well?
ROBIN FOREMAN
That's why we're trying to do randomization, to say, "We know we don't have everything but if we randomize we should be missing the same information for both groups."
DAVE COLE
Right.
ROBIN FOREMAN
Does it always work? No. When you get into the nitty gritty of randomization with people, it gets really tough.
If you're thinking about a marketing campaign, let's say I'm on the internet, I get an ad for a pair of pants. I'm like, "Oh, I like those pants." I click on it, right? From that perspective, it’s really easy to say, "Oh, we randomly chose Robin. We served the pants ad to her, she clicked on it and bought the pants." You can really easily follow this through. When you talk about health care, like when we talk about our programs, it's not like a simple click-through-rate.
DAVE COLE
Right.
ROBIN FOREMAN
We have people involved here. We actually have to call up the members or the patient and then get them to agree. To track that, you need a system. That's added complexity. It's not all in a single website where you can track and see the transactional data. We don't have that because there are people involved in different steps and you have to track it. It gets really complicated and there are a lot of places where the randomization can fail. It's nobody's fault but you have to be really precise about where in the process you put the randomization into place.
DAVE COLE
What do you mean the randomization can fail?
ROBIN FOREMAN
Like I said, we have a lot of programs. Let's say I have program A. That's what I'm focused on and I want to know if program A works. It's where we're going to do the randomization. Well we also have program B. Some of the people that we would outreach for program A actually also qualify for program B. We have hierarchies, so program B actually gets the hierarchy. What that means is: if people qualify for B, they get B over A. We would take them out of A and send them to B.
DAVE COLE
Right.
ROBIN FOREMAN
But if you just take everybody who's qualified for A and you randomize them at that point, they'll be balanced but some of those people are going to fall out to B.
DAVE COLE
Right.
ROBIN FOREMAN
So it's actually better to wait, take A, take out the people that are going to go to B, then take that smaller sample and randomize at that point. Otherwise you're introducing this other bias where you make it unbalanced.
DAVE COLE
Yeah. Once you start removing everybody who's participating in B, even if they're participating in B for a specific reason, you now have a bias. Your attempt at this random group of ‘A folks’ has now been sort of tainted and it's no longer random. That is certainly a big challenge. I imagine, from a complexity standpoint on your end, there's not much you can do to say, "Hey, please don't let B trump A." There's a business decision downstream that you have no control over.
So yeah, I can imagine that that brings a whole host of challenges. Any tips or tricks on how to approach challenges like that?
ROBIN FOREMAN
What I've learned so far, and honestly this is something we're constantly working on because there's always something new, is...if I am all about program A, I need to understand the landscape. I can't just focus on A. I need to understand programs B, C, and D. I need to know what's happening there.
DAVE COLE
Right.
ROBIN FOREMAN
I think sometimes that's hard. You have people who are embedded in the business and they do understand everything, but you're just there to support them on A and so they may not realize why you need to understand at all. They probably don't even think about it, right? As data scientists we really need to make sure that we're speaking up, "Hey, I need to also understand if there's anything else going on," and explain why. I find our business partners are like, "Oh yeah, of course we'll tell you that. We want it all to work, right?” If I didn't say anything, if I didn't know that was potentially an issue, they wouldn't know it's potentially an issue. Six months into this realization you're like, "Oh, no."
DAVE COLE
Right, so I think the nugget there is to really understand as best you can, the holistic experience that your members are undergoing. In an ideal world, you'd have this amazing system that understood all the various touch points of that member, all the programs they were participating in and you’re able to work with that downstream to better create your RCTs. There's a strong sort of data management challenge there, in getting your data all prepped. In the non-healthcare world you talk about that customer 360, right? You may talk about that in the retail world, the marketing world. What I'm hearing is it's really no different in the world of healthcare.
ROBIN FOREMAN
Yeah, 100%. We use this holistic view as much as we can. We want that 360 view. The problem becomes, if I am a retail site, I have one site where a person goes to, so I can track that. In healthcare, there are different tools being used by different teams, different things are rolled out in different ways. It gets really complicated. I don't know why that is. Well, healthcare in general is like that.
DAVE COLE
It absolutely is. I've talked to a number of folks. It is healthcare in general. The systems there, I think we've all experienced it. Every time we go to a doctor we fill out the form again. You just filled this out last time you were here.
ROBIN FOREMAN
"I thought I told you this already."
DAVE COLE
"We just want to see if anything's changed." It's like, "Oh God." It is really crazy. That's the case. But I think there's a lot of reluctance to share information because this is health information. This is information about me that is very private. I think there's this bend, and rightfully so, towards being as conservative as possible in terms of making sure that the data is collected in an ethical way and in a privacy-conscious way.
ROBIN FOREMAN
Yeah, 100%. But I mean, and then you have a lot of different players in the market, right? You have health insurance companies, which is where a lot of my data comes from. You have the actual doctors’ offices that have their own data. You have hospitals that have theirs. It’s complex. It's not me going into a store and buying a pair of pants. I’m going to a single store—
DAVE COLE
—where all the data is being collected by that one retailer and that's that. One retailer might say, "Oh yes, they bought this pair of pants here, but they also bought the top at a competing store." I'd love to know about that. If you're trying to truly understand what Dave Cole likes to wear, and understand my entire wardrobe, you'd have to collect data from a bunch of different places. I think the difference there is they're competing against each other whereas in this world you'd like for them to not to compete. Why couldn't they just share the information, because it's all about your and my health? That should be the most important thing but I know there are certainly data and infrastructure challenges that you have to work through, that I think are relevant to all of us out there, right? When you think about building out a data science team, making sure that data quality and that data infrastructure and understanding your customer as best you can is really a prerequisite most of the time.
ROBIN FOREMAN
Yeah and it's probably something some of the newer data scientists don't want to hear. We spend a lot of time on our data. I've been working with the same data for five years; I know it really well and there are constantly new things I'm learning. We spend a lot of time on data quality, making sure that it's being pulled at the same consistency. Maybe it’s not the most attractive part of the job for a lot of people, but it's the most important piece of it.
DAVE COLE
There's not a data scientist out there who isn’t nodding their head right now. They all do it. They all know it's a big part of their job. I think there are a lot of companies who are trying to make that less like what a data scientist does. It's that experimentation, bringing disparate data sets together to see if there are things that pop; that's just the nature of the job. It's hard to create that perfect system. I talked about the prereqs. I'm sort of going back on what I just said. It's sort of hard to create that perfect system as a data scientist, because you don't know what they're going to want to use in their next model, right? Even they don't.
ROBIN FOREMAN
Right. Yeah.
DAVE COLE
Yeah. Well, great. Let's switch gears a little bit. I know we want to talk a little bit about product analytics. What is product analytics? Let's just start there.
ROBIN FOREMAN
I like this term. For years, data science has been focused on predictive models, which is great, but what happens after the model gets put into place? What it means to me is that you have some product or program that you're trying to sell, promote or whatever it might be. You build a predictive model to figure out how to market it to someone, how to target someone, timing etc. Which is where the field has been for years. We need to build predictive models and you can do it across disciplines for whatever reason, but we're starting to move just beyond that, and this is what I love. The other half of that is, "Well okay, we have this prediction model. We set this as the person we want to reach out to. Did it work? Did it do anything?"
DAVE COLE
Right.
ROBIN FOREMAN
We just talked about RCTs. In the best case you can do something really simple, like a T-test afterwards if your randomization works. In healthcare, it doesn't work a lot. You can't just do a T-test to say whether or not things went well and our outreach actually worked. This is where you bring in causal inference, which is my absolute joy and where my team focuses a lot.
So, yeah, we have the predictive model, but did it actually work? I think data science is starting to move this direction. I know Uber's doing some cool stuff with causal inference, and some of the other big tech companies as well. To me, product analytics is this life cycle of predicting and then coming in on the backend and say, did it work?
DAVE COLE
Right. So predicting, and then testing on the backend to make sure it worked. I would imagine, also in healthcare, getting the ground truth is what you're talking about in terms of being very difficult. It's not always like, "Hey, I sent them an ad and then I'm going to wait a month to see if they bought the pants." It could take months before that treatment is actually expected to take effect. I imagine it's not always immediate. Also, getting the data set back to know whether or not it worked, just the practicality; working with disparate systems also can be a challenge. Am I right on that?
ROBIN FOREMAN
You're right on that. We're looking at behavior change, right, and it's hard to measure. Do you know that somebody did it 100% because of what you did, or was there potentially something in there that confounded it? Super interesting, but it's complicated.
DAVE COLE
Right.
ROBIN FOREMAN
On the backend for us, sometimes it's about someone doing something. Sometimes it's about avoiding something. We don't want people to end up in the hospital. If you end up in the hospital it means you're really sick, right? Nobody wants that, so a lot of what we do is about keeping people healthier now so they don't end up in the hospital. It's down the road avoiding an event, so you have to wait, right? You don't know if what you're doing now is going to avoid something three months from now.
DAVE COLE
Right. So if the program is all about eating better: more vegetables, something I need to do, and that's to reduce your risk of heart disease. I might eat my vegetables and you might not, but neither of us may go to the hospital. You don't really know. You could survey folks I suppose and say, "Hey, are you actually following the program? How closely are you following the program?" Not everyone answers surveys, right? There's a lot of messiness in your world, it sounds like.
ROBIN FOREMAN
Oh yeah. You're talking biologically; all of us know somebody who's the healthiest person we know. They get sick and you're like, "Whoa. They exercise, they eat right. They're not over here eating pizza for lunch like me.”
You're like, "Well, how did that happen?" Because we're talking about averages, and that person probably had some risks that we didn't have any information on. It's extremely messy. There are some really cool methods out there to deal with it.
DAVE COLE
What are some of those methods that you're excited about?
ROBIN FOREMAN
One of the people on my team is so good at this, he's awesome: moving into heterogeneous treatment effects. Not everybody's the same, we're not all homogeneous. Within these populations and within these pockets, there will be people who respond better to interventions than others. Using some of these methods we try to pinpoint who those pockets are, who these people are that will respond the best. There are different methods for it. One of them is a causal forest. It's kind of like a random forest, but not exactly the same. It brings together the prediction piece with the causal inference. It came out of Stanford, I believe, so they're doing a lot of really cool stuff in their econ area over there.
DAVE COLE
Cool. Well, that's beyond my pay grade, but that sounds interesting. It certainly makes sense that you would want to go after and focus more on the folks who are actually going to eat their vegetables than not. I'll just put it out there.
ROBIN FOREMAN
Eat the vegetables and get the benefit from them, right?
DAVE COLE
And get the benefit. Yeah, there you go. Right, right.
ROBIN FOREMAN
Yeah. So, that's what it is. Maybe we have people that can eat whatever they want and it doesn't matter, so we don't need to worry about them. But there is this group of people that, if they eat vegetables, we can actually make them healthier. That's who we want to go after.
DAVE COLE
Got it.
ROBIN FOREMAN
Then there's a group of people that eat vegetables and they get worse. Who knows why, but you don't want to push vegetables on them.
DAVE COLE
Right, right. Gosh. Your world is not for the faint of heart Robin but I think you are in the right spot. Clearly you started your career wanting to help people and that passion hasn't gone away. You're just helping folks on a broader scale and using data science to make that all happen. Along the way I learned a few things from you about RCTs, how to go about working in the world of messy data and how to think holistically about your customer or member. All of that makes a lot of sense and there's some really good insight there. Thank you for being on the Data Science Leaders podcast. I've had a blast chatting with you, Robin.
ROBIN FOREMAN
Thank you. It's been fun.
DAVE COLE
Great. If people want to reach out to you to learn more, can they reach out to you via LinkedIn?
ROBIN FOREMAN
Yes, they can.
DAVE COLE
Great. Well, thanks again Robin, and have a great rest of your week!
ROBIN FOREMAN
Thank you. You too!
Popular episodes
What It Takes to Productize Next-Gen AI on a Global Scale
Help Me Help You: Forging Productive Partnerships with Business...
Change Management Strategies for Data & Analytics Transformations
Listen how you want
Use another app? Just search for Data Science Leaders to subscribe.About the show
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.