Data Science Leaders | Episode 07 | 29:03 | June 15, 2021
Derrick Higgins, Head of Enterprise Data Science & AI
Blue Cross and Blue Shield of Illinois, Montana, New Mexico, Oklahoma & Texas
Data scientists in the healthcare industry face some especially tough challenges. Not only do they have to contend with complex regulatory landscapes impacting the data they can work with, but they’re also constrained by some less than modern processes.
75% of medical communication is still delivered by fax. And that’s just one example.
Derrick Higgins, Head of Enterprise Data Science & AI at Blue Cross and Blue Shield of Illinois, Montana, New Mexico, Oklahoma, and Texas, talks to us about the processes his team has put in place to overcome these unique challenges.
In this episode, we cover:
Welcome to the Data Science Leaders podcast. I’m your host, Dave Cole. Our guest today is Derrick Higgins. He is the Senior Director of Data Science at Blue Cross and Blue Shield—and let me make sure I get this right—of Illinois, Montana, New Mexico, Oklahoma, and Texas. Previous to Blue Cross Blue Shield, Derrick was also the data science manager at American Family Insurance. He also happens to have received his PhD in linguistics from the University of Chicago, which we might touch on in today’s podcast. So welcome! How are you doing Derrick?
Doing well. Thanks for having me.
Great. So one of the topics that I wanted to cover in today’s episode is working in a regulated industry, so specifically in healthcare and what sort of challenges that poses to you as a data science leader. We’re also going to talk a little bit about collaboration and your approach to building out teams as a DSL. And if we have time, we’ll also talk a little bit about how IT helps to make a lot of this happen for you and sort of the partnership with IT, and maybe some tips and tricks for our audience. But let’s start off with the meaty topic, which is working in the healthcare industry. So maybe you can start off with just telling me, what are some of the challenges and safeguards that are in place that you have to be aware of as a data scientist?
As you say, healthcare is a very regulated industry and that makes everything just a little bit more difficult. At a high level, things like HIPAA govern our use of data and that presents some challenges, but really not challenges that are different in kind from the challenges that other companies face that also want to have strict privacy standards around the data of their customers or their members, in our case. What I think is really interesting and different and challenging for us, is just how different the regulation is across different insurance products or different sub-areas in which we work. So we have different types of accounts that could be government accounts. We have Medicaid members that are insured or Medicare Advantage members that are insured, people coming in from retail exchanges to Obamacare, people who have insurance through a group account, and depending on the contracts in place or the government regulations that govern each of those categories of insurance, we may have different opportunities to use that data.
So specifically, different clients that we have may have different restrictions on how we can use their data, depending on the financial obligation that they have, whether they’re administrative services only, which means that they are on the hook for member spending or whether they’re fully insured, which means that we as an insurer cover the healthcare spending of their member. And then on another axis, there’s the different states. So it is a regulated industry, but it is regulated at the state level as well as the federal level, which means that as we deal with data and think about the capabilities we can develop, we have to think about each of those different states and how they govern insurance within their borders.
So one thing that came up really recently is we’re trying to support vaccination efforts. So we’re part of a Blue Cross Blue Shield Association effort to support equity in the distribution of COVID vaccines. And so the first thing we really need to know is which of our members already have been vaccinated so that we can reach out to the right people. We don’t want to reach out to people who’ve already gotten the vaccine or people who are at low risk. So the best way of knowing that is for us to go to state immunization registries.
Because when you get a vaccine, whether it’s for COVID or for HPV or something else, there’s a state-level registry, but different states have different rules around who can access that data. So in Oklahoma, we can pull that data and we can see which of our members have been vaccinated and not reach out to them. In Illinois though, only doctors and hospitals and clinics can get access to that data. So we can’t actually pull it. We only know that somebody’s been vaccinated if they submit a claim for reimbursement for a vaccination event, which doesn’t tend to happen that often because the government is paying for vaccines.
So if you’re a data scientist on your team, Derrick, are you just aware of all these rules and regulations? I mean, is it sort of upstream to the data scientists as to what data they can access? How does this actually get imposed from a day-to-day standpoint?
We have a certain level of corporate training that informs people about insurance regulations and about best practices for using data and security constraints, sharing minimum necessary, and so on. But then the specific constraints that are relevant to a project may be very specific to that project and not generalized beyond it. So we have an agile model under which we engage with business stakeholders. Data scientists are involved in this business-facing conversation, and that’s where the constraints and the needs of the business get married up with the art of the possible and what data science can actually do.
Right. Are there any specific roles or partners that you can work with, who can get brought in if you have any questions? Or do you sort of rely on the business for that?
I wish I can say we had a great solution to that. I mean, we’re building up a lot. We’re trying to be more systematic about cataloging tribal knowledge about data and about constraints around data. But there’s not a single person you can go to to ask all of these questions. They’re the data producers in the data and analytics solutions team for us who maybe know where the data came from and can talk about provenance. There are the application developers who can say what source system it came from and how it was entered, what the kind of user experience was in the entry. And then there’s the legal department, which can answer very targeted questions about specific use cases but aren’t often the best partners in terms of solutioning and thinking about what is actually possible.
Yeah, the team of “No” basically is the legal department, yes. So you mentioned, I mean, topical in today’s times is equity in the COVID vaccine. And the fact that some states have a registry, some states do not. I’m wondering do you ever work with business counterparts to exert pressure, especially in these times when getting access to that registry can be really life or death. And maybe I’m overstating it a little bit, but it certainly would be a lot more seamless if more states were, I guess, like Oklahoma there, and they actually had a registry and we knew who was being vaccinated and who was not. I imagine the machines of government move slowly, but I can imagine in crises, you would want them to speed up. Is there any talk of that?
There’s definitely corporate advocacy that goes on around issues like this around data transparency and supporting what we need to do to support our members effectively. Recent things that have been going on around price transparency are areas where insurers and payers are very invested and want to have a seat at the table. But you’re exactly right. When we’re talking about a pandemic where we need to have things change quickly, if it’s to really affect us in a way that’s going to be relevant to our members, that’s often not effective. We’re not able to work through our channels and influence legislators to get data access changed.
That’s a challenge that hopefully we’ll solve at some point when COVID-19 is behind us. So I’m sure there’ll be lots and lots of lessons learned and hopefully we’ll improve processes. One interesting statistic that I was just in preparation for this podcast episode, 30% of all the world’s data is actually healthcare data. But then another interesting fact was 75% of communication of medical information—I assume that means medical records and things like that—is still being done by fax. So do you ever run into challenges in sort of working within an industry that might not be on the cutting edge? No offense to your employer.
No, exactly. We face that challenge every day. Exactly the challenge that you mentioned, that we are not always getting digital transmission of the data that we need to make decisions. So we’d love to be more responsive to our members when they submit a claim or they submit a prior authorization request, just to use artificial intelligence to auto-approve that, so they don’t have to wait.
If we can parse the supporting information that’s sent along to find the evidence that would allow us to approve it, that would be great. But often it’s not just an electronic health record. There are new standards around digital transmission of electronic health records, FHIR and HL7 and CCDA, and so on. It could be just a PDF or it could be a TIFF image that’s attached to a request. And so we have to have these pipelines that support multiple different legacy platforms as well as the future state.
Right, right. That’s very interesting. It’s a whole podcast in and of itself to dive deep into the types of healthcare use cases. If you can talk at a high level just about some of those challenges… If you could modernize the healthcare industry, where would you start? Obviously my guess is the focus is somewhere around data, but what are some areas where you’d like to see improvements?
If I could modernize the healthcare industry, I mean, I think the move to digital platforms and interoperability and FHIR and so on. That’s great, but I’m going to stand up for the providers here a little bit who I think a lot of the burden of data capture has fallen on historically to actually get that electronic health record that constitutes sort of a patient’s passport or record that can move between multiple clinics and also we can use as payers to make decisions. Providers are in a really tough spot right now. Clinicians complain, they spend 10 minutes with a patient and then they have to spend 20 minutes afterwards recording what happened just because the front end systems for providers are so cumbersome right now.
So if we could think about how to bring AI to bear, to support better, intelligent, efficient data capture for providers after encounters with patients, I think that would help everybody because we’d get more electronic data potentially. And I also feel like the data would be of higher quality because right now, when physicians are under such time constraints and have really poor affordances in their front end systems for entering patient data, sometimes they’re just going to click on the wrong thing, or they’re just not going to take the time that is needed to record things that are not relevant to billing or not relevant to what the claims are going to be submitted for that patient.
My guess is when physicians went to medical school, data entry was not a course that they spent a lot of time on. But maybe, I don’t know, there’s certainly the AI to solve it, but maybe there’s more practical ways to solve it in the mean time. But I agree with you, if you clean up the data at the point of when it gets entered in, it just makes everyone’s life easier downstream. That’s very interesting and I think a lot of those challenges are not just unique to the healthcare industry. I mean, there’s data quality problems all over the place, but in terms of your team actually doing the data science itself. So switching gears a little bit to talk about how your team collaborates. Certainly in the world of COVID, everyone’s in a distributed mode, but is your team generally located where you are in the Chicago area or is it strewn all over the place?
We’re clustered around two locations. You mentioned we’re in five states, but most of our staff is in either Chicago or Richardson, Texas, which is kind of near Dallas. So most of the data science team is in Chicago or Chicagoland. There are some folks in Richardson, then we have a couple of people who are full-time remote in other locations.
What are some of the tips and tricks for our audience here in terms of doing collaboration? What are some of the challenges that you’ve seen or some approaches that you’ve seen that work well when it comes to collaborating?
We use our own kind of version of agile. So we have people organized into scrum teams. We typically have three to five data scientists working with a product manager and working with one or more data engineers. And that seems to be a pretty manageable group for focusing on a specific domain. And then within the teams, the technical groups will always be pairing up on a specific project. I’m a big believer in code review and it’s something we use pretty extensively. Not a lot of data science teams do that, but once we move past the stage of exploratory work, which is just in notebook, then we have code that’s moving toward a model that might be in production of some sort. All of the code that people write gets code reviewed in GitHub. So one advantage of that I think is that everybody has a little bit of insight into other people’s projects and things are more easily transferable.
You don’t end up with the person who can’t go on vacation because if they do, that piece of code is at risk or whatever. I think the other advantage that I see is it just insulates us from dumb mistakes. If there’s some kind of bug that makes it through into production, it’s not just the fault of the person who wrote the code, it’s the fault of the person who reviewed the code as well. There are multiple eyes on there. Nobody writes perfect code and that shouldn’t be the expectation, but we need to have a process in place that limits our risks. One funny story, earlier in my career, I’ve been in a few different places so I can tell the story without people knowing necessarily where it came from. I came onto a project, transitioned onto it after somebody had been working on it for some time.
And it was a pretty big, complicated project involving some NLP and there was some specific piece of this code that involved just generating a random number between one and 10, something like that. And it’s funny, the way it was implemented, it used a standard library function for generating a random number. I think the number would by default be generated between one and 65,635, something like that. And so if it didn’t happen to fall between one and 10, the code would generate another random number, generate another random number until one was found in the range that was needed. So it was tens of thousands of times more inefficient than it needed to be to just to generate a random number. Just eliminating that issue sped the code up from taking hours to run, to taking minutes to run. That’s the kind of thing that is a learning opportunity for somebody who may not know kind of the best practice for doing something simple like that. And it’s a very simple fix, very simple fix. Code review process is great for learning and produces higher quality code.
This is a great story. I could see myself being somebody who does something like that, hopefully not. But where do you insert that code review process? Do you have it on a regular basis for a project or do you wait until the team feels like they have sort of trained their model and they’re about to move into the production phase. Where do you see it happen?
No, for it to be successful, it has to be a continuous process. So if you give somebody a hundred lines of code to review at a time, they can do a careful and helpful job at that. If you give them a thousand lines or 5,000 lines of code to review, it’s going to be too much. And if there are things that should be changed, it’s very difficult to ask for changes at that point, when architectural decisions are very locked in to the entire code base that’s being submitted for review. So it has to really be a continuous process. And that’s part of the learning as well. People really have to adopt their workflow to do the development in smaller chunks that can be reviewed as opposed to saying, “Well, it’s not ready to look at yet. Once it’s finished and polished, then I’ll show it to you.”
Right. I think you also have to have a culture of learning and being accepting of the critique of others. And that can be a bit jarring if you’re not used to collaborating in that fashion. Then when it comes to the code reviews itself, you mentioned in your example an inefficiency, but also there’s feature selection, there’s picking what model, all that sort of good stuff. I imagine, is sort of anything open game for the code review?
That’s a good question. Yeah, it’s hard to know where to draw the line. Certainly some types of comments in a code review are not helpful if it’s just matters that are more subjective and maybe one person’s preference or somebody else’s. So I don’t know if I have a concise answer, but I’ll say we did a workshop for the team on GitHub and sort of best practices for GitHub last year.
We developed that internally and created our own repository and gave people a sample code review task as part of the workshop so that then we could do a meta review of the code reviews that people provided. Because yeah, some things that are related to code structure and where you’re maybe sharing knowledge about a package that somebody was not aware of are really helpful and improve the quality of people’s code and reduce the need for those things to be reviewed in the future. Other things where it’s just about conforming to a particular convention, you maybe have to know the person a little bit more and understand if that’s an area where they want to attend in writing their code or not.
Good stuff. So you mentioned GitHub, and you were talking a little bit about the data science life cycle as well. Where do you see IT playing a role in helping? Typically you see IT play a role, it could be on the getting access to data side, it also could be on the production side, but where do you see it working best?
Ideally, IT is an enabler. They put teams in a position where they can succeed, where they have everything they need to be effective in their roles. And there’s discussion about exactly where that border is between roles, but I think that’s my framework for thinking about IT. They should be enablers and that’s enabling on the front end. So as you say, providing the right access to data, although in our organization, it’s a little bit differently set up and the data team’s not within IT. It can also be about access to tools. Often IT owns the budget for a tooling that has to be provided. It should be a collaboration to work with IT around data science tooling and determining what’s what’s needed. It doesn’t work well when there’s an enterprise architecture that is sort of forced on data science teams and there isn’t a more productive relationship around that.
I mean, IT has specific legitimate concerns around security and around integration with the rest of the tool chain, but then data science and data science teams have to have an opportunity to weigh in about their needs as well. And then IT are good partners as we think about putting work into production. So there is some point at which the data science capability turns into an enterprise capability. And it could be some scheduled process that runs and populates the table. It could be an API that’s created. And it’s important for us to be able to work collaboratively with IT as partners in thinking about how to integrate with their systems. That involves both the technical aspect and the workflow. What is their process? What is their delivery schedule? How do we engage with them given that probably we work a little bit differently from them.
Right. And you mentioned code reviews…do you see the IT team, if you know that a model is eventually going to make it into production, do they also get involved in some of the code reviews if you’re building out an API they want to be in production? Or do they actually take the code and maybe write it themselves?
No, I think that’s not a good model. When models get re implemented that’s, in my mind, an anti-pattern because once that model then exists on the IT side, it’s no longer something that can be further developed, fixed, enhanced on the data science side. We want to live in much more, ideally, a continuous integration world, where as the code evolves and the model develops, we can continue to push out improvements and get those incorporated into production. And rewriting a code entirely breaks that model. Just throw it over the wall.
I’ve seen that at many places as well. It really slows down, obviously, getting your models into production, but also to your point, you’re now worried that is the model that your team built actually the model that’s in production or not? There are those slight differences that can really impact things from a business standpoint. Then once your models in production, where do you see the ownership lying to actually model the performance of the model itself?
IT definitely needs to be involved there. Typically data scientists don’t sign up for PagerDuty. If it’s in the middle of the night, then somebody needs to be that line of support, and that’s IT. There needs to be some first-line governance by the IT team, but at a certain point, if things go really off the rails, data scientists need to understand that they still own those models, or the data science team owns those models and as the ones who truly understand how they work and should be expected to address serious problems—like model internal problems—if they should arise.
Well, there’s two aspects of it, right? There’s is the model sort of working as designed? Is it outputting inferences? But then there’s actually the accuracy of the model. Right? And I imagine that, is IT playing a role there in monitoring the accuracy or any other sort of metric that you may be using?
No, in terms of drift and models becoming stale and so on, no, that’s much more of the responsibility of the data science team to monitor that. I mean, again, if IT can provide tools and platforms where those metrics are easily visible and actionable by data scientists, that’s great. But we need to engage with business teams in a way that anticipates these issues, that this model is not going to be something that you can just use forever, and we’re going to need to stay in communication about the future of this capability.
Great. Totally agree. Switching gears yet again, your background is, you have a PhD in linguistics, your background is in NLP. When you look at your team, what is the background of your team? Do you have each member sort of being an expert in various sort of branch of data science, or do you find some people have, is it depth or is it breadth of sort of skills and expertise, how would you define your team? And then how would you define what your goal is for your team in terms of that breadth and depth?
I think for us, it is very important that we have a core level of competence in fundamental data science knowledge. So people have to be good programmers. And we tend to be a Python shop, so they have to have at least some level of fluency in Python. They have to have very solid understanding of statistical learning theory. So a good instincts in terms of how to create data sets, how to monitor performance, what sorts of pitfalls might arise that would cause performance statistics to be misleading, ideas of out of sample, out of time evaluation and so on, regularization and how to avoid over fitting these types of things. So if people are solid in sort of basic data science engineering work, and then also in the statistical aspect of machine learning, there’s just a lot they can do. It opens the doors for a lot of things.
And then individuals will either come in with specialized skills or have areas in which they choose to develop over their careers. So we have folks who are specialized in computer vision and natural language processing. Other people may decide they want to work on time series analysis or reinforcement learning. It’s never really been a problem I think, to maintain the kind of diversity and skills that we need for the range of projects that we’re involved in. I guess if you have a big enough team and we have somewhere around 40-50 data scientists, then that sort of has come up organically, given that we do continuously underscore to our team the importance of continuing to develop. The fact that the skills you have today as a data scientist are not necessarily going to be the skills you need tomorrow as a data scientist.
Absolutely. The one constant in data science is change, for sure. So for those in the audience who are data scientists, and maybe they’re looking at moving into the healthcare space, if you look back on your career and could go back in a time machine as you’ve moved into healthcare, what one thing do you think would be beneficial or helpful for you to have known as you moved into healthcare?
Well, it’s a little interesting that you asked me for advice on sort of career trajectory, because my career is sort of a random walk between different domains. I started out in education, and ed tech, and educational NLP, and consulting in different domains and then kind of Insure Tech, and now healthcare. So that’s not necessarily the path everybody will want to follow to get to healthcare and data science. But in terms of life lessons and things I’ve learned through working in data science, I would say one of the pitfalls for me early in my career is that I really enjoy technical problems. I like to solve technical problems.
If you tell me in a meeting about some potential problem, my first instinct is to go back to my computer and start writing code. And often that’s the wrong approach. That’s exactly the wrong thing to do to try to solve the solution too early. If you spend a little time, talk with people, look at approaches that other people have taken to the same problem or think about whether the problem I have in my head that’s really cool, really aligns with the business needs, that time spent at the keyboard is much more efficient and much more on point. So taking a step back, sleeping on it before you go too deep is the advice I would give.
That’s good advice. I think that doesn’t just apply to the world of data science. I think just in general, it’s good advice for all of us as we look to solve problems in the real world. But Derrick, it’s been a pleasure chatting with you. Is there any way to get ahold of you if people are interested in following up to ask more detailed questions? LinkedIn, I imagine. Any other ways?
I would suggest LinkedIn. Yeah, I’m kind of a Luddite when it comes to social media.
No worries, so am I. Well, hey, thanks a ton, Derrick. This has been a blast and hopefully we’ll have you again one day as you get more years under your belt, and we can talk a bit more about the world of healthcare and hopefully we’ll solve some problems in the meantime. So thanks a ton, it’s been a blast.
I appreciate you having me, Dave.
38:29 | Episode 14 | August 03, 2021
26:54 | Episode 13 | July 27, 2021
42:29 | Episode 12 | July 20, 2021
34:22 | Episode 11 | July 13, 2021
Use another app? Just search for Data Science Leaders to subscribe.
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.