
Governing Models and Structuring Teams in Highly Regulated Industries
Summary
Transcript
Model governance is vital, especially in heavily regulated industries like insurance.
Strong governance can help ensure that key models are reproducible, explainable, and auditable—all important factors for both internal model development workflows and for external regulatory compliance. But the best governance strategy isn’t always obvious.
Anju Gupta, VP Data Science & Analytics at Northwestern Mutual, is a big believer in establishing model governance practices early, and she shares her thoughts on the topic in the episode. Plus, she talks about some surprising roles on her data science team and the unique value that comes from pairing actuaries with data scientists.
We discuss:
- How to establish scalable model governance practices
- The intersection of actuarial work and machine learning
- Roles you didn’t know you needed on your data science team
DAVE COLE
Hello! Welcome to another episode of the Data Science Leaders podcast. I'm your host Dave Cole, and today's guest is Anju Gupta. Anju, how are you doing today?
ANJU GUPTA
I'm doing very well. Thank you.
DAVE COLE
Great. Thanks for being here. So, Anju is the Vice President of Data Science and Analytics at Northwestern Mutual. She also has a PhD in genetics and statistics from The Ohio State University.
Today, we are going to be talking about the importance of model governance in data science. We're also going to be talking about machine learning and actuaries, obviously Northwestern Mutual being an insurance company, there's a lot to talk about there. Last but not least, we're going to talk about the roles on your data science team that you didn't know you needed until now. You have all the roles, and I'm very curious to talk about all those various roles and how they work together.
So, let's start from the top. Insurance is one of those regulated industries, and so naturally the oversight that happens due to being in a regulated industry is just different than other industries. So, governance plays an important role. So, maybe just start at the top. How do you see the importance of governance in your role today at Northwestern Mutual?
ANJU GUPTA
Yeah, and thank you for the THE Ohio State—
DAVE COLE
You bet!
ANJU GUPTA
Huge Ohio State fan.
Governance is huge for us. We have got data governance teams, a steering committee that manages data governance. Then, we have a model governance team, which is also separate from the data science team, but it is a model governance team that has responsibility towards the Enterprise Risk Executive Committee (we call it EREC) that has members of the exec team that helped define how we look at our model governance policies as well.
So, what we do is we have a fairly robust model governance policy, which has been in play for quite some time at Northwestern Mutual, and we define what our key models are within the company. If a model is defined as a key model, then it has to go through certain steps as part of the model governance: defining who the owner of the model is, who the user of the model is. There are different roles within the model governance policy by which we dictate all of our models. So, there's very robust documentation that goes in as part of the model governance as well for any of the key models that we put into production. Each of these models have fairly well-defined model explainability documents as well that we put in place as a model becomes a key model.
So, it goes through several layers of governance and instituting known base of knowledge across a set of governance teams, which looks at each of the models independently and makes a decision whether that model can be a key model or not. If it is a key model, then we go through the steps of figuring out who's the owner, who's the user, who are the folks that are putting guardrails around those models, by which the model gets put into production. So, there's various levers that we pull through once a model gets into production.
DAVE COLE
There's a lot of data out there and I think governance, in general, is a good thing, obviously. But in certain industries like yours, there are actually regulations around it. So, I have some questions about your process and, I’ll get to that in a second, but why all this documentation? What's the purpose behind it?
ANJU GUPTA
It's model auditing, because these are the models that will be making decisions for clients when we give a policy to a client. If a model gets defined as a key model, then the audit trail becomes really critical for us. So if we ever get audited, we have entire documentation end to end for each one of these models.
DAVE COLE
And the audit is going to be along the lines of why did you give this policy to this person versus that person? And having a full audit trail and understanding, well, the model took these characteristics into play and here's how the model works, and you have all that documentation.
ANJU GUPTA
You have, yes.
DAVE COLE
"Here's the data that we were using at the time..."
ANJU GUPTA
And you also have full visibility into who are the model owners, who are the model developers, who are the model users, and also who are the folks who put controls around those models. So, those are fairly well defined roles, and we do that. We do that for each one of the models that go into production and if it is defined as a key model for the company.
DAVE COLE
So that's my next question. You mentioned that a number of times, “defined as a key model.” How do you determine whether or not a model is a key model? What are the criteria?
ANJU GUPTA
We were discussing how we define a model as a key model just two days ago as well. That's written within our policy on what gets defined as a key model. There are several levers that you can pull in. Every company will have their own recipe for defining what a key model is, which could be different than how we define our key models. But for us, if it's making a client decision for some of the products that we are selling, then it becomes a key model. There are other levers as well that you can pull. But again, it gets instituted within the policy that you put together for your own company. So, each company has to figure out what they want to call their key models, which will be different.
DAVE COLE
And then you have also the members of this governance board, I think you mentioned with the EREC team. How do you get on that governance board? Is there a special separate team that is governance focused or is it pulled from various data science teams throughout the company?
ANJU GUPTA
It's not the data science teams that are part of the enterprise risk committee. These are the execs from various business lines, either risk or insurance or CIO will be part of it, CISO will be part of it as well, but it's a cross-functional exec team that gets pulled together to ensure that governance is looked at the highest level of the company.
DAVE COLE
Got it.
ANJU GUPTA
It's very much ingrained into Northwestern Mutual’s DNA, for sure.
DAVE COLE
Awesome. You've been at Northwestern Mutual for just a couple years now, but I don't know if you have any advice for others who are trying to stand up and mature the model governance process they have internally. One thing that always is in the back of my mind is there's that ability to be agile. There's things that obviously you have to do just for the sake of your customers and for the sake of potentially getting audited, but that potentially slows down the ability for you to publish a model and put it into production. So, what advice do you have for those data science leaders out there who are wondering how they can speed up this whole process and so on? What advice do you have for that?
ANJU GUPTA
I would say that if somebody's sitting outside, they would think that governance slows down the process. I wouldn't say that at all. If you are bringing governance early on in the process, versus later, if you're bringing them later in the process, then it's definitely going to slow down. They become the bottleneck. But if you bring them early on in the process, so that they're working with you towards defining your key model, towards putting those governance documents in place. You have to ensure that you have headcount dedicated for this as well, because this is not an easy body of work either.
If you're wanting to define what your policy looks like, there are various articles out there that'll tell you what a policy should look like. One can take some of those best industry practices. Gartner has a number of them as well, but there are various articles out there which tell you what the standard industry practice is. In my mind, it is a defined science. What's not out there today is a platform that lets you track all of this. A lot of industries, a lot of companies right now are essentially working through Word documents and Excel spreadsheets on really understanding what are the models that they have within their environment, within the four walls of their company, and ensuring that they have model governance tied to each one of them.
But there are platforms that are evolving in the industry as well, which we are taking a close look at. Those are the platforms, there's more to come, but those are the platforms that can help you streamline if you don't have a model governance policy and you bring in one of those platforms, which I would highly recommend. It just streamlines the process for you and makes it much easier.
For us, we do have a well-defined policy. We do not have a platform right now. So what we are looking into is bringing some of those platforms so that we can lean in and further accelerate what we are doing. But to be truthful, I wouldn't say it slows down the process. What I would definitely say is if you are a regulated industry, even if you are not, model governance is extremely critical and understanding what your models are doing versus depending on the black boxes of models, which is what it has become quite a bit, is critical.
So make sure you have, I call them unicorns, your best data scientist, who sits on the cusp of both data science and data engineering. Make sure they are sitting and ensuring that all your model explainability documents are well presented as part of your rigorous process already, and not as something on the side that you need to do.
DAVE COLE
There's a general best practice there that—
ANJU GUPTA
Yep.
DAVE COLE
You should be thinking about how your model works and why it's making the predictions that it's making and documenting that as well. There's also the ability to, from an audit perspective, that reproducibility aspect as well, like being able to reproduce the model using ideally the data.
ANJU GUPTA
Huge.
DAVE COLE
Which is hugely important. It's also a very tricky problem to solve. There are tools and platforms, like you mentioned, that are out there that help with that. I think anytime anyone from the engineering side or the data science side hears somebody using a spreadsheet to track things, generally you want to steer clear of that, a lot of good information there.
I'm curious if we can switch topics here and talk a little bit about the world of data science and machine learning and actuaries, the actuarial role. An actuary has been around for many years and basically it's their job to assist in the underwriting process, and if you look at their skills in terms of the statistical background, it's not too dissimilar from a data scientist. So, how do you see that role evolving at Northwestern Mutual?
ANJU GUPTA
At Northwestern Mutual, our data scientists work very closely with actuaries. So, I tend to call them a pair team. That's where we see a lot of success in a pair team, where one of our data scientists or a group of our data scientists are working with a group of actuaries. Data scientists are well versed in dealing with big data, and actuaries are really good at being able to articulate the value proposition of the data that's coming in, or the results that are coming in. One of the things that I think has led to some of the successes recently that we have had with our models is because we have been able to do paired programming with them.
But I've also seen recently there's this Institute and Faculty of Actuaries, which has put a lot of focus around machine learning, education, and computer science education for the actuaries as well. So, it is evolving, and we are seeing more and more actuaries. I would like to see more, but it is creeping up within the actuary department, just the learnings of actuaries within the machine learning and AI space. It hasn't touched deep learning that much yet, but definitely into the machine learning space. So, there's more and more to come. As we start to do the paired programming, we are training both our data scientist and our actuarial scientist as well. I tend to call that a converged science. It's going to happen no matter what.
DAVE COLE
In terms of getting those two roles, what do you mean by converged science? Those two roles getting together and working more collaboratively?
ANJU GUPTA
Yes, and one learning from the other so that it starts to converge, the math and science behind it starts to converge.
DAVE COLE
Got it. Just being selfish, if I'm in your shoes and if you're trying to build out your data science team, do you see actuaries as being a pool to potentially pull from for future data scientists?
ANJU GUPTA
Oh, absolutely, absolutely. When you think about the work that we do within data science for let's say underwriting acceleration. Bringing external data sources, using our internal data sources, and the guardrails by which we make certain decisions within a model, it’s all defined and driven by actuaries within the company. So, they are a close partner of how we build our models and how we put guardrails around those models, because there's this historical knowledge and experience that is just huge, which the data science team will not have.
DAVE COLE
Yeah.
ANJU GUPTA
To be truthful.
DAVE COLE
The life insurance business is a bit on the morbid side from a data science perspective. You're basically predicting somebody might pass and—
ANJU GUPTA
It is.
DAVE COLE
What policy you need to put in place, and it has been around for a long time. But as you know, the data elements that we're able to use have really exploded over the past couple of decades, and I think that's changed and allowed the actuarial side of the house to be more accurate in a lot of ways. I think there's a lot of learnings there.
ANJU GUPTA
So, when you think about what our team is comprised of, it's comprised of data scientists, data engineers, and it's also comprised of medical staff. We work very closely with medical staff. It’s also comprised of actuaries and underwriters as well. So, that brings that convergence of what good looks like, and you wouldn't be able to do it in a silo with just data. Data science just helps accelerate all the thought processes together and being able to articulate it in a fashion that is reproducible and that leads to an outcome.
DAVE COLE
Allows you to operationalize the whole process as well.
ANJU GUPTA
Yep.
DAVE COLE
Well, okay. That's a good segue. You mentioned a few different roles you have on your team. When I was talking to you in preparation for this podcast, I was amazed at how many different roles you have. Maybe we can just start there and list them out, and then I'd like to understand some of the unique ones. Some of them, most people might have heard before, but some were news to me.
ANJU GUPTA
Within my team, we have data scientists, we have data engineers, we have analytical engineers (that's a new role that we have created this past year), and then we have a fairly robust program management team, and then we have data analysts as well. All of these roles work end to end in what we call CRISP-DM. You’ve heard of CRISP-DM, right?
DAVE COLE
I have, but maybe you can describe it briefly for us.
ANJU GUPTA
I don't remember each one of them, but you prepare the data. You understand the data, which is more of an exploratory analysis. You do some initial model evaluation. You do the testing of the model or the A/B testing, and then the model monitoring. And then you keep on going through this cycle over and over again. That's essentially CRISP-DM.
I was trained as a data scientist myself, and this was 15 years ago. So when I was doing all my data analysis and we had quantified that 80% of my job as a data scientist was about manipulating data and 15% for model and A/B testing, and then 5% putting the story line, the communication of what the outcomes look like. 15 years later, and I have had multiple teams, I kept asking my team, what is the convergence now? And what I hear still, it's 80% data manipulation, 20-15%. The stats haven't changed.
One way to think about it a little bit differently is creating this notion of analytical engineers. So, analytical engineers are fairly well tied with my data scientist and data engineers. Data engineers are responsible for building the data pipeline. So, they bring the big data in house, create the unified data platform or whatever you want to call it, ingest all the data into one place, which is now available to a large number of data scientists as well, but then analytical engineers come into play.
Their role essentially is, "What is the problem that we are trying to solve? And what data do we need?” So, they take the data. They are in reality data engineers as well, but have the subject matter expertise of the problem that we are trying to solve. So, they come in and their job essentially is to bring that data asset together for the data scientist to work on and provide the ingest pipeline for the data assets that we are wanting to, because this is all big data that we are talking about.
DAVE COLE
Right.
ANJU GUPTA
So AEs play a critical role in preparing the asset for the data scientist. Data scientist then starts to work on the data. Then comes the real R&D. We do tons of R&D model evaluation, go back to the CRISP-DM cycle, and then come up with an outcome. When all of this happens, the AE comes back again, and their job is to ensure that the model gets deployed into production through an API. So, there's tons of optimization and performance tuning that happens because you can't ask, and being a data scientist, I may be wrong, but you can't really ask a data scientist to do tons of optimization, performance tuning, because they love to do R&D. So, that's the job that we take away from the data scientist. And once the model is done, we know their focus is on increasing the prediction accuracy for what we are trying to do. So, get us the best prediction accuracy through the iterations of the models, add more data or run tons of different types of algos to see which one gives the best outcome. Then, the final model goes to the AE where he or she is now focused on optimization, performance tuning, and creating the model API by which business can consume the models for the outcome that they're wanting to drive.
I joined Northwestern Mutual one year ago, a couple months, and we didn't have that role. So, we created that role. I've seen huge success with that role in my past life, and at NM too, where each one of our work streams has a set of AEs with a set of data scientists and data analysts as well. So data analysts are very much focused on building a visualization for the outcomes that the data scientists are producing and translating it to the business leaders.
DAVE COLE
The storytelling, if you will.
ANJU GUPTA
The storytelling…I just feel like we can get so much, especially in the data science space, if there’s a cry I could put out to academia: if there’s a course or a curriculum that we can create for scientists to be a better communicator, the whole industry will definitely benefit a lot from that.
DAVE COLE
So those listeners out there, make sure you can explain what you're doing to your business counterparts, because if you don't, frustration on both sides. And also to be fair, there are also courses that are geared towards teaching data science and the basics and statistics to executives as well. I'd like to see both parties meet in the middle to a certain extent.
But it's interesting when you were talking about the AE, the analytical engineer, when you first started talking, I was thinking, it sounds like they're building the data mart, building up maybe a feature store for the data scientists. So, they’re hopefully reducing that 80% of the data manipulation that you mentioned. But then when you started, you said they also helped stand up the model APIs as well, and I assume, maybe even the model monitoring. That almost sounds like, and you have this role too, but like an ML engineer. I've also heard of machine learning engineers. Where do you see that boundary between the ML engineer and the AE?
ANJU GUPTA
I'm glad you mentioned it because I mixed the both of them together, but we do have an MLOps team and the MLOps team, which is where—
DAVE COLE
I told you she has all the roles!
ANJU GUPTA
So we have an MLOps team, which we have recently instituted. That team is responsible for ensuring that all of our models that we are putting into production are fully monitored. And the model monitoring that we do, the intent is to provide visibility to the model governance team on all the models that we are monitoring.
We have a fully robust model monitoring pipeline and there is an MLOps team that is responsible for the machine learning operationalization framework, so from model deployment, to model governance, to model monitoring, and that's a 24/7 team that takes into account...it's a support team as well, but they're also responsible for optimizing the performance of the models that are going into production. We tend to call them ML engineers that are responsible for optimizing the performance of models that are going into production.
DAVE COLE
Got you, but then the analytical engineer, that individual is responsible for taking the model and turning it into an API before it gets handed off to the MLOps team. Okay. I got it.
ANJU GUPTA
Yep. And MLOps teams and AEs work very closely together.
DAVE COLE
I'm sure. My brain works with diagrams. I do see that there's lots of overlap between the various teams. Like I see the AE working very closely with the data scientist.
ANJU GUPTA
Yep.
DAVE COLE
And through that CRISP-DM, and the very early stages, but then the latter stages, the AE is working very closely with the MLOps team, ML engineer, and so on. And then with the storytelling, you have the data analyst.
ANJU GUPTA
And one thing I have definitely seen is what helps out quite a bit is having a rigor on the communication channel throughout this process. It's better to have more communication than less so that everybody is aware of what's going on versus not knowing. The earlier each individual is in the process, it does create a communication overhead, but it just helps everybody. It helps move faster with the alignment that needs to happen within the organization to move towards an outcome that all of us are trying to achieve.
DAVE COLE
So, let's dig into that a little bit. Are these folks all separate teams or let's say they're all working on the same model, are they all part of the same team? Are they tiger-team based or do they get, when you mentioned communication, I'm picturing that the data scientists and the AEs reach out to the MLOps person at a certain stage.
ANJU GUPTA
So there is project based, depending on what project the team is working on. The team gets together on a regular basis. So, we have daily stand-ups, we have huddles. Daily stand-ups are for individual tracks of that project, but then huddles are where we bring everybody together to make sure that everybody is familiar with what the bigger picture is.
But we do have tiger teams where you have AEs and MLOps teams working together for model deployment and performance tuning. Then you have tiger teams of AEs and data scientists working on the model iteration and delivering an increase on the prediction accuracy for that model and things like that.
DAVE COLE
Last question on this topic, I just find it so fascinating, which is in the storytelling element of it. How do you see that working with the data analyst and the data scientists? Is it a deck? Is it prose? Is it a written document? Are they presenting in tandem, or is the data analyst that go-between the business and the data science team?
ANJU GUPTA
They cross the data analyst and data scientists as well, depending on the project itself, but there's several collaterals that we create. One for sure is the white paper. Every model that we put together, we have white papers associated with it that one can go in and read about what that model is supposed to do.
Then we have an internal data science review committee, which is a set of senior data scientists. It's run by data scientists, led by data scientists. That team gets together once a month, and then if there are models that we are developing, we bring them in front of the data scientists so that everybody's familiar with them at a high level, and get input from our data scientist on the technical rigor of some of the models that we are building, and if there are other ways to think about those models. That's more of an academic exercise for us that brings the technical rigor within the team, and that's where the data scientist or the data analyst, whoever is working on the project, will go in and present the work to the data science review committee.
We also present our work with the model outputs that come in to the business stakeholders. That's when our project management group and our data analyst will come in full swing and put the collaterals together and share it with the business leaders.
DAVE COLE
Well, that's great. I said that would be my last question. I feel like I have a bunch more, but I want to stop it here. I really appreciate you taking the time, Anju. If people want to reach out to you, can they link up with you on LinkedIn?
ANJU GUPTA
Yes.
DAVE COLE
Awesome. Well, thank you so much for taking the time. We covered a lot of topics for model governance, to actuaries, to analytic engineers and MLOps and every role in between.
ANJU GUPTA
We did!
DAVE COLE
So thank you all for listening and Anju, thank you so much for being here.
ANJU GUPTA
Thank you very much.
New episodes
What It Takes to Productize Next-Gen AI on a Global Scale
Help Me Help You: Forging Productive Partnerships with Business...
Change Management Strategies for Data & Analytics Transformations
Listen how you want
Use another app? Just search for Data Science Leaders to subscribe.About the show
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success.