Data Science at a Fortune 500 Insurance Leader
Insurers are under enormous pressure to innovate and be nimble today, especially as the industry works to address global changes to risk profiles, behavioral patterns, and both micro and macroeconomic uncertainties surrounding the novel coronavirus. Buyers have become more price-sensitive than ever, viewing insurance providers as interchangeable. New technology-driven players have come on the scene, winning market share from long-established players. And real-time data from IoT devices like smart homes, car sensors, health devices, and drones have become mainstream, offering vast insights to those able to tap into them.
Whereas once insurers focused data science efforts mainly on risk management, data science is foundational across all areas of the business today. One Fortune 500 insurer has centralized data science work on Domino, slashing model delivery and validation cycles by months so the firm can provide a differentiated customer experience and better prevent fraud.
For nearly 30 years, this Fortune 500 insurer has used advanced analytics to better understand customer needs and data science initiatives have grown organically across the company’s core product lines and functions, and the number of data scientists has swelled into the hundreds.
One area of focus in particular has been fraud detection. Risk leaders understand that there’s a fine line financial services organizations must walk when it comes to detecting (and stopping) fraud. Get it right, and customers are grateful that their financial institution prevented a fraudulent transaction, account breach, or false claim. Get it wrong, and customer frustration sets in as transactions and claims are put on hold.
But as data scientists worked to build new fraud detection models and integrate new data sources that would enable the business get this balance right, they faced many hurdles that slowed their progress:
- Data scientists couldn’t always access the infrastructure and tools they needed. Centralized R and Python servers were oversubscribed, and at any given time, a significant number of data scientists couldn’t do their work.
- Without wanting to wait for infrastructure and tooling, data scientists opted to work on their laptops. This “Shadow IT” situation created massive governance risk for the organization.
- There was no easy way to share knowledge, datasets, models, and revisions. As a result, data scientists often spent time cleaning data without realizing another data scientist had already done so. Project history and details were often lost when data scientists moved to a different team or left the company, making it difficult to build on past work.
- Model code and data were often “thrown over the wall” to risk managers for approval prior to deployment with little documentation, adding complexity to governance and compliance processes. Lack of auditability also made it difficult for technology leaders to make sure data science projects adhered to security, governance, and compliance efforts, especially surrounding data access.
- It wasn’t easy to operationalize models; data scientists had to rewrite models from Python and R into SPSS to bring them to production.
The company conducted a two-week pilot of data science technologies from Domino Data Lab, Dataiku, DataScience.com (acquired by Oracle), and IBM. Pilot participants, who represented the company’s Banking, Property and Casualty, Innovation, and Risk divisions among others, unanimously selected Domino for its unique ability to support the end-to-end data science management lifecycle, accelerating model development, validation, and deployment.
The organization took a phased approach toward implementing Domino, onboarding approximately 30 users per week. Today, the platform serves more than 400 users across Data Science and IT, including:
- Data science leaders, who gained comprehensive visibility into data science projects underway. (Previously, they had to ask every team member individually for updates.) Integrated data science project management capabilities help them set goals for projects and track completed work by their team. As a result, they can better keep stakeholders apprised of progress and more quickly identify and resolve development bottlenecks.
- Data scientists and data engineers, who can develop and deploy models faster. A self-service environment ensures they have access to the tools and resources they need and a single place to search and find past work. Automatic tracking enables them to easily reproduce and share results so they can better collaborate with other data scientists and streamline downstream processes such as model validation. The ability to deploy models using APIs and web apps eliminates the need for data scientists to recode models on a different platform. And integration with the company’s Continuous Integration/Continuous Deployment (CI/CD) system makes it easier for teams to export and deploy models when they’re ready.
- IT and platform managers, who now have better oversight of data science work. They can confirm adherence to security, governance, and compliance, while empowering data science teams to independently access containerized research environments and compute resources. The IT team plans to integrate Trifacta data wrangling software, SAS, and autoML tools from DataRobot and H2O with the Domino platform to expand the tooling available to data scientists. And as the company migrates data science capabilities to the cloud, technology leaders say Domino’s Kubernetes-native orchestration capabilities will help them more quickly make the transition.
The Domino Effect
- Improved fraud detection. Before, teams faced time-consuming processes to mine a growing number of data sources and long cycle times (up to a year) to get new models into production. Comparatively, data scientists tapped into a wider array of data, including customer insights, behavior trends, and spending patterns, and deployed a new more sophisticated fraud detection model on Domino in a fraction of the time. The speed and success of their work, including the ability to quickly deploy new models, has helped the fraud team make a case to expand data science efforts.
- Recouping an estimated ten percent of lost time and reducing cycle times by months. Previously, data science teams lost an estimated ten percent of their time reconstructing past work and environments during lengthy (up to 18 months) model validation processes. These time-wasters have been eliminated thanks to Domino, enabling both data science teams and model validators to maximize their productivity and achieve greater economies of scale. In the future, the organization plans to invite model validators and regulators into the Domino platform, which can trim months off model deployment.
- Improved risk management. Financial services organizations are under the microscope to ensure data science processes, and the resulting models, adhere to regulatory rules. Now, with automatic tracking and full reproducibility of experiments, datasets, tools, and environments, the company has an audit trail of exactly what work has been done and who accessed what data.