Applying Data Science to Fight Child Abuse
Thorn builds technology to defend children from sexual abuse
Data Science at Thorn
Thorn is a nonprofit that explores and analyzes sensitive data sets, often in partnership with law enforcement, to fight the sexual exploitation of children. Thorn uses data science to uncover insights that help its partners sift through massive amounts of data and make and elevate actionable content. Their web-based tool Spotlight is used by over 1,300 agencies in all 50 U.S. states and Canada to find human trafficking victims faster.
With a small team largely relying on a rotating batch of volunteers, Thorn needed a platform that was secure and instilled best practices from the get-go. Thanks to Domino, Thorn has been able to scale and enhance collaboration without compromising sensitive data. “We’re a small nonprofit, so we’re never going to have a massive data science team. With our first batch of volunteers working in Domino, we’ve been tackling three projects that I wouldn’t have been able to touch otherwise,” said Ruben van der Dussen, director of Thorn’s Innovation Lab.
As a nonprofit, Thorn has limited resources to invest in a large data science department. To expand its reach, the organization decided to bring in a small number of Ph.D. students and skilled volunteers who were passionate about the cause. Yet relying on a team of geographically distributed individuals working varied hours and for short durations posed obstacles.
Static infrastructure: Volunteers’ time is precious and limited; it is critical to expedite the onboarding process and rapidly procure the tools and infrastructure necessary to become productive fast. But volunteers were using their own machines, not all of which were powerful enough to run the required tools.
Thorn lacked the ability to embed security and access controls which would allow volunteers to work on critical projects leveraging data sets that include confidential information, often pertaining to law enforcement investigations. Volunteers were limited to samples of non-sensitive data shared via GitHub and SQL databases.
Silos of knowledge and iteration friction: The main risk associated with a volunteer-based team became a mission-critical problem. Volunteers working on individual laptops resulted in silos and hampered transparency.
Many volunteers would take on a project, deliver great work, and then return to their day jobs or start something new. They might leave behind an interesting insight or--ideally--a model, but it was nearly impossible for others to understand what work was done and/or how that model was built. Thorn needed the ability to oversee data scientists’ work, keep a history of all workstreams and experiments, and institutionalize knowledge so that when one volunteer left, the work they’d done could be preserved, accessible and reproducible by others. The team needed to be able to share results, collaborate and institutionalize knowledge in a central hub.
Thorn recognized the need for a central data science platform, but its engineers lacked the bandwidth to build and then continuously maintain, improve and support one. As the team started exploring commercially available products that would allow Thorn to expand its volunteer base, “Domino kind of rose to the top,” in part because of its security features and ease of use, according to van der Dussen.
We immediately needed a platform where we could securely give volunteers access to whichever data we saw fit and monitor their work. If people dropped off, we wanted to be able to revoke access.
Immediately upon implementing the Domino data science platform, the solution freed Thorn’s engineers to focus on high-impact projects instead of maintaining tools for the data science team.
Other benefits included:
Management and security: Domino doesn’t just help data scientists build models, it facilitates the end-to-end workflow – from ideation to discovery of existing work, model sharing, deployment to production (via either APIs supporting external models or human consumption), quality assurance, monitoring, and documentation. This end-to-end process enables rapid model delivery and iteration which translates to faster innovation and competitive differentiation.
Reproducibility: Maintaining a system of record for experiments and results is critical with constant turnover on the team. “I really love the project and package structure of Domino. Whenever a volunteer does some work, I push them to package it up really nicely. Then it can be run seamlessly by another volunteer that comes in, with expected results and proper documentation,” said van der Dussen.
Collaboration: Domino makes it easy for a spread-out team to share digestible results with one another and with other stakeholders at Thorn, including product, engineering and executives. Sharing progress and challenges keeps the team on track toward relevant, high-impact results.
Easy onboarding: Domino allows Thorn to provide volunteers with a standardized compute environment so they can make an immediate impact. “We can say, ‘Hey, go ahead and spin up this massive cluster to do whatever work you need to do. And it’s sitting within our environment, so we can control what you use,” van der Dussen said. Team members access the data science tools they’re comfortable using, including Jupyter and Python.
The Domino Effect
Domino has enabled Thorn’s data science team to scale by more than 3X by fully leveraging students and volunteers. Van der Dussen expects that the platform will underpin the organization’s plans to scale further, including doubling the number of collaborators.
“I was the only one that could work on this before, and now suddenly we have four projects going in parallel,” van der Dussen said. “We can expect progress on fronts that we weren’t really paying attention to before, with more efficiency as we bring on more people.”