High-profile data scandals have made waves in recent years - from Cambridge Analytic accessing Facebook’s user information to influence voter opinion to Equifax exposing the personal information of 148 million Americans. Consumers are now more concerned than ever about how companies are collecting, storing and using their data. That also means data scientists are reckoning with what it means to be ethical data stewards in a field that is still a bit of a Wild West.
At Rev, Domino’s annual summit for data science leaders, three experts shared their vision of what data responsibility can and should look like.
Defining Data Responsibility
Natalie Evans Harris, COO at BrightHive, a data platform for social service providers, says data responsibility involves three components:
Why Data Ethics Are Important
Beyond the inherent value of being ethical, companies and nonprofits are responding to other drivers in moving toward data responsibility. For many, the motivation is to follow the law and avoid punishment or scandal, while for others it’s a response to more customers asking about how their data is used, Harris says.
Others see data responsibility as a way to stand out from the competition. Chad Wilsey, Director of Conservation Science at the National Audubon Society, says transparency around the group’s quantitative impact was a way to boost funding and public support.
For Margit Zwemer, VP of Systematic Active Equities at BlackRock, it comes down to risk: “If you’re not protecting your data, you’re at risk of being hacked or having a data breach.”
How Teams Can and Should Be More Responsible
Until an industry-wide shift takes hold, it’s up to individuals, teams, and organizations to cultivate data responsibility. But no roadmap exists, and figuring out how to do this can be challenging and expensive. “Asking every data scientist to be a philosopher and InfoSec expert is putting a lot of burden on people who shouldn’t necessarily have to be showing leadership in that space…because those cultures aren’t yet in place,” Zwemer says.
To address the gap, Harris founded the Community-driven Principles for Ethical Data Sharing, an effort to crowdsource a code of ethics involving more than 800 data scientists. The code enshrines principles like informed consent, security, transparency and preventing unfair bias. “There’s nothing groundbreaking in these principles, but they’ve served as a launching pad for people to create their own individual ethos and approaches for the way they build products,” she says.
Making Data Responsibility Stick
An optional code of ethics isn’t a substitute for real industry standards. What is it going to take for companies to truly follow ethical data practices en masse? Harris expects that regulations along the lines of the GDPR will expand and that universities will increasingly incorporate ethics discussions into data science curricula. But she thinks the cachet of being an ethical company will be even more effective. “We’re going to get to this place where it’s going to be cool to be responsible and ethical with the way that you use people’s information,” she says.
Zwemer thinks technology companies will eventually face training and reportability requirements around data responsibility, akin to the regulatory obligations in industries like finance. But she says companies will only take data ethics seriously if it affects the bottom line, with customers or shareholders punishing players that don’t comply. Given the fast pace of technological innovation, she predicts a lot of missteps will occur before that happens. “We’re creating new problems as fast as we’re solving them,” she says. “It’s going to take a lot of falling down and getting back up.”