Algorithms can have an outsized impact on society. That’s why many data science leaders have focused a lot of effort recently on defining data literacy and ethics in a way that’s operationalizable in their company culture.
What we talked about:
Building a data science team at the New York Times
Creating a data culture
The relationship between a data science team and a data analyst team
Necessary soft skills for data scientists
Some resources mentioned during the podcast:
Hello, my name is Dave Cole and I am your host of the DSL Podcast. DSL standing for data science leaders. And this is our inaugural episode. And I just would like to talk briefly about what the goal of the podcast is. And I promise I won’t do this in subsequent episodes.
The goal of the podcast sort of stems from my realization as I was a Chief Analytics Officer prior to my current job here at Domino as the Chief Customer Officer. And I realized that I was really learning on the job. I had a lot of analytics, a lot of background in data science, but I was learning as I went and I was looking outward to try to find mentors, trying to find other people who could help me better understand the art of doing data science. I thought that a good idea would be to create a podcast that helped out both existing data science leaders today, as well as aspiring data science leaders, or even just people who are interested in data science and who want to hear from other data science leaders about the challenges that they’re accomplishing in their job. And also some of the things that they’re trying to do to further data science in the business world.
I’m really excited to bring this to you. I hope it’s educational, I hope it’s helpful. And I also hope it’s fun. I want each and every one of you to hopefully laugh in the course of the podcast, if you don’t at least chuckle at least once or twice, then I’m not doing my job. So I will try to keep it light. I’ll try to keep it fun, but also most importantly, informational. So without further ado, let’s dive into our first guests here on the Data Science Leaders Podcast.
Today, we have the chief data scientist of the New York Times, Chris Wiggins. He also happens to be an associate professor at Columbia in applied math, and the co-founder of hackNY.org. Do you want to talk a little bit about hackNY.org?
Sure, so hackNY really grew out of concerns that I saw in New York City after the financial crisis of 2008. So after the financial crisis of 2008, there were all these students at Columbia who had their job offers rescinded and didn’t know what they were going to do. And it just seemed a time, when the rules and the norms of careers for young technologists were up for grabs. And in particular, it was at the end of this sort of Madoff economy, where it just went up without reason. And I was teaching all these young engineers who got on the subway, and then there was only one vertical that they were all going to, or technologists who were interested in doing something creative. The dominant narrative was, well, we’re going to have to move to San Francisco if we want to do something that’s technologically creative, which made me sad.
And so I started talking to friends from startups about what they thought New York City needed to be a stronger startup community. And far and away the strongest answer was connections with young software developers who would help grow a company, a startup company, which was also sad because I knew all these young students who had their job offers rescinded and were actually very interested in trying something like a startup. And sort of trying something that was not the dominant narrative in New York City in 2000 through 2008. So I had already been involved in undergraduate research programs for the summer. And so I talked to a couple of startup people and tried to sell them on this idea of doing something like an internship program. And in fact it was John Borthwick from Betaworks, the first person who said to me, “We need an internship program like the banks have.”
And as soon as he said it, I was like, “Yeah, that is a good idea.” Because young people, they get to their senior year and they think, “Oh well, I’ll do whatever I did my junior year.” Or else they think, “I’ll do whatever those smart seniors did last year after I graduate.” So I talked to a couple of startup people and we put together this program to do something like a structured research program for undergraduates, except it would be residential and at startups. So the students work at different startups during the summer. We house them together in NYU dorms. I already have a couple of jobs. So like the only way this works is for the alumni to own it and for the gray haired people to just stand back.
So at this point, it’s a network of very talented young people. The alumni vet the applications for the next years, the alumni are the residential mentors, sort of like the dorm RAs. And it’s built up this whole network of talented young technologists. It’s also a great way to learn the future. Like hanging out with smart young people, they really are constantly telling you the future, if you just listen to them.
Was it just software engineers, was it data scientists? What was the type of folks who participated?
Yeah, we always say front end, back end data science and design, people who code. So I would be happy to branch out into like, I don’t know, product or something else. I just don’t know those skill sets so well, and I don’t exactly know the companies that are hiring them. But based on the initial network I had, that was the easiest place to start
I’m going to hazard to guess, that’s been also beneficial to you. Not just, I mean, personally being able to create this network, but also … I don’t know if you have any of your team who came from this potentially?
Yeah, actually there’s a couple of New York Times software engineers who are also hackNY alumni. That’s true.
Perfect. That’s awesome.
Yeah, actually we have had a summer intern now. No, we’ve had three summer interns in my group who were hackNY alumni, now that you mentioned it.
There you go.
But more generally, among the software engineers at New York Times, there are several that are alumni of the program.
That is fantastic. I know in times, especially, I mean now I think is no different than it has been for probably the last couple of years, but it’s obviously very hard to find talent and you’ve got to get a little creative in getting folks who are coming right out of school and building out a network like you have. And then it sort of feeds on itself as it grows. And you create this alumni, could be another way of hiring talent.
Let’s go ahead and get started. The first thing that I think most people have on their minds, at least I do is, tell us a little bit about your journey and the New York Times, you’ve been there for quite some time. What does it mean to do data science at the New York Times?
The data science group is a group, I like to say we develop and deploy machine learning problems to meet problems on the newsroom and business side. So we are ourselves on the business side, that is, we’re not doing data journalism or infographics or something like that. We are developing and deploying machine learning that might help editors with, either understanding how stories are promoted or guiding how stories might be promoted in the future. Or working with people from advertising or product like developing recommendation engines, and then deploying those, developing new innovative ad products, developing machine learning to do better marketing for subscription offers. So it’s mostly problems that could happen at any digital subscription service.
Tell us a little bit too about your journey at the New York Times. I mean, the New York Times has been around for 170 years. I think it was founded in 1851. And so what has its journey been and how did data science come to be? And I believe you played a part in that, is that correct?
Yeah, yeah. I’ve been helping them with that.
Yeah, a little bit!
When I showed up, I was hired into the BI team, which had just been created. So I think in 2012, the New York Times first created the BI team by essentially getting together all the database analysts from different parts of the org and the New York Times had just started building out the infrastructure to do event tracking. So that was an exciting time because there was all these data and not a lot of people had really looked at, say, correlating the web data with the subscription data or other things that you might imagine as sort of like table stakes. Once, you have event tracking data, there’s so many new questions you could ask.
And I had a sabbatical, so I took a sabbatical from Columbia in the fall of 2013 and had asked a bunch of smart faculty members for advice on where to do a sabbatical. A good friend of mine who’s in the journalism school advised me to go to the New York Times. He said, “Don’t go to Facebook or Google. That would just like your normal day job.” Because my day job was already doing machine learning. “Go to the New York Times. It would be weird.” And he was right.
I can guess, but what do you think sort of makes it weird? Like what makes it interesting?
Well, it was weird for an academic to go do it. So at this point, I had been in academia for a long time and sort of understood what “mission accomplished” looks like to academics. And there’s kind of a different definition of “mission accomplished” in industry. So that was fun. Getting to know it. And I had my own understanding of what data science meant among academics: the craft of applying machine learning to a problem from the natural sciences, for example, which is a lot of what my published research has been. But data science applied to industrial problems, sort of has its own process and its own definitions of “mission accomplished.”
At a high level, one of the things that’s very different is in academia, you’re mostly optimizing for peer review. You write a paper, you release a code base or something, it’s going to be used by other people whose brains are sort of shaped just like your brain because that’s how you’re writing it. And in the real world, you’re sort of doing this complimentary review, where things really work well if you’re building something that’s useful, it changes the life of somebody who’s not like you, who’s got some sort of complimentary skillset. It’s a basic product mindset. How do you get to know the people who are going to be using your thing, understand what their real pain points are, and then iterate with them to build something that they can integrate into their lives.
That’s been really fun. So that’s been an academic or an intellectual exercise being educated about what data science looks like in the real world, but then there’s the fact that it’s the New York Times. The mission really matters to me. And it’s a particularly unusual time for journalism. The business model of newspapers has just sort of gone away over the last decade because of the move of advertising to digital. And then the swallowing of digital advertising by one or two major players, has really caused a big problem to the craft of journalism. So the craft of journalism has been tied to the business of newspapering and the business of newspapering is in real disruption, so to speak. It’s been very educational to participate in the way the New York Times has met this digital challenge and made this digital transition, I would say, successfully.
Speaking of disruption, I imagine just based on how you’re describing it, that data science was relatively new to the New York Times. I mean, they were just moving into this digital world from the paper based world. And what has that journey been like? Were you first? Was there a small team and you built that team out? Tell us a little bit about what that was like.
At the end of 2013, at the end of my sabbatical, the head of BI and I had decided that we would do something, that we would create a data science team. We agreed that I was going to try to recruit and lead a data science group. So we created a group and hired a bunch of people. And at this point it’s, I think, 15 people in the data science team. I often think about this great blog post by Monica Rogati from 2017 about the hierarchy of needs, which says at the base you need to have really good data infra, you need to have your infrastructure correct. Then you can think about testing and monitoring, getting your A/B testing working. Then you can think about fancy AI and deep learning and stuff like that.
When we showed up, we were able to do some really interesting machine learning, but I feel like at the time it was really just provocations. Like, “Look, we can make these models that are predictive of kind of subscriber churn or something else that matters to people.” But actually integrating into people’s process, means that you need to somehow have the infrastructure correct. Set up an API, set up your data engineering, set up your data pipelines, and that took a long time.
Great, so you built out the data science team and you mentioned that pyramid of needs where the bottom is the data infrastructure. And you’ve also written a bit about the importance of data literacy and data ethics, so I’m going to segue this into a conversation around data. Talk to us a little bit about, as a data science leader, the importance of data literacy. I imagine not just obviously for your team, but also for those who consume your output and then also how about data ethics weaves into this?
I think really, if you want to have a data science team that has an impact on the culture, is actually part of culture change at an org, it’s very useful to have a culture of people who are willing to engage around data. Not just the people who are sort of functionally capable of data, like writing machine learning code, but people who are rhetorically capable and people who are critically capable. Rhetorically capable, meaning the output of machine learning, I need to have many people who understand what that means, and they can tell a story where the machine learning models are an asset that drives that story. So it’s not just my group. I work with a much larger data analyst team and I work with product and marketing friends who understand the methods well enough that they understand how that has narrative impact. And it helps us all understand, for example, the user journey.
And a critical capability is useful, I think, for all of us, so that when we get a result, we know whether to say, “Eureka!” or “Nuts!” Meaning, is that a discovery? Or, I don’t really trust that. It’s useful to work with analysts and product people and other people who have that sort of critical capability. When they see some trend in the data, whether it’s our output of a model or inputs to our models, we all have the critical capability to investigate, is that something that makes sense? And if not, how do we query it and dig in and sometimes do the shoe leather work to go figure out why the data are the way they are?
Right. You mentioned the data analyst team. I’m interested in, how do you see the difference between your team and the data analyst team? How does that relationship play out?
We work very closely. Part of the culture of change story was, when I was hired, there was a BI team which reported up to the CTO who reported to the CIO. Separately, there were a bunch of analysts who reported up to the Chief Marketing Officer. Part of the dynamic that I stepped into in 2013 was the dynamic between the CIO and the CMO and totally different parts of the org chart. We had a real case of the technological infrastructure mirroring the org chart, where the analysts in marketing really needed reliable data from the database analysts, who were just very distant in the org chart and didn’t necessarily have aligned priorities.
One of the things that’s happened is the creation of a centralized data group. The person I report to is an SVP of Data, who reports directly to CEO. And the data analysts and the data scientists are side-by-side in the org chart. I should say, along with the head of data governance. So data governance, data analysts, and data scientists are all part of one group. We work very closely with the data analysts in terms of how we work together. We tend to focus on problems that are either predictive or prescriptive, so we’re focusing on machine learning or machine learning and the service of decision support. Sometimes that means reinforcement learning, contextual abandons, or what have you. Sometimes it just means predictive models that are optimized for predictive power.
We work very closely with analysts who deeply understand the pathologies of the data. So sometimes that just means the gap between your mental model of how the data were generated and how they really were generated. There’s some fluidity, so there are people who have moved from being data analysts to being software engineers, or people who’ve moved from data analysts to being product people. But that’s a larger and very helpful team with which to partner.
Got it. And then data ethics, which is a little bit different than all of this. Obviously, you need to have data literacy if you’re a data analysts and if you’re the data scientist as well. But where do you see data ethics playing a role within your team? I don’t know if you have any stories or if it’s just more of a bit of a passion about what you’re seeing in the industry as a whole.
It is happening in the industry as a whole, or at least with the people that I talk to in the industry as a whole. But I would say within our group, we tend to hire for a kind of person that cares about what they do. We all know that our algorithms are having impact. We’re writing algorithms that, for example in the case of recommendation engines, those algorithms are powering choices about what content is being shown to which people. We want to understand that well enough that we know how those algorithms are working and we all can sleep well at night. Part of it is just a function of who you hire. And again, that’s true in data science and also certainly in our data governance team.
The head of data governance and I have worked quite a bit on defining ethics in a way that’s operationalizable and then defining process and procedures, where everybody – not only on my team, but also marketing or advertising, whatever partners we work with – feel like they know that, when we can either quantify or work quantitatively or qualitatively to understand how our process manifests an applied sense of ethics. And sometimes to quantify the fairness of our algorithms in ways that are … There’s several ways to go about doing that, to monitor that fairness and to think about what data we use for different algorithms, disparate treatment, it’s disparate impact. It’s been a concern of ours for many years, but now that concern is so pervasive throughout the company that we’re really starting to document that. It’s not just something we talk about with each other and with our teams, but we’re putting it in writing, so that all of our product partners know, first of all, that we take it seriously and that we have principles and process for defending them.
That’s fantastic. That’s relatively new ground. When I talk to other data science leaders, I don’t know that I’ve talked to anyone who’s – maybe they just haven’t mentioned it – but who’s written the rules of the road in terms of data ethics and the do’s and don’ts. You mentioned one that I know is of interest to me, which is the role that data science plays in serving up content.
And certainly in social media, there’s been a lot that’s been written about content that is just reinforcing your current beliefs instead of challenging your current beliefs, because the likelihood is that you’re going to read it and if you read it, you’re going to see more ads and you’re going to be essentially a more profitable user.
Can you talk at all about striking that balance? Where does data ethics play a role there in serving up content?
Well, let me give you an answer that’s specific to the New York Times, and then an answer maybe that’s useful to other data science leaders. So for the New York Times, the scale of potential harms is much smaller for a couple of reasons. One is, we are a publisher.
Yeah, all the content is based in fact.
Yeah. So there’s a lot of companies that say, “We’re not a publisher.” It’s just that they aggregate billions and billions of URLs. And then they algorithmically perform an editorial function. They perform a function about what’s going to be shown to which people. It’s just that the editorial function is itself algorithmically-driven. So the New York Times doesn’t have that problem. The New York Times doesn’t have a pool of content that says black is white and night is day, or the earth is flat or the earth is round. And then we can figure out, you like a round earth person and you look like flat earth person… Because we don’t have content that’s itself incoherent. All the content is coherent in its worldview. And it’s very carefully vetted. Part of the process of the newsroom side is truth and making sure that things are true, and retractions… And anyways, process around good faith journalism, let’s call it.
Both because of scale and because of the very nature of the fact it’s not user generated content, it’s content generated by the newsroom. We don’t have quite the scale of harms that a platform company has. That said, I do think that some of the ways we think about ethics are in general to any company that works on algorithms. And part of it is that we try not to make up ethics. That is, there actually is an applied ethical tradition, philosophical ethical tradition that’s very old. Applied ethical tradition comes from human subject research, let’s say the 1970s or more recent, which is really about rights, harms, and justice. It’s about the respect for the personhood of the people using the product, which includes informed consent. It’s about harms and about whether or not you are paying attention to the unknowable consequences of your actions, meaning, every technology has unintended consequences.
Are you monitoring and mitigating those harms and justice, including fairness? And at this point over the last five years, fairness has become a well-defined technological subject within the computer science community. So we have benefitted from that as well as just general clear thinking around justice. I would say, part of what we do that I think is useful to any digital service company is, how are you going to monitor and mitigate harms? Because you cannot predict in advance every harm, all you can do is monitor and mitigate those harms as they become realizable.
Yeah. Do you have any tips on monitoring those harms, any examples for our audience on how to do that?
Part of it is, any responsible company is going to build some sort of dashboard of how their products are working. In addition, to whatever metric you’re optimizing, there are a family of counter metrics.
Those counter metrics are things that you know that you would drive down, if you were to brainlessly optimize the thing you’re trying to optimize. Among those counter metrics can be, and in our experience, are counter metrics around fairness and equitable allocation. They could be, how is the algorithm affecting different types of content? They could be, how is an algorithm affecting different groups of people? So those are things you can monitor and dashboard, if I can use dashboard as a verb, the same you would any sort of product KPI.
That makes a lot of sense. So one of the things and switching gears a little bit that we talked about in preparation for this podcast is the idea of building out the teams themselves. What sort of advice do you have on the role that soft skills play in your data science team? What sort of thinking do you have around the skills that a data scientist needs to have outside of the more traditional skills of understanding data, being able to code, and understanding the basics of statistics and machine learning, etc.?
I’m on record for years of saying that we look for collaboration skills in addition to machine learning skills.
So those collaboration skills include … I want people who can communicate the results to the rest of the organization in a clear and concise fashion. And that particular quote I actually stole from Jeff Hammerbacher. So Jeff Hammerbacher wrote this essay in his own book, “Beautiful Data,” in 2009 called Information Platforms and the Rise of the Data Scientist. He has this particular paragraph about why they came up with the job title “data scientist” at Facebook. And he said, “A data scientist might be doing all these different things that a business analyst or statistician is not quite doing.” And he listed a bunch of things. And the last item is, “and communicate the results to the rest of the organization in a clear and concise fashion.”
The ability to communicate is really important to the people I hire, because I believe in the sort of product thing that you should actually interact with the people who are going to be using your stuff. So I want all the data scientists to go shadow the people who are going to use their tools, learn what their pains are, and then iterate with them as they build it out, so that they know that they’re building something people want. You can only do that, if you’re willing to communicate with them. That means you communicate in their language, not yours. And several people have written now about data science, and it’s part of the job of the data scientist to speak the language of the domain expert you’re working with, not to ask them to speak your technological secret language.
So communication skills are key. And then for getting things done, because we are building products, the basic collaboration skills, which are sometimes called project management, are very useful. Which is identify, communicate, and eliminate blockers. Identify, communicate, and meet deadlines. At a high level, people who can do that, they’re just really pleasant to collaborate with. In addition to this sort of tech skills and Python and SQL, those collaboration skills and communication skills are very important. There are many great candidates that we have not hired because we felt like we didn’t feel comfortable dropping them in a room with somebody from marketing, product, or some other part of the org.
Right. I can also imagine that even still through the hiring process, there’s still room to grow in that area. You might have somebody who is a great communicator, but they’re working with say, the marketing team and they never worked with the marketing team before, and they’re working on improving the latest campaign to drive subscribers and subscriptions up. How do you get that data scientist to get closer and to speak in the language of their business counterparts? Do you have them just sit with them and just hear all their problems or is there some other technique to get them up to speed? Is there any magic sauce there?
I think you put people in higher risk positions until you can tell that they’re not feeling comfortable. You just put somebody in a situation where like, “Okay, why don’t you try to drive this meeting?” And what does driving this meeting mean? It means that you walk into the meeting and you’ve thought through what are the interests of everybody else in that meeting? And you know those interests, because you’ve interviewed them directly and you should do that. Or at least you have some empathy. You walk into the meeting with some empathy for everyone else in the room. What are their interests? And then you can actually add value, you can be useful.
Yeah. I’ve certainly been in meetings with data scientists and it’s not always that you find a data scientist actually leading the meeting and driving the conversation from a business perspective and then putting it in terms that hopefully both parties can understand in terms of how to actually operationalize it. Like, “Okay, I think we should build a model that predicts this. Here’s how the output of that prediction is going to go into the system and then here’s how you’re going to act upon it. And here’s how it’s going to improve this business process of eliminating churned customers or what have you.”
That’s very important to us. I should say, at the New York Times, it’s a company where they’re comfortable having a data scientist go present to the CEO, which happens not infrequently. So that’s important to me that I know that the people in my group are building something people want. At the New York Times, it’s the type of company where everybody wants the individual contributors to have the opportunity to present their stuff, rather than to have the manager or manager’s manager who has only a managerial understanding of the details go and represent that work.
And then you’ve got the telephone tag problem of the fact that it’s been filtered through…
Which can be really bad. I don’t think I’ve ever seen a situation where somebody asked me something and I had to be like, “That person actually knows.” But I would much rather just have that person present their work directly.
I think that’s just a general managerial best practice, is to have the person who’s actually created the content. It empowers them, they feel even more of a sense of ownership. And you better believe that, if they’re going in front of the CEO, they’re going to make sure that their presentation is on point, factually correct and well thought out, and to your point, concise. I imagine you play a part in that.
And presented in a language that everybody can understand. There’s no benefit to anybody presenting things in a language that’s only meaningful to your fellow data scientists.
Yeah. The last thing I think you’ve written a bit about, and we’ll just touch on that briefly, we could probably have a whole other podcast episode on it, but from your vantage point, where do you see data science going in five years, maybe in 10?
That’s a good question. Because unlike an academic field that moves very slowly, data science has always been the field really defined by industrial concerns. And industry will make up new terms or redefine new terms. I guess one general thing is infrastructure is getting so good that the pain of deploying data science models as products is really dropping every year. So it’s just easier and easier every year to go from a model to a product or to a scheduled process. The term some people use for this, Eric Colson for example at Stitch Fix, often talks about a full-stack data scientist. That’s somebody who can not only model, they can deploy. So we’ve done a lot of that. And I presume that as the infra gets better and better, it’ll be easier for data scientists to go from model to products pretty easily, which is exciting. It’s just a lot easier to have impact that way.
I’ve also seen that role separated into two. You have data scientist, then you have this role of ML engineer as well, who’s responsible for productionalizing it. If you can have somebody who has that full-stack expertise, obviously there’s not going to be a lot lost in translation when it goes from model creation to model deployment.
The way we’ve dealt with that at the Times is, in the last few months, or maybe a year, we’ve created an MLE group. The idea is, for many of the projects, the data scientist can go from model to deploy. For things that are actually driving algorithmic recommendations on the web, we really want to make sure that stuff doesn’t fall down. So there’s a partnership between a data scientist who reports to me, who leads this group in algorithmic recommendations. The functional group that she works with reports within the engineering org. And they work very closely together to develop and deploy machine learning. Most of our stuff is written in Python. Most of their stuff is written Go. A lot of my data scientists have gone from being in academics to Go-slinging software engineers, sufficient that they can push to the same code base as the MLEs. So they work very closely together. For data science projects that really need to be highly performing and reliable and are going to be on the front page and we need not to fall down, that’s where we’ve been working closely with a dedicated MLE group. But for many of the stuff that we do, that’s a model that’s in batch, that’s retrained either every 15 minutes or every 24 hours, those are things that we can code and schedule as processes, monitored sufficiently.
Got it. That makes a ton of sense. Well Chris, this has just been fantastic. If people want to get in touch with you, I imagine they can reach out on your LinkedIn page?
LinkedIn’s good, we’re hiring! So let me know on LinkedIn if you’re looking for data science or data analytics, or MLE or data governance or data product. There’s a lot of other roles.
Certainly, the picture you painted of working at the New York Times on your team is a fascinating one. If I were back to my coding and data analytics and data science days, I’d certainly apply. It’s been a pleasure having you on the Data Science Leaders Podcast, Chris. Thank you so much for taking the time.
Sure, my pleasure.
44:16 | Episode 06 | June 08, 2021
33:37 | Episode 05 | June 01, 2021
42:53 | Episode 04 | May 25, 2021
39:31 | Episode 02 | May 11, 2021
Use another app? Just search for Data Science Leaders to subscribe.
Data Science Leaders is a podcast for data science teams that are pushing the limits of what machine learning models can do at the world’s most impactful companies.
In each episode, host Dave Cole interviews a leader in data science. We’ll discuss how to build and enable data science teams, create scalable processes, collaborate cross-functionality, communicate with business stakeholders, and more.
Our conversations will be full of real stories, breakthrough strategies, and critical insights—all data points to build your own model for enterprise data science success