Welcome back to our monthly burst of themespotting and conference summaries. We held Rev 2 May 23-24 in NYC, as the place where “data science leaders and their teams come to learn from each other.” The conference more than doubled from last year: 2 days, 3 tracks, 5 sponsors, 39 sessions, 65 speakers, 600 attendees. The many reviews, discussions, debates, etc., on social media speak quite loudly, and I’ll provide some pointers to those here. Note that there’s not enough room in an article to cover these presentations adequately so I’ll highlight the keynotes plus a few of my favorites. Videos will be coming out later.
Nick Elprin, CEO and co-founder of Domino Data Lab. Image Provided Courtesy ofA.Spencer of Domino.
First item on our checklist: did Rev 2 address how to lead data teams? Yes! In many, many ways. To quote Brian Lindauer from Duo Security:
“Enjoyed #dominorevso much that it left me wanting a Slack for data science leaders. If you lead a data science team/org, DM me and I’ll send you an invite to data-head.slack.com”
We had data science leaders presenting about lessons learned while leading data science teams, covering key aspects including scalability, being model-driven, being model-informed, and how to shape the company culture effectively. As one of the co-chairs for Rev, I felt bad that I did not get to see each of the talks. Again, I’m looking forward to the videos!
Data science leadership: importance of scaling, culture, and specialization
One of my favorite presentations—and the one I kept hearing quoted by attendees —was the day 1 keynote “Data Science at Netflix: Principles for Speed & Scale” by Michelle Ufford. First off, her slides are fantastic! Secondly, I talked backstage with Michelle, who got into the field by working on machine learning projects, though recently she led data infrastructure supporting data science teams. She had much to say to leaders of data science teams, coming from perspectives of data engineering at scale. And by “scale” I’m referring to what is arguably the largest, most successful data analytics operation in the cloud of any public firm that isn’t a cloud provider. Netflix is cloud native and was founded on a culture of sophisticated data science practices. In terms of personalization, the sheer magnitude of model-informed decisions is startling; without machine learning, work such as manually selecting video frames for thousands of shows wouldn’t be possible.
Michelle’s talk examined an organizational trade-off between simplicity and flexibility, especially through the lens of how practices have evolved over time; for example, how the organization responds to challenges. It’s that practice of engineering at Netflix which I find so interesting – really since Ariel Tseitlin led Cloud Solutions there. For example, while enjoying dinner the evening before Rev, a close friend questioned the use of Jupyter beyond exploratory data science. I asked, “Have you seen Papermill from Netflix?” The following day, Michelle off-handedly mentioned how Jupyter notebooks get used in production especially by the data engineers. To paraphrase, when you encounter a problem once, make it simpler for others to understand the next time they encounter that same issue. Make that part of the company culture. That’s the essence of engineering. Jupyter fits brilliantly for that purpose.
“I’ve been in a lot of talks, but a lot of times the advice isn’t very feasible or applicable. I’m trying to avoid that.”@MichelleUfford#DominoRev
Another gem of an observation from Michelle Ufford: too many prescriptive rules for how data teams approach problems leads to brittle process and ineffective culture. In other words, autonomy at the team level allows for scaling more effectively:
“We hire competent people then get out of their way. We minimize policy, remove restrictions, and give everyone freedom and responsibility.”
My reactions to that point were visceral. Wish I could’ve quoted Michelle throughout, like, the past 15 years! Wow, that’s one of the largest stumbling points that I’ve seen for companies (in my case, the past 5-6 employers), where corporate leaders proscribe how to handle data, how to analyze data, and how to organize data science teams. Because reasons. Because of bad culture. Because of anxieties and misunderstandings around the HIPPO (highest paid person’s opinion), who may have little understanding about technology and use cases. Bad things happen this way. In one of the most egregious cases I’ve seen, the CEO of a Silicon Valley tech start-up forbade employees from writing LinkedIn recommendations for former co-workers who’d been terminated. While it took a few years, eventually their board walked out that CEO, long after much damage had been done.
In case you hadn’t noticed, corporate execs typically don’t excel at making proscriptions about data practices. In fact, recent industry surveys point out how:
Company culture is one of the most significant stumbling blocks for enterprise adoption of effective data-related practices. Those problems tend to come from the top, i.e., misunderstandings at the board level.
Many enterprise organizations with sophisticated data practices place those kinds of decisions on data science team leads rather than the executives or product managers.
Michelle’s observation is the first time I’ve seen an argument within data science that corresponds with Bruce Schneier’s arguments about security from his book, Beyond Fear: Thinking Sensibly about Security in an Uncertain World. To paraphrase Schneier, this is how centralization and uniformity make security efforts brittle: “Defense in depth, which means overlapping responsibilities to reduce single points of failure, both for the actual defensive matters and intelligence functions.” Similarly, from a cognitive perspective, centralization and uniformity make data science approaches brittle and proscriptive leadership is a serious misstep. It doesn’t reconcile with achieving scale.
One subjective point that Michelle and I debated backstage: specialization.
In an organization the size of Netflix, teams need to be differentiated. Her talk addressed career paths for people in data science going into specialized roles, such as data visualization engineers, algorithm engineers, and so on. I’m generally averse to overly-specialized job titles – given how our field is based on interdisciplinary work. Even so, I agree 100 percent with Michelle’s point here, which I might paraphrase as: bring in people who are good at being generalists, then let them dive deeply into areas they love. That approach fits well with the differentiated needs in enterprise.
Data science leadership: importance of being model-driven and model-informed
Considering how Rev 2 was largely about leadership, I’m tempted to place all the reviews under this same heading. That wouldn’t be quite fair, but how about at least a few more? Randi Ludwig, a data science leader at Dell Technologies, captured these zen kōans from Nick Elprin, CEO and co-founder of Domino Data Lab, about the importance of model-driven business:
“Being data-driven is like navigating by watching the rearview mirror. Being model-driven is like using GPS.”
“If your business is using big data and putting dashboards in front of analysts, you’re missing the point.”
“Being model-driven is about being disciplined enough to actually use the models to make your decisions. Just doing machine learning is not enough, and sometimes not even necessary.”
and … my favorite line from Rev 2 … (wait for it) …
“If you aren’t a model-driven business, you’ll soon be losing to one.”
In addition to model-driven business, another term that resonated strongly with Rev 2 attendees was what Thomas Laffont, CEO and co-founder of Coatue, called model-informed. FWIW, my keynote (“Data Science: Past and Future” – see head explosion emoji) followed a discussion led by Matthew Granade with Thomas and Alex Izydorczyk from Coatue. They covered so many of the items that I was going to discuss; my follow-up to their panel seemed mostly about unpacking their key points onto timelines, along with summarizing some of the more unconventional insights from this column, then projecting ahead into likely near-term outcomes. Like you do.
My take on where models fit into the “metasyntactic-variable-eats-the-world” continuum?
We lead teams of people plus automation (e.g., models) to address customer needs, with lots of amazing two-way feedback loops possible betwixt any pair of these entities. Perhaps not model -driven nor -informed, but instead model-intertwingled. Because I <3 graphs, networks, tensors, and so on. Then again, Gartner and Forrester aren’t exactly rushing to co-opt my line.
Trying to dissect a model to divine an interpretation of its results is a good way to throw away much of the crucial information – especially about non-automated inputs and decisions going into our workflows – that will be required to mitigate existential risk. Because of compliance. Admittedly less Descartes, more Wednesday Addams.
Cognition, decision-making, and bias
Second item on our checklist: did Rev 2 dig into some of the mysteries about cognition and decision-making? Check!
“For having integrated insights from psychological research into economic science, especially concerning human judgment and decision-making under uncertainty.”
I recall a “Data Drinkup Group” gathering at a pub in Palo Alto, circa 2012, where I overheard Pete Skomoroch talking with other data scientists about Kahneman’s work. Rather, they were beaming about Kahneman’s work and its significance in our field. Clearly, when we work with data and machine learning, we’re swimming in those waters of decision-making under uncertainty. Whether an organization is model-driven, or model-informed, or any adjacent descriptor related to models, at some point we’re managing teams of people plus automation, and the people – particularly their leadership – must be effective at working with probabilistic process. That’s where the human judgment mentioned by the Nobel committee comes into play.
Pete describes how, in general, machine learning shifts engineering from a deterministic process to a probabilistic one. Unfortunately, humans suck at probabilities. Humans appear to be hardwired by evolution along the lines of screaming “Eek! There’s a monster, everybody run!” Also, consider the phenomenon of somebody yelling “Fire!” in a crowded theater. Humans typically don’t stop to assess the probability of a monster or a fire, thinking carefully before acting. Instead they tend to stampede over each other in their rush to get the hell away.
Our day 1 keynote at Rev 2 featured Dr. Kahneman presenting on “The psychology of intuitive judgment and choices.” You can see many of the highlights on Twitter threads. Noise, along with its close relationship to information theory, became a subject of study using probabilistic tools in the mid-twentieth century thanks to Claude Shannon. Throughout his keynote, Kahneman developed a theme exploring noise vis-à-vis human cognition and decision-making.
tweet-quote: “Humans have a lot of valuable inputs; they have impressions, they have judgments. But humans are not very good at integrating information in a reliable and robust way. That’s what algorithms are designed to do – they reduce the noise.”
tweet-quote: “Humans are noisy, and we prefer the natural to the artificial, even though models are regularly better than humans.”
tweet-quote: “People need a story after a decision to feel good about the results – just not too quickly. Discuss evidence before intuition.”
Moving beyond introductions, i.e., the more descriptive and anecdotal aspects, Kahneman explored tools we can use to overcome the effects of cognitive bias. The most poignant for me was a simple approach for measuring noise within an organization. To do this, first review quantitative decisions being made by staff – for example, settlement prices quoted by insurance claims adjusters. Measure how these decisions vary across your population. Then calculate the variance divided by the mean to construct a metric for noise in decision-making. Public service advisory: be sure to wrap that rascal in a confidence interval.
Kahneman described how in many professional organizations, people would intuitively estimate that metric near 0.1 – however, in reality, that value often exceeds 0.5 such that collectively the staff’s decision-making process performs worse than random. Worse than flipping a coin! For kicks, try calculating this kind of metric within your own organization. Clearly in these situations, becoming model-driven addresses an existential risk and that’s the point Nick Elprin made in his keynote. Better training (for people and for models) is also indicated.
Another subtle but powerful suggestion: collect evidence before the debate begins in a meeting to avoid confirmation bias. In other words, let people express their concerns privately before anyone starts to take a stand. Otherwise, people will tend to rally around the early set of opinions expressed. Kahneman described the history of decision analysis:
“Decision analysis was going to take over business, but then CEOs didn’t want it. They didn’t want to be wrong or second-guessed. It will be fascinating when ML models start predicting CEO decisions and whether they feel threatened.”
Truer words were seldom spoken. There’s an important point here, in that data science really only addresses part of the puzzle. There’s an entirely different field of decision analysis that concerns how we handle the actions that data science informs. A few people have been vocal about this point recently – Cassie Kozyrkov and Ajay Agrawal come most immediately to mind. And to Kahneman’s point, once ML models begin to disrupt/automate the value of C-suite decision-making, this is all going to get really interesting.
Addressing cognitive bias with pre-mortems
That may take a while. More near-term, Kahneman suggested the use of pre-mortems – also called backcasting, as a contrapositive of forecasting. That practice formalizes means for collecting evidence prior to introducing the cognitive bias that is inherent in meetings. It’s a way of leveraging human cognitive biases to produce tangible results.
One of the people attending Rev was Carl Spetzler – a Ramsey Medal winner who’s also in the SRI Hall of Fame as an expert in decision analysis. I got to speak with Carl later during the conference, and I highly recommend his recent “Rethink: Premortem 3.0” webinar. While the discussion in that webinar begins a bit oddly, one quickly realizes that they are (1) running components of a pre-mortem with their live audience, and (2) leveraging a facilitator to mitigate bias. They’re using a cloud-based tool called PowerNoodle to automate some of the mechanics of the pre-mortem process.
Seriously, in my next start-up, scheduling a meeting without first leveraging a pre-mortem will likely become a fireable offense. This kind of practice is essential for supporting diversity and inclusion, for averting catastrophes that might otherwise be ignored due to human difficulties with probability, for eliminating so much wasted time in meetings listening to the same HIPPOs drone on time and again.
To be fair, Kahneman made a statement or two that wrinkled my brows. Others attending felt the same way. There were comments about bias embedded in ML models, related to discrimination, and I don’t quite buy what he said. Also, Kahneman seemed to be talking at multiple points about fully-automated use of ML models – without considering model evaluation, audits, human-in-the-loop, model transparency, or other adjacent concerns that are current. There was a lively Q&A session after his keynote, and if we’d had more time I’d love to have dug into those points.
Hard problems, in practice
Chris Wiggins is a Columbia professor, Chief Data Scientist at the New York Times, and the co-founder of hackNY. If you look into the origins of our field, you’ll quickly find that Chris has been writing and teaching about the history of data science, for example, the “Data: Past, Present, and Future” course at Columbia – and as you may have noticed, that’s one of my most favorite topics. We’ve been involved in a discussion forum for several years, although we’d never met. Arguably, I learned more from a few conversations with Chris than from the rest of the conference combined.
In his day 2 session talk, “Data Science at the New York Times,” Chris opened with “The Talk” that nearly every one of my editors has sat me down and retold about “separation of church and state” in journalism. He showed a diagram of those two concerns as separate boxes, then added another box labeled “data” as the common foundation for both. Working through distinctions of descriptive analytics, predictive analytics, and prescriptive analytics, Chris recounted several stories about how managers had requested one kind of deliverable from the data science while needing something entirely different. A key aspect of leadership in our field is to recognize those distinctions and reframe projects to fit the needs of the business, instead of merely responding to requests. For example, a request for a descriptive dashboard to “compare whether a red button or a blue button leads to lower churn” might be better served by a prescriptive model to personalize pages so that customers churn less. Don’t be hesitant to be proactive. Another theme was about how often project results for internal decisions get turned around as customer-facing products – as soon as the salespeople see which insights can be surfaced, then monetized. That’s an important reminder about the interdisciplinary nature of data science and our team process.
Hypothetically speaking, if particular federal programs are at risk of being cut unless they can prove their results, but the required data is siloed at Census, IRS, HUD, Illinois Corrections, etc., with no possibility of running a join across agencies while maintaining compliance … then clearly many important social services are at risk of getting cut. Jupyter and NYU have been busy addressing that problem. Beyond the government use cases, this work has enormous utility for data science in enterprise in general. Full disclosure: I’m part of that effort and consulting on behalf of NYU.
In another of my favorite talks, Robert Toguchi from U.S. Army Special Operations Command presented “Fostering a Culture of Change” on day 1. Compared with enterprise scale and compliance requirements w.r.t. data science, the military faces large challenges that are typical of enterprise organizations, along with the added aspect of keeping pace with machine learning advances. The latter has the added nuance that the rapidly evolving field of adversarial machine learning is becoming increasingly adversarial. Without going into details, one might even find AI teams who represent the latest in ML technology for large nations vying with each other on Kaggle – and that’s not a matter of “proxy wars” any longer, it’s a matter of swapping out datasets between the real and the simulated. In other words, these could escalate into higher intensity conflicts. DoD has much to gain by learning lessons about data science from enterprise, as well as much to gain and contribute to open source. I find this especially interesting given the “committer wars” of big data over the past decade or so: now we’re seeing more non-vendor contributors in popular open source projects. I consider that a healthy trend.
Also, Josh Wills presented the day 1 closing keynote “How to Play Well with Others” which was arguably one of the most pragmatic talks about data science and leadership, especially the part about the “infinite loop of sadnessempathy” illo. That fits surprisingly well with another famous illustration from “Computing with Data: Concepts and Challenges,” where John Chambers (inventor of the R language) was citing John Tukey (inventor of data analytics). Whenever you get the opportunity run, do not walk, to catch Josh speaking.
We followed Josh with a social mixer called Data Science in the Senses, featuring several projects that leverage sensory data. That even included a custom cocktail, available only at Rev 2. For another “only-at-Rev-2” feature, we gave away socks as schwag! By the time we arrived in NYC, I’d nearly forgotten a task several months earlier where we’d profiled luminaries in data science history then made socks to commemorate them: Grace Hopper, Thomas Bayes, Karen Spärk Jones, John Tukey, Ada Lovelace. To make sure that NLP represents, of course I got the Karen Spärk Jones socks.
Rev 2 provided a rare and much-needed forum about data science leadership, where teams could come and learn about practices from each other. Kudos to Domino Data Lab for sponsoring this unique format, and especially to co-chairs Jon Rooney, who handled the conference speaker introductions expertly, and Karina Babcock, who performed so much of the core work producing the conference. I’d like to say that it was a one-of-a-kind event, except that we’re doing this annually. See you at Rev 3 in 2020!