Numenta is a research-focused company that applies neuroscience findings to machine learning algorithms.
Data Science at Numenta
Neuroscience algorithms used by Numenta are most commonly used to detect anomalies on streaming data sets in areas such as:
- Data center monitoring for cyber security protection
- Traffic flow analysis
- Social media activity around a particular company, product or topic
- Proactive maintenance of machinery
By implementing the Domino Data Lab data science platform, Numenta researchers have scalable, flexible computation resources and can run thousands of experiments in parallel. They’re delivering algorithmic results in hours that would have taken weeks before, and their results are more accurate because they’ve been tested across a wider range of parameters.
Before Domino, Numenta’s data scientists worked in isolation, either on their laptops or in private Amazon Web Services (AWS) cloud environments. Because researchers with neuroscience and machine learning backgrounds don’t often have AWS expertise, most went the laptop route.
Those who ran AWS did so inefficiently; without a standardized, simple way to configure and run AWS. “We want our researchers focusing on the task at hand and not necessarily the infrastructure behind the scenes,” said Subutai Ahmad, Numenta’s vice president of research.
In addition, the team hadn’t standardized practices around documentation of experiments. This resulted in duplicative efforts and lack of reproducibility. “You easily lose track of processes and have to rerun things,” Ahmad added.
Numenta implemented the Domino data science platform to satisfy the following needs:
Easily run experiments in the cloud. Accommodate ever-changing needs for compute power to support data science workflows, while minimizing manual overhead.
Facilitate rapid experimentation. Make it easy for researchers to tweak parameters, understand what they’ve already looked at and optimize algorithms.
Improve tracking and reproducibility. Enforce a standardized way of tracking work history and replicating it in the future.
“With Domino, you’re able to set up an environment once and then share it and reuse it,” explained Austin Marshall, a senior staff engineer at Numenta.
Domino makes it easy to test algorithms at different size and performance levels, switching seamlessly between compute environments without worrying about infrastructure. The platform’s integration with data science tools such as Jupyter Notebooks is also convenient, supporting existing workflows. And by automatically tracking every step of the data science process in Domino, Numenta preserves past work and makes it easy to publish results.
This is important so that, for example, external audiences leveraging its algorithms under the Affero General Public License (AGPL) can reproduce the same results. Ahmad added, “When you need to execute your experiments, you’re operating within Domino’s execution environment where you’re paying by the minute as opposed to the sum cost you’d have with AWS.”
Beyond the platform benefits, Numenta values working with the Domino team. “The support team at Domino has been fantastic,” said Ahmad. “They’ve been really responsive to requests and over time have built a bunch of new features that make our lives a lot easier.”
The Domino Effect
By relieving infrastructure headaches and allowing researchers to experiment in parallel, Numenta researchers achieve more. They deliver results in hours that would have previously taken weeks, and the results have greater accuracy because they’ve been more thoroughly tested.
One key deliverable that Numenta brought to market after implementing Domino was its Numenta Anomaly Benchmark (NAB). NAB is the first benchmark for evaluating anomaly detection models for streaming data. “When developing NAB, we easily saved a couple weeks of time, if not more, thanks to productivity gains afforded by Domino,” said Ahmad.
And Numenta was able to test more than twice as many algorithms against it as they would have had the capacity to do pre-Domino. Next, Numenta is tackling even bigger challenges. “The next generation of algorithms that we’re working on are 5 to 10 times larger than the ones we use in NAB. When we want to run large-scale experiments, testing parameter combinations or seeing how the algorithms scale in different situations, you often have to run thousands of individual experiments,” summarized Ahmad.
“Domino is helping us tackle problems we wouldn’t be able to otherwise, because they’re just too large for laptops.”