Radical New Neural Network Design Could Overcome Big Challenges in AI

David Duvenaud et al., Neural Ordinary Differential Equations, arXiv:1806.07366
The trajectories of neural ordinary differential equations.

David Duvenaud was collaborating on a project involving medical data when he ran up against a major shortcoming in artificial intelligence (AI).

An AI researcher at the University of Toronto, he wanted to build a deep-learning model that would predict a patient’s health over time. But data from medical records is kind of messy: Throughout your life, you might visit the doctor at different times for different reasons, generating a smattering of measurements at arbitrary intervals. A traditional neural network struggles to handle this. Its design requires it to learn from data with clear stages of observation. Thus, it is a poor tool for modeling continuous processes, especially ones that are measured irregularly over time.

The challenge led Duvenaud and his collaborators at the university and the Vector Institute to redesign neural networks as we know them. Their paper was among four others crowned “best paper” at the Neural Information Processing Systems conference, one of the largest AI research gatherings in the world.

Neural nets are the core machinery that make deep learning so powerful. A traditional neural net is made up of stacked layers of simple computational nodes that work together to find patterns in data. The discrete layers are what keep it from effectively modeling continuous processes.

In response, the research team’s design scraps the layers entirely. (Duvenaud is quick to note that they didn’t come up with this idea. They were just the first to implement it in a generalizable way.) To understand how this is possible, let’s walk through what the layers do in the first place.

How a Traditional Neural Net Transforms an Image of a Lion Into the Name “Lion”

The most common process for training a neural network (i.e., supervised learning) involves feeding it a bunch of labeled data. Let’s say you wanted to build a system that recognizes different animals. You’d feed a neural net animal pictures paired with corresponding animal names. Under the hood, it begins to solve a crazy mathematical puzzle. It looks at all the picture/name pairs and figures out a formula that reliably turns one (the image) into the other (the category). Once it cracks that puzzle, it can reuse the formula again and again to correctly categorize any new animal photo—most of the time.

But finding a single formula to describe the entire picture-to-name transformation would be overly broad and result in a low-accuracy model. It would be like trying to use a single rule to differentiate cats and dogs. You could say dogs have floppy ears. But some dogs don’t and some cats do, so you’d end up with a lot of false negatives and positives.

This is where a neural net’s layers come in. They break up the transformation process into steps and let the network find a series of formulas that each describe a stage of the process. So, the first layer might take in all the pixels and use a formula to pick out which ones are most relevant for cats versus dogs. A second layer might use another to construct larger patterns from groups of pixels and figure out whether the image has whiskers or ears. Each subsequent layer would identify increasingly complex features of the animal, until the final layer decides “dog” on the basis of the accumulated calculations. This step-by-step breakdown of the process allows a neural net to build more sophisticated models—which in turn should lead to more accurate predictions.

The layer approach has served the AI field well—but it also has a drawback. If you want to model anything that transforms continuously over time, you also have to chunk it up into discrete steps. In practice, if we returned to the health example, that would mean grouping your medical records into finite periods like years or months. You could see how this would be inexact. If you went to the doctor on 11 January and again on 16 November, the data from both visits would be grouped together under the same year.

So the best way to model reality as close as possible is to add more layers to increase the granularity. (Why not break your records up into days or even hours? You could have gone to the doctor twice in one day.) Taken to the extreme, this means the best neural network for this job would have an infinite number of layers to model infinitesimal step changes. The question is whether this idea is even practical.

If this is starting to sound familiar, that’s because we have arrived at exactly the kind of problem that calculus was invented to solve. Calculus gives you all these nice equations for how to calculate a series of changes across infinitesimal steps—in other words, it saves you from the nightmare of modeling continuous change in discrete units. This is the magic of Duvenaud and his collaborators’ paper: It replaces the layers with calculus equations.

The result is really not even a network anymore; there are no more nodes and connections, just one continuous slab of computation. Nonetheless, sticking with convention, the researchers named this design an “ODE net”—ODE for “ordinary differential equations.”

Read the full story here.


Don't miss out on the latest technology delivered to your email monthly.  Sign up for the Data Science and Digital Engineering newsletter.  If you are not logged in, you will receive a confirmation email that you will need to click on to confirm you want to receive the newsletter.