AI/machine learning

Data as Prior/Innate Knowledge for Deep Learning Models

Rapid advances in deep learning continue to demonstrate the significance of end-to-end training with no a priori knowledge. However, when models need to do forward prediction, most AI researchers agree that incorporating prior knowledge with end-to-end training can introduce better inductive bias. 

gettyimages-961970568.jpg

There is a long history of debate about how much of human knowledge is innate and how much is learned from experience/data. This is also known as the nature vs. nurture debate. Traditional Bayesian methods explicitly model these two components with a prior component that can be derived from existing knowledge and a likelihood that is learned from the data.

The recent rapid advances in deep learning continue to demonstrate the significance of end-to-end training with no apriori knowledge (such as domain-aware feature engineering) in many computer vision and neurolinguistic programming tasks. However, when models require reasoning or need to do forward prediction, or operate in a low-data regime, most AI researchers and practitioners will agree that incorporating prior knowledge with end-to-end training can introduce better inductive bias, beyond what is provided in the training data. This is especially true in complex domains such as healthcare where precision of the systems are central and the combinatorics required for generalization is large.

Despite the general agreement that innate structure and learned knowledge need to be combined, there is no simple approach to incorporate this innate structure into learning systems. As a scientific community, we are just starting to research approaches to incorporate prior into deep learning models. In fact, not surprisingly, there is little consensus on whether priors needs to be hard-coded into the model or can be learned through, for example, using a reward function.

The prior is, generally speaking, a probability distribution that expresses one’s beliefs about a quantity before some evidence is taken into account. If we restrict ourselves to a machine-learning model, the prior can be thought of as the distribution that is imputed before the model starts to see any data. However, a more holistic view into the process should consider not only the model training but also the design of the system as a whole, including the choice of model and the data. Therefore, the prior includes anything from the choice of the algorithm itself to the way that the data is labeled.

Therefore, a perhaps not-so-obvious way to inject prior knowledge and beliefs into the process is through the training data selection. Indeed, by selecting and labeling training samples for any supervised learning model, we encode knowledge into the learning model. The data labeling process might be trivial in some domains but will require a good deal of domain knowledge in others. For example, to label cats and dogs, we need to understand the difference between these two animals, which might seem very common sense; but, to label medical images as “cancer” or “not cancer,” we need deep medical expertise.

Read the full story here.