What is generative adversarial network (GAN) — and how it makes computers creative

3

This article is part of Demystifying AI, a series of posts that (try) to disambiguate the jargon and myths surrounding AI.

Moments of epiphany tend to come in the unlikeliest of circumstances. For Ian Goodfellow, PhD in machine learning, it came while discussing artificial intelligence with friends at a Montreal pub one late night in 2014. What came out of that fateful meeting was “generative adversarial network” or (GAN), an innovation that AI experts have described as the “coolest idea in deep learning in the last 20 years.”

Goodfellow’s friends were discussing how to use AI to create photos that looked realistic. The problem they faced was that current AI techniques and architectures, deep learning algorithms and deep neural networks, are good at classifying images, but not very good at creating new ones.

Goodfellow came up with the idea of a new technique in which different neural networks challenged each other to learn to create and improve new content in a recursive process. That same night, he coded and tested his idea and it worked. With the help of fellow scholars and alums from his alma mater, Université de Montréal, Goodfellow later completed and compiled his work into a famous and highly-cited whitepaper titled “Generative Adversarial Nets.”

Since then, GAN has sparked many new innovations in the domain of artificial intelligence. It has also landed the now 33-year-old Ian Goodfellow a job at Google Research, a stint at OpenAI, and turned him into one of the few and highly coveted AI geniuses.

[Read: The Artist in the Machine: The bigger picture of AI and creativity]

Deep learning’s imagination problem

GAN addresses the lack of imagination haunting deep neural networks, the popular AI structure that roughly mimics how the human brain works. DNNs rely on large sets of labeled data to perform their functions. This means that a human must explicitly define what each data sample represents for DNNs to be able to use it.

For instance, give a neural network enough pictures of cats and it will glean the patterns that define the general characteristics of cats. It will then be able to find cats in pictures it has never seen before. The same logic is behind facial recognition and cancer diagnosis algorithms. This is how self-driving cars can determine whether they’re rolling on a clear road or running into a car, bike, child, or another obstacle.

But deep neural networks suffer from severe limitations. Prominent among them is the heavy reliance on quality data. The training data of a deep learning application often determines the scope and limit of its functionality.

The problem is that in many cases such as image classification, you need human operators to label the training data, which is time-consuming and expensive. In other areas, it takes a lot of time to generate the necessary data, such as training self-driving cars. And in domains such as health care, the data required for training algorithms will have legal and ethical implications because it’s sensitive personal information.

The real limits of neural networks manifest themselves when you use them to generate new data. Deep learning is very efficient at classifying things but not so good at creating them. This is because the understanding of DNNs from the data they ingest does not exactly translate into the ability to generate similar data. That’s why, for instance, when you use deep learning to draw a picture, the results usually look very weird (if nonetheless fascinating).