October 24, 2020

Latest News and Commentary Articles

Why training neural networks comes with a hefty price tag

10 min read

In recent years, deep learning has proven to be an effective solution to many of the hard problems of artificial intelligence. But deep learning is also becoming increasingly expensive. Running deep neural networks requires a lot of compute resources, training them even more.

The costs of deep learning are causing several challenges for the artificial intelligence community, including a large carbon footprint and the commercialization of AI research. And with more demand for AI capabilities away from cloud servers and on “edge devices,” there’s a growing need for neural networks that are cost-effective.

While AI researchers have made progress in reducing the costs of running deep learning models, the larger problem of reducing the costs of training deep neural networks remains unsolved.

Recent work by AI researchers at MIT Computer Science and Artificial Intelligence Lab (MIT CSAIL), University of Toronto Vector Institute, and Element AI, explores the progress made in the field. In a paper titled, “Pruning Neural Networks at Initialization: Why are We Missing the Mark,” the researchers discuss why current state-of-the-art methods fail to reduce the costs of neural network training without having a considerable impact on their performance. They also suggest directions for future research.

Pruning deep neural networks after training

The recent decade has shown that in general, large neural networks provide better results. But large deep learning models come at an enormous cost. For instance, to train OpenAI’s GPT-3, which has 175 billion parameters, you’ll need access to huge server clusters with very strong graphics cards, and the costs can soar at several million dollars. Furthermore, you need hundreds of gigabytes worth of VRAM and a strong server to run the model.

There’s a body of work that proves neural networks can be “pruned.” This means that given a very large neural network, there’s a much smaller subset that can provide the same accuracy as the original AI model without significant penalty on its performance. For instance, earlier this year, a pair of AI researchers showed that while a large deep learning model could learn to predict future steps in John Conway’s Game of Life, there almost always exists a much smaller neural network that can be trained to perform the same task with perfect accuracy.

There is already much progress in post-training pruning. After a deep learning model goes through the entire training process, you can throw away many of its parameters, sometimes shrinking it to 10 percent of its original size. You do this by scoring the parameters based on the impact their weights have on the final value of the network.

Many tech companies are already using this method to compress their AI models and fit them on smartphones, laptops, and smart-home devices. Aside from slashing inference costs, this provides many benefits such as obviating the need to send user data to cloud servers and providing real-time inference. In many areas, small neural networks make it possible to employ deep learning on devices that are powered by solar batteries or button cells.

Pruning neural networks early