MIT’s CompreSSM Technique Uses Control Theory to Make AI Models Leaner During Training

Training large AI models is expensive—not just in dollars, but in time, energy, and computational resources. The traditional approach has been to train a massive model first, then compress it afterward. But MIT researchers have a better idea: compress during training instead of after.

A new technique called CompreSSM, developed by researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL), Liquid AI, and international partners, uses mathematical tools from control theory to shed unnecessary complexity from AI models mid-training. The approach targets state-space models, which power applications from language processing to audio generation and robotics.

How It Works

The key insight is surprisingly simple: the relative importance of different components within AI models stabilizes much earlier than expected—typically after just 10 percent of training. The researchers use something called Hankel singular values, which measure how much each internal state contributes to the model’s overall behavior. By calculating these values early, they can reliably rank which dimensions matter and which are dead weight.

Once those rankings are established, the less important components get surgically removed, and the remaining 90 percent of training proceeds at the speed of a much smaller model. The model essentially discovers its own efficient structure as it learns.

The Numbers

The results are striking. On image classification benchmarks, compressed models maintained nearly the same accuracy as their full-sized counterparts while training up to 1.5 times faster. On CIFAR-10, a model reduced to roughly a quarter of its original state dimension achieved 85.7 percent accuracy—compared to just 81.8 percent for a model trained at that smaller size from scratch.

On Mamba, one of the most widely used state-space architectures, the method achieved approximately 4x training speedups, compressing a 128-dimensional model down to around 12 dimensions while maintaining competitive performance.

Why It Matters

Current approaches like post-training pruning still require paying the full computational cost of training the big model first. Knowledge distillation—another popular technique—requires training a large “teacher” model to completion and then training a second, smaller “student” model, effectively doubling training effort.

CompreSSM avoids both costs. Compared to Hankel nuclear norm regularization, a recently proposed spectral technique, CompreSSM was more than 40 times faster while achieving higher accuracy. Against knowledge distillation on heavily compressed models, CompreSSM maintained near-full performance while distilled models saw significant accuracy drops.

The Bigger Picture

“This turns compression from an afterthought into part of the learning process itself,” said Daniela Rus, MIT professor and director of CSAIL. “Instead of training a large model and then figuring out how to make it smaller, CompreSSM lets the model discover its own efficient structure as it learns.”

As AI models continue to grow in size and computational demands, techniques like CompreSSM could prove essential for making AI development more sustainable—and accessible to researchers without access to massive GPU clusters.

This post is part of the daily AI news digest. For more AI news, visit AI News.