
Stanford and Nvidia Optimize GPUs 2x Faster with TTT-Discover
TL;DR
Researchers from Stanford, Nvidia, and Together AI have developed an innovative technique called TTT-Discover that enhances the optimization of code on GPUs, achieving algorithm execution up to twice as fast as those written by human experts during the inference process.
Researchers Optimize GPU Performance Using TTT-Discover
Researchers from Stanford, Nvidia, and Together AI have developed an innovative technique called TTT-Discover (Test-Time Training to Discover) that enhances code optimization on GPUs, achieving algorithm execution up to twice as fast as those written by human experts during the inference process.
The technique challenges the current paradigm of "frozen" models, which are unable to learn after training. With TTT-Discover, the model continues to train and adjust its weights while attempting to solve a specific problem.
The Limitation of 'Frozen' Models
Enterprise AI models often use approaches where parameters remain static. Even though these models perform well on known problems, they fail in situations that require innovative solutions, such as formulating unprecedented algorithms.
As Mert Yuksekgonul, one of the study's authors and a PhD student at Stanford, noted: "Thinking models wouldn't be able to prove P != NP without training during inference, just as Andrew Wiles wouldn't have proven the Last Theorem of Fermat without years of effort."
A New Approach to Reinforcement Learning
TTT-Discover proposes a significant advancement in training reasoning models. Unlike standard learning, which seeks average performance across various tasks, this technique focuses on finding the ideal solution for a specific problem.
The researchers implemented two key components that differentiate TTT-Discover:
- Entropic Objective: This component causes the model to ignore average solutions and actively seek exceptional results, potentially more rewarding.
- PUCT Search: A new tree search strategy inspired by AlphaZero, which explores different solution paths.
This technique is more effective in problems that present a continuous reward signal, allowing for tracking gradual improvements.
Economic Considerations on Heavy Inference
Companies paying for API calls may need to change their approach, as a single use of TTT-Discover can cost around $500. This technique proves more advantageous for static and high-value assets.
For example, optimizing a crucial code in a company that processes large volumes of data can yield significant savings. Yuksekgonul states that this is ideal for high-impact decisions where improvements can provide a visibly immediate return on investment.
Implementation Considerations
One advantage of TTT-Discover is that it does not require a proprietary model. The researchers utilized the gpt-oss-120b, an open-source model, providing the code to the community.
This flexibility allows companies to perform their optimizations within secure environments without the need to send data to external servers. "If a company already uses reinforcement learning, there is no need for additional infrastructure," Yuksekgonul asserts.
Real-World Use Cases
TTT-Discover has been applied in four technical domains and established new performance benchmarks in many cases. In one experiment, optimizing GPU kernels for matrix multiplication achieved speeds up to twice as fast as the best previous options.
The technique is better suited for areas that require verifiable progress signals, such as logistics and resource management, where it is crucial to measure performance objectively.
Future Perspectives and Implications
The future of enterprise AI adoption may require the evolution of systems to support problem-driven learning. "Companies must learn to specify problems and to provide internal feedback data for test-time learning to be effective," concludes Yuksekgonul.
Identifying problems that may benefit from TTT-Discover represents a new opportunity to transform inference into an automated R&D laboratory.
Content selected and edited with AI assistance. Original sources referenced above.


