
Introduces 'Observational Memory' and Reduces AI Costs by Up to 10x
TL;DR
Observational memory is a new memory architecture approach that promises to cut artificial intelligence (AI) costs by up to 10 times, developed by Mastra.
Introduction to Observational Memory
Observational memory is a new approach in memory architecture that promises to reduce costs for artificial intelligence (AI) agents by up to 10 times. Developed by Mastra, this technology stands out in improving performance on long-duration benchmarks compared to the RAG (Retrieval-Augmented Generation) system.
How It Works
Unlike the RAG system, which seeks context dynamically, observational memory utilizes two background agents known as Observer and Reflector to compress the conversation history into a dated log of observations. This approach eliminates the need for dynamic information retrieval, simplifying the architecture.
Compression Process
In the process, observational memory divides the context window into two blocks: one contains compressed and dated observations, while the other maintains the current session message history. When the number of messages reaches 30,000 tokens, the Observer compresses this information and adds it to the first block, discarding the original messages.
Economic Impact
The savings brought by observational memory come from prompt caching, which reduces token costs by up to 10 times compared to uncached prompts. While most memory systems modify the prompt with each interaction, leading to cost inefficiencies, observational memory maintains context stability, enabling effective caching.
Advantages Over Traditional Compression
Most coding agents use a compression method that summarizes the entire history only when the context window is about to overflow. This may result in the loss of critical details. In contrast, observational memory processes small portions of data more frequently, producing a decision log that is more effective for executing consistent tasks.
Business Use Cases
Observational memory serves various business applications, from chatbots on content management platforms to alert management systems for engineering teams. The common requirement among these applications is context maintenance over months, allowing agents to reference past conversations without users needing to repeat information.
Significance for Production AI Systems
The introduction of observational memory represents an alternative architectural approach to vector databases and the dominant RAG pipelines in today’s market. Its simplified architecture and context stability create a conducive environment for easy debugging and maintenance. The possibility of viable caching to reduce costs transforms the proposal into a solution aligned with large-scale performance.
Final Considerations
For business teams evaluating memory approaches for AI, questions about the necessity of context maintenance, tolerance to data compression, and the need for dynamic retrieval are fundamental. Observational memory may be ideal for those requiring agents that preserve work continuity and user history. As AIs become an integral part of recording and production systems, memory design becomes as critical as the choice of technological model.
Content selected and edited with AI assistance. Original sources referenced above.


