Flumes - Building a Memory Stack: What We Learned Designing Flumes

Building a Memory Stack: What We Learned Designing Flumes

We tried vector DBs, Redis, SQL, and object stores to build agent memory and hit a wall. Here’s what we learned, and why Flumes is a new kind of memory engine for AI.

When we started building Flumes, we didn’t set out to reinvent memory infrastructure. We just wanted to give our AI agents a way to remember things.

That turned out to be a much harder problem than we expected.

This post is a look behind the scenes, what we tried, what broke, and what we ultimately learned in designing a purpose-built memory stack for AI agents.

The Problem: Memory Was a Mess

At first, we did what many teams do. We combined a:

Vector database (Pinecone) for semantic search
Redis for fast lookups of recent turns
Postgres for structured state
S3 logs for transcripts and backups

It kind of worked. But it was a mess.

No unified view of what the agent "knew"
Hard to debug or inspect memory state
Expensive to scale across agents or sessions
Complex orchestration logic to move data between layers

More than anything, we felt like we were building a memory system on top of tools that weren’t built for memory at all.

Why Existing Tools Fell Short

Each of the existing infra tools did one thing well:

Vector DBs retrieved similar stuff
Caches stored recent stuff
SQL stored known stuff
Object stores stored everything else

But none of them gave us what memory needs:

Continuity: Being able to track context across time
Abstraction: A way to reason about memory without micromanaging data
Structure: Semantic tagging, object types, or timelines
Adaptivity: Auto-summarization, compression, and expiry

We weren’t looking for a database. We needed a memory engine.

Our Design Goals

That led us to build Flumes as a unified memory layer, not a replacement for your DB, but a layer that handles memory as its own system.

We designed it with four core principles:

Structured: Memories aren’t blobs. They have shape, relationships, timestamps, importance, types.
Queryable: You should be able to recall memory based on context, tags, or recency, not just semantic similarity.
Scalable: Memory should grow without exploding token budgets or storage cost.
Self-managing: The system should know when to summarize, forget, or archive, without manual intervention.

Flumes is not a wrapper. It’s a runtime for memory.

What We Got Wrong Early On

Early designs were overly index-centric. We relied too heavily on vector search. It made memory feel fuzzy and opaque.

We also underestimated:

How important memory observability would be
How expensive token churn would become
How brittle prompt-level hacks were for persistence

Once we rethought memory as an intentional, structured, and tiered system, the design started to click.

Lessons We Learned

Memory is not I/O. It’s reasoning infrastructure.
You can’t bolt memory on. It has to be baked into the agent’s runtime.
Token optimization is architecture, not cleanup.
Summarization should be a policy, not a patch.
Observability is a feature, not a dashboard.

Where We're Going

We’re continuing to refine the Flumes memory engine: better summarization pipelines, tighter latency bounds, smarter cold storage policies. But the core principle is clear:

Memory deserves its own system.

If you’re tired of stitching together memory from tools that were never designed for it, come see what we’re building.

[Join the early access] and start building agents with real memory—structured, queryable, and built to scale.

‍

Get early access

Effortless memory for AI teams

Join the early access