Building a Memory Stack: What We Learned Designing Flumes

We tried vector DBs, Redis, SQL, and object stores to build agent memory and hit a wall. Here’s what we learned, and why Flumes is a new kind of memory engine for AI.

When we started building Flumes, we didn’t set out to reinvent memory infrastructure. We just wanted to give our AI agents a way to remember things.

That turned out to be a much harder problem than we expected.

This post is a look behind the scenes, what we tried, what broke, and what we ultimately learned in designing a purpose-built memory stack for AI agents.

The Problem: Memory Was a Mess

At first, we did what many teams do. We combined a:

  • Vector database (Pinecone) for semantic search
  • Redis for fast lookups of recent turns
  • Postgres for structured state
  • S3 logs for transcripts and backups

It kind of worked. But it was a mess.

  • No unified view of what the agent "knew"
  • Hard to debug or inspect memory state
  • Expensive to scale across agents or sessions
  • Complex orchestration logic to move data between layers

More than anything, we felt like we were building a memory system on top of tools that weren’t built for memory at all.

Why Existing Tools Fell Short

Each of the existing infra tools did one thing well:

  • Vector DBs retrieved similar stuff
  • Caches stored recent stuff
  • SQL stored known stuff
  • Object stores stored everything else

But none of them gave us what memory needs:

  • Continuity: Being able to track context across time
  • Abstraction: A way to reason about memory without micromanaging data
  • Structure: Semantic tagging, object types, or timelines
  • Adaptivity: Auto-summarization, compression, and expiry

We weren’t looking for a database. We needed a memory engine.

Our Design Goals

That led us to build Flumes as a unified memory layer, not a replacement for your DB, but a layer that handles memory as its own system.

We designed it with four core principles:

  1. Structured: Memories aren’t blobs. They have shape, relationships, timestamps, importance, types.
  2. Queryable: You should be able to recall memory based on context, tags, or recency, not just semantic similarity.
  3. Scalable: Memory should grow without exploding token budgets or storage cost.
  4. Self-managing: The system should know when to summarize, forget, or archive, without manual intervention.

Flumes is not a wrapper. It’s a runtime for memory.

What We Got Wrong Early On

Early designs were overly index-centric. We relied too heavily on vector search. It made memory feel fuzzy and opaque.

We also underestimated:

  • How important memory observability would be
  • How expensive token churn would become
  • How brittle prompt-level hacks were for persistence

Once we rethought memory as an intentional, structured, and tiered system, the design started to click.

Lessons We Learned

  1. Memory is not I/O. It’s reasoning infrastructure.
  2. You can’t bolt memory on. It has to be baked into the agent’s runtime.
  3. Token optimization is architecture, not cleanup.
  4. Summarization should be a policy, not a patch.
  5. Observability is a feature, not a dashboard.

Where We're Going

We’re continuing to refine the Flumes memory engine: better summarization pipelines, tighter latency bounds, smarter cold storage policies. But the core principle is clear:

Memory deserves its own system.

If you’re tired of stitching together memory from tools that were never designed for it, come see what we’re building.

[Join the early access] and start building agents with real memory—structured, queryable, and built to scale.

Get early access

Effortless memory for AI teams