Introducing an Advanced RAG System with Raptor

Mohammad Sadegh Nemat Pour
Development
min read
Ever felt like your search tools just don’t “get” what you’re really looking for?
You’re definitely not alone. As AI and information retrieval keep moving forward, it’s easy to notice the gap between what we want and what most tools actually deliver—especially when you’re dealing with huge documents or tricky, nuanced questions.
That’s exactly why we decided to build something different: an advanced Retrieval-Augmented Generation (RAG) system, powered by a technique we call Raptor.
This isn’t just another search engine tweak—it’s a real leap forward in how machines can understand and find information.
Why Standard RAG Falls Short
Let’s be real: most RAG pipelines do a decent job finding documents that look similar to your query.
But what about the deeper meaning? The context? Those subtle connections that make an answer actually useful?
Standard systems often miss these, especially if you’re having an ongoing conversation or your data is a bit messy.
We wanted to change that. Our goals were pretty ambitious:
See the Big Picture (and the Details): Understand documents as more than just a pile of words—really get their structure and meaning.
Get Inside the User’s Head: Make sense of queries in context, even as conversations evolve.
Deliver Answers, Not Just Documents: Actually synthesize and rank information so you get what you really need.
Our Solution: A Two-Stage RAG Pipeline (with a Twist)
Here’s how we tackled the challenge.
Our system has two main parts, and honestly, it’s easiest to show you:

A Visual Glimpse: The RAPTOR Image
To give you a better feel for the RAPTOR approach, here’s an official image from the RAPTOR GitHub repository:

This illustration captures the essence of RAPTOR’s recursive, tree-structured approach to document organization and retrieval. Each branch represents a cluster of related information, making it easier for the system to find contextually relevant answers—no matter how complex your data is.
Stage 1: Raptor—Deep Document Understanding
Before anyone even asks a question, Raptor is already busy.
Think of it as a super-organized librarian who not only reads every book but also arranges them into a multi-level map of topics and themes.
Chunking & Embedding: We break documents into bite-sized pieces and turn them into vector embeddings using Google Vertex AI. It’s like putting every bit of text on a map.
Hierarchical Clustering: This is where Raptor really shines. Using UMAP and Gaussian Mixture Models, we group similar chunks—not just in flat clusters, but in a tree-like hierarchy. So you can see both the forest and the trees.
Summarization: Each cluster, at every level, gets its own summary. That way, we build up a layered understanding, from the nitty-gritty details to the big-picture themes.
Efficient Storage: All this structure is stored in PostgreSQL with pgvector, so it’s ready to go when you need it.
Stage 2: Smarter Query Processing
When you ask a question, the system doesn’t just grab for keywords.
Instead, it thinks—almost like a human would.
Contextualization: The system looks at the conversation so far, making sure it really gets what you mean.
Query Expansion: It comes up with variations of your question, casting a wider net to catch related ideas.
Retrieval: Using those expanded queries, it fetches the most relevant chunks and summaries from Raptor’s map.
Re-Ranking: A cross-encoder model takes a closer look, re-ranking results for true relevance—not just surface similarity.
LLM Response Generation: Finally, a Large Language Model (like Google’s Gemini) pulls together the best info into a clear, concise answer, complete with sources.
The Tech Stack (for the Curious)
Framework: LangChain
AI Models: Google Vertex AI (for both embeddings and LLM)
Vector Store: PostgreSQL + pgvector
Clustering: UMAP & Gaussian Mixture Models
Web Scraping: Trafilatura
Interface: Streamlit
What Makes This Special?
Recursive Clustering (Raptor): We don’t just group data—we build a semantic map, layer by layer.
Context-Aware Queries: Every question is understood in context, not isolation.
Query Expansion: We go beyond your words to find what you meant.
Cross-Encoder Re-Ranking: Results aren’t just “close”—they’re right.
Wrapping Up
By combining Raptor’s deep document understanding with a smart, multi-step query process, we’ve built a RAG system that’s more than just a search tool.
It’s a true assistant—one that understands, synthesizes, and delivers answers you can actually use.
We’re genuinely excited about what this means for anyone working with big, complex information sets.
If you’re ready for smarter search, give Raptor a try.
And a big thanks again to remolab for helping make this possible!
Stay informed with the latest guides and news.