Core Concepts

How RAG Works

Retrieval-Augmented Generation (RAG) is a technique that gives LLMs access to specific, up-to-date information without fine-tuning.

The Ingestion Pipeline

Chunking

When you upload a document, we break it into smaller, overlapping "Chunks". This ensures that the context is preserved while making search more granular.

Embedding

Each chunk is fed into an Embedding Model that turns text into a high-dimensional vector. Similar meanings end up near each other in this mathematical space.

Vector Storage

These vectors are stored in our advanced database, indexed for blazing-fast retrieval.

The Retrieval Loop

  1. User Query — You ask a question.
  2. Vector Search — We convert your question into a vector and find the most similar chunks.
  3. LLM Synthesis — The original question + retrieved chunks are sent to the AI model.
  4. Verified Answer — The AI answers based only on the provided context, reducing hallucinations.