Core Concepts
How RAG Works
Retrieval-Augmented Generation (RAG) is a technique that gives LLMs access to specific, up-to-date information without fine-tuning.
The Ingestion Pipeline
Chunking
When you upload a document, we break it into smaller, overlapping "Chunks". This ensures that the context is preserved while making search more granular.
Embedding
Each chunk is fed into an Embedding Model that turns text into a high-dimensional vector. Similar meanings end up near each other in this mathematical space.
Vector Storage
These vectors are stored in our advanced database, indexed for blazing-fast retrieval.
The Retrieval Loop
- User Query — You ask a question.
- Vector Search — We convert your question into a vector and find the most similar chunks.
- LLM Synthesis — The original question + retrieved chunks are sent to the AI model.
- Verified Answer — The AI answers based only on the provided context, reducing hallucinations.