RAG Systems | iSeeCI

Course Overview

RAG transforms LLMs from general-purpose chatbots into domain experts. By retrieving relevant context before generation, you eliminate hallucinations and ground responses in your actual data. This course covers the full RAG stack from basic retrieval to advanced graph-based approaches.

Embedding Fundamentals

How text becomes vectors. Embedding models, dimensionality, and semantic similarity.

OpenAI Ada Cohere Sentence Transformers

Vector Databases

Storing and querying embeddings at scale. Index types, filtering, and hybrid search.

Pinecone ChromaDB Weaviate

Chunking Strategies

Document preprocessing, chunk sizes, overlap, and semantic chunking approaches.

Recursive Semantic Agentic

Retrieval Strategies

Beyond basic similarity search: reranking, query expansion, and multi-query retrieval.

Cohere Rerank HyDE Multi-Query

GraphRAG

Knowledge graphs for RAG. Entity extraction, relationship mapping, and graph traversal.

Neo4j LlamaIndex Microsoft GraphRAG

Evaluation & Optimization

Measuring RAG quality. Retrieval metrics, generation quality, and continuous improvement.

RAGAS Faithfulness Relevance

RAG Pipeline Architecture

┌─────────────────────────────────────────────────────────────────────────┐ │ INGESTION PIPELINE │ ├─────────────────────────────────────────────────────────────────────────┤ │ Documents → Chunker → Embedder → Vector DB │ │ ↓ ↓ ↓ ↓ │ │ [PDF/MD] [Split] [Ada-3] [Pinecone] │ │ [HTML] [Overlap] [Cohere] [ChromaDB] │ │ [JSON] [Semantic] [Weaviate] │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────┐ │ RETRIEVAL PIPELINE │ ├─────────────────────────────────────────────────────────────────────────┤ │ Query → Query Expansion → Retrieval → Reranking → Context │ │ ↓ ↓ ↓ ↓ ↓ │ │ [User] [HyDE/MQ] [Top-K] [Cohere] [Prompt] │ │ [Decompose] [Hybrid] [CrossEnc] │ └─────────────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────────────┐ │ GENERATION PIPELINE │ ├─────────────────────────────────────────────────────────────────────────┤ │ Context + Query → Prompt Template → LLM → Response + Citations │ │ ↓ ↓ ↓ ↓ │ │ [Merged] [System] [Claude] [Grounded] │ │ [Few-shot] [GPT-4] [Sourced] │ └─────────────────────────────────────────────────────────────────────────┘

Structural Patterns for RAG

RAG systems benefit from structural patterns that manage complexity and enable flexible component swapping.

Composite

Build document trees where folders contain documents contain chunks. Process entire hierarchies uniformly with recursive operations.

Flyweight

Share embedding model instances and vector DB connections across retrievers. Avoid loading the same expensive resources multiple times.

Bridge

Separate retrieval abstraction from implementation. Switch between Pinecone, ChromaDB, or Weaviate without changing retriever logic.

Template Method

Define the RAG pipeline skeleton: chunk → embed → store → retrieve → generate. Let subclasses customize specific steps.

                    patterns/rag_pipeline.py
                    python
                

from abc import ABC, abstractmethod

# Template Method: Define RAG pipeline skeleton
class RAGPipeline(ABC):
    """Template for RAG pipelines - subclasses customize steps"""

    async def process_query(self, query: str) -> str:
        """Template method - defines the algorithm structure"""
        # Step 1: Expand/transform query
        expanded = await self._expand_query(query)

        # Step 2: Retrieve relevant documents
        docs = await self._retrieve(expanded)

        # Step 3: Rerank results
        ranked = await self._rerank(query, docs)

        # Step 4: Build context
        context = self._build_context(ranked)

        # Step 5: Generate response
        response = await self._generate(query, context)

        return response

    async def _expand_query(self, query: str) -> list[str]:
        """Default: no expansion. Override for HyDE, multi-query."""
        return [query]

    @abstractmethod
    async def _retrieve(self, queries: list[str]) -> list[Document]:
        pass

    async def _rerank(self, query: str, docs: list) -> list:
        """Default: no reranking. Override to add Cohere rerank."""
        return docs

    def _build_context(self, docs: list) -> str:
        return "\n\n".join(d.content for d in docs)

    @abstractmethod
    async def _generate(self, query: str, context: str) -> str:
        pass

# Bridge Pattern: Separate retriever abstraction from implementation
class VectorStore(ABC):
    @abstractmethod
    async def search(self, query: str, k: int) -> list: pass

class PineconeStore(VectorStore):
    async def search(self, query: str, k: int):
        embedding = await self.embedder.embed(query)
        return self.index.query(embedding, top_k=k)

class ChromaStore(VectorStore):
    async def search(self, query: str, k: int):
        return self.collection.query(query_texts=[query], n_results=k)
                

Iterator Pattern for Document Processing

Memory Efficiency

When processing large document collections, use iterators to avoid loading everything into memory. Process documents one at a time or in batches.

                    patterns/document_iterator.py
                    python
                

from typing import Iterator, Generic, TypeVar
from pathlib import Path

T = TypeVar("T")

class ChunkIterator(Generic[T]):
    """Iterator pattern for memory-efficient document processing"""

    def __init__(self, documents: list[Path], chunk_size: int = 500):
        self.documents = documents
        self.chunk_size = chunk_size
        self.doc_index = 0
        self.chunk_buffer: list[str] = []

    def __iter__(self) -> Iterator[str]:
        return self

    def __next__(self) -> str:
        # Refill buffer if empty
        while not self.chunk_buffer:
            if self.doc_index >= len(self.documents):
                raise StopIteration

            doc = self._load_document(self.documents[self.doc_index])
            self.chunk_buffer = self._chunk_document(doc)
            self.doc_index += 1

        return self.chunk_buffer.pop(0)

    def _load_document(self, path: Path) -> str:
        return path.read_text()

    def _chunk_document(self, text: str) -> list[str]:
        # Simple chunking - override for semantic chunking
        chunks = []
        for i in range(0, len(text), self.chunk_size):
            chunks.append(text[i:i + self.chunk_size])
        return chunks

# Usage - processes millions of docs without memory issues
async def ingest_documents(paths: list[Path], vector_store):
    for chunk in ChunkIterator(paths):
        embedding = await embedder.embed(chunk)
        await vector_store.upsert(chunk, embedding)
                

Vector Database Comparison

Database	Best For	Scaling	Setup
Pinecone	Production, managed service	Automatic, serverless	`pip install pinecone-client`
ChromaDB	Local development, prototyping	Single machine	`pip install chromadb`
Weaviate	Hybrid search, GraphQL API	Kubernetes, cloud	Docker or Weaviate Cloud
Qdrant	Advanced filtering, Rust performance	Cluster mode	`pip install qdrant-client`
pgvector	Existing Postgres infrastructure	Postgres scaling	Postgres extension

Hands-On Projects

Build a basic RAG system with ChromaDB and sentence-transformers
Implement the Template Method pattern for a configurable RAG pipeline
Create a Bridge pattern to swap between Pinecone and ChromaDB
Build a document iterator that processes 10GB of PDFs efficiently
Implement HyDE (Hypothetical Document Embeddings) for query expansion
Add Cohere reranking to improve retrieval quality
Build a GraphRAG system with Neo4j and entity extraction
Evaluate your RAG system with RAGAS metrics

Ready to Build Knowledge Systems?

Give your AI applications access to enterprise knowledge. Continue with Fine-Tuning & Customization to create domain-specific models.

Enroll Now

RAG &
Knowledge Systems

Course Overview

Embedding Fundamentals

Vector Databases

Chunking Strategies

Retrieval Strategies

GraphRAG

Evaluation & Optimization

RAG Pipeline Architecture

Structural Patterns for RAG

Composite

Flyweight

Bridge

Template Method

Iterator Pattern for Document Processing

Memory Efficiency

Vector Database Comparison

Hands-On Projects

Ready to Build Knowledge Systems?

Ask iSeeCI

RAG &Knowledge Systems

Course Overview

Embedding Fundamentals

Vector Databases

Chunking Strategies

Retrieval Strategies

GraphRAG

Evaluation & Optimization

RAG Pipeline Architecture

Structural Patterns for RAG

Composite

Flyweight

Bridge

Template Method

Iterator Pattern for Document Processing

Memory Efficiency

Vector Database Comparison

Hands-On Projects

Ready to Build Knowledge Systems?

Ask iSeeCI

RAG &
Knowledge Systems