Fine-Tuning | iSeeCI

Course Overview

Fine-tuning adapts pre-trained models to your specific domain. When RAG isn't enough—when you need the model to think differently, not just access different data—fine-tuning is the answer. This course covers modern parameter-efficient techniques that make training accessible.

Transfer Learning Basics

How pre-trained models encode knowledge and why fine-tuning works.

Pre-training Catastrophic Forgetting Domain Shift

Dataset Preparation

Curating training data, formatting for instruction tuning, quality filtering.

Alpaca Format ShareGPT Data Augmentation

LoRA & QLoRA

Parameter-efficient fine-tuning. Train billion-parameter models on consumer GPUs.

LoRA QLoRA PEFT

Training Infrastructure

Setting up training environments, distributed training, monitoring.

HuggingFace Axolotl W&B

Evaluation & Iteration

Measuring fine-tuning success. Benchmarks, human evaluation, A/B testing.

Perplexity Task Metrics Regression Tests

Deployment & Serving

Quantization, model merging, and efficient inference for production.

GGUF vLLM TensorRT-LLM

When to Fine-Tune vs RAG

Use Case	Best Approach	Why
Answer questions about documents	RAG	Model doesn't need to memorize, just reason
Generate code in company style	Fine-Tune	Style is embedded in weights, not retrievable
Domain-specific terminology	Both	Fine-tune for fluency, RAG for facts
Structured output formats	Fine-Tune	Consistent formatting requires training
Real-time information	RAG	Can't retrain for every update
Specific tone/personality	Fine-Tune	Voice emerges from training, not retrieval

Creational Patterns for Training

Training pipelines benefit from creational patterns that manage complex object construction and configuration.

Builder

Construct complex training configurations step by step. Model settings, LoRA config, training arguments—validated and assembled correctly.

Abstract Factory

Create families of related objects: model + tokenizer + trainer for different base models (Llama, Mistral, Qwen).

Prototype

Clone training configurations for hyperparameter sweeps. Start from a working config and modify specific parameters.

Memento

Save training state for checkpointing and recovery. Resume from any point after crashes or preemption.

                    patterns/training_builder.py
                    python
                

from dataclasses import dataclass, field
from typing import Optional

@dataclass
class TrainingConfig:
    """Immutable training configuration"""
    model_name: str
    lora_r: int
    lora_alpha: int
    learning_rate: float
    epochs: int
    batch_size: int
    gradient_accumulation: int

class TrainingConfigBuilder:
    """Builder pattern for training configuration"""

    def __init__(self):
        self._model_name: str = ""
        self._lora_r: int = 8
        self._lora_alpha: int = 16
        self._learning_rate: float = 2e-4
        self._epochs: int = 3
        self._batch_size: int = 4
        self._gradient_accumulation: int = 4

    def model(self, name: str) -> "TrainingConfigBuilder":
        self._model_name = name
        return self

    def lora(self, r: int = 8, alpha: int = 16) -> "TrainingConfigBuilder":
        self._lora_r = r
        self._lora_alpha = alpha
        return self

    def training(self, lr: float, epochs: int) -> "TrainingConfigBuilder":
        self._learning_rate = lr
        self._epochs = epochs
        return self

    def batch(self, size: int, accumulation: int = 4) -> "TrainingConfigBuilder":
        self._batch_size = size
        self._gradient_accumulation = accumulation
        return self

    def build(self) -> TrainingConfig:
        if not self._model_name:
            raise ValueError("Model name is required")
        return TrainingConfig(
            model_name=self._model_name,
            lora_r=self._lora_r,
            lora_alpha=self._lora_alpha,
            learning_rate=self._learning_rate,
            epochs=self._epochs,
            batch_size=self._batch_size,
            gradient_accumulation=self._gradient_accumulation
        )

# Usage
config = (TrainingConfigBuilder()
    .model("meta-llama/Llama-3.2-3B")
    .lora(r=16, alpha=32)
    .training(lr=1e-4, epochs=3)
    .batch(size=2, accumulation=8)
    .build())
                

LoRA Fine-Tuning Example

Why LoRA?

LoRA (Low-Rank Adaptation) freezes the base model and trains small adapter matrices. A 7B model that needs 28GB for full fine-tuning can be trained with LoRA on a 24GB GPU. QLoRA adds 4-bit quantization, enabling training on 8GB GPUs.

                    train_lora.py
                    python
                

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments
from peft import LoraConfig, get_peft_model
from trl import SFTTrainer
from datasets import load_dataset

# Load base model
model = AutoModelForCausalLM.from_pretrained(
    "meta-llama/Llama-3.2-3B",
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-3B")

# Configure LoRA
lora_config = LoraConfig(
    r=16,                       # Rank of adaptation matrices
    lora_alpha=32,              # Scaling factor
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# Apply LoRA to model
model = get_peft_model(model, lora_config)
model.print_trainable_parameters()  # Only ~0.1% of params!

# Load dataset (Alpaca format)
dataset = load_dataset("json", data_files="training_data.jsonl")

# Training arguments
training_args = TrainingArguments(
    output_dir="./lora-output",
    num_train_epochs=3,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=8,
    learning_rate=1e-4,
    warmup_ratio=0.03,
    logging_steps=10,
    save_strategy="epoch",
    bf16=True,
)

# Train with SFTTrainer
trainer = SFTTrainer(
    model=model,
    args=training_args,
    train_dataset=dataset["train"],
    tokenizer=tokenizer,
    max_seq_length=2048,
)

trainer.train()
model.save_pretrained("./lora-adapter")
                

Hands-On Projects

Prepare a dataset in Alpaca format from your domain documents
Implement the Builder pattern for training configuration
Fine-tune Llama 3.2 3B with LoRA on a single GPU
Use the Prototype pattern to run hyperparameter sweeps
Implement checkpointing with the Memento pattern
Merge LoRA weights back into base model
Quantize to GGUF for local deployment with llama.cpp
Evaluate with task-specific benchmarks and A/B testing

GPU Requirements

This course requires GPU access. QLoRA can train 7B models on 8GB VRAM, but 24GB+ is recommended for comfortable iteration. Cloud options: RunPod, Lambda Labs, or Colab Pro.

Ready to Train Custom Models?

Build AI that thinks like your domain expert. Continue with Enterprise AI Strategy to learn how to deploy and govern AI at scale.

Enroll Now

Fine-Tuning &
Customization

Course Overview

Transfer Learning Basics

Dataset Preparation

LoRA & QLoRA

Training Infrastructure

Evaluation & Iteration

Deployment & Serving

When to Fine-Tune vs RAG

Creational Patterns for Training

Builder

Abstract Factory

Prototype

Memento

LoRA Fine-Tuning Example

Why LoRA?

Hands-On Projects

GPU Requirements

Ready to Train Custom Models?

Ask iSeeCI

Fine-Tuning &Customization

Course Overview

Transfer Learning Basics

Dataset Preparation

LoRA & QLoRA

Training Infrastructure

Evaluation & Iteration

Deployment & Serving

When to Fine-Tune vs RAG

Creational Patterns for Training

Builder

Abstract Factory

Prototype

Memento

LoRA Fine-Tuning Example

Why LoRA?

Hands-On Projects

GPU Requirements

Ready to Train Custom Models?

Ask iSeeCI

Fine-Tuning &
Customization