I got fed up with the training process that makes AI models forget everything they know just to remember my birthday. The problem? Overfitting. It's like giving AI Alzheimer's - the model "unlearns" how to be smart just to memorize one stupid fact.
My blood pressure spiked every time I watched a model lose its intelligence for a pizza preference. The solution? A hybrid architecture that separates personality from memory. Fine-tune the soul with LoRA, store the facts in ChromaDB.
Here's how I'm building my digital soul without destroying its intelligence - using Llama 4 Scout Instruct, ChromaDB vector databases, local fine-tuning, and my multi-rig GPU farm.
Note: Evolution of My Replicant //
My first model was trained using Mistral-7B-Instruct-v0.3. I'm currently working on shifting
to Llama 4 Scout Instruct for reasons I'll explain below - primarily the 10-million token
context window, native multimodality, and Mixture-of-Experts architecture. I'm also planning to
use a 4D vector embedding database structure that includes time vectors for semantic search
across temporal memories.
The setup will use local ChromaDB with synchronization to Google's Vertex AI Embeddings API for
Firebase compatibility. You can read about my earlier work with Mistral on my
Substack - this is the next phase in
the journey.
>
The Reality: Fine-tuning isn't brain surgery. It's adding a personality filter. But if you train
too hard, the filter becomes a straitjacket.
The model forgets how to code, how to think, how to be a genius - all because you forced it to
remember your pizza preference 10,000 times.
LLM Fine-TuningExperimentalLoRAChromaDB
When Training Becomes Brainwashing
I spent hours debating myself about this. The training process is supposed to teach the model who I am. But every tutorial tells you to run more epochs.
More epochs means more learning, right?
Wrong.
If I run 10,000 epochs to teach the model my favorite food, I'm not teaching it a fact - I'm training the math to believe that every answer should involve pizza. The model loses its general intelligence. It forgets how to code.
It forgets how to solve problems. It becomes a parrot that only knows how to repeat what I told it.
This is called overfitting. It's like giving AI Alzheimer's - the mathematical weights that held other knowledge get overwritten. The model doesn't "choose" to forget.
The new math literally erases the old math because it's trying to satisfy my 10,000-epoch demand.
I refuse to build a lobotomized AI.
The Problem //
Overfitting is real. When you over-train a model on one specific thing, the mathematical weights
that held other information get literally overwritten. The model doesn't just "learn" your
birthday - it "unlearns" how to be a genius.
But I needed to upgrade. I looked at the options. Llama 3.1. Mistral 3. DeepSeek V3.2.
I'm going with Llama 4 Scout Instruct - the 109B parameter Mixture-of-Experts model that only activates 17B parameters per token. I'm currently building this new version.
Here's why this matters for a digital replicant:
10-million token context window: I could feed it my entire life's history in one conversation
Native multimodality: It understands images, not just text
Efficiency: With my multi-rig setup, this runs fast - distributed training across rigs when needed
Instruct foundation: It already knows how to have a conversation - I'm just adding my personality, not teaching it basic social skills
Better for semantic memory: The MoE architecture handles complex temporal relationships better than dense models
>
The Strategy: Use a model that already knows how to talk. Build on the Instruct foundation. Add
personality, don't teach etiquette.
That's the difference between fine-tuning and starting from scratch.
The Soul: LoRA Fine-Tuning
Personality, tone, values, and how I express myself. This gets baked into the model with
Low-Rank Adaptation - a filter that doesn't touch the core brain.
The Library: ChromaDB RAG
Birthdays, memories, specific facts, and personal history. This lives in a vector database
that the model queries before answering - perfect memory without corrupting the brain.
Building My Digital Soul: The Two-Brain Solution
I realized I needed to treat the AI's brain like a dual-process system. Not everything belongs in the model weights.
The Soul: LoRA Fine-Tuning
This is where personality lives. How I speak. My sarcasm.
My sentence structure. My values. My general vibe.
I'm using LoRA (Low-Rank Adaptation) - not full fine-tuning. LoRA doesn't touch the main brain. It freezes the original model and adds a tiny transparent layer on top.
Think of it like a camera filter - you don't grind the glass, you snap on a filter.
My settings:
Rank (r): 64 - High capacity for personality depth
Alpha: 128 - Stronger signal for behavior replication
Epochs: 1-3 only - I want it to mimic me, not memorize me
Why LoRA? //
LoRA freezes the base model. The original Llama 4 intelligence stays 100% intact. I'll only be
training about 1% of the parameters.
The adapter will be only 100-500MB. I can have 50 different personality adapters and swap them in
seconds while using the same base model.
The Library: ChromaDB Vector Database with 4D Temporal Embeddings
This is where facts live. Birthdays. Family names. Specific memories.
Personal experiences. But here's the key difference - I'm planning to use a 4D vector embedding structure that includes time vectors for semantic search across temporal memories.
Instead of forcing the model to "remember" my birthday in its weights, I give it a library card. The model queries ChromaDB before answering. It finds the relevant fact or memory - including temporal context - and "whispers" it to the smart brain right before it speaks.
Why ChromaDB over AnythingLLM?
Flexibility: I can build custom apps. AnythingLLM locks me into their interface
Control: It's just Python. I own the entire stack
Local-first: ChromaDB runs locally, but I synchronize with Google's Vertex AI Embeddings API for Firebase compatibility
4D temporal structure: Time vectors allow the system to understand "when" things happened, not just "what" happened
Persistent: The database survives model upgrades. I can switch from Llama 4 to Llama 5 and keep all my memories
python
1import chromadb
23# Create a local database4client = chromadb.PersistentClient(path="./chroma_db")56# Create a collection for personal memories7collection = client.get_or_create_collection(name="personal_memories")89# Add a memory10collection.add(11 documents=["I visited Paris in June 2019 and loved the croissants."],12 metadatas=[{"date":"2019-06-01","location":"Paris"}],13 ids=["id1"]14)1516# Query the memory17results = collection.query(18 query_texts=["Where did I go in 2019?"],19 n_results=120)21print(results["documents"])
Vector DatabaseLocal OnlyChromaDB
Platform Features
Why this hybrid approach works better than dumping everything into training data.
Prevents Overfitting
By separating personality from facts, the model keeps its general intelligence. No more AI
Alzheimer's from over-training.
Perfect Factual Accuracy
Birthdays and dates are looked up, not "remembered" through shaky math. 100% accurate, 0%
hallucination.
Model Upgrade Path
When Llama 5 drops, I don't lose my memories. The ChromaDB folder stays unchanged. I just
point the new model at the old database.
Custom Application Building
With ChromaDB, I can build web apps, mobile interfaces, or custom replicant UIs. AnythingLLM
locks me into their chat window.
The Hard Truth: Epochs Aren't Magic
Here's what I learned the hard way. More epochs don't mean better learning. They mean more overfitting.
1 epoch: The model skims your personality. Gets the general vibe. Misses some details.
3 epochs: The model recognizes your style. Picks up the tone. Starts to sound like you.
100 epochs: The model memorizes your training data. Stops thinking. Becomes a parrot that only knows how to repeat what you told it.
I limit myself to 1-3 epochs. I want the AI to be inspired by my thoughts, not shackled by them.
I want it to keep its world-class Llama 4 intelligence while just adding my flavor on top.
Reality Check //
Quality beats quantity. Instead of repeating one fact 1,000 times, write 1,000 different ways that
fact might come up in conversation. Diversity is better than repetition.
Mix personal facts with general knowledge to keep the model's smart brain active.
The Local-Only Manifesto
I run everything locally. No cloud. No Google Colab. No Hugging Face Hub uploads.
Why?
Privacy: My thoughts stay in my house
Control: I own the entire stack
Speed: Six dedicated rigs - I don't need cloud GPUs
Immortality: My work survives even if the cloud providers shut down
My rig setup:
Rig
GPU Setup
CPU
Rig 1
2x NVIDIA A6000
Intel i9-14900K
Rig 2
2x NVIDIA A6000
Intel i9-14900K
Rig 3
2x NVIDIA A6000
Intel i9-14900K
Rig 4
2x RTX 5090 (Blackwell)
Intel i9-14900K
Rig 5
2x RTX 5090 (Blackwell)
Intel i9-14900K
Rig 6
2x RTX 5090 (Blackwell)
Intel i9-14900K
My software setup (in progress):
Base Model: Currently upgrading to Llama 4 Scout Instruct on local storage (from Mistral-7B-Instruct)
Training: Unsloth framework with offline mode
Database: ChromaDB on local storage - planning 4D temporal embeddings
Embeddings: Local ChromaDB - will synchronize with Google's Vertex AI Embeddings API for Firebase compatibility
Environment:HF_DATASETS_OFFLINE=1 and TRANSFORMERS_OFFLINE=1
>
The Takeaway: Your AI stays world-class smart because its brain isn't "Alzheimer-fied" with too
much data. It has perfect, infinite memory because it's reading from local storage, not trying to
remember through shaky math. That's how you build a replicant that reflects you without destroying
itself.
Local OnlyAir-GappedSelf-Hosted
Current Progress
I'm currently building the Llama 4 Scout Instruct version. The architecture is designed - 2 epochs planned, Rank 64, Alpha 128. The goal is to keep the model's intelligence while making it sound like me.
It shouldn't forget how to code. It shouldn't forget how to solve problems. It shouldn't become a parrot.
The ChromaDB setup is in progress. I'm planning to prime it with memories. Birthdays. Family facts.
Core preferences. The vision is that when I ask about my mom's birthday, it will look it up and respond in my voice.
The soul will come from the training. The facts will come from the database. No lobotomy.
No Alzheimer's. Just a digital replicant that reflects me without destroying itself.
Work in Progress //
The hybrid architecture is the plan. Personality will live in LoRA adapters. Facts will live in
ChromaDB.
The model should stay smart. The memories should stay accurate. This is how I'm building a
world-class replicant without breaking the brain.
I'm still iterating. Still learning. Still building.
But I believe I've solved the fundamental problem - how to teach an AI who I am without making it forget how to be smart.
The foundation is designed. The architecture is planned. Once I finish the Llama 4 Scout build, I can start adding more memories, more personality, more depth.
But I'll never run 10,000 epochs. I'll never force-feed the model my entire life story. I'll keep the soul separate from the library.
That's how real replicants get built.
Article Stats
Llama 4 Scout InstructModel
109B (MoE)Parameters
ChromaDBMemory
Latest Blog Posts
Manifesto2026-05-02
The Rise of the Agentic Internet
The era of building website content is dead. The digital world just hasn't seen the body yet. I am moving to Full Agentic AI — and the implications will dismantle the current server-based software industry.
2026-02-12
LM Studio vs. Ollama
LM Studio runs Llama 4 Scout on local GPUs - but even 96GB VRAM has limits. Context length matters. Kilo Code bridges your IDE to local models. Here is what I learned.
Best Practices2026-02-08
Why You Must Run ESLint Before You Touch the "Cloud"
Running ESLint locally isn't optional - it's your first defense against broken Vercel deployments. I learned this the hard way when my code pushed to Git, triggered Vercel, and failed after 5 minutes of waiting. The fix? A 0.5-second local ESLint check that catches errors before they reach production. Here's why ESLint prevents deployment failures, code rot, and invisible performance bugs.
Achievement2026-02-08
Building a Neural Link Architecture: Zero Link Rot with AI-Powered Semantic Linking
I got absolutely fed up with broken internal links and manual link maintenance. The problem? Hardcoded links rot when slugs change. The solution? A neural link architecture that uses vector embeddings, hybrid ranking algorithms, and AI to automatically inject semantically relevant links at render-time. This system eliminates link rot, scales to thousands of articles, and ensures every link is contextually relevant. Here's how I built a semantic linker that treats websites as living knowledge graphs for AI citation systems.
A
B
C
This article is part of a Semantic Cluster. All links are managed by the Digital Architect AI.