My Road to i80

The Spark
I love to cook. Not by the book, but with intuition, memory, and whatever ingredients are at hand. Over the years, I've built a rich personal archive: dishes invented on the fly, surprising flavor combinations, and moments of improvisation captured in photos, notes, and conversations.
Then, in January 2023, I encountered ChatGPT. I was blown away by its ability to answer questions and generate ideas. I'd ask about cuisines, techniques, or flavor pairings, and the responses were impressive. But when I asked, "What did I cook that night with salmon and miso?", the reply came through as: "I don't have access to your personal cooking history." For all their brilliance, these large language models (LLMs) didn't know my stories. They couldn't access the unique experiences, preferences, and memories that make me who I am.
The Turning Point
That's when it hit me: What if I could build an AI that feels like an LLM but knows what I know? Not a generic model trained on the internet, but a personal system capturing my cooking stories, my thought processes, my unique way of seeing the world. I imagined an AI that could answer, "What would I have cooked with those ingredients?" or even "How would I have approached that problem?" - not based on generic data, but on my own lived experience.
This led me to Retrieval Augmented Generation (RAG), a technology that combines the conversational fluency of LLMs with a structured knowledge base tailored to specific domains. I started building a personal knowledge base - not of recipes, but of my cooking stories, complete with ingredients, intentions, and moments of inspiration. This system wouldn't just recite facts; it would reflect my style, my creativity, my voice.
That's also where the name "i80" comes from - inspired by the interstate highway built for clarity, structure, and speed. Just like that road, this system is meant to navigate complexity with confidence. No guesswork. No detours. Just grounded, context-aware answers that get you where you need to go.
My vision goes beyond cooking. It's about creating an AI that acts like an extension of myself - one that preserves my memories and thought patterns for future reflection, or for others to access my perspective. It's a step toward digital continuity, a way to keep my perspective alive. And this approach isn't just personal. The same technology can empower organizations - like hotel guest services, HR help desks, new employee onboarding assistants and many others - where public LLMs fall short due to a lack of domain-specific, trustworthy knowledge.
The Challenge Ahead
Building an AI that knows you like you know yourself is no small task. It requires blending the expressive power of LLMs with the precision of curated data. But that's the road I'm on: a path to an AI that doesn't just answer, but remembers, reflects, and resonates.
I don't know how far I'll go, or exactly what the final destination might look like. But that's part of the journey. What started as a spark - an idea to build an AI that truly understands me - has become a deeply personal exploration of memory, language, and technology.
A Living Journal
This website will document my road to i80. Through this space, I'll share what I'm learning along the way - the breakthroughs and the dead ends, the tools and techniques, the insights that emerge when intuition meets iteration. Whether it leads to a fully functioning personal AI, a new kind of storytelling engine, or something I haven't imagined yet, I'm here to explore it - and you're welcome to follow along.
- Alex P. Wang
October 28, 2024
How I Got Started
I jumped in after the idea struck me - despite not knowing much about large language models.
Back in graduate school, I studied expert systems, AI, and neural networks, but LLMs were a whole new world.
I knew I'd need Python, but at that point, I hadn't written a single line of it.
Thankfully, there's an abundance of resources online - and even better, we now have LLMs to help along the way. It didn't take long for me to set up my initial Python environment using VS Code. And just like that, I was on my way. (Set Up Your Python App Environment)
At first, I tried downloading open-source LLMs and fine-tuning them with my own knowledge base. That quickly turned into a dead end. Fine-tuning is resource-intensive, hard to iterate, and - most importantly - poorly suited for domain knowledge that changes frequently or needs precise control. It simply wasn't the right tool for a task where accuracy, flexibility, and explainability matter. Eventually, I came across the concept of Retrieval-Augmented Generation (RAG), and everything started to make more sense.
How It Works
I think the best way to explain how a Retrieval-Augmented Generation (RAG) system works is with an analogy - how a human answers a question.
Imagine you're answering a question from your friend. First, you think back through your memory to find anything relevant - that's retrieval. Then, based on what you remember, your brain put the answer together in your own words - that's generation.
That's essentially what a RAG system does. It has a curated knowledge base (its memory). The retriever pulls in the most relevant information from the knowledge base, and the orchestrator - like your brain - decides how to respond: either directly or by calling on an LLM to help craft a clear, conversational answer.
- Curate a knowledge base (memory) - Use your private, domain-specific content - facts, stories, documentation - to build a structured foundation.
- Retrieve relevant content (retrieval) - When a question is asked, the retriever searches the knowledge base for snippets that are semantically related and have high similarity scores.
- Orchestrate a response (brain):
- If the similarity score is high and the answer is straightforward, it returns the response directly.
- If the similarity is lower or the question is more nuanced, it sends the retrieved content to the LLM - along with clear instructions to stay grounded in the facts and avoid hallucination.
The result is an AI that answers more like a well-informed person: thoughtful, relevant, and context-aware.
Key Building Blocks
It didn't take me long to piece together the basic building blocks and get things up and running. At the heart of what I built is a modular system - one that reflects how we, as humans, think and respond. I set up a curated knowledge base to serve as memory. A retriever fetches the most relevant information based on your question. An orchestrator - something I fine-tuned carefully - decides whether to respond directly or synthesize information from multiple sources with the help of an LLM.
Once I had these core parts working together, things really started to click. The setup gave me a solid foundation to start experimenting - exactly the kind of system I had envisioned from the very beginning.
Knowledge Base
A structured memory system built from your content. Text is transformed into semantic vectors and stored in a vector database for fast, meaningful retrieval.
Retriever
Finds the most relevant entries from the knowledge base by comparing the question to stored vectors using similarity scores.
Orchestrator
Decides how to respond - sometimes returning retrieved content directly, or invoking the LLM for synthesis when needed.
LLM
Produces answers in natural language, using the retrieved content as context for thoughtful, accurate responses.
Key Challenges
The concept sounded simple - until I actually started building and testing my initial knowledge base. One of the first things I had to understand was how the system makes sense of conversational language. It does this through a technique called embeddings. In simple terms, embeddings convert text into numbers (technically, vectors) - representations that capture meaning, not just literal words. This allows the system to compare concepts and retrieve content that's semantically relevant to the question being asked, even when the wording or language differs.
With that foundation in place, a new challenge emerged: what content should I embed? At first, I followed the conventional method of breaking documents into evenly sized chunks. But in practice, this approach often missed the essence of what users were really asking. What turned out to be far more effective - especially in a focused domain - was creating query-focused embeddings that align more closely with the actual questions people tend to ask. That shift significantly improved the system's performance and reliability. It laid the foundation for my solution. Of course, embeddings are just one part of the puzzle - many other challenges still remain.
Embedding Quality
Poor or inconsistent embedding quality due to vague text, evolving language, or model drift. Embedding mismatches lead to retrieval errors - especially over time
Query Understanding
Interpreting vague or conversational questions - especially when context is implied or missing. Handling multi-turn conversations or follow-ups that reference previous questions or answers
Retrieval Accuracy
Ensuring the right pieces of information are found, ranked by relevance, and passed correctly to the language model.
Knowledge Base Coverage
Capturing enough structured, relevant knowledge to confidently answer the real-world questions users actually ask. Keeping the knowledge base up to date, pruning outdated info.
Multilingual Support
Accurately retrieving and generating across multiple languages. Embeddings and language models trained in one language often degrade in performance with others.
Hallucination Control
Making sure the model generates responses based only on trusted knowledge, avoiding plausible-sounding fabrications.
Research
Based on the challenges I've encountered while building private domain RAG systems - from query understanding to embedding design - I've identified several key areas that deserve deeper exploration. This section is where I document that journey. I am actively researching these topics through a mix of formal research papers, where I explore core architectural questions, compare retrieval strategies, and evaluate embedding methods across different domains - including hotels and my cooking stories. Alongside these papers, I publish shorter articles that share practical insights, design choices, and lessons learned from building real systems. All of this is a work in progress - continuously updated as I discover new ideas, run new experiments, and refine the path forward.
Papers
Enhancing Query Retrieval Precision Through Optimized Embedding Text Selection
This paper analyzes how embedding construction affects semantic search precision in private-domain applications. It evaluates normalization, synonyms, typo handling, and conversational phrasing using text-embedding-3-large and ChromaDB, providing evidence-backed recommendations to improve retrieval accuracy.
Optimizing Retrieval in Private Knowledge Systems: A Comparison of Query-Focused and Chunk-Based Embedding Strategies
This paper compares query-focused embedding and chunk-based retrieval in private knowledge systems, using hotel data and cooking stories. It evaluates precision, latency, and efficiency, showing how query-focused methods can improve accuracy and reduce overhead in structured, domain-specific RAG applications.
Tracking Domain Coverage Growth in Narrow Knowledge Spaces
This paper presents a method for tracking how well a hotel-focused knowledge base grows to cover user queries. It introduces metrics to measure domain coverage, identify gaps, and evaluate how retrieval accuracy improves as new content is added over time.
Articles
Selecting the Best Embedding Method for Limited Domain Knowledge
This article compares chunk-based and query-focused embedding strategies for RAG systems in limited-domain settings. It outlines 15 chunking methods and highlights why query-based embedding offers greater precision and efficiency for structured knowledge bases.
Embedding with Targeted Language: Enhancing Intent Matching in Multilingual RAG Systems
This article explores techniques for improving intent matching in multilingual RAG systems by embedding content in the user's native language. It demonstrates how targeted-language embeddings increase retrieval accuracy across English, Chinese, and bilingual queries.
Learning from Queries: Automating Intent Expansion in RAG Systems
This study proposes a method for using real user queries to discover new intents and improve RAG coverage. It outlines a feedback loop where unmatched or ambiguous queries are clustered and used to suggest new embedding entries.
Precision Routing in Hybrid RAG: Balancing Retrieval, Generation, and Escalation
This article introduces a hybrid RAG architecture that routes queries to retrieval, generation, or escalation paths based on confidence and intent type. It aims to optimize system performance and user trust in private-domain assistants.
Calibrating Confidence in RAG: Threshold Tuning for Trustworthy Retrieval and Routing
This article investigates how to set and tune similarity thresholds in RAG systems to avoid hallucinations and misrouting. It provides practical guidelines for confidence-based decision making in domain-constrained environments.
Beyond Retrieval: Integrating RAG with APIs and Agent Actions in Private Domains
This article explores how RAG systems can be extended to trigger API calls or agent workflows. It demonstrates how retrieval results can serve as the decision layer in structured action pipelines for hotel, service, or task automation.
Want to Collaborate?
This is an ongoing personal research journey - not a finished product, but a path of discovery. There are many challenges ahead: from capturing intent to handling ambiguity, from structuring knowledge to scaling across subjects within a domain. I'm approaching it step by step, using a divide-and-conquer mindset to explore, test, and refine. Is the end goal too ambitious? Perhaps. But I'm confident that I'll make progress, uncover interesting insights, and grow along the way.
If you're working on similar problems or have ideas to share, I welcome collaboration and conversation. Let's learn from each other.
Want to collaborate or learn more?
Email me at alex@i80.com