Introducing: Building Agentic AI (Workflows, Fine-Tuning, Optimization, and Deployment)

Nov 07, 2025

My name is Sinan Ozdemir and I have been writing about AI for over a decade now and my new book, Building Agentic AI launches in mid December 2025 and focuses on real implementation details on deploying and iterating on Agentic AI. This blog series will give you the highlights of over a dozen detailed case studies from the book—each with working code, benchmarks, and lessons learned from extensive testing.

Alongside Julian, a friend and AI enthusiast, we will show you why LLMs forget things mid-conversation, why agents get stuck in endless loops, why your RAG system can’t seem to grab the right document even though it seems “obvious” to you, and much more.

About the Book, “Building Agentic AI”

We don’t work in a research lab of a frontier AI company making small changes that most people won’t notice; we work in the research lab of startups and enterprise companies. Every single case study and example is born from a real client project from Sinan or lessons that he gives in his lectures to working professionals, up-leveling themselves in modern AI. This book is 300+ pages of ready-to-use code and hard-won insights from deploying AI systems that actually work in production.

The book takes you from basic LLM workflows to sophisticated multi-agent systems. For example:

You’ll build a RAG-focused SQL agent that outperforms pure LLM prompting by at least 30%.
Create agents that follow company policies without hallucinating.
Deploy voice bots with sub-second latency. And learn when not to use agents (spoiler: more often than you’d think).

Each chapter builds on real problems hit in production. Like when a customer support bot started making up refund policies or when a “smart” agent burned through $500 in LLM usage and API calls solving a simple task that could have been solved using only $20 dollars and a little more foresight.

By the end of the book and this series, you will know exactly when to use workflows vs. agents, how to evaluate AI systems at scale, and why fine-tuning might not always be the answer but when it is, you’ll know how to do it right.

The journey ahead

There’s a lot of ground to cover here: From basic RAG pipelines to computer-controlling agents. And every chapter will include code you can follow along with to understand why it works (or doesn’t in some cases).

Here’s a sneak peak at the case studies covered in this series:

Case Study 1: Text to SQL Workflow (LangGraph, RAG, document retrieval) Build a system that converts questions to SQL using RAG. Your SQL accuracy will beat raw LLMs by 30% with half the token cost.
LLM Evaluation (evaluating Case Study 1) An exploration of how well the SQL system we built actually works.
LLM Experimentation (high-level concepts and why you should care) Run systematic tests on prompts, models, and embedding models. Sometimes even the smallest changes can 2x your performance when you experiment efficiently.
Case Study 2: “Simple” Summary Prompt (embedding similarity, positional bias) Discover why LLMs favor content at the start and end of prompts. This bias is breaking your RAG system and chatbots and you may not even know it.
Case Study 3: From RAG to Agents Convert a workflow into an agent that makes its own decisions using tools. Agents handle the weird edge cases your workflow never imagined, but at what cost (literally)?
AI Rubrics for Grading Create scoring systems to evaluate AI outputs consistently and with mitigated bias. less arguing about quality - more clear, measurable criteria.
Case Study 4: AI SDR with MCP (multi-agent systems) Build multiple agents that research contacts and send emails. Your outreach can finally sound human at scale.
Case Study 5: Prompt Engineering Agents Create agents that follow company policies using a synthetic test data as a measuring stick. See how one single sentence in a prompt can move accuracy from 44% to 70%.
Case Study 6: Deep Research + Agentic Workflows Combine structured workflows with agent flexibility for research tasks. Get reliability without sacrificing adaptability.
Case Study 7: Image Retrieval Pipelines Build image search using CLIP-like embeddings and cosine similarity.
Case Study 8: Visual Q&A with Moondream Use a tiny (relatively) LLM called Moondream to answer questions about images.
Case Study 9: Coding Agents Build an agent that writes code to solve tasks more efficiently and accurately.
Case Study 10: Agentic Tool Selection Performance Test how well different LLMs choose the right tools. Tool order in prompts can shift accuracy by 40%.
Case Study 11: Benchmarking Reasoning Models Compare reasoning models like o1 and Claude against standard LLMs. They may even lose to cheaper models on real tasks, we’ll see!
Case Study 12: Computer Use Build agents that control browsers and applications through screenshots. Your agent can finally use software you can’t API into.
Case Study 13: Classification vs Multiple Choice (fine-tuning) Compare fine-tuning against multiple choice prompting for classification. The winner might depend on if you have 100 or 10,000 examples.
Case Study 14: Domain Adaptation (fine-tuning Qwen for Airbnb policies) Fine-tune Qwen on domain-specific documents. Generic models become experts in your exact business rules.
Case Study 15: Speculative Decoding with Qwen Speed up inference by having a small model draft for a large model. Same exact outputs, 2-3x faster, sometimes ;)
Case Study 16: Voice Bot - Need for Speed (Twilio + WebSockets + Groq) As newer voice to voice models mature, building real-time voice bots with streaming audio can still perform well with sub 500ms responses making conversations feel natural.
Case Study 17: Fine-Tuning Matryoshka Embeddings Train embeddings that work at multiple dimensions. Dynamically trade speed for accuracy based on each query’s needs.

Who we are

Sinan Ozdemir - I was building AI systems before Transformers were cool. I founded one of YC’s first ever GenAI companies in 2015, patented AI Agents in 2019, and have worked as a Director of Data Science and AI consultant for several private and public companies. I previously taught AI and Math at Johns Hopkins, and I’ve written a half dozen books, including The Principles of Data Science.

Julian Alvarado - AI enthusiast. Vibe marketer. Fun guy. I’ve been around the block over the last 10 years doing content and PMM work in across the tech scene from FinTech, Data & Analytics, and even Crypto.

Next up: Building our first production-ready workflow

The next post covers the LLM concepts that make or break production systems: tokens, context windows, alignment, prompt ordering, workflows vs agents.

Let’s dive in →

AI Office Hours

Discussion about this post

Ready for more?