AI Core

Fundamentals of AI models, architecture, and how they work

221 episodes · Page 2 of 10

#2693: When AI Ignores Your Style Guide

Why your AI ignores formatting instructions and how to fix it with pipeline architecture, not model swaps.

prompt-engineeringfine-tuningai-reasoning

#2692: Type Safety: Static vs Dynamic, Soundness & More

Static vs dynamic, strong vs weak, and the truth about TypeScript's unsoundness. A deep dive into type theory.

software-developmentstatic-vs-dynamic-typingtype-soundness

#2684: When Agent Skills Collide: Context Windows & Plugin Design

How to handle overlapping agent skills and whether context windows will ever make the problem go away.

ai-agentscontext-windowprompt-engineering

#2682: Live Retrieval vs. RAG: What an Agent Actually Does

Does every AI conversation create a tiny vector store? We unpack the real tradeoffs between live document fetching and pre-indexed RAG.

ragai-agentsvector-databases

#2676: Vector Database Schema Design for AI Memory Layers

Stop dumping vectors blindly. Design metadata schemas and namespaces for retrieval that actually works at scale.

vector-databasesragai-memory

#2674: Why Your Agent's Context Window Is Getting Eaten Before You Start

Stop shipping the whole toolbox to every session. A bridge plugin pattern that fetches skills on demand instead.

context-windowai-agentsprompt-engineering

#2673: The Embedding Coupling Problem: Editing Vector Stores

Can you edit or delete individual chunks in Pinecone? And can you actually back up a vector index? Yes—but with critical caveats.

vector-databasesragai-agents

#2672: When a Startup Claims to Break the Quadratic Wall

A startup claims linear attention scaling at 12M tokens, beating GPT-5.5 on retrieval benchmarks.

large-language-modelscontext-windowbenchmarks

#2664: Can You Trust an LLM's Raw Knowledge?

Why pre-trained knowledge isn't reliable for facts — and what actually makes models useful.

large-language-modelsfine-tuningrag

#2651: AI Training Itself: Student, Teacher, and Grader

Can models generate their own training data and judge their own outputs? The promise and pitfalls of fully AI-led pipelines.

large-language-modelsai-trainingmodel-collapse

#2650: How to Catch an LLM's Bad Writing Habits

A practical guide to analyzing podcast transcripts for repetitive language and dialogue patterns — from Python word counts to embedding clustering.

large-language-modelsprompt-engineeringfine-tuning

#2640: Why Instructional Models Beat Conversational for Batch AI

Beyond cheaper tokens—how batch inference changes AI workflows and why instructional models beat conversational ones for automated jobs.

llm-as-a-judgebatch-inferenceinstruction-following

#2639: The Hidden Layer That Makes Search Work

Why your search results miss the mark — and how cross-encoders fix it.

ragsearchinformation-retrieval

#2634: The Two-Stage Pipeline for Persistent User Memory

How to extract durable personal context from raw prompts and build a self-healing memory layer for AI systems.

ai-memorycontext-windowprompt-engineering

#2622: How Transformers Actually Work: Attention, Tokens, and Context

How one architectural change unlocked chatbots, image generation, and protein folding — explained without the jargon.

transformerslarge-language-modelsgpu-acceleration

#2559: The Smartest Path to Python for AI

A practical guide to the best courses and platforms for learning Python, specifically for machine learning.

software-developmentai-trainingpython-for-ai

#2551: How Progressive Disclosure Saves MCP from Token Bloat

Why dumping all tool schemas into context breaks accuracy — and three implementations that fix it.

model-context-protocolcontext-windowai-agents

#2540: Does Your AI Framework Change the Output?

Same model, same prompts, different harness. Does the plumbing change the water?

ai-agentsprompt-engineeringagent-framework-comparison

#2517: How Unsloth Makes LLM Fine-Tuning 2x Faster

Unsloth cuts memory usage by 50-70% and speeds up training 2.2x for models like Llama 3 and Mistral.

fine-tuninggpu-accelerationopen-source

#2516: Overfitting Is Not a Binary Condition

Overfitting isn't binary. Learn the real triggers, the bias-variance tradeoff, and modern techniques to prevent it.

fine-tuningtraining-datamodel-collapse

#2511: Measuring AI API Latency Through the Black Box

How to benchmark token throughput and debug slowdowns in closed CLI tools like Claude Code using OpenTelemetry and mitmproxy.

latencyapi-integrationopen-source

#2497: Tracing One Python Print Through 6 Abstraction Layers

What actually happens when you print "Hello" in Python? Six layers, 562 system calls, and a hardware-enforced kernel boundary.

operating-systemssoftware-developmenthardware-engineering

#2495: How to Bake Personality Into an LLM in 15 Minutes

Fine-tune a model's personality with ~300 examples and a consumer GPU. SFT + DPO explained.

fine-tuningsmall-language-modelsgpu-acceleration

#2494: Active Prompt Engineering: Daniel's Diff-Based Loop

A deep dive into iterative prompt refinement using inter-iteration prediction change as an uncertainty signal.

prompt-engineeringactive-learningfew-shot-learning