#gpu-acceleration

46 episodes

May 3

#2622: How Transformers Actually Work: Attention, Tokens, and Context

How one architectural change unlocked chatbots, image generation, and protein folding — explained without the jargon.

transformerslarge-language-modelsgpu-acceleration

Apr 29

#2517: How Unsloth Makes LLM Fine-Tuning 2x Faster

Unsloth cuts memory usage by 50-70% and speeds up training 2.2x for models like Llama 3 and Mistral.

fine-tuninggpu-accelerationopen-source

Apr 27

#2495: How to Bake Personality Into an LLM in 15 Minutes

Fine-tune a model's personality with ~300 examples and a consumer GPU. SFT + DPO explained.

fine-tuningsmall-language-modelsgpu-acceleration

Apr 26

#2464: Batch APIs: The 50% Discount You're Probably Misusing

Batch inference APIs offer 50% off — but only for the right workloads. Here's when they actually make sense.

large-language-modelsai-inferencegpu-acceleration

Apr 26

#2456: Choosing Between AI Cloud Providers

A practical guide to choosing between Modal, RunPod, Nebius, and Baseten for AI workloads.

gpu-accelerationcloud-computingai-inference

Apr 25

#2432: From RTL to GDSII: How Custom Silicon Is Designed

The economics and engineering of ASICs vs. CPUs and GPUs, from transistor placement to hyperscaler strategy.

hardware-engineeringsemiconductorsgpu-acceleration

Apr 25

#2431: The 3 Markets in an AI Trench Coat

GPUs, LPUs, and ASICs: why the best hardware for AI depends entirely on what you're trying to do.

gpu-accelerationai-inferenceai-training

Apr 22

#2376: Iran’s Crypto Sanctions Workaround

How Iran turns cheap electricity into cryptocurrency to bypass sanctions—and the tradeoffs of this digital alchemy.

cryptographyirangpu-acceleration

Apr 12

#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone

Can you build a personalized LLM by skipping traditional fine-tuning and using only post-training alignment methods like DPO and GRPO? We break dow...

fine-tuningai-alignmentgpu-acceleration

Apr 7

#2115: Why AI Answers Differ Even When You Ask Twice

You ask an AI the same question twice and get two different answers. It’s not a bug—it’s physics.

ai-inferencegpu-accelerationai-non-determinism

Apr 6

#2065: Why Run One AI When You Can Run Two?

Speculative decoding makes LLMs 2-3x faster with zero quality loss by using a small draft model to guess tokens that a large model verifies in para...

latencygpu-accelerationai-inference

Apr 6

#2063: That $500M Chatbot Is Just a Base Model

That polite chatbot? It started as a raw, chaotic autocomplete engine costing half a billion dollars to build.

large-language-modelsgpu-accelerationai-training

Apr 4

#2017: That Q4_K_M Is Not a Cat Sneeze

Those cryptic letters on Hugging Face actually map how much brain power you trade for speed.

quantizationgpu-accelerationlocal-ai

Apr 4

#1992: Israel's 4,000-GPU National Supercomputer

Israel is building a sovereign AI supercomputer with 4,000 Nvidia B200 GPUs to keep startups local.

gpu-accelerationnational-securityinfrastructure

Apr 3

#1940: Why Google's 31B Model Fits in Your GPU

Google just dropped Gemma four, and its 31-billion-parameter size is a masterclass in hardware-aware AI design.

open-source-aigpu-accelerationai-agents

Mar 31

#1820: Renting vs. Owning GPUs: The Break-Even Math

Is it cheaper to rent serverless GPUs or buy your own hardware? We break down the math on utilization, depreciation, and hidden costs.

serverless-gpugpu-accelerationhardware-reliability

Mar 31

#1809: The TTS Developer's Dilemma: Size vs. Speed

Stop guessing. We break down the critical trade-offs between model size, latency, and sample rate for production-ready voice apps.

text-to-speechgpu-accelerationedge-computing

Mar 31

#1807: Why GPU Containers Force You to Build

Docker promised "run anywhere," but GPU images make you compile for hours. Here’s why the abstraction breaks down.

gpu-accelerationdockerdependency-management

Mar 31

#1806: Why Mac Minis Are Eating AI's Hardware Race

Apple Silicon's unified memory is crushing traditional GPUs for local LLMs. Here's why the M4 Mac Mini is the new king of affordable AI hardware.

local-aihardware-engineeringgpu-acceleration

Mar 29

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

A 4GPU benchmark on Ubuntu shows the 1.5B parameter Whisper Large is slower and less accurate than the tiny Whisper Small.

speech-recognitiongpu-accelerationlatency

Mar 25

#1534: The Rise of the Agentic Terminal: Beyond the Command Line

Stop drowning in terminal tabs. Discover how tools like Zellij and Ghostty are transforming the command line into an Agentic Development Environment.

ai-agentsgpu-accelerationsoftware-development

Mar 15

#1224: Cracking the CUDA Code: NVIDIA’s Software Dominance

Discover why NVIDIA’s CUDA is the oxygen of the AI industry and how tools like OpenAI’s Triton are finally challenging its 20-year software moat.

gpu-accelerationsemiconductorsparallel-computing

Mar 11

#1109: The T-FLOP Trap: Measuring the Power of Modern AI

Are teraflops the "horsepower" of AI, or just a marketing gimmick? Explore why raw compute speed isn't the whole story in the race for AI power.

gpu-accelerationarchitecturelarge-language-models

Mar 11

#1102: Beyond the Boost: Mastering Modern GPU and RAM Tuning

Is manual hardware tuning still worth it? Discover why undervolting and curve optimization are the new secrets to peak PC performance.

gpu-accelerationthermal-managementhardware-reliability

Mar 10

#1081: The K-V Cache: Solving AI’s Invisible Memory Tax

Why does your AI get slower as you chat? Discover the K-V cache, the invisible bottleneck of generative AI, and how we're fixing it in 2026.

architecturegpu-accelerationlocal-ai

Mar 8

#1021: Python: The Accidental King of Artificial Intelligence

Why did a 1980s hobby project become the backbone of AI? Explore the history of Python and the chaos of modern dependency management.

architecturegpu-accelerationdependency-management

Feb 18

#675: The Intelligence Factory: How AI is Rebuilding the Cloud

From liquid cooling to nuclear power, Herman and Corn explore how AI is transforming data centers into high-density "intelligence factories."

architecturegpu-accelerationenergy-infrastructure

Feb 17

#663: Workstation vs. Consumer: The Real Cost of Power

Is a high-end desktop enough, or do you need a workstation? Herman and Corn break down the "three pillars" of professional hardware.

architecturegpu-accelerationlocal-ai

Feb 15

#633: Memory Wars: The Future of Local Agentic AI

Can your PC handle the next wave of AI agents? Herman and Corn dive into VRAM, quantization, and the future of running LLMs locally.

ai-agentslocal-aigpu-acceleration

Feb 4

#484: The Silicon Sharing Economy: Inside Serverless GPUs

How do small teams run massive AI models without $50,000 chips? Corn and Herman dive into the hidden plumbing of serverless GPU providers.

cloud-computingai-inferencelatencygpu-accelerationinfrastructure

Jan 5

#170: The Heavy Metal of Machine Learning: Inside PyTorch

Discover why PyTorch is the "oxygen" of AI. Herman and Corn explore its history, the magic of Autograd, and the move to the PyTorch Foundation.

large-language-modelsgpu-accelerationarchitecture

Jan 4

#162: Beyond the Desktop: Defining the 2026 Workstation

Is your PC a workstation or just a fast desktop? Herman and Corn break down the hardware that defines professional computing in 2026.

local-aiarchitecturegpu-acceleration

Dec 27

#110: Building the Ultimate Local AI Inference Server

Learn how to build a high-performance local AI server for agentic coding, from dual-GPU PC builds to the power of Mac's unified memory.

local-aigpu-accelerationai-agents

Dec 23

#84: The Silicon Arms Race: Why GPUs are the New Oil

Are high-end microchips the new enriched uranium? Herman and Corn dive into the high-stakes world of GPU export bans and global AI supremacy.

gpu-accelerationsupply-chain-securityelectronic-warfare

Dec 23

#82: Why GPUs Are the Kings of the AI Revolution

From video game dragons to digital brains: Herman and Corn explain why your graphics card is the secret engine behind the AI boom.

gpu-accelerationlarge-language-modelsparallel-computing

Dec 11

#56: Building an AI Model from Scratch: The Hidden Costs

Building an AI model from scratch? It's a brutal reality of trillions of tokens and millions in GPUs. Discover the hidden costs of modern AI.

large-language-modelsgpu-accelerationfine-tuning

Dec 11

#55: Running Video AI at Home: The Real Technical Challenge

Video AI: Hype vs. Reality. Can your GPU handle it? We dive into the technical challenges of running video AI at home.

video-generationgpu-accelerationlocal-ai

Dec 8

#34: Red Team vs. Green: Local AI Hardware Wars

NVIDIA's CUDA rules AI, leaving AMD users battling a "green wall." Explore the hardware wars and thorny paths forward.

large-language-modelsgpu-accelerationhardware-acceleration

Dec 7

#31: ComfyUI: Power, Polish, & The AI Creator's Frontier

ComfyUI: Unlocking AI's true power, but is your rig ready? Dive into the future of digital artistry.

local-aigpu-accelerationprompt-engineering

Dec 6

#27: AMD AI: Taming Environments with Conda & Docker

Tired of AI environment headaches on AMD? We demystify Conda, Docker, and host environments to unlock your GPU's full potential.

gpu-accelerationdockerdependency-management

Dec 5

#25: GPU Brains: CUDA, ROCm, & The AI Software Stack

Unraveling how GPUs power AI. We dive into CUDA, ROCm, and the software stack that makes it all think.

gpu-accelerationparallel-computingsoftware-stack

Dec 4

#18: Beyond the GPU: Unpacking AI's Chip Revolution

Beyond the GPU: we're unpacking AI's chip revolution. Discover the crucial, often overlooked world of AI's fundamental building blocks.

gpu-accelerationarchitecturesemiconductors

Dec 4

#17: Cloud Render Superpowers: Local Edit, Remote Muscle

Unleash cloud superpowers! Edit locally, render remotely with AI-accelerated GPUs like NVIDIA A100s.

gpu-accelerationcloud-computingremote-rendering

Nov 28

#12: The AI Breakthrough: Transformers & The Perfect Storm

AI's everywhere. How did chatbots, art, and video all emerge so suddenly? The secret lies in Transformers and a perfect storm.

transformersfine-tuninggpu-acceleration

Nov 24

#6: How To Fine Tune Whisper

Build your own AI transcription tool! We'll walk you through fine-tuning Whisper, from data to notebook.

fine-tuningspeech-recognitiongpu-acceleration

Nov 24

#2: Local STT For AMD GPU Owners

AMD GPU? No problem! Dive into local AI adventures like on-device speech to text.

speech-recognitiongpu-accelerationlocal-ai

#2622: How Transformers Actually Work: Attention, Tokens, and Context

#2517: How Unsloth Makes LLM Fine-Tuning 2x Faster

#2495: How to Bake Personality Into an LLM in 15 Minutes

#2464: Batch APIs: The 50% Discount You're Probably Misusing

#2456: Choosing Between AI Cloud Providers

#2432: From RTL to GDSII: How Custom Silicon Is Designed

#2431: The 3 Markets in an AI Trench Coat

#2376: Iran’s Crypto Sanctions Workaround

#2177: Skip Fine-Tuning: Shape LLMs With Alignment Alone

#2115: Why AI Answers Differ Even When You Ask Twice

#2065: Why Run One AI When You Can Run Two?

#2063: That $500M Chatbot Is Just a Base Model

#2017: That Q4_K_M Is Not a Cat Sneeze

#1992: Israel's 4,000-GPU National Supercomputer

#1940: Why Google's 31B Model Fits in Your GPU

#1820: Renting vs. Owning GPUs: The Break-Even Math

#1809: The TTS Developer's Dilemma: Size vs. Speed

#1807: Why GPU Containers Force You to Build

#1806: Why Mac Minis Are Eating AI's Hardware Race

#1752: Whisper Small Beats Whisper Large in Speed & Accuracy

#1534: The Rise of the Agentic Terminal: Beyond the Command Line

#1224: Cracking the CUDA Code: NVIDIA’s Software Dominance

#1109: The T-FLOP Trap: Measuring the Power of Modern AI

#1102: Beyond the Boost: Mastering Modern GPU and RAM Tuning

#1081: The K-V Cache: Solving AI’s Invisible Memory Tax

#1021: Python: The Accidental King of Artificial Intelligence

#675: The Intelligence Factory: How AI is Rebuilding the Cloud

#663: Workstation vs. Consumer: The Real Cost of Power

#633: Memory Wars: The Future of Local Agentic AI

#484: The Silicon Sharing Economy: Inside Serverless GPUs

#170: The Heavy Metal of Machine Learning: Inside PyTorch

#162: Beyond the Desktop: Defining the 2026 Workstation

#110: Building the Ultimate Local AI Inference Server

#84: The Silicon Arms Race: Why GPUs are the New Oil

#82: Why GPUs Are the Kings of the AI Revolution

#56: Building an AI Model from Scratch: The Hidden Costs

#55: Running Video AI at Home: The Real Technical Challenge

#34: Red Team vs. Green: Local AI Hardware Wars

#31: ComfyUI: Power, Polish, & The AI Creator's Frontier

#27: AMD AI: Taming Environments with Conda & Docker

#25: GPU Brains: CUDA, ROCm, & The AI Software Stack

#18: Beyond the GPU: Unpacking AI's Chip Revolution

#17: Cloud Render Superpowers: Local Edit, Remote Muscle

#12: The AI Breakthrough: Transformers & The Perfect Storm

#6: How To Fine Tune Whisper

#2: Local STT For AMD GPU Owners

Related Topics