The Pipeline

Voice-to-Podcast Automation with AI

Last Updated: December 2025

Overview

My Weird Prompts transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Inworld AI TTS to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.

How It Works

Voice Input

Queue-Based Processing - Drop audio files into the processing queue

Record your question or prompt as a voice message
Place audio files in the prompts/to-process/ directory
Run the episode generator to process queued prompts
Supports MP3, WAV, and other common audio formats

Technology: Local filesystem queue with Python pipeline

Processing

Transcription: Google Gemini 3 Flash transcribes the voice prompt
Metadata: Gemini 3 Flash generates episode metadata (title, description, tags)
Audio Processing: FFmpeg normalizes and prepares prompt audio
Format Conversion: Ensures compatible audio format for concatenation

Technology: Gemini 3 Flash for transcription and metadata, FFmpeg for audio processing

Generation

Research: Tavily provides research augmentation for episode content
Script Generation: Nano Banana creates dialogue script between Corn and Herman
Cover Art: Nano Banana Pro generates unique episode artwork (3 variants)
TTS Dialogue: Inworld AI TTS generates voice audio with character personalities

Technology: Tavily for research, Nano Banana for scripting, Nano Banana Pro for images, Inworld AI for TTS

Assembly

Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
Loudness normalization to -16 LUFS (podcast standard)
MP3 encoding at 192kbps, 44.1kHz

Technology: FFmpeg for audio assembly and normalization

Publishing

CDN Upload: Audio and images uploaded to Cloudinary
Archive: Full episode backed up to Wasabi S3-compatible storage
Database: Metadata inserted into Neon PostgreSQL
Blog Post: Markdown file generated for Astro static site

Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL

Deployment

Git push triggers automatic Vercel deployment
New episode goes live on website within minutes
RSS feed automatically updated for podcast apps

Technology: Vercel auto-deploy, Astro static site generator

Voice Capture

The pipeline processes voice prompts from a local queue. Record your question using any audio recording app and drop it into the processing queue.

Record Your Question

Use any voice recorder app to capture your prompt

Add to Queue

Place the audio file in the prompts/to-process/ directory

Run Generator

Execute the episode generation script to process the queue

Published Episode

Episode is automatically published and deployed to the website

Technology Stack

AI Services

Google Gemini 3 Flash (Metadata)
Tavily (Research Augmentation)
Nano Banana (Episode Generation)
Nano Banana Pro (Cover Art)
Inworld AI TTS
Flux Schnell (via fal.ai)
Replicate (backup)

Input & Integration

Local Queue Processing
Python Pipeline
FFmpeg (Audio)

Storage

Cloudinary (CDN)
Wasabi S3 (Archive)
Neon PostgreSQL
GitHub (Source)

Deployment

Astro (Static Site)
Vercel (Hosting)
GitHub Actions

Episode Output

For each episode, the pipeline creates:

Final Audio: MP3 file with full podcast episode

Cover Art: 3 AI-generated cover image variants

Metadata: Title, description, tags, timestamps

Transcript: Full dialogue script

Blog Post: Markdown file for website

Cost Estimate

Service	Cost per Episode	Notes
Inworld AI TTS	~$0.30-0.40	15-minute episode
Image Generation	~$0.01-0.05	3 cover variants
Transcription	Minimal	Free tier
Storage	~$0.01	Wasabi + Cloudinary
Total per Episode	~$0.35-0.50	Approximate

Key Features

📱

Cross-Platform Input

Send voice messages via Telegram from any device

🎙️

Voice Synthesis

Inworld AI TTS creates natural-sounding AI hosts with distinct personalities

🎨

AI Art Generation

Unique cover artwork for every episode using Flux AI

⚡

Fully Automated

Voice prompt to published episode in minutes

📊

Production Quality

Professional audio normalization and podcast standards

🔔

Status Notifications

Get notified via Telegram when your episode is ready

Open Source

The entire pipeline is open source and available on GitHub. View the code, contribute improvements, or adapt it for your own podcast automation projects.

View on GitHub

Previous Versions

Documentation for previous pipeline iterations is preserved for reference:

Pipeline V4 (Fish Audio) Pipeline V3 (Chatterbox TTS) Pipeline V2 (Original)

← Back to Episodes