Archive: This documents the V4 pipeline implementation. See Pipeline for the current documentation.
Pipeline V4 Documentation
Voice-to-Podcast Automation with AI - Featuring Fish Audio TTS
Last Updated: December 2025
Overview
The My Weird Prompts pipeline transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Fish Audio TTS with pre-trained voice models to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.
Pipeline Phases
Processing
- Voice Upload: User's voice prompt uploaded to processing queue
- Transcription: Google Gemini 2.5 transcribes and extracts metadata
- Audio Processing: FFmpeg normalizes and prepares prompt audio
Technology: Gemini 2.5 Flash for transcription, FFmpeg for audio processing
Generation
- Script Generation: AI creates dialogue script between Corn and Herman
- Cover Art: Flux AI generates unique episode artwork (3 variants)
- TTS Dialogue: Fish Audio TTS generates voice audio with character personalities
Technology: Gemini for scripting, Flux Schnell for images, Fish Audio for TTS
Assembly
- Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
- Loudness normalization to -16 LUFS (podcast standard)
- MP3 encoding at 192kbps, 44.1kHz
Technology: FFmpeg for audio assembly and normalization
Publishing
- CDN Upload: Audio and images uploaded to Cloudinary
- Archive: Full episode backed up to Wasabi S3-compatible storage
- Database: Metadata inserted into Neon PostgreSQL
- Blog Post: Markdown file generated for Astro static site
Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL
Deployment
- Git push triggers automatic Vercel deployment
- New episode goes live on website within minutes
- RSS feed automatically updated for podcast apps
Technology: Vercel auto-deploy, Astro static site generator
Technology Stack
AI Services
- Google Gemini 2.5 Flash
- Fish Audio TTS
- Flux Schnell (via fal.ai)
- Replicate (backup)
Storage
- Cloudinary (CDN)
- Wasabi S3 (Archive)
- Neon PostgreSQL
- GitHub (Source)
Deployment
- Astro (Static Site)
- Vercel (Hosting)
- FFmpeg (Audio)
- Python Pipeline
Episode Output
For each episode, the pipeline creates:
Cost Estimate
| Service | Cost per Episode | Notes |
|---|---|---|
| Fish Audio TTS | ~$0.30-0.40 | 15-minute episode |
| Image Generation | ~$0.01-0.05 | 3 cover variants |
| Transcription | Minimal | Free tier |
| Storage | ~$0.01 | Wasabi + Cloudinary |
| Total per Episode | ~$0.35-0.50 | Approximate |
Key Features
Voice Cloning
Fish Audio TTS creates natural-sounding AI hosts with distinct personalities
AI Art Generation
Unique cover artwork for every episode using Flux AI
Fully Automated
Voice prompt to published episode in minutes
Production Quality
Professional audio normalization and podcast standards
Open Source
The entire pipeline is open source and available on GitHub. View the code, contribute improvements, or adapt it for your own podcast automation projects.