The Pipeline

Voice-to-Podcast Automation with AI

Last Updated: December 2025

Overview

My Weird Prompts transforms voice-recorded prompts into full podcast episodes with AI-generated dialogue, cover art, and automatic publishing. The pipeline uses Inworld AI TTS to create natural conversations between two AI hosts: Corn the Sloth and Herman the Donkey.

How It Works

1

Voice Input

Queue-Based Processing - Drop audio files into the processing queue

  • Record your question or prompt as a voice message
  • Place audio files in the prompts/to-process/ directory
  • Run the episode generator to process queued prompts
  • Supports MP3, WAV, and other common audio formats

Technology: Local filesystem queue with Python pipeline

2

Processing

  • Transcription: Google Gemini 3 Flash transcribes the voice prompt
  • Metadata: Gemini 3 Flash generates episode metadata (title, description, tags)
  • Audio Processing: FFmpeg normalizes and prepares prompt audio
  • Format Conversion: Ensures compatible audio format for concatenation

Technology: Gemini 3 Flash for transcription and metadata, FFmpeg for audio processing

3

Generation

  • Research: Tavily provides research augmentation for episode content
  • Script Generation: Nano Banana creates dialogue script between Corn and Herman
  • Cover Art: Nano Banana Pro generates unique episode artwork (3 variants)
  • TTS Dialogue: Inworld AI TTS generates voice audio with character personalities

Technology: Tavily for research, Nano Banana for scripting, Nano Banana Pro for images, Inworld AI for TTS

4

Assembly

  • Combines intro jingle, disclaimer, user prompt, AI dialogue, and outro
  • Loudness normalization to -16 LUFS (podcast standard)
  • MP3 encoding at 192kbps, 44.1kHz

Technology: FFmpeg for audio assembly and normalization

5

Publishing

  • CDN Upload: Audio and images uploaded to Cloudinary
  • Archive: Full episode backed up to Wasabi S3-compatible storage
  • Database: Metadata inserted into Neon PostgreSQL
  • Blog Post: Markdown file generated for Astro static site

Technology: Cloudinary CDN, Wasabi object storage, Neon PostgreSQL

6

Deployment

  • Git push triggers automatic Vercel deployment
  • New episode goes live on website within minutes
  • RSS feed automatically updated for podcast apps

Technology: Vercel auto-deploy, Astro static site generator

Voice Capture

The pipeline processes voice prompts from a local queue. Record your question using any audio recording app and drop it into the processing queue.

Record Your Question

Use any voice recorder app to capture your prompt

Add to Queue

Place the audio file in the prompts/to-process/ directory

Run Generator

Execute the episode generation script to process the queue

Published Episode

Episode is automatically published and deployed to the website

Technology Stack

AI Services

  • Google Gemini 3 Flash (Metadata)
  • Tavily (Research Augmentation)
  • Nano Banana (Episode Generation)
  • Nano Banana Pro (Cover Art)
  • Inworld AI TTS
  • Flux Schnell (via fal.ai)
  • Replicate (backup)

Input & Integration

  • Local Queue Processing
  • Python Pipeline
  • FFmpeg (Audio)

Storage

  • Cloudinary (CDN)
  • Wasabi S3 (Archive)
  • Neon PostgreSQL
  • GitHub (Source)

Deployment

  • Astro (Static Site)
  • Vercel (Hosting)
  • GitHub Actions

Episode Output

For each episode, the pipeline creates:

Final Audio: MP3 file with full podcast episode
Cover Art: 3 AI-generated cover image variants
Metadata: Title, description, tags, timestamps
Transcript: Full dialogue script
Blog Post: Markdown file for website

Cost Estimate

Service Cost per Episode Notes
Inworld AI TTS ~$0.30-0.40 15-minute episode
Image Generation ~$0.01-0.05 3 cover variants
Transcription Minimal Free tier
Storage ~$0.01 Wasabi + Cloudinary
Total per Episode ~$0.35-0.50 Approximate

Key Features

📱

Cross-Platform Input

Send voice messages via Telegram from any device

🎙️

Voice Synthesis

Inworld AI TTS creates natural-sounding AI hosts with distinct personalities

🎨

AI Art Generation

Unique cover artwork for every episode using Flux AI

Fully Automated

Voice prompt to published episode in minutes

📊

Production Quality

Professional audio normalization and podcast standards

🔔

Status Notifications

Get notified via Telegram when your episode is ready

Open Source

The entire pipeline is open source and available on GitHub. View the code, contribute improvements, or adapt it for your own podcast automation projects.

Previous Versions

Documentation for previous pipeline iterations is preserved for reference: