How project processing works

Transcription, speaker selection, moment extraction, and post generation.

When you submit a project, SparkVox runs an async pipeline. Credits are deducted once at the start (per minute of content, rounded up). Post generation and sprout-tree images are included in that fee.

Project statuses

text

pending → transcribing → extracting → generating → ready
  (or awaiting_speaker between transcribing and extracting)
  (or failed at any stage)

What you can upload

URL: YouTube or direct audio/video links.
YouTube: Sparky tries native captions first (fast); if unavailable or speaker diarization is needed, it falls back to audio download and Gladia transcription.
File upload: MP3, WAV, MP4, and other common formats from the New Project form.
Transcript file: .txt or .srt - skips transcription and goes straight to moment extraction.

For YouTube projects, title, channel, and tags from the video help Sparky spell names and technical terms correctly in excerpts and posts.

Project perspective

When creating a project you choose a source kind and perspective. They control how the transcript is processed:

Podcast / Interview

Perspective	Best for	What happens
Host	Hosts building a personal brand	Speaker selection - only your lines are used.
Guest	Guest appearances on someone else's show	Speaker selection - only your lines are used.
Full Conversation	Highlight reels of the whole episode	Full cleaned transcript is used.

Knowledge & Advisory

Perspective	Best for	What happens
Expert	Keynotes, solo trainings, thought leadership	Extracts frameworks and masterclass lessons from your content.
Advisor	Client calls and strategy sessions	Speaker selection - isolates your strategic advice from collaborative calls.
Trainer	Workshops, demos, walkthroughs	Full transcript used for step-by-step playbook posts.

Transcript file uploads (.txt / .srt) use Full Conversation or Trainer perspective only (Host, Guest, and Advisor are disabled for transcript files).

Moments and posts

Sparky identifies up to 15 insight-rich moments from your transcript, then writes one LinkedIn post per moment in your voice. When processing finishes, review everything in your sprout tree. Moments that fail generation may not show a post card.

Every recording you process is a content asset. Advisor and Expert perspectives are built for calls and trainings where your strategic thinking would otherwise stay in Otter and never reach LinkedIn.