Back to Blog
CAST

UGC Video Ads with AI: Generate Authentic-Looking Testimonials at Scale

BP Corp Engineering
8 min read

User-generated content (UGC) video ads consistently outperform studio-produced creative across every platform we've tested. The problem: real UGC is expensive, slow to produce, and doesn't scale to multiple markets.

AI-generated UGC solves this. We've produced over 500 testimonial-style video ads using CAST, BP Corp's AI avatar engine. These videos use synthetic voices, lip-sync technology, and human-like avatars to simulate authentic customer testimonials.

This guide shows you how to generate production-ready UGC video ads with AI, the technical stack that makes it possible, and performance data from 200+ campaigns.

Why UGC Video Ads Work

Before we discuss AI generation, here's why UGC format matters:

Completion rates: In our A/B tests across Meta and TikTok, UGC-style videos achieve 2.3x higher completion rates than static image ads and 1.8x higher than polished video ads.

Trust signals: UGC bypasses the "ad blindness" filter. Viewers perceive testimonial-format content as peer recommendations, not marketing.

Platform algorithms: Meta's 2025 algorithm updates prioritize "authentic" content in feed placements. UGC-style creative gets cheaper CPMs (average -23% vs studio creative in our campaigns).

Multi-format adaptability: A single UGC video script can be rendered in 9:16 (Stories/Reels), 1:1 (Feed), and 16:9 (YouTube) without re-shooting.

The challenge: hiring real users, coordinating shoots, editing testimonials, translating for international markets. Cost per video: $200-500. Timeline: 2-4 weeks.

AI generation brings cost to near-zero and timeline to under 10 minutes per video.

How CAST Generates AI UGC Videos

CAST is BP Corp's AI video generation module inside GENESIS. It combines three technologies:

  1. AI Avatars — Photorealistic human models with natural idle movements
  2. Voice Cloning — ElevenLabs integration for 47 languages with emotional tone control
  3. Lip-Sync — Wav2Lip-based mouth movement synchronized to audio

The Generation Pipeline

Step 1: Script Input

You provide the testimonial script. CAST analyzes:

  • Sentence structure (for natural pauses)
  • Emotional tone markers (enthusiasm, relief, trust)
  • Key phrases (for emphasis gestures)

Example script for a solar lead generation brand:

I was spending over €200 a month on electricity. I submitted one form on PapaPrevoit, and three solar companies called me the next day. I went with the best quote and now my bill is down to €40. I wish I'd done this years ago.

Step 2: Avatar Selection

CAST provides 120+ avatars categorized by:

  • Demographic (age, gender, ethnicity)
  • Setting (home office, living room, outdoor)
  • Lighting quality (natural, studio, casual)

For lead generation verticals, we've found best performance with:

  • Insurance: 45-55 age range, home setting, warm lighting
  • Solar: 35-50 age range, natural light, outdoor or kitchen backgrounds
  • Home renovation: 40-60 age range, construction-adjacent settings

Step 3: Voice Generation

CAST sends your script to ElevenLabs API with these parameters:

  • Language: Auto-detected or manually specified (47 options)
  • Voice ID: Selected from ElevenLabs library or custom cloned voice
  • Stability: 0.5-0.7 (higher = more consistent, lower = more expressive)
  • Similarity Boost: 0.75 (how closely to match reference voice characteristics)

CAST also injects SSML tags for natural pauses:

<speak>
  I was spending over two hundred euros a month on electricity.
  <break time="500ms"/>
  I submitted one form on PapaPrevoit, and three solar companies called me the next day.
</speak>

Average voice generation time: 3-8 seconds for 30-second script.

Step 4: Lip-Sync Rendering

CAST uses a modified Wav2Lip implementation:

  • Takes avatar base video (3-5 second loop of idle movement)
  • Extracts facial mesh
  • Maps phonemes from ElevenLabs audio to mouth shapes
  • Renders new video with synchronized lip movement

Rendering time: 30-90 seconds per video depending on length.

Step 5: Format Export

CAST outputs in three aspect ratios simultaneously:

  • 1080x1920 (9:16 for Stories, Reels, TikTok)
  • 1080x1080 (1:1 for Feed posts)
  • 1920x1080 (16:9 for YouTube, display ads)

File format: MP4, H.264 codec, optimized for <8MB file size for fast mobile loading.

Multi-Language Scaling

CAST's primary advantage over traditional UGC: instant localization.

We operate lead generation brands in 4 countries (France, Hungary, UK, US). Before CAST, we'd shoot separate testimonial videos for each market. Now:

  1. Write script in source language (English)
  2. Translate via DeepL API (integrated in CAST)
  3. Select language-appropriate avatar (Hungarian name, Hungarian setting)
  4. Generate with ElevenLabs Hungarian voice ID
  5. Export video

Total time: 6 minutes including translation.

Voice quality by language (subjective ratings from native speakers, 1-10 scale):

Language Voice Quality Lip-Sync Accuracy Naturalness
English 9.2 8.8 9.0
French 8.9 8.5 8.7
Spanish 8.8 8.3 8.6
German 8.5 8.0 8.3
Hungarian 8.2 7.8 8.0
Polish 8.0 7.5 7.8

Languages with character-based scripts (Chinese, Japanese, Korean) score lower on lip-sync accuracy (6.5-7.2) due to Wav2Lip's Latin-alphabet training bias.

Production Workflow for Campaign Launch

Here's our process for launching a new campaign with CAST-generated videos:

Week 1: Script Development

Write 5-7 testimonial scripts covering different pain points and outcomes. For solar vertical:

  1. High electricity bills → savings after installation
  2. Environmental concern → feel-good about sustainability
  3. Installation anxiety → smooth process testimonial
  4. Multiple quotes confusion → got best deal through aggregator
  5. Skepticism → regret not doing it sooner

Each script: 20-35 seconds spoken (roughly 60-90 words).

Week 1: Avatar Testing

Generate 3 variations of each script with different avatars. Run as test campaign with €50 budget per variation.

Metrics to track:

  • 3-second view rate
  • 15-second completion rate
  • CTR
  • CPA (for lead generation)

Kill underperformers after 48 hours. Double budget on top 2 performers.

Week 2: Scale + Localization

Translate winning scripts. Generate localized versions with region-appropriate avatars. Launch in new markets with proven creative formula.

Performance Data: AI UGC vs Real UGC vs Static Ads

We ran a controlled A/B test across 8 campaigns in Q4 2025 (solar, insurance, home renovation verticals). Budget: €15,000 per cohort.

Creative formats tested:

  • Group A: CAST-generated AI UGC videos (5 avatars × 3 scripts = 15 variations)
  • Group B: Real UGC videos from paid users (5 people × 3 scripts = 15 videos, cost: €3,500)
  • Group C: Static image ads with testimonial quote overlays (15 variations)

Results (Meta Ads, 30-day campaign):

Metric AI UGC (CAST) Real UGC Static Ads
Impressions 2,847,392 2,801,445 2,823,108
3-sec view rate 68.2% 71.5% 52.3%
Completion rate 31.4% 34.8% 13.6%
CTR 2.84% 3.02% 1.97%
CPA (cost per lead) €18.43 €17.21 €26.77
Production cost €0 €3,500 €450
Production time 2 hours 3 weeks 4 hours

Key findings:

  1. Real UGC edges out AI by 6-8% on completion rate and CTR, but the difference isn't statistically significant in CPA due to higher production costs amortized across limited creative variations.

  2. AI UGC outperforms static by 2.3x on completion rate and 1.4x on CTR. CPA 45% lower.

  3. Volume advantage: CAST produced 15 variations in 2 hours. Real UGC took 3 weeks and required user coordination. Static ads took 4 hours but performed worst.

  4. Refresh cycle: AI UGC allows weekly creative refresh to combat ad fatigue. Real UGC limits you to 1-2 refreshes per quarter due to production overhead.

Quality Threshold: When AI UGC Is "Good Enough"

Not all AI-generated video looks authentic. Here are the quality gates we enforce in CAST:

Lip-Sync Accuracy

If mouth movement is off by more than 100ms, viewers notice. CAST re-renders videos that fail automated sync scoring (we use a proprietary Wav2Lip confidence score). Acceptance threshold: >0.82.

Voice Naturalness

ElevenLabs voices occasionally produce artifacts (robotic cadence, weird emphasis). CAST flags these via:

  • Spectral analysis (looking for unnatural frequency patterns)
  • Pause duration variance (robotic voices have unnaturally consistent pauses)
  • Human QA spot-check (1 in 10 videos reviewed manually)

Avatar Realism

Early avatars had uncanny valley issues (dead eyes, frozen expressions). Current CAST avatars use:

  • Subtle idle animation (micro-movements, breathing, eye blinks every 3-5 seconds)
  • Natural lighting that matches background setting
  • Authentic clothing (no stock photo "business casual" uniformity)

Realism score: 7.8/10 on average (based on blind tests where viewers rate real vs AI).

Common Mistakes to Avoid

1. Over-Scripting

Testimonials should sound conversational, not rehearsed. Bad: "I utilized the platform to connect with service providers." Good: "I filled out one form and got three calls the next day."

2. Mismatched Avatar Demographics

If your product targets 55+ homeowners, don't use a 28-year-old avatar. Testimonial credibility depends on viewer identification with speaker.

3. Same Avatar, Multiple Brands

We made this mistake early: reused avatars across brands in same vertical. Viewers who saw both ads noticed. Now we enforce strict avatar segregation.

4. Ignoring Platform Specs

TikTok prioritizes videos under 15 seconds in algorithm. Meta feed ads perform better at 20-30 seconds. YouTube tolerates 45-60 seconds. Generate different edits for each platform.

5. No Captions

68% of video ads on Meta are watched without sound. CAST auto-generates captions (burned-in or SRT file) for every video. CTR increases by 23% on average when captions are present.

Technical Stack Breakdown

For engineers building similar systems, here's CAST's architecture:

Frontend:

  • Script input: Next.js form with real-time character count and tone analysis
  • Avatar selection: Grid view with filters (demographic, setting, lighting)
  • Preview player: React Video component with playback controls

Backend:

  • Voice generation: ElevenLabs API (text-to-speech endpoint)
  • Lip-sync: Python microservice running Wav2Lip on GPU instance (NVIDIA T4)
  • Video processing: FFmpeg for format conversion, compression, caption burning
  • Storage: Cloudflare R2 (cheap egress, fast CDN delivery)

Performance optimizations:

  • Avatar base videos pre-rendered and cached (reduces generation time by 40%)
  • Batch processing: queue up to 10 videos, render concurrently
  • Lazy loading: preview low-res version while high-res renders in background

Average end-to-end generation time: 4.2 minutes per video (including queue wait time during peak hours).

Ethical Considerations

AI-generated testimonials blur the line between authentic UGC and synthetic marketing content. Our policy:

  1. Disclosure: All CAST videos include small text "Illustrative testimonial" in corner or description where required by platform policy.
  2. No impersonation: We never clone real users' voices or likenesses without written consent.
  3. Truthful claims: Scripts must reflect realistic outcomes from actual user data (we don't fabricate success metrics).
  4. Platform compliance: Meta, TikTok, and Google all allow AI-generated creative with proper labeling. We follow each platform's disclosure requirements.

Some advertisers in regulated industries (finance, healthcare) avoid AI UGC entirely due to compliance risk. Know your industry rules before deploying.

ROI Calculation

Let's compare cost models for a lead generation business running continuous campaigns:

Scenario: 4 brands, 3 markets each, refreshing creative every 2 weeks.

Traditional UGC approach:

  • Hire 12 users per quarter (4 brands × 3 markets)
  • Pay €300 per testimonial video
  • Editing/localization: €150 per video
  • Total quarterly cost: €5,400
  • Creative output: 12 videos

CAST approach:

  • CAST subscription: €149/month (part of GENESIS suite)
  • ElevenLabs API: ~€0.30 per voice generation
  • GPU compute: €0.15 per video render
  • Total quarterly cost: €447 + (€0.45 × number of videos)
  • Creative output: Unlimited (realistic: 100+ videos/quarter)

Break-even point: 11 videos. After that, every additional video costs €0.45 vs €450.

For agencies managing multiple clients, CAST pays for itself in the first week.

What's Next for AI UGC

Current limitations we're working to solve:

  1. Interactive elements: Can't yet generate videos where avatar responds to viewer input (branching testimonials, choose-your-own-journey).
  2. Emotion range: Current avatars handle neutral-to-positive testimonials well. Negative-to-positive transformation stories (common in weight loss, addiction recovery verticals) still look robotic.
  3. Multi-person scenes: CAST generates single-speaker testimonials. Real UGC often features couples or families discussing decisions together. Multi-avatar rendering is in beta.
  4. Real-time generation: Current 4-minute render time is too slow for dynamic creative optimization (DCO) use cases. We're targeting <30 seconds for DCO integration.

If you're launching AI-powered ad creative across multiple markets, CAST pairs with PRISM for complete creative automation. Read our performance comparison of AI vs human-made ads for the full dataset.

Get Started with CAST

CAST is available inside GENESIS at getgenesis.app/cast. Free plan includes 10 video generations per month. Pro plan ($149/month) includes unlimited generations, custom avatar uploads, and priority rendering.

Try CAST Free →

Try CAST Free →

Generate video content with AI

Related Articles

AI UGC Video Ads: Scale Authentic Testimonials | GENESIS