VidCraft has five quality gates that operate at different stages:
| Gate | Stage | Skill |
|---|---|---|
| Script Review | Pre-generation | script-reviewer |
| Voice Check | Pre-generation | voice-checker |
| Pre-Generation Gates | Pre-generation | pre-generation-check |
| Video Review | Post-generation | video-reviewer |
| Brand + A11y | Post-generation | brand-checker + accessibility-checker |
The quality gates are not cosmetic. Skip them and you will generate video content that sounds like AI slop β and you'll end up producing twice.
The skill /vidcraft:script-reviewer checks a script against 15 points. Step 5 of the review runs check_pronunciation to flag TTS-unfriendly numbers and acronyms.
| # | Point | Check |
|---|---|---|
| 1 | Timing | WPM calculation against target duration (e.g., tutorial 120-140 WPM) |
| 2 | Readability | Flesch score >= 60 |
| 3 | Structure | Type-conform scene sequence (hook β ... β CTA) |
| 4 | Hook quality | First 5 seconds β no "In this video..." |
| 5 | CTA | Present, clear, single (not 3 options) |
| 6 | Tone | Consistent, matches brand + audience |
| 7 | No AI language | No "journey", "tapestry", "embrace", hedging |
| 8 | Sentence length | Max 20 words per sentence (for narration clarity) |
| 9 | Scene duration | No scene > 60s (account for HeyGen splits) |
| 10 | Transitions | Defined (cut/fade/match-cut/whip) |
| 11 | Visual cues | Present per scene ([zoom], [highlight], etc.) |
| 12 | On-screen text | Max 7 words per overlay |
| 13 | No jargon | Without explanation β check audience knowledge level |
| 14 | Summary | Present for tutorials/trainings |
| 15 | Expression range (Synthesia) | No >60s stretch with zero ! / ? on Expressive Avatar scripts |
## Script Review: "OXID Gallery Tutorial β Episode 01"
### 15-Point Quality Check
- [x] 1. Timing β 380 words / 130 WPM = 2:55 (target: 3-15 min) β
- [x] 2. Readability β Flesch 68 (target: >= 60) β
- [x] 3. Structure β Hook β Prereqs β 4 steps β Verification β CTA β
- [WARN] 4. Hook β "In this tutorial I'll show you..." β AI-typical.
Suggestion: "OXID Gallery installs in 8 minutes β if you have
the right dependencies. Here they are."
- [x] 5. CTA β "Next video: configuration" β single β
- [x] 6. Tone β factual, patient (matches audience: shop admins) β
- [WARN] 7. AI language β Line 47: "in the realm of modern galleries"
β abstract filler, replace with concrete
- [x] 8. Sentence length β Max 19 words, median 14 β
- [x] 9. Scene duration β Longest scene 47s β
- [WARN] 10. Transitions β Scene 3β4 not defined
- [x] 11. Visual cues β All scenes have [highlight]/[zoom]/[screencast] β
- [x] 12. On-screen text β Max 6 words per overlay β
- [WARN] 13. Jargon β "Composer dependencies" without explanation (L. 89)
- [x] 14. Summary β Present in scene 7 β
- [x] 15. Expression range β N/A (HeyGen script) β
### Pronunciation Check (P15 advisory)
β "2025" detected: suggest "twenty twenty-five" for TTS clarity
β "PHP" detected: suggest "P-H-P" or "PHP" with phonetic note
## Overall: 11 / 15 β Revise before Generation
Critical: none
Polish: 4 (hook), 7 (AI language), 10 (transition), 13 (jargon)
The skill /vidcraft:voice-checker scans narration for typical AI patterns. Patterns live in knowledge/ai-language-patterns.md. For Synthesia Expressive Avatar scripts, the checker also runs an expression range pass.
| Category | Examples |
|---|---|
| Abstract noun stacks | "tapestry of innovation", "landscape of solutions" |
| Hedging | "it's worth noting", "one might say" |
| Over-explained metaphors | "like a journey through..." followed by explanation |
| ClichΓ© escalation | "game-changer", "next-level", "unleash" |
| Generic filler | "in today's fast-paced world", "in the digital age" |
| Imperative inflation | "let me show you", "let's dive into" |
| Expression flatline (Synthesia) | >60s of narration with no !, ?, :) on Expressive Avatar scripts |
Voice Check: oxid-gallery-tutorial / 01-installation
---
Findings: 3 (advisory only)
L. 47: "in the realm of modern galleries"
β Abstract filler noun ("realm")
β Suggestion: "modern gallery solutions" or be concrete
L. 89: "Let's dive in..."
β Imperative inflation, opening clichΓ©
β Suggestion: "First step: ..."
L. 134: "a true game-changer for your shop"
β Marketing clichΓ©
β Suggestion: be concrete about what it solves
Advisory only. Voice checker does not rewrite β it flags spots with recommendations. You decide.
/vidcraft:pre-generation-check runs all gates before generation.
| # | Gate | Type | Description |
|---|---|---|---|
| 1 | Script status | BLOCK | Script must be Script Reviewed |
| 2 | Scenes present | BLOCK | At least one scene with narration |
| 3 | Narration complete | BLOCK | All scenes must have narration |
| 4 | Visual direction | WARN | All scenes should have visual direction |
| 5 | Timing | INFO | Show estimated duration |
In addition to the 5 standard gates, platform validation runs:
HeyGen:
| # | Gate | Type | Description |
|---|---|---|---|
| β | Char limit per scene | BLOCK | <= 5,000 (API hard limit) |
| β | No avatar switch within scene | BLOCK | Split if needed |
| β | One background per scene | BLOCK | Split if needed |
| 10 | SSML prosody tags | WARN | Warn if prosody tags present without a Custom Voice set |
| 11 | Undeclared variables | WARN | Warn on {{variable}} placeholders not declared |
Synthesia:
Pre-Generation Check: oxid-gallery-tutorial / 01-installation
---
β
Gate 1: Script Status = "Script Reviewed"
β
Gate 2: 7 scenes present
β
Gate 3: Narration complete
β οΈ Gate 4: Scene 5 has no visual direction
βΉοΈ Gate 5: Estimated duration 2:55 (target: 3-15 min)
β HeyGen Validation:
Scene 03: Avatar switch inside scene
β "Anna_Professional" β "Marcus_Casual"
β Fix: split scene
β οΈ Gate 10: SSML prosody tags found but no Custom Voice set
β οΈ Gate 11: Undeclared variables: {{company_name}}, {{plan_name}}
π΄ BLOCKED β Please fix issues before generation.
/vidcraft:video-reviewer checks a generated video against 20 points.
| Category | Points | Focus |
|---|---|---|
| Pacing | 4 | Total duration, scene rhythm, pauses, hook timing |
| Visual | 4 | Avatar consistency, background quality, visual cues visible, transitions |
| Narration | 4 | Pronunciation, emphasis, tempo, clarity |
| Brand | 4 | Logo, lower-third, color palette, tonality |
| Accessibility | 4 | Subtitle readiness, reading pace, color contrast, inclusivity |
## Video Review: "OXID Gallery Tutorial β Episode 01"
### Pacing (4/4) β
- [x] Total duration 2:58 (target: 3-15 min, tight but okay)
- [x] Scene rhythm: 25-50s, well-varied
- [x] Pauses after each step present
- [x] Hook under 5s
### Visual (3/4)
- [x] Avatar consistency
- [WARN] Background "office.jpg" too pixelated in scene 03
- [x] Visual cues visible (zoom, highlights)
- [x] Transitions clean
### Narration (4/4) β
- [x] OXID pronounced correctly ("OH-XID", not "OX-ID")
- [x] Natural emphasis
- [x] Tempo matches WPM target
- [x] Clear
### Brand (3/4)
- [WARN] Logo missing in outro scene (scene 07)
- [x] Lower-third consistent
- [x] Color palette
- [x] Tonality
### Accessibility (4/4) β
- [x] Subtitles ready for Whisper generation
- [x] Reading pace comfortable
- [x] Contrast sufficient (WCAG AA)
- [x] Inclusive language
## Overall: 18 / 20 β Approved with minor revisions
Required: background scene 03 (re-render with better image),
add logo in scene 07
/vidcraft:brand-checker checks brand consistency, especially in multi-episode series.
config.yaml > brand.toneBrand Check: oxid-gallery-tutorial / 01-installation
---
3 violations found:
1. Tone: "That's super cool!" (L. 78)
β Brand tone: "professional + friendly" (no "super cool")
β Suggestion: "That's an elegant solution."
2. Terminology: "OXID Shop" (multiple)
β Correct: "OXID eShop"
3. Logo: missing in closing scene
β Brand guideline: logo in every outro
/vidcraft:accessibility-checker checks WCAG 2.1 compliance.
| Aspect | Check |
|---|---|
| Subtitle readiness | Narration is Whisper-friendly (no slang, clear pronunciation) |
| Reading pace | Subtitles readable at 200 WPM (max ~21 chars/second) |
| Color contrast | Text overlays >= 4.5:1 ratio (AA), >= 7:1 (AAA) |
| Inclusive language | Gendering, disability language, cultural sensitivity |
| Audio description | For purely visual processes β narration describes what happens |
| Jargon density | If > 5% technical terms, recommend a glossary |
Accessibility Check: oxid-gallery-tutorial / 01-installation
---
WCAG 2.1 Level: AA (with 1 warning)
β
Subtitle readiness β narration clear, Whisper-compatible
β
Reading pace β avg 18 chars/sec (limit: 21)
β οΈ Color contrast β Scene 04 text overlay: 3.8:1 (AA requires 4.5:1)
β Fix: change text color from #999 to #555
β
Inclusive language β no problematic phrasing
β
Audio description β visual steps are narrated
β
Jargon density β 3.2% (below threshold)
Pre-generation:
1. /vidcraft:script-reviewer β 15-point check (incl. pronunciation advisory)
2. Apply fixes
3. /vidcraft:voice-checker β AI-tell scan (optional, but recommended)
4. /vidcraft:storyboard-creator + screenshot-planner + asset-collector
5. /vidcraft:pre-generation-check β final block gate
6. /vidcraft:heygen-engineer / synthesia-engineer
Post-generation:
1. /vidcraft:video-reviewer β 20-point check
2. /vidcraft:brand-checker β for multi-episode series
3. /vidcraft:accessibility-checker β mandatory for public sector / education
4. Apply fixes / re-generate
5. /vidcraft:subtitle-generator
Possible causes:
voice-checker was not also runknowledge/ai-language-patterns.md is outdated β new AI patterns missingSolution:
/vidcraft:voice-checker my-project 01-foo
For DE-specific patterns: PR to knowledge/ai-language-patterns.md with a DE section.
Most common cause: status fields in YAML frontmatter outdated.
Check:
grep -A1 "status:" ~/video-projects/projects/MY-PROJECT/episodes/01-FOO/README.md
Solution: manually set status to Script Reviewed.
Cause: the video reviewer works on storyboard + script, not on the actually rendered video. Manual visual inspection is always required as a complement.
Solution: the skill is a diagnostic tool, not a QA replacement. Always plan 5-10 min of visual inspection.