The doc analyzer is VidCraft's entry point for doc-to-video pipelines. It parses PDF/DOCX/Markdown, extracts structure, suggests video types, and produces a scene sketch — without you assembling the script by hand.
Rule of thumb: if you have a plugin README, manual, or spec, always start with
doc-analyzer. The skill saves you 80% of the research time.
| Use Case | Recommendation |
|---|---|
| Plugin tutorial from README | Direct — typical complexity 3-5 episodes |
| Software manual to training series | Required — break the manual down or it gets bloated |
| Whitepaper to explainer | Helpful — extract top statements |
| Spec to onboarding | Helpful — find user flows |
| Recycle webinar transcript | Direct — re-cut mode |
project-conceptualizer is betterbrief-creator is enoughdoc-analyzer/vidcraft:doc-analyzer <file-path> [video-type]
Examples:
/vidcraft:doc-analyzer ~/projekte/oxid-gallery/README.md tutorial
/vidcraft:doc-analyzer ~/Downloads/synthesia-spec.pdf training
/vidcraft:doc-analyzer ~/docs/altcha-howto.docx
research/doc-analysis.md:
# Document Analysis: oxid-gallery README.md
## Stats
- Words: 3,520
- Sections: 12
- Code blocks: 18
- Lists: 7
- Images: 3
## Structure
1. Introduction (380 words)
2. Prerequisites (210 words)
3. Installation (940 words, 6 code blocks)
4. Configuration (620 words, 4 code blocks)
5. Theming (310 words)
6. Troubleshooting (480 words)
7. FAQ (320 words)
8. License (260 words)
## Complexity Assessment
- Readability: Flesch 58 (medium-complex)
- Tech depth: medium (Composer, OXID modules)
- Visual: 3 screenshots, all in "Theming"
- Audience: shop admins with OXID experience
## Recommendation
- **Video type:** tutorial series
- **Episodes:** 3
- Ep 01: Installation (sections 1-3)
- Ep 02: Configuration (section 4)
- Ep 03: Theming + troubleshooting (sections 5-6)
- **Estimated duration:** 8-12 min total
- **Audience match:** mid-level shop admin
analyze_documentParses the file and extracts structure.
Input:
analyze_document(file_path: str)
Output:
{
"stats": {
"words": 3520,
"sections": 12,
"code_blocks": 18,
"lists": 7,
"images": 3
},
"structure": [
{"level": 1, "title": "Introduction", "words": 380},
{"level": 1, "title": "Prerequisites", "words": 210},
...
],
"code_languages": ["bash", "php", "yaml"],
"language_detected": "en"
}
extract_key_pointsExtracts the most important statements.
Input:
extract_key_points(file_path: str, max_points: int = 10)
Output:
{
"key_points": [
{
"rank": 1,
"text": "OXID Gallery 1.5+ requires PHP 8.1 and OXID 7.0",
"section": "Prerequisites",
"type": "constraint"
},
{
"rank": 2,
"text": "Composer install without plugin activation does not auto-activate",
"section": "Installation",
"type": "warning"
}
]
}
suggest_video_structureSuggests a scene sketch.
Input:
suggest_video_structure(file_path: str, video_type: str = "tutorial")
Output (excerpt):
{
"video_type": "tutorial",
"episodes": [
{
"title": "Installation",
"duration_estimate": "3-4 min",
"scenes": [
{"id": "01-hook", "narration_seed": "OXID Gallery installs in...", "duration": "5s"},
{"id": "02-prereqs", "narration_seed": "You'll need PHP 8.1...", "duration": "20s"},
{"id": "03-composer", "narration_seed": "First step: composer require...", "duration": "60s"},
...
]
}
]
}
analyze_complexityAssesses complexity for video type recommendation.
Input:
analyze_complexity(file_path: str)
Output:
{
"readability": 58,
"tech_depth": "medium",
"visual_density": "low",
"audience_level": "mid-admin",
"recommended_types": ["tutorial", "how-to", "training"],
"primary_recommendation": "tutorial"
}
suggest_video_topicsSuggests multiple video topics (for multi-episode plans).
Input:
suggest_video_topics(file_path: str, max_topics: int = 5)
Output:
{
"topics": [
{"title": "Install OXID Gallery", "type": "tutorial", "duration": "3-4 min"},
{"title": "Theming the Gallery", "type": "how-to", "duration": "4-5 min"},
{"title": "Performance tuning", "type": "training", "duration": "8-10 min"},
{"title": "Lightbox configuration", "type": "tutorial", "duration": "2-3 min"},
{"title": "OXID Gallery vs. native image component", "type": "explainer", "duration": "60-90s"}
]
}
/vidcraft:doc-analyzer ~/projekte/oxid-gallery/README.md
→ Recommendation: tutorial series, 3 episodes
/vidcraft:new-project "OXID Gallery Tutorial" tutorial
→ Project structure created
/vidcraft:project-conceptualizer oxid-gallery-tutorial
→ Derive concept from doc-analysis.md
/vidcraft:brief-creator oxid-gallery-tutorial
→ Brief with audience + goals
(Per episode:)
/vidcraft:script-writer oxid-gallery-tutorial 01-installation
/vidcraft:script-reviewer oxid-gallery-tutorial 01-installation
...
/vidcraft:doc-analyzer ~/Documents/altcha-whitepaper.pdf
→ Recommendation: explainer (60-120s)
/vidcraft:new-project "What is ALTCHA?" explainer
/vidcraft:project-conceptualizer altcha-explainer
→ Top-3 key statements from extract_key_points
/vidcraft:script-writer altcha-explainer 01-explanation
/vidcraft:doc-analyzer ~/docs/altcha-spec.pdf
/vidcraft:doc-analyzer ~/docs/altcha-integration.md
/vidcraft:doc-analyzer ~/docs/altcha-faq.docx
→ 3 separate doc-analysis.md files
/vidcraft:new-project "ALTCHA Training" training
/vidcraft:project-conceptualizer altcha-training
→ Aggregates the 3 analyses, suggests a 5-episode plan
/vidcraft:script-writer altcha-training 01-what-is-altcha
/vidcraft:script-writer altcha-training 02-installation
...
| Format | Library | What is extracted |
|---|---|---|
| pdfplumber | Text, headings, tables (best-effort) | |
| DOCX | python-docx | Text, headings, lists, tables |
| Markdown (.md) | native | Text, headings, code blocks, lists, frontmatter |
| HTML | BeautifulSoup (planned) | — |
| EPUB | ebooklib (planned) | — |
PDFs without a text layer (scanned) deliver empty results. OCR is not built in.
Doc analyzer is language-neutral, but:
Tables from PDFs are often a gamble — pdfplumber tries, but complex layouts (merged cells, multi-column) are lost. For important tables: verify content manually from the source document.
Doc analyzer does not extract images. It only detects that they exist and reports their position. For screenshots: use screenshot-planner, not doc-analyzer.
Practical limit: ~50,000 words per doc. For larger documents the recommendation gets less precise — split into logical parts (e.g., chapter-wise).
# Doc Analysis: oxid-gallery/README.md
> Analyzed: 2026-04-25 14:32 UTC
## Source Stats
- Path: ~/projekte/oxid-gallery/README.md
- Format: Markdown
- Size: 18 KB
- Words: 3,520
- Sections: 12 (max depth: 3)
- Code blocks: 18 (bash: 12, php: 4, yaml: 2)
- Lists: 7 (ordered: 4, unordered: 3)
- Images: 3 (all in section "Theming")
- Language: en (confidence 0.94)
## Section Breakdown
| # | Title | Level | Words | Has Code | Has List |
|---|-------|-------|-------|----------|----------|
| 1 | Introduction | 1 | 380 | no | no |
| 2 | Prerequisites | 1 | 210 | no | yes |
| 3 | Installation | 1 | 940 | yes (6) | yes |
| 4 | Configuration | 1 | 620 | yes (4) | yes |
| 5 | Theming | 1 | 310 | yes (2) | no |
| ... | ... | ... | ... | ... | ... |
## Key Points (Top 10)
1. [HIGH] OXID Gallery 1.5+ requires PHP 8.1 and OXID 7.0
2. [HIGH] Composer install without plugin activation does not auto-activate
3. [MID] Theme override via `bin/oe-console oe:theme:activate`
4. [MID] Lightbox lib is Fancybox 5 (LGPL — license context matters)
5. [LOW] Default image size: 1200x800px
6. ...
## Complexity Assessment
- Readability: Flesch 58 (medium-complex)
- Tech depth: medium (Composer, OXID modules)
- Visual: 3 screenshots — all in "Theming"
- Audience: mid-level shop admin
## Recommended Video Type
- Primary: tutorial-series (3 episodes)
- Alternative: training (single 12-min)
- Reason: code-heavy + step-based + admin audience
## Suggested Episode Plan
### Ep 01 — Installation (target: 3-4 min)
- Hook: "Stop messing with FTP-only plugins."
- Prereqs scene
- Composer install scene
- Activate plugin scene
- Verification scene
- CTA: "Ep 02: Configuration"
### Ep 02 — Configuration (target: 3-4 min)
- Hook (about config options)
- Theme settings scene
- Lightbox config scene
- ...
### Ep 03 — Theming + Troubleshooting (target: 4-5 min)
...
analyze_document fails with import errorCause: pdfplumber or python-docx missing in venv.
Solution:
~/.vidcraft/venv/bin/pip install pdfplumber python-docx
Cause: scanned PDF without a text layer.
Solution: OCR first (e.g., via ocrmypdf):
ocrmypdf input.pdf output-ocr.pdf
/vidcraft:doc-analyzer output-ocr.pdf
Cause: doc covers multiple topics — recommendation gets fuzzy.
Solution: split the doc or explicitly pass a video type:
/vidcraft:doc-analyzer my-doc.md training
Cause: language detection picks dominance, other content gets sub-optimally categorized.
Solution: split the doc by language and run two analyses.