Architecture¶

Overview¶

The pipeline has two stages, joined by the filesystem:

build.py ──► scenes/NN_<id>.html  ──┐
         ──► audio/NN_<id>.mp3    ──┤
         ──► index.html           ──┘
                                     └──► hyperframes render ──► final.mp4

Stage 1 — `build.py` (content + composition)¶

build.py owns everything before rendering:

Source of truth: the SCENES list at the top of the file defines all content — kicker, title, bullets, narration text, and implicit references to image and audio assets.
TTS: for each scene, calls ElevenLabs via the REST API and writes audio/NN_<id>.mp3. Files are cached — they are not re-fetched if they already exist.
Timing: probes each audio file with ffprobe, then computes per-scene durations and start offsets with configurable lead/trail padding and crossfade overlap.
Scene blocks: emits scenes/NN_<id>.html — one self-contained HyperFrames sub-composition per scene. Each block has its own <style>, composition root, GSAP timeline, and Ken Burns image.
Host: emits index.html — a 44-line host that references the blocks via data-composition-src and contains all <audio> elements with absolute timestamps.

Stage 2 — `hyperframes render` (frame capture + encoding)¶

The HyperFrames CLI renders the composition deterministically:

Compile: parses index.html, resolves sub-compositions and media, builds a scene graph. Reports audioCount, videoCount, total duration.
Extract video frames: any <video> elements (none in this project) are decoded.
Process audio: all <audio> elements in the host are mixed into a single track.
Frame capture: Chrome/Puppeteer seeks the GSAP timeline to each frame time, screenshots the canvas. 4 workers in parallel by default.
Encode video: H.264 yuv420p @ 30fps via libx264.
Assemble: mux video + audio into the final MP4.

File tree¶

video/hf/
├── build.py              # Stage 1 entrypoint (tracked in git)
├── index.html            # Generated host composition
├── final.mp4             # Rendered output
│
├── scenes/               # Generated HyperFrames blocks (one per scene)
│   ├── 00_intro.html
│   ├── 01_llm.html
│   ├── 02_ai.html
│   ├── 03_mcpgw.html
│   ├── 04_mcpreg.html
│   ├── 05_skill.html
│   └── 06_outro.html
│
├── audio/                # ElevenLabs MP3s (generated, cached)
│   ├── 00_intro.mp3
│   └── ...
│
└── img/                  # PNG image assets (tracked in git)
    ├── 00_intro.png
    ├── 01_llm.png
    ├── 02_ai.png
    ├── 03_mcpgw.png
    ├── 04_mcpreg.png
    ├── 05_skill.png
    └── 06_outro.png

Naming convention¶

All assets share the same NN_<id> prefix, where NN is the zero-based scene index and <id> is the scene's id field from SCENES. This keeps audio, images, and scene blocks in sync without any mapping tables.

HyperFrames composition structure¶

Host (`index.html`)¶

The host is intentionally minimal. It contains:

One <div data-composition-src="..."> per scene block (the visual sub-compositions)
Seven <audio> elements with absolute data-start times (scene start + lead pad)
A #fade-out overlay clip for the closing fade
A single GSAP timeline that only drives the final fade

<div id="root" data-composition-id="root" data-width="1920" data-height="1080"
     data-start="0" data-duration="182.572">

  <!-- Scene blocks -->
  <div data-composition-id="scene-intro"
       data-composition-src="scenes/00_intro.html"
       data-start="0.000" data-duration="15.946"
       data-track-index="1" data-width="1920" data-height="1080"></div>
  <!-- ... more scenes ... -->

  <!-- Audio — must live in the host, not inside sub-compositions -->
  <audio id="aud-0" class="clip"
         data-start="0.600" data-duration="14.446"
         data-track-index="3" data-volume="1"
         src="audio/00_intro.mp3"></audio>
  <!-- ... -->
</div>

Audio must be in the host

HyperFrames only assembles audio declared directly in the root index.html. Audio elements inside sub-composition blocks (data-composition-src files) are rendered visually but their audio tracks are not extracted into the final MP4. Always declare <audio> in the host with absolute start times.

Scene blocks (`scenes/NN_<id>.html`)¶

Each scene block is a full standalone HTML file containing:

A <div data-composition-id="scene-<id>"> root scoped to data-start="0" (time is relative within the block)
All visual structure: accent bar, background fill, split-panel content, footer
A dedicated GSAP timeline registered as window.__timelines["scene-<id>"]
The Ken Burns effect on the image (#img-kb)

The HyperFrames runtime loads each block into an isolated context, finds its window.__timelines entry, and seeks it in sync with the host timeline, offset by data-start.

Timing model¶

scene_duration  = PAD_LEAD + audio_duration + PAD_TRAIL
                = 0.6s     + X seconds      + 0.9s

scene_start[0] = 0
scene_start[i] = scene_start[i-1] + scene_duration[i-1] - OVERLAP
                                                          - 0.4s crossfade

total_duration  = scene_start[-1] + scene_duration[-1]

Within each scene block, audio starts at data-start="0.6" (= PAD_LEAD). In the host, the audio data-start is scene_start[i] + PAD_LEAD — the same moment expressed as an absolute timeline position.

Architecture¶

Overview¶

Stage 1 — build.py (content + composition)¶

Stage 2 — hyperframes render (frame capture + encoding)¶