Transcoding
Baander converts video files into streamable CMAF segments on-the-fly using FFmpeg. Segments are served via HLS v6 and DASH manifests, with per-segment adaptive bitrate selection across multiple quality tiers.
How It Works
When a client requests a video stream, Baander creates a transcode session and an associated transcode job for the requested quality tier. The encoding loop runs in a Swoole coroutine and dispatches FFmpeg work to isolated worker processes:
- Probe the source video (resolution, HDR, interlacing, framerate, audio channels).
- Encode the init segment (movie header for the chosen codec and bitrate).
- Analyze loudness (two-pass EBU R128 measurement) so the audio filter chain can normalize to the target loudness standard.
- Build filter chains -- video scaling, deinterlacing, HDR tonemapping, framerate capping, audio downmixing, loudness normalization, DRC.
- Encode media segments -- 6-second CMAF segments, dispatched to the CPU process pool with a sliding window of in-flight work.
- Serve manifests -- the client receives an HLS v6 media playlist or DASH MPD that references the encoded segments.
Segments are encoded with libx265 (HEVC) tagged as hvc1 for broad player compatibility. The pixel format is always yuv420p. Each segment is a fragmented MP4 with frag_keyframe+separate_moof+default_base_moof flags for independent seekability.
Quality Tiers
The quality ladder defines six HEVC tiers. The client selects a tier at session creation time based on its capabilities and available bandwidth.
| Tier | Resolution | Video Bitrate | Max Bitrate | Buffer Size | Codec |
|---|---|---|---|---|---|
| 360p | 640 x 360 | 800 kbps | 1.2 Mbps | 1.6 Mbps | hvc1 |
| 480p | 854 x 480 | 1.4 Mbps | 2.1 Mbps | 2.8 Mbps | hvc1 |
| 720p | 1280 x 720 | 2.8 Mbps | 4.2 Mbps | 5.6 Mbps | hvc1 |
| 1080p | 1920 x 1080 | 5 Mbps | 7.5 Mbps | 10 Mbps | hvc1 |
| 1440p | 2560 x 1440 | 10 Mbps | 15 Mbps | 20 Mbps | hvc1 |
| 4K | 3840 x 2160 | 20 Mbps | 30 Mbps | 40 Mbps | hvc1 |
All tiers use the RFC 6381 codec string hvc1.1.6.L93.B0 with AAC audio (mp4a.40.2) in manifests.
Audio Profiles
Each session is assigned an audio profile that controls codec, bitrate, channel layout, sample rate, loudness target, and dynamic range compression.
| Profile | Codec | Bitrate | Channels | Sample Rate | Loudness Standard | DRC |
|---|---|---|---|---|---|---|
| mobile_mono | AAC | 32 kbps | 1.0 (mono) | 44.1 kHz | Mobile (-14 LUFS) | On |
| mobile_stereo | AAC | 64 kbps | 2.0 (stereo) | 44.1 kHz | Mobile (-14 LUFS) | On |
| streaming_stereo | AAC | 128 kbps | 2.0 (stereo) | 48 kHz | Streaming (-16 LUFS) | Off |
| streaming_5.1 | AAC | 256 kbps | 5.1 (surround) | 48 kHz | Streaming (-16 LUFS) | Off |
| broadcast_stereo | AAC | 192 kbps | 2.0 (stereo) | 48 kHz | EBU R128 (-23 LUFS) | Off |
| broadcast_5.1 | AAC | 384 kbps | 5.1 (surround) | 48 kHz | EBU R128 (-23 LUFS) | Off |
| hifi_stereo | AAC | 256 kbps | 2.0 (stereo) | 48 kHz | Dialogue (-20 LUFS) | Off |
| opus_stereo | Opus | 96 kbps | 2.0 (stereo) | 48 kHz | Streaming (-16 LUFS) | Off |
CPU Process Pool
FFmpeg uses proc_open() under the hood, which is not hooked by Swoole's coroutine runtime. If FFmpeg ran directly in an HTTP worker, it would block that worker for the entire duration of the encode. To avoid this, all FFmpeg work is dispatched to a CPU process pool -- a set of isolated worker processes that communicate with the main server over Unix sockets.
The pool is managed by CpuProcessPool and accessed through the domain-specific TranscodeProcessPool facade. Results are written to a shared Swoole\Table so the encoding coroutine can poll for completion without blocking.
Worker Count
The default pool size is 2 workers, configured in config/services.yaml:
App\Shared\Infrastructure\Swoole\ProcessPool\CpuProcessPool:
arguments:
$handlers: !tagged_iterator 'swoole.cpu_pool_worker'
$workerCount: 2
The encoding loop uses a sliding window of at most workerCount in-flight segments per job. If multiple transcode sessions are active, they share the pool -- so each job gets at most workerCount segments encoding concurrently. Increase the worker count to match your CPU cores if you need more parallelism.
Worker Process Details
Workers are plain PHP processes (no Symfony container) that receive a JSON payload, execute FFmpeg, and return the result. Each worker has a 300-second timeout for segment encoding and a 600-second timeout for loudness analysis. Stalled processes are killed with SIGKILL.
Job Types
The pool handles three job types:
| Job Type | Description | Timeout |
|---|---|---|
encode_segment |
Encode a single 6-second media segment with video and audio filters | 300s |
encode_init_segment |
Encode the init segment (movie header, no audio) | 120s |
analyze_loudness |
Run EBU R128 loudness analysis pass on the source audio | 600s |
Job Monitoring and State
Segment-by-Segment Tracking
Each transcode job tracks progress per-segment. When a segment completes, the job records the segment index, output file path, file size, and duration. This data is persisted to the database and used by the manifest generator to build accurate playlists.
State transitions for a transcode job: pending -> in_progress -> completed (or failed / cancelled).
Seek-Aware Queue
The encoding loop listens for playback position changes (seeks and pauses) via the SeekSignalBroker. When a client seeks:
- In-flight segments always finish -- workers are never killed.
- The remaining pending queue is reorganized so segments closest to the seek target are encoded first.
- This ensures the client gets watchable content around the new position as quickly as possible.
On pause, dispatching stops and the loop waits. In-flight segments are allowed to finish before the loop goes idle.
Graceful Restarts
When the server receives a shutdown signal, the GracefulRestartHandler persists all active job state to disk (var/transcode_state/<job-public-id>.json). The state file includes the list of completed segments, the current segment index, and the quality tier.
On restart, the handler scans for persisted state files, verifies that previously completed segments still exist on disk, and resumes encoding from the next unencoded segment. This means a server restart does not lose transcoding progress.
See Monitoring for checking job status and pool health.
Hardware Acceleration
Hardware acceleration is not currently implemented. FFmpeg uses libx265 software encoding exclusively.
The docker-compose.yml file contains commented-out configuration for both NVIDIA (NVENC/NVDEC) and Intel (QSV/VAAPI) GPU passthrough. Enabling this in the future would require:
- Installing the NVIDIA Container Toolkit on the host and uncommenting the GPU device reservation.
- Passing
/dev/driinto the container for Intel iGPU access. - Updating
TranscodePoolWorkerto select a hardware encoder when available.
Configuration
Transcoding does not have dedicated environment variables. The relevant configuration is:
- CPU process pool worker count -- set
$workerCountinconfig/services.yaml(default: 2). See Configuration for general server settings. - State directory -- persisted job state is written to
var/transcode_state/inside the container. Ensure this directory is on a persistent volume if you deploy with ephemeral containers. - FFmpeg path -- hardcoded to
/usr/local/bin/ffmpeginTranscodePoolWorker. - Source media path -- mounted read-only into the container (see
docker-compose.ymlvolumes).