Live API & Open Source

Multimodal Video Generation Model

A production-grade foundation model deployable via API, on-prem, or open weights.

Get API Key Open Source

Generating — PSM-V3

"Aerial drone shot over a misty mountain range at golden hour, cinematic depth of field, slow camera push forward"

4K 60fps 10s H.265

Capabilities

One Model. Every Modality.

Text, image, audio, and video inputs. Cinematic video output. Production-ready quality at any resolution.

Text to Video

Natural language prompts to cinematic video. Scene composition, camera movement, and lighting from text.

Scene-level coherence

Camera control

Multi-subject tracking

Image to Video

Animate any still image. Reference frames become living scenes with consistent style and motion.

Style preservation

Motion synthesis

Temporal consistency

Audio to Video

Audio-reactive generation. Music, voice, or ambient sound drives visual rhythm, cuts, and scene transitions.

Beat-synced motion

Voice-driven scenes

Ambient matching

Video to Video

Style transfer, upscaling, and re-rendering. Transform existing footage with new aesthetics while preserving structure.

Style transfer

Super resolution

Frame interpolation

Open Source

Open Weights.
No Lock-in.

Full model weights on Hugging Face. Run locally, fine-tune on your data, deploy anywhere. Apache 2.0 license.

Hugging Face model hub with full documentation
ComfyUI nodes for visual workflow integration
Fine-tuning scripts and LoRA training support
Community-driven development on GitHub

View on GitHub

PSM-V3 Architecture

Multimodal Encodertext / image / audio / video

↓

Diffusion Transformer3.2B params

↓

Temporal Attentionscene coherence

↓

VAE Decoder4K / 60fps output

↓

C2PA Signingprovenance

Apache 2.0 · Hugging Face · ONNX · TensorRT

API

Production API.
Three Lines of Code.

REST API with streaming output, webhook callbacks, and batch processing. Generate at scale without managing infrastructure.

Sub-second first-frame latency
Webhook callbacks for async workflows
Batch processing with priority queues
99.9% uptime SLA

Get API Key

import psm

client = psm.Client("your-api-key")

video = client.generate(
  prompt="Aerial drone shot over a misty
  mountain range at golden hour,
  cinematic, 4K",
  duration="10s",
  resolution="4k"
)

video.save("output.mp4")

Deploy

Run It Your Way

Cloud API for speed. On-prem for control. Open weights for everything else.

Cloud API

Managed infrastructure with auto-scaling. Pay per generation. No GPUs to manage, no models to host.

On-Prem

Deploy in your own VPC or data center. Full data sovereignty. Docker containers with NVIDIA GPU support.

Open Weights

Download from Hugging Face. Fine-tune, distill, or integrate into your own pipeline. Apache 2.0.

FAQ

Common Questions

Native 4K (3840x2160) at up to 60fps. Also supports 1080p, 720p, and custom aspect ratios including portrait (9:16) and square (1:1).

Up to 120 seconds per generation. For longer content, use the segment stitching API to chain clips with scene-level continuity.

Yes. Open weights include fine-tuning scripts and LoRA training support. Fine-tune on as few as 50 video clips to adapt the model to your visual style.

Minimum: 1x NVIDIA A100 (80GB) or H100. Recommended: 2x H100 for real-time generation at 4K. Quantized variants available for consumer GPUs (RTX 4090).

Open weights are Apache 2.0 — use commercially without restriction. The hosted API is pay-per-generation with volume discounts. Enterprise plans available.

Pass an audio file as input. The model extracts rhythm, tonality, and speech patterns to drive visual generation — beat-synced cuts, mood-matched scenes, and lip-sync for voice.

No visible watermarks. All outputs include invisible C2PA content credentials for provenance verification. You own the output.