PSM-V3 is live — multimodal generation with native audio support
Model API Open Source Deploy FAQ Docs Get API Key
Live API & Open Source

Multimodal Video Generation Model

A production-grade foundation model deployable via API, on-prem, or open weights.

Get API Key Open Source
Generating — PSM-V3
"Aerial drone shot over a misty mountain range at golden hour, cinematic depth of field, slow camera push forward"
4K 60fps 10s H.265
Integrated with
ComfyUI · Hugging Face · Replicate · RunPod · NVIDIA NIM · fal.ai · Together AI · Solana · ComfyUI · Hugging Face · Replicate · RunPod · NVIDIA NIM · fal.ai · Together AI · Solana ·

One Model. Every Modality.

Text, image, audio, and video inputs. Cinematic video output. Production-ready quality at any resolution.

Text to Video

Natural language prompts to cinematic video. Scene composition, camera movement, and lighting from text.

Scene-level coherence
Camera control
Multi-subject tracking

Image to Video

Animate any still image. Reference frames become living scenes with consistent style and motion.

Style preservation
Motion synthesis
Temporal consistency

Audio to Video

Audio-reactive generation. Music, voice, or ambient sound drives visual rhythm, cuts, and scene transitions.

Beat-synced motion
Voice-driven scenes
Ambient matching

Video to Video

Style transfer, upscaling, and re-rendering. Transform existing footage with new aesthetics while preserving structure.

Style transfer
Super resolution
Frame interpolation
4K
Native Resolution
60fps
Frame Rate
120s
Max Duration
<4s
Latency (10s clip)

Open Weights.
No Lock-in.

Full model weights on Hugging Face. Run locally, fine-tune on your data, deploy anywhere. Apache 2.0 license.

  • Hugging Face model hub with full documentation
  • ComfyUI nodes for visual workflow integration
  • Fine-tuning scripts and LoRA training support
  • Community-driven development on GitHub
View on GitHub
PSM-V3 Architecture
Multimodal Encodertext / image / audio / video
Diffusion Transformer3.2B params
Temporal Attentionscene coherence
VAE Decoder4K / 60fps output
C2PA Signingprovenance
Apache 2.0 · Hugging Face · ONNX · TensorRT

Production API.
Three Lines of Code.

REST API with streaming output, webhook callbacks, and batch processing. Generate at scale without managing infrastructure.

  • Sub-second first-frame latency
  • Webhook callbacks for async workflows
  • Batch processing with priority queues
  • 99.9% uptime SLA
Get API Key
import psm

client = psm.Client("your-api-key")

video = client.generate(
  prompt="Aerial drone shot over a misty
  mountain range at golden hour,
  cinematic, 4K",
  duration="10s",
  resolution="4k"
)

video.save("output.mp4")

Run It Your Way

Cloud API for speed. On-prem for control. Open weights for everything else.

Cloud API

Managed infrastructure with auto-scaling. Pay per generation. No GPUs to manage, no models to host.

On-Prem

Deploy in your own VPC or data center. Full data sovereignty. Docker containers with NVIDIA GPU support.

Open Weights

Download from Hugging Face. Fine-tune, distill, or integrate into your own pipeline. Apache 2.0.

Common Questions

Native 4K (3840x2160) at up to 60fps. Also supports 1080p, 720p, and custom aspect ratios including portrait (9:16) and square (1:1).

Up to 120 seconds per generation. For longer content, use the segment stitching API to chain clips with scene-level continuity.

Yes. Open weights include fine-tuning scripts and LoRA training support. Fine-tune on as few as 50 video clips to adapt the model to your visual style.

Minimum: 1x NVIDIA A100 (80GB) or H100. Recommended: 2x H100 for real-time generation at 4K. Quantized variants available for consumer GPUs (RTX 4090).

Open weights are Apache 2.0 — use commercially without restriction. The hosted API is pay-per-generation with volume discounts. Enterprise plans available.

Pass an audio file as input. The model extracts rhythm, tonality, and speech patterns to drive visual generation — beat-synced cuts, mood-matched scenes, and lip-sync for voice.

No visible watermarks. All outputs include invisible C2PA content credentials for provenance verification. You own the output.

Get Started

Start generating
in under a minute.

Free tier available. No credit card required. Upgrade when you're ready.

Get API Key View on GitHub