A production-grade foundation model deployable via API, on-prem, or open weights.
Text, image, audio, and video inputs. Cinematic video output. Production-ready quality at any resolution.
Natural language prompts to cinematic video. Scene composition, camera movement, and lighting from text.
Animate any still image. Reference frames become living scenes with consistent style and motion.
Audio-reactive generation. Music, voice, or ambient sound drives visual rhythm, cuts, and scene transitions.
Style transfer, upscaling, and re-rendering. Transform existing footage with new aesthetics while preserving structure.
Full model weights on Hugging Face. Run locally, fine-tune on your data, deploy anywhere. Apache 2.0 license.
REST API with streaming output, webhook callbacks, and batch processing. Generate at scale without managing infrastructure.
import psm client = psm.Client("your-api-key") video = client.generate( prompt="Aerial drone shot over a misty mountain range at golden hour, cinematic, 4K", duration="10s", resolution="4k" ) video.save("output.mp4")
Cloud API for speed. On-prem for control. Open weights for everything else.
Managed infrastructure with auto-scaling. Pay per generation. No GPUs to manage, no models to host.
Deploy in your own VPC or data center. Full data sovereignty. Docker containers with NVIDIA GPU support.
Download from Hugging Face. Fine-tune, distill, or integrate into your own pipeline. Apache 2.0.
Native 4K (3840x2160) at up to 60fps. Also supports 1080p, 720p, and custom aspect ratios including portrait (9:16) and square (1:1).
Up to 120 seconds per generation. For longer content, use the segment stitching API to chain clips with scene-level continuity.
Yes. Open weights include fine-tuning scripts and LoRA training support. Fine-tune on as few as 50 video clips to adapt the model to your visual style.
Minimum: 1x NVIDIA A100 (80GB) or H100. Recommended: 2x H100 for real-time generation at 4K. Quantized variants available for consumer GPUs (RTX 4090).
Open weights are Apache 2.0 — use commercially without restriction. The hosted API is pay-per-generation with volume discounts. Enterprise plans available.
Pass an audio file as input. The model extracts rhythm, tonality, and speech patterns to drive visual generation — beat-synced cuts, mood-matched scenes, and lip-sync for voice.
No visible watermarks. All outputs include invisible C2PA content credentials for provenance verification. You own the output.
Free tier available. No credit card required. Upgrade when you're ready.