Simple, transparent pricing
Start free. Scale with pay-as-you-go compute. No hidden fees.
Free
- Pay-as-you-go compute
- OpenAI-compatible API
- Fal.ai-compatible API
- Community support
Pro
- Pay-as-you-go compute
- OpenAI-compatible API
- Fal.ai-compatible API
- Priority support
- Higher rate limits
- Workflow orchestration
Enterprise
- Custom compute pricing
- Dedicated GPU capacity
- SLA guarantees
- SSO / SAML
- Audit logs
- Dedicated support
How billing works
Two components, no surprises.
Plan subscription
Your monthly plan sets your rate limits and included quotas. Think of it as your capacity reservation — Free, Pro, or Enterprise.
Per-model compute
GPU time is metered to the second and billed at the rate card for each model. You only pay for what you use — there is no idle cost.
Per-model rates
All rates are in USD. Compute is billed per-second for continuous workloads and per-request for discrete tasks.
Frequently asked questions
Is there a free tier?
Yes. The Free plan includes 1,000 jobs and 3,600 compute seconds per month at no cost. You only pay for compute beyond the included quota.
What happens when I hit my quota limits?
Once you exhaust your monthly quota, new job submissions will return a 429 error until the quota resets at the start of the next billing cycle. Upgrade to Pro for higher limits, or contact us for enterprise capacity.
How does billing work?
Your plan subscription covers your monthly quota. Compute usage is metered per-second (or per-request for discrete tasks) and billed separately at the rates shown above. A spending cap can be set to avoid surprise charges.
Can I set a spending cap?
Yes. You can configure a monthly spending cap in the Billing settings. Once the cap is reached, job submissions are paused until you raise the cap or the billing period resets.
What is enterprise pricing?
Enterprise plans offer custom rate cards, dedicated GPU capacity, SLA guarantees, and volume discounts. Contact our sales team to discuss your requirements.
How do I estimate my costs?
Multiply the per-second rate by your expected GPU time per job, then by your monthly job volume. Most LLM inference jobs run in 2–10 seconds; video generation typically takes 2–10 minutes.
What payment methods are accepted?
We accept all major credit and debit cards via Stripe. Enterprise customers may arrange invoice-based billing.
Can I switch plans?
Yes, you can upgrade or downgrade your plan at any time. Upgrades take effect immediately; downgrades apply at the start of the next billing period.
Need dedicated capacity?
Enterprise plans include custom rate cards, dedicated GPU pools, SLA guarantees, and a dedicated support channel.