Models Pricing Docs

Simple, transparent pricing

Start free. Scale with pay-as-you-go compute. No hidden fees.

Free

$0 forever
1K jobs / month
10 jobs / min
  • Pay-as-you-go compute
  • OpenAI-compatible API
  • Fal.ai-compatible API
  • Community support
Most popular

Pro

$49 per month
100K jobs / month
100 jobs / min
  • Pay-as-you-go compute
  • OpenAI-compatible API
  • Fal.ai-compatible API
  • Priority support
  • Higher rate limits
  • Workflow orchestration

Enterprise

Custom pricing
Unlimited jobs / month
1,000 jobs / min
  • Custom compute pricing
  • Dedicated GPU capacity
  • SLA guarantees
  • SSO / SAML
  • Audit logs
  • Dedicated support

How billing works

Two components, no surprises.

Plan subscription

Your monthly plan sets your rate limits and included quotas. Think of it as your capacity reservation — Free, Pro, or Enterprise.

Per-model compute

GPU time is metered to the second and billed at the rate card for each model. You only pay for what you use — there is no idle cost.

Per-model rates

All rates are in USD. Compute is billed per-second for continuous workloads and per-request for discrete tasks.

Pricing coming soon — contact us for early access rates.

Frequently asked questions

Is there a free tier?

Yes. The Free plan includes 1,000 jobs and 3,600 compute seconds per month at no cost. You only pay for compute beyond the included quota.

What happens when I hit my quota limits?

Once you exhaust your monthly quota, new job submissions will return a 429 error until the quota resets at the start of the next billing cycle. Upgrade to Pro for higher limits, or contact us for enterprise capacity.

How does billing work?

Your plan subscription covers your monthly quota. Compute usage is metered per-second (or per-request for discrete tasks) and billed separately at the rates shown above. A spending cap can be set to avoid surprise charges.

Can I set a spending cap?

Yes. You can configure a monthly spending cap in the Billing settings. Once the cap is reached, job submissions are paused until you raise the cap or the billing period resets.

What is enterprise pricing?

Enterprise plans offer custom rate cards, dedicated GPU capacity, SLA guarantees, and volume discounts. Contact our sales team to discuss your requirements.

How do I estimate my costs?

Multiply the per-second rate by your expected GPU time per job, then by your monthly job volume. Most LLM inference jobs run in 2–10 seconds; video generation typically takes 2–10 minutes.

What payment methods are accepted?

We accept all major credit and debit cards via Stripe. Enterprise customers may arrange invoice-based billing.

Can I switch plans?

Yes, you can upgrade or downgrade your plan at any time. Upgrades take effect immediately; downgrades apply at the start of the next billing period.

Need dedicated capacity?

Enterprise plans include custom rate cards, dedicated GPU pools, SLA guarantees, and a dedicated support channel.