Models Pricing Docs

Agents that see, speak, and create.

Every modality. One API.

Text, image, video, voice — your agents handle them all through a single endpoint. Compose multi-modal workflows. Scale on managed infrastructure.

API Text Image Video Voice Transcription OCR

From anything, to anything

Pick an input modality. Pick an output. Casola handles the rest.

Text Image

Prompt → Generated artwork

Text Voice

Script → Narration

Text Video

Description → Generated clip

Voice Text

Audio → Transcript

Image Text

Photo → Description

Image Video

Still → Animated sequence

Video Text

Clip → Summary

Voice Voice

Audio → Cloned speech

Chain modalities into workflows

Compose multi-step pipelines that cross modality boundaries — declaratively.

Upload audio
voice
Transcribe
transcription
Summarize
text
Generate cover
image

Drop-in compatible

Use the SDKs you already know. Just point them at Casola.

app.ts
                  import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "https://api.casola.ai/openai/v1",
  apiKey: process.env.CASOLA_API_TOKEN,
});

// Text → Image
const image = await client.images.generate({
  model: "flux-schnell",
  prompt: "A sunset over Tokyo, ukiyo-e style",
});

// Audio → Text
const transcript = await client.audio.transcriptions.create({
  model: "whisper-large-v3-turbo",
  file: audioBlob,
});
                
app.ts
                  import { fal } from "@fal-ai/client";

fal.config({
  credentials: process.env.CASOLA_API_TOKEN,
  requestMiddleware: (config) => ({
    ...config,
    url: config.url.replace("https://rest.fal.ai", "https://api.casola.ai/fal"),
  }),
});

// Text → Video
const result = await fal.subscribe("fal-ai/wan/v2.2-5b/text-to-video", {
  input: {
    prompt: "A cat walking on a treadmill",
    num_frames: 81,
  },
});
                
claude_desktop_config.json
                  {
  "mcpServers": {
    "casola": {
      "command": "npx",
      "args": ["@casola/mcp-server"],
      "env": {
        "CASOLA_API_TOKEN": "sk-..."
      }
    }
  }
}
                

Run your models, our GPUs

Bring your own weights. We handle everything below the model.

Bring your weights

Upload fine-tuned model weights, run them on Casola's GPU fleet

Auto-scale

Scale from zero to hundreds of GPUs based on demand

Zero infra

No CUDA drivers, no Docker, no cloud accounts to manage

MLOps-friendly

Integrates with your existing training and deployment pipelines

Built for production

The controls your team needs before going live.

Regional data processing

Route jobs to EU or US regions. Data stays where you need it.

Content filtering

Built-in safety filters or bring your own moderation pipeline

Audit logging

Every request logged with full provenance for compliance

Team access control

Organizations, roles, and scoped API tokens out of the box

Start building

Free tier included. No credit card required.