Pricing

You only pay for what you use on Replicate. Some models are billed by hardware and time, others by input and output.

Public models

Thousands of open-source machine learning models have been contributed by our community and more are added every day. We also host a wide variety of proprietary models.

Most models are billed by the time they take to run. The price-per-second varies according to the hardware in use. When running or training one of these public models, you only pay for the time it takes to process your request.

Some models are billed by input and output. We've included some examples below.

You'll find estimates for how much any model will cost you on the model's page.

anthropic/claude-3.7-sonnet

anthropic/claude-3.7-sonnet

The most intelligent Claude model and the first hybrid reasoning model on the market (claude-3-7-sonnet-20250219)

$0.015 / thousand output tokens
$3.00 / million input tokens
black-forest-labs/flux-1.1-pro

black-forest-labs/flux-1.1-pro

Faster, better FLUX Pro. Text-to-image model with excellent image quality, prompt adherence, and output diversity.

$0.04 / output image
black-forest-labs/flux-dev

black-forest-labs/flux-dev

A 12 billion parameter rectified flow transformer capable of generating images from text descriptions

$0.025 / output image
black-forest-labs/flux-schnell

black-forest-labs/flux-schnell

The fastest image generation model tailored for local development and personal use

$3.00 / thousand output images
deepseek-ai/deepseek-r1

deepseek-ai/deepseek-r1

A reasoning model trained with reinforcement learning, on par with OpenAI o1

$0.01 / thousand output tokens
$3.75 / million input tokens
google/veo-2

google/veo-2

State of the art video generation model. Veo 2 can faithfully follow simple and complex instructions, and convincingly simulates real-world physics as well as a wide range of visual styles.

$0.50 / second of output video
ideogram-ai/ideogram-v3-quality

ideogram-ai/ideogram-v3-quality

The highest quality Ideogram v3 model. v3 creates images with stunning realism, creative designs, and consistent styles

$0.09 / output image
recraft-ai/recraft-v3

recraft-ai/recraft-v3

Recraft V3 (code-named red_panda) is a text-to-image model with the ability to generate long texts, and images in a wide list of styles. As of today, it is SOTA in image generation, proven by the Text-to-Image Benchmark by Artificial Analysis

$0.04 / output image
wavespeedai/wan-2.1-i2v-480p

wavespeedai/wan-2.1-i2v-480p

Accelerated inference for Wan 2.1 14B image to video, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

$0.09 / second of output video
wavespeedai/wan-2.1-i2v-720p

wavespeedai/wan-2.1-i2v-720p

Accelerated inference for Wan 2.1 14B image to video with high resolution, a comprehensive and open suite of video foundation models that pushes the boundaries of video generation.

$0.25 / second of output video

Private models

You aren't limited to the public models on Replicate: you can deploy your own custom models using Cog, our open-source tool for packaging machine learning models.

Unlike public models, most private models (with the exception of fast booting fine-tunes) run on dedicated hardware so you don't have to share a queue with anyone else. This means you pay for all the time instances of the model are online: the time they spend setting up; the time they spend idle, waiting for requests; and the time they spend active, processing your requests. If you get a ton of traffic, we automatically scale up and down to handle the demand.

For fast booting fine-tunes you'll only be billed for the time the model is active and processing your requests, so you won't pay for idle time like with other private models. Fast booting fine-tunes are labeled as such in the model's version list.

Hardware pricing

CPU
cpu
$0.000100/sec
$0.36/hr
GPU
-
CPU
4x
GPU RAM
-
RAM
8GB
Nvidia A100 (80GB) GPU
gpu-a100-large
$0.001400/sec
$5.04/hr
GPU
1x
CPU
10x
GPU RAM
80GB
RAM
144GB
2x Nvidia A100 (80GB) GPU
gpu-a100-large-2x
$0.002800/sec
$10.08/hr
GPU
2x
CPU
20x
GPU RAM
160GB
RAM
288GB
4x Nvidia A100 (80GB) GPU
gpu-a100-large-4x
$0.005600/sec
$20.16/hr
GPU
4x
CPU
40x
GPU RAM
320GB
RAM
576GB
8x Nvidia A100 (80GB) GPU
gpu-a100-large-8x
$0.011200/sec
$40.32/hr
GPU
8x
CPU
80x
GPU RAM
640GB
RAM
960GB
Nvidia H100 GPU
gpu-h100
$0.001525/sec
$5.49/hr
GPU
1x
CPU
13x
GPU RAM
80GB
RAM
72GB
Nvidia L40S GPU
gpu-l40s
$0.000975/sec
$3.51/hr
GPU
1x
CPU
10x
GPU RAM
48GB
RAM
65GB
2x Nvidia L40S GPU
gpu-l40s-2x
$0.001950/sec
$7.02/hr
GPU
2x
CPU
20x
GPU RAM
96GB
RAM
144GB
Nvidia T4 GPU
gpu-t4
$0.000225/sec
$0.81/hr
GPU
1x
CPU
4x
GPU RAM
16GB
RAM
16GB
Additional hardware
2x Nvidia H100 GPU
gpu-h100-2x
$0.003050/sec
$10.98/hr
Additional H100 capacity is reserved for committed spend contracts.
4x Nvidia H100 GPU
gpu-h100-4x
$0.006100/sec
$21.96/hr
Additional H100 capacity is reserved for committed spend contracts.
8x Nvidia H100 GPU
gpu-h100-8x
$0.012200/sec
$43.92/hr
Additional H100 capacity is reserved for committed spend contracts.

Learn more

For a deeper dive, check out how billing works on Replicate.

Enterprise & volume discounts

If you need more support or have complex requirements, we can offer:

  • Dedicated account manager
  • Priority support
  • Higher GPU limits
  • Performance SLAs
  • Help with onboarding, custom models, and optimizations

We've also got volume discounts for large amounts of spend. Email us at sales@replicate.com to learn more.