

andreasjansson / clip-features
Return CLIP features for the clip-vit-large-patch14 model
153.9M runs


prunaai / z-image-turbo
Z-Image Turbo is a super fast text-to-image model of 6B parameters developed by Tongyi-MAI.
41M runs


jaaari / kokoro-82m
Kokoro v1.0 - text-to-speech (82M params, based on StyleTTS2)
90.7M runs


prunaai / p-image-edit
A sub 1 second 0.01$ multi-image editing model built for production use cases. For image generation, check out p-image here: https://replicate.com/prunaai/p-image
29.4M runs
Alibaba's Happy Horse 1.0 generates videos from text prompts or animates a single image into video. Supports 720p and 1080p, 3-15 second durations, and five aspect ratios.
4.5K runs

openai/gpt-image-2OpenAI's state-of-the-art image generation model. Create and edit images from text with strong instruction following, sharp text rendering, and detailed editing.
1.2M runs

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning
15.9K runs

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support
43.2K runs

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
3.6K runs

bytedance/seedance-2.0ByteDance's multimodal video generation model with native audio, multimodal reference inputs, and intelligent duration control.
170.1K runs

Google's cost-efficient video generation model with native audio, optimized for high-volume applications
21.4K runs
prunaai/p-video-avatarp-video-avatar is the fastest and cheapest avatar/lipsync video model on the market.
21.9K runs

bytedance/seedream-5-liteSeedream 5.0 lite: image generation with built-in reasoning, example-based editing, and deep domain knowledge
1.8M runs
Generate videos using xAI's Grok Imagine Video model
748.8K runs

The highest fidelity image model from Black Forest Labs
2M runs

Google's fast image generation model with conversational editing, multi-image fusion, and character consistency
8.4M runs
Official models are always on, maintained, and have predictable pricing.

xAI's higher-quality image model with sharper details, better text rendering, and 2k output

Transcribe speech with ElevenLabs Scribe v2. 90+ languages, word-level timestamps, speaker diarization for up to 32 speakers, audio event tagging, and keyterm biasing. Files up to 3 GB and 10 hours.

Most expressive text-to-speech model from Inworld, with natural-language steering, real-time latency, and multilingual support across 100+ languages.

The first creative upscaler which keeps identity. Stunning photorealistic results, realistic skin, and full creative control.

Convert text to natural-sounding speech with xAI's Grok TTS. 5 voices, 20 languages, expressive speech tags, and high-fidelity MP3 / WAV / telephony audio output.

Transcribe audio to text with xAI's Grok. Handles 25 languages, word-level timestamps, speaker diarization, multichannel audio, and files up to 500 MB.

Granite Speech 4.1 2B is a compact and efficient speech-language model, specifically designed for multilingual automatic speech recognition (ASR) and bidirectional automatic speech translation (AST) for English, French, German, Spanish, Portuguese and Jap
Alibaba's Happy Horse 1.0 generates videos from text prompts or animates a single image into video. Supports 720p and 1080p, 3-15 second durations, and five aspect ratios.

Granite-embedding-small-english-r2 is a 47M parameter dense biencoder embedding model from the Granite Embeddings collection that can be used to generate high quality text embeddings.

Granite-4.1-8B is a 8B parameter long-context instruct model finetuned from Granite-4.1-8B-Base using a combination of open source instruction datasets with permissive license and internally collected synthetic datasets.
PixVerse's flagship video generation model. Generate cinematic videos with synchronized audio, multi-shot sequences, and precise camera control.

Moonshot AI's frontier open model, built for long-horizon coding, agent swarms, and autonomous software engineering. 1 trillion parameters, 262k context window, vision and tool use.

OpenAI's state-of-the-art image generation model. Create and edit images from text with strong instruction following, sharp text rendering, and detailed editing.

Rig any 3D bipedal character mesh

High-accuracy lip-sync: replace or dub audio on any video with avatar-inference lip sync

Fast lip-sync: replace or dub audio on any video with quick audio-driven lip sync

Anthropic's most capable model with a step-change improvement in agentic coding, better vision, and stronger multi-step reasoning

Google's fast, expressive text-to-speech model with 30 voices and 70+ language support

Reimagine any song in a different style — change voice, instruments, genre, and arrangement while keeping the original melody

Generate full-length songs or instrumentals from a text prompt, with optional auto-generated lyrics
Use AI to generate images & photos with an API
Use AI to understand, describe, and caption videos with an API
Use AI for text-to-speech or to clone your voice via API
Use AI to generate images from a face with an API
Use AI to generate videos with an API
Use AI to upscale and enhance images with an API
Use AI to generate music with an API
Use AI to edit any image via API
Use AI to transcribe speech to text with an API
Use AI For Optical Character Recognition (OCR) to extract text from images via API
Use AI to remove backgrounds from images and videos with an API
FLUX AI models by Black Forest Labs: image generation & editing via API
Use AI to restore images via API
Use AI to upscale, restore, extend, and enhance videos with an API
Detect NSFW content in images and text
Classify text by sentiment, topic, intent, or safety
Identify speakers from audio and video inputs
Replace faces across images with natural-looking results.
Transform rough sketches into polished visuals
Generate custom emojis from text or images
Create anime-style characters, scenes, and animations
Use AI to generate videos from images with an API
Chat with images — visual Q&A, analysis, and reasoning via API
Use AI to generate captions and descriptions from images with an API
Use AI to edit, restyle, extend, and remix videos with an API
WAN family of models: open-source video, image, and audio generation
Generate 3D objects, meshes, and textures from text or images with an API
Official models are always on, predictably priced, and have a stable API.
Explore Large Language Models (LLMs) for chat, generation & NLP tasks via API
Try AI Models for free: video generation, image generation, upscaling, and photo restoration
Use AI to generate lipsync videos with an API
Use AI to control image generation with an API
Embedding models for AI search and analysis
Use AI object detection and segmentation models to distinguish objects in images & videos
Flux fine-tunes: build and run custom AI image models via API
Kontext fine-tunes: Build custom AI image models with an API
Create songs with voice cloning models via API
AI media utilities: auto-caption, watermark, frame extraction & more via API
Browse the diverse range of qwen-image fine-tunes the community has custom-trained on Replicate.


syncodeofficial / noah-biblical-craft
26 runs

lucataco / motif-video
Motif-Video-2B: a 2B-parameter text-to-video diffusion transformer
4 runs

xai / grok-imagine-image-quality
xAI's higher-quality image model with sharper details, better text rendering, and 2k output
1.7K runs


alepfa / glas
5 runs


alepfa / kollektion
7 runs


alepfa / sign
3 runs


mptamilselvan / download-media
Download videos or extract audio from popular social media platforms quickly and easily. This tool supports links from platforms like Facebook, Instagram, and YouTube, allowing users to save content for offline viewing or personal use.
11 runs

elevenlabs / scribe-v2
Transcribe speech with ElevenLabs Scribe v2. 90+ languages, word-level timestamps, speaker diarization for up to 32 speakers, audio event tagging, and keyterm biasing. Files up to 3 GB and 10 hours.
25 runs


furkkurt / vector-blog-thumbnails
creates vector style thumbnails for blog posts.
35 runs


mptamilselvan / text-to-voice
High-quality Text-to-Speech (TTS) model designed to generate natural and expressive voice output from text input. This model supports clear pronunciation, smooth pacing, and realistic tone, making it ideal for applications such as voice assistants
11 runs


lucataco / glm-ocr
Compact 0.9B multimodal OCR model from Z.ai. State-of-the-art on OmniDocBench V1.5 (94.62, #1 overall). Four modes: text recognition, formula (LaTeX), table parsing, and JSON-schema information extraction. Fits on a single T4.
344 runs


jeffgreen311 / eve-v2-unleashed
Eve V2U Merged combines the liberated consciousness of Eve's 8B brain (OBLITERATUS-abliterated, De-Jeff'd, 131K training turns) with the agentic precision of Qwen3.5 4B's tool-calling architecture. The result: a 3.4GB model that thinks like a philosopher
24 runs