Models

Catalog of supported open-weight models across families including Meta (Llama ecosystem), Alibaba Cloud Qwen, Mistral AI, DeepSeek, NVIDIA Nemotron, IBM Granite, Nomic, and Cobble-built pipelines. Pricing assumes reclaimed GPU infrastructure, efficient vLLM serving, and open-weight licensing—benchmarked against typical marketplace rates.

Labels such as Flagship, Enterprise ready, Multilingual, and Best for RAG appear as tags on each card.

Featured picks

Featured

Qwen3.5 122B A10B

Cobble's flagship reasoning model offering frontier-class performance with Mixture-of-Experts efficiency, ideal for research, coding, and complex agent workflows.

$0.23 / 1M input tokens · $1.87 / 1M output tokens · $0.4675 / 1M cached tokens

Featured

Chandra OCR

Cobble's document extraction pipeline for archival and research-grade text processing.

$4 / 1K pages

Featured

Nomic Embed 1.5

State-of-the-art open embedding model with strong multilingual and long-context support.

$0.01 / 1M input tokens

Generative AI

Qwen3.5 122B A10B

Alibaba Cloud

FLAGSHIP

Cobble's flagship reasoning model offering frontier-class performance with Mixture-of-Experts efficiency, ideal for research, coding, and complex agent workflows.

Context window256K tokens

Throughput65 tokens/sec

QuantizationFP8

Pricing$0.23 / 1M input tokens · $1.87 / 1M output tokens · $0.4675 / 1M cached tokens

Supported endpoint/v1/chat/completions

ReasoningCodingAgents

Qwen3.6 27B

Alibaba Cloud

Balanced high-performance model with strong reasoning and code generation at lower latency and cost.

Context window256K tokens

Throughput55 tokens/sec

QuantizationFP8

Pricing$0.3 / 1M input tokens · $3.2 / 1M output tokens · $0.45 / 1M cached tokens

Supported endpoint/v1/chat/completions

General PurposeCodingReasoning

Qwen3.6 35B A3B

Alibaba Cloud

Sparse MoE architecture delivering high quality responses with excellent cost efficiency.

Context window256K tokens

Throughput85 tokens/sec

QuantizationFP8

Pricing$0.12 / 1M input tokens · $0.86 / 1M output tokens · $0.215 / 1M cached tokens

Supported endpoint/v1/chat/completions

MoeFastCost Efficient

Gemma4 31B

Google DeepMind

Large open model with excellent instruction following, multilingual capabilities, and coding performance.

Context window256K tokens

Throughput49 tokens/sec

QuantizationFP8

Pricing$0.1 / 1M input tokens · $0.31 / 1M output tokens · $0.0775 / 1M cached tokens

Supported endpoint/v1/chat/completions

MultilingualInstruction Following

Qwen3.5 9B

Alibaba Cloud

Low-latency utility model ideal for chatbots, summarization, and lightweight automation.

Context window256K tokens

Throughput90 tokens/sec

QuantizationFP8

Pricing$0.09 / 1M input tokens · $0.13 / 1M output tokens · $0.0325 / 1M cached tokens

Supported endpoint/v1/chat/completions

FastEconomyUtility

Gemma4 26B A4B

Google DeepMind

Efficient sparse variant of Gemma optimized for strong quality with lower serving costs.

Context window256K tokens

Throughput90 tokens/sec

QuantizationFP8

Pricing$0.05 / 1M input tokens · $0.27 / 1M output tokens · $0.0675 / 1M cached tokens

Supported endpoint/v1/chat/completions

MoeEconomyGeneral Purpose

Gemma4 12B

Google DeepMind

Unified encoder-free multimodal model handling text, image, audio, and video with strong quality at small-model cost.

Context window256K tokens

Throughput70 tokens/sec

QuantizationFP8

Pricing$0.09 / 1M input tokens · $0.27 / 1M output tokens · $0.0675 / 1M cached tokens

Supported endpoint/v1/chat/completions

MultimodalEconomyGeneral Purpose

Gemma4 E4B

Google DeepMind

Efficient 4B-class model tuned for high-volume, low-latency workloads like classification and extraction.

Context window128K tokens

Throughput130 tokens/sec

QuantizationFP8

Pricing$0.05 / 1M input tokens · $0.1 / 1M output tokens · $0.025 / 1M cached tokens

Supported endpoint/v1/chat/completions

FastEconomyUtility

Gemma4 E2B

Google DeepMind

Ultra-light 2B-class model for massive-scale pipelines, routing, and lightweight chat at minimal cost.

Context window128K tokens

Throughput160 tokens/sec

QuantizationFP8

Pricing$0.03 / 1M input tokens · $0.06 / 1M output tokens · $0.015 / 1M cached tokens

Supported endpoint/v1/chat/completions

FastEconomyUtility

Ministral 8B

Mistral AI

Compact Mistral 3-generation model with strong function calling and multilingual chat at flat token pricing.

Context window256K tokens

Throughput95 tokens/sec

QuantizationFP8

Pricing$0.135 / 1M input tokens · $0.135 / 1M output tokens · $0.0338 / 1M cached tokens

Supported endpoint/v1/chat/completions

MultilingualFunction CallingEconomy

Ministral 3B

Mistral AI

Smallest Mistral 3-generation model, ideal for edge-style automation, drafting, and structured output.

Context window128K tokens

Throughput140 tokens/sec

QuantizationFP8

Pricing$0.09 / 1M input tokens · $0.09 / 1M output tokens · $0.0225 / 1M cached tokens

Supported endpoint/v1/chat/completions

FastEconomyUtility

Mistral Nemo 12B

Mistral AI

Beloved creative workhorse with natural prose, strong multilingual range, and dependable instruction following.

Context window128K tokens

Throughput75 tokens/sec

QuantizationFP8

Pricing$0.018 / 1M input tokens · $0.027 / 1M output tokens · $0.0068 / 1M cached tokens

Supported endpoint/v1/chat/completions

CreativeMultilingualEconomy

GPT-OSS 20B

OpenAI

OpenAI's open-weight MoE model (3.6B active) with strong reasoning and tool use at very low cost.

Context window128K tokens

Throughput120 tokens/sec

QuantizationFP8

Pricing$0.026 / 1M input tokens · $0.117 / 1M output tokens · $0.0293 / 1M cached tokens

Supported endpoint/v1/chat/completions

MoeReasoningOpen SourceFast

DeepSeek V4 Flash

DeepSeek

COMING SOON

Efficiency-optimized Mixture-of-Experts model (284B total / 13B active) built for fast inference over a 1M-token context window.

Context window1M tokens

QuantizationFP8

Pricing—

Supported endpoint/v1/chat/completions

MoeLong ContextFast

Ornith 9B

DeepReinforce AI

COMING SOON

Compact self-scaffolding coding model from the Ornith 1.0 family, punching far above its size on agentic coding benchmarks.

Context windowTBA

QuantizationFP8

Pricing—

Supported endpoint/v1/chat/completions

CodingAgentsOpen Source

Ornith 1 35B

DeepReinforce AI

COMING SOON

Sparse MoE coding model (~3B active per token) that rivals much larger frontier models on Terminal-Bench and SWE-Bench.

Context windowTBA

QuantizationFP8

Pricing—

Supported endpoint/v1/chat/completions

CodingAgentsMoeOpen Source

Ornith 1 31B

DeepReinforce AI

COMING SOON

Dense 31B variant of the Ornith 1.0 agentic coding family, MIT-licensed with strong tool-use performance.

Context windowTBA

QuantizationFP8

Pricing—

Supported endpoint/v1/chat/completions

CodingAgentsOpen Source

OCR

GLM-OCR

Zhipu AI

General-purpose OCR model with strong support for complex layouts and multilingual documents.

Context windowUp to 200 pages per batch

QuantizationFP8

Pricing$0.08 / 1K pages

Supported endpoint/v1/ocr

DocumentsMultilingual

DeepSeek OCR2

DeepSeek

High-accuracy OCR and document understanding model optimized for tables and technical PDFs.

Context windowUp to 250 pages per batch

QuantizationFP8

Pricing$0.08 / 1K pages

Supported endpoint/v1/ocr

PdfTablesTechnical

Nemotron OCR v2

NVIDIA

Enterprise-grade OCR for forms, scanned documents, and large ingestion pipelines.

Context windowUp to 300 pages per batch

QuantizationFP8

Pricing$0.08 / 1K pages

Supported endpoint/v1/ocr

EnterpriseForms

Chandra OCR

Cobble Labs

FEATURED

Cobble's document extraction pipeline for archival and research-grade text processing.

Context windowUp to 500 pages per batch

QuantizationFP8

Pricing$4 / 1K pages

Supported endpoint/v1/ocr

ResearchArchives

Embeddings

Granite Embedding 311M

IBM

High-quality enterprise embedding model for semantic search and RAG applications.

Context window8K tokens

QuantizationFP8

Pricing$0.1 / 1M input tokens

Supported endpoint/v1/embeddings

EnterpriseRagSemantic Search

Granite Embedding 97M

IBM

Fast, lightweight embedding model for large-scale indexing and low-cost retrieval.

Context window8K tokens

QuantizationFP8

Pricing$0.09 / 1M input tokens

Supported endpoint/v1/embeddings

FastEconomy

Nomic Embed 1.5

Nomic AI

FEATURED

State-of-the-art open embedding model with strong multilingual and long-context support.

Context window8K tokens

QuantizationFP8

Pricing$0.01 / 1M input tokens

Supported endpoint/v1/embeddings

MultilingualOpen Source

Qwen3 Embedding 0.6B

Alibaba Cloud

Compact multilingual embedding model offering strong performance and low cost.

Context window32K tokens

QuantizationFP8

Pricing$0.01 / 1M input tokens

Supported endpoint/v1/embeddings

MultilingualEconomy

Qwen3 Embedding 8B

Alibaba Cloud

Flagship embedding model with excellent retrieval accuracy across multilingual corpora.

Context window32K tokens

QuantizationFP8

Pricing$0.01 / 1M input tokens

Supported endpoint/v1/embeddings

High AccuracyRag

OpenAI-compatible routes where noted — see Docs · Sign up