Models

Catalog of supported open-weight models across families including Meta (Llama ecosystem), Alibaba Cloud Qwen, Mistral AI, DeepSeek, NVIDIA Nemotron, IBM Granite, Nomic, and Cobble-built pipelines. Pricing assumes reclaimed GPU infrastructure, efficient vLLM serving, and open-weight licensing—benchmarked against typical marketplace rates.

Labels such as Flagship, Enterprise ready, Multilingual, and Best for RAG appear as tags on each card.

Featured picks

Featured

Qwen3.5 122B A10B

Cobble's flagship reasoning model offering frontier-class performance with Mixture-of-Experts efficiency, ideal for research, coding, and complex agent workflows.

$0.90 / 1M input tokens · $3.50 / 1M output tokens

Featured

Chandra OCR

Cobble's document extraction pipeline for archival and research-grade text processing.

$1.25 / 1K pages

Featured

Nomic Embed 1.5

State-of-the-art open embedding model with strong multilingual and long-context support.

$0.025 / 1M input tokens

Generative AI

Qwen3.5 122B A10B

Alibaba Cloud

FLAGSHIP

Cobble's flagship reasoning model offering frontier-class performance with Mixture-of-Experts efficiency, ideal for research, coding, and complex agent workflows.

Context window128K tokens
Throughput120–220 tokens/sec
Pricing$0.90 / 1M input tokens · $3.50 / 1M output tokens
Supported endpoint/v1/chat/completions
ReasoningCodingAgents

Qwen3.6 27B

Alibaba Cloud

Balanced high-performance model with strong reasoning and code generation at lower latency and cost.

Context window128K tokens
Throughput180–350 tokens/sec
Pricing$0.35 / 1M input tokens · $1.25 / 1M output tokens
Supported endpoint/v1/chat/completions
General PurposeCodingReasoning

Qwen3.6 35B A3B

Alibaba Cloud

Sparse MoE architecture delivering high quality responses with excellent cost efficiency.

Context window128K tokens
Throughput220–420 tokens/sec
Pricing$0.25 / 1M input tokens · $0.95 / 1M output tokens
Supported endpoint/v1/chat/completions
MoeFastCost Efficient

Gemma4 31B

Google DeepMind

Large open model with excellent instruction following, multilingual capabilities, and coding performance.

Context window128K tokens
Throughput150–300 tokens/sec
Pricing$0.45 / 1M input tokens · $1.50 / 1M output tokens
Supported endpoint/v1/chat/completions
MultilingualInstruction Following

Qwen3.5 9B

Alibaba Cloud

Low-latency utility model ideal for chatbots, summarization, and lightweight automation.

Context window128K tokens
Throughput350–700 tokens/sec
Pricing$0.10 / 1M input tokens · $0.35 / 1M output tokens
Supported endpoint/v1/chat/completions
FastEconomyUtility

Nemotron 3 Nano Omni 30B A3B

NVIDIA

Efficient NVIDIA MoE model tuned for enterprise assistants and agentic applications.

Context window128K tokens
Throughput220–450 tokens/sec
Pricing$0.22 / 1M input tokens · $0.85 / 1M output tokens
Supported endpoint/v1/chat/completions
NvidiaEnterpriseAgents

Gemma4 26B A4B

Google DeepMind

Efficient sparse variant of Gemma optimized for strong quality with lower serving costs.

Context window128K tokens
Throughput240–480 tokens/sec
Pricing$0.20 / 1M input tokens · $0.75 / 1M output tokens
Supported endpoint/v1/chat/completions
MoeEconomyGeneral Purpose

OCR

Chandra OCR

Cobble Labs

FEATURED

Cobble's document extraction pipeline for archival and research-grade text processing.

Context windowUp to 500 pages per batch
Throughput80–180 pages/minute
Pricing$1.25 / 1K pages
Supported endpoint/v1/ocr
ResearchArchives

GLM-OCR

Zhipu AI

General-purpose OCR model with strong support for complex layouts and multilingual documents.

Context windowUp to 200 pages per batch
Throughput50–120 pages/minute
Pricing$1.50 / 1K pages
Supported endpoint/v1/ocr
DocumentsMultilingual

DeepSeek OCR2

DeepSeek

High-accuracy OCR and document understanding model optimized for tables and technical PDFs.

Context windowUp to 250 pages per batch
Throughput60–140 pages/minute
Pricing$1.80 / 1K pages
Supported endpoint/v1/ocr
PdfTablesTechnical

Nemotron OCR v2

NVIDIA

Enterprise-grade OCR for forms, scanned documents, and large ingestion pipelines.

Context windowUp to 300 pages per batch
Throughput70–160 pages/minute
Pricing$2.00 / 1K pages
Supported endpoint/v1/ocr
EnterpriseForms

Embeddings

Nomic Embed 1.5

Nomic AI

FEATURED

State-of-the-art open embedding model with strong multilingual and long-context support.

Context window8K tokens
Throughput10,000–20,000 texts/minute
Pricing$0.025 / 1M input tokens
Supported endpoint/v1/embeddings
MultilingualOpen Source

Granite Embedding 311M

IBM

High-quality enterprise embedding model for semantic search and RAG applications.

Context window8K tokens
Throughput8,000–15,000 texts/minute
Pricing$0.030 / 1M input tokens
Supported endpoint/v1/embeddings
EnterpriseRagSemantic Search

Granite Embedding 97M

IBM

Fast, lightweight embedding model for large-scale indexing and low-cost retrieval.

Context window8K tokens
Throughput15,000–30,000 texts/minute
Pricing$0.015 / 1M input tokens
Supported endpoint/v1/embeddings
FastEconomy

Qwen3 Embedding 0.6B

Alibaba Cloud

Compact multilingual embedding model offering strong performance and low cost.

Context window32K tokens
Throughput8,000–18,000 texts/minute
Pricing$0.020 / 1M input tokens
Supported endpoint/v1/embeddings
MultilingualEconomy

Qwen3 Embedding 8B

Alibaba Cloud

Flagship embedding model with excellent retrieval accuracy across multilingual corpora.

Context window32K tokens
Throughput3,000–8,000 texts/minute
Pricing$0.06 / 1M input tokens
Supported endpoint/v1/embeddings
High AccuracyRag

OpenAI-compatible routes where noted — see Docs · Sign up