API Guide - Featherless AI

Base URL

https://api.featherless.ai/v1

All endpoints below are relative to this base URL. Authentication uses a Bearer token in the Authorization header.

Gateway Alternative (no API key needed): This page documents the upstream Featherless provider API. To use these models without an API key through our gateway, use https://api.kim8.s4s.host/v1/ — no auth required, proxies to Featherless + 3 other providers. The gateway automatically logs all API calls and provides analytics at https://api.kim8.s4s.host/analytics/. API Gateway Docs →

Authentication

Include your API key in the Authorization header for all requests:

Authorization: Bearer YOUR_API_KEY

Get your API key from your Featherless dashboard.

Core API Endpoints

GET/v1/models

List all available models.

curl https://api.featherless.ai/v1/models \
  -H "Authorization: Bearer YOUR_API_KEY"

POST/v1/chat/completions

Send messages to a chat model and receive a completion.

Parameter	Type	Required	Description
`model`	string	Yes	Model ID (e.g. `nousresearch/hermes-3-llama-3.1-8b`)
`messages`	array	Yes	Array of message objects with `role` and `content`
`temperature`	float	No	Sampling temperature (0-2). Default: 1
`top_p`	float	No	Nucleus sampling threshold. Default: 1
`top_k`	int	No	Top-k sampling parameter
`max_tokens`	int	No	Maximum tokens to generate
`stop`	string/array	No	Stop sequences to end generation
`presence_penalty`	float	No	Penalize new tokens based on presence. Range: -2 to 2
`frequency_penalty`	float	No	Penalize new tokens based on frequency. Range: -2 to 2

curl https://api.featherless.ai/v1/chat/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-8b",
    "messages": [
      {"role": "system", "content": "You are a helpful assistant."},
      {"role": "user", "content": "Explain quantum computing in simple terms."}
    ],
    "temperature": 0.7,
    "max_tokens": 1024
  }'

POST/v1/completions

Text completion for base/non-chat models.

curl https://api.featherless.ai/v1/completions \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen2.5-coder-7b",
    "prompt": "def fibonacci(n):",
    "max_tokens": 256,
    "temperature": 0.3
  }'

POST/v1/tokenize

Count tokens for a given input text.

curl https://api.featherless.ai/v1/tokenize \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "nousresearch/hermes-3-llama-3.1-8b",
    "input": "Hello, how are you today?"
  }'

Playground API Endpoints

POST/api/chat

Playground chat endpoint (same format as /v1/chat/completions).

curl -X POST /api/chat \
  -H "Content-Type: application/json" \
  -d '{"model":"Qwen/Qwen2.5-Coder-32B-Instruct","messages":[{"role":"user","content":"Write a hello world"}]}'

POST/api/vision

Send an image for analysis. Accepts base64-encoded images or URLs.

curl -X POST /api/vision \
  -H "Content-Type: application/json" \
  -d '{"model":"mistralai/Magistral-Small-2506","image":"data:image/png;base64,...","prompt":"Describe this image"}'

POST/api/ocr

Extract text from an image using OCR-capable vision models.

curl -X POST /api/ocr \
  -H "Content-Type: application/json" \
  -d '{"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","image":"data:image/png;base64,..."}'

GET/api/search

Search for models by name or category.

curl "/api/search?q=Qwen&category=coding"

GET/api/models

Get model catalog data as JSON (used by the help pages).

Multi-Model Sequential Processing (JavaScript)

Chain multiple models together for review, refinement, or multi-step analysis:

const models = [
  'Qwen/Qwen2.5-Coder-32B-Instruct',
  'nvidia/OpenCodeReasoning-Nemotron-32B'
];

async function sequentialPipeline(prompt) {
  let context = prompt;
  const results = [];

  for (const model of models) {
    const res = await fetch('https://api.featherless.ai/v1/chat/completions', {
      method: 'POST',
      headers: {
        'Authorization': 'Bearer YOUR_API_KEY',
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({
        model,
        messages: [{ role: 'user', content: context }],
        max_tokens: 2048,
      }),
    });
    const data = await res.json();
    const output = data.choices[0].message.content;
    results.push({ model, output });
    context = `Previous output from ${model}:\n${output}\n\nImprove this further.`;
  }

  return results;
}

CI/CD Integration Example

How a CI tool would call the API for automated code review:

#!/bin/bash
# ci-code-review.sh - Run AI code review on a diff
API_KEY="${FEATHERLESS_API_KEY}"
MODEL="Qwen/Qwen2.5-Coder-32B-Instruct"
DIFF=$(git diff main...HEAD)

RESPONSE=$(curl -s https://api.featherless.ai/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
    --arg model "$MODEL" \
    --arg diff "$DIFF" \
    '{
      model: $model,
      messages: [
        {"role":"system","content":"You are a code reviewer. Be concise."},
        {"role":"user","content":("Review this diff:\n"+$diff)}
      ],
      max_tokens: 2048
    }'
  )")

echo "$RESPONSE" | jq -r '.choices[0].message.content'

# Post as PR comment (GitHub example)
COMMENT=$(echo "$RESPONSE" | jq -r '.choices[0].message.content')
# gh pr comment "$PR_NUMBER" --body "$COMMENT"

Rate Limiting & Plans

Rate limits vary by plan tier (free, pro, enterprise)
Concurrent requests are limited per account
Rate limit headers are included in every response (X-RateLimit-Limit, X-RateLimit-Remaining)
Exceeding limits returns HTTP 429 with a Retry-After header
Higher tiers offer more concurrent requests and higher throughput

Official Documentation

For complete details, parameters, and troubleshooting, see the official docs:

featherless.ai/docs/api-overview-and-common-options

Links

Model Catalog Usage Scenarios What Works Playground API Model Playground Vision AI Image Studio Full API Reference API Gateway

Featherless API Guide