OpenAI-compatible API for running open-weight AI models
https://api.featherless.ai/v1
All endpoints below are relative to this base URL. Authentication uses a Bearer token in the Authorization header.
Gateway Alternative (no API key needed): This page documents the upstream Featherless provider API. To use these models without an API key through our gateway, use https://api.kim8.s4s.host/v1/ — no auth required, proxies to Featherless + 3 other providers. The gateway automatically logs all API calls and provides analytics at https://api.kim8.s4s.host/analytics/. API Gateway Docs →
Include your API key in the Authorization header for all requests:
Authorization: Bearer YOUR_API_KEY
Get your API key from your Featherless dashboard.
List all available models.
curl https://api.featherless.ai/v1/models \
-H "Authorization: Bearer YOUR_API_KEY"
Send messages to a chat model and receive a completion.
| Parameter | Type | Required | Description |
|---|---|---|---|
model | string | Yes | Model ID (e.g. nousresearch/hermes-3-llama-3.1-8b) |
messages | array | Yes | Array of message objects with role and content |
temperature | float | No | Sampling temperature (0-2). Default: 1 |
top_p | float | No | Nucleus sampling threshold. Default: 1 |
top_k | int | No | Top-k sampling parameter |
max_tokens | int | No | Maximum tokens to generate |
stop | string/array | No | Stop sequences to end generation |
presence_penalty | float | No | Penalize new tokens based on presence. Range: -2 to 2 |
frequency_penalty | float | No | Penalize new tokens based on frequency. Range: -2 to 2 |
curl https://api.featherless.ai/v1/chat/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nousresearch/hermes-3-llama-3.1-8b",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"temperature": 0.7,
"max_tokens": 1024
}'
Text completion for base/non-chat models.
curl https://api.featherless.ai/v1/completions \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen2.5-coder-7b",
"prompt": "def fibonacci(n):",
"max_tokens": 256,
"temperature": 0.3
}'
Count tokens for a given input text.
curl https://api.featherless.ai/v1/tokenize \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "nousresearch/hermes-3-llama-3.1-8b",
"input": "Hello, how are you today?"
}'
Playground chat endpoint (same format as /v1/chat/completions).
curl -X POST /api/chat \
-H "Content-Type: application/json" \
-d '{"model":"Qwen/Qwen2.5-Coder-32B-Instruct","messages":[{"role":"user","content":"Write a hello world"}]}'
Send an image for analysis. Accepts base64-encoded images or URLs.
curl -X POST /api/vision \
-H "Content-Type: application/json" \
-d '{"model":"mistralai/Magistral-Small-2506","image":"data:image/png;base64,...","prompt":"Describe this image"}'
Extract text from an image using OCR-capable vision models.
curl -X POST /api/ocr \
-H "Content-Type: application/json" \
-d '{"model":"mistralai/Mistral-Small-3.2-24B-Instruct-2506","image":"data:image/png;base64,..."}'
Search for models by name or category.
curl "/api/search?q=Qwen&category=coding"
Get model catalog data as JSON (used by the help pages).
Chain multiple models together for review, refinement, or multi-step analysis:
const models = [
'Qwen/Qwen2.5-Coder-32B-Instruct',
'nvidia/OpenCodeReasoning-Nemotron-32B'
];
async function sequentialPipeline(prompt) {
let context = prompt;
const results = [];
for (const model of models) {
const res = await fetch('https://api.featherless.ai/v1/chat/completions', {
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
model,
messages: [{ role: 'user', content: context }],
max_tokens: 2048,
}),
});
const data = await res.json();
const output = data.choices[0].message.content;
results.push({ model, output });
context = `Previous output from ${model}:\n${output}\n\nImprove this further.`;
}
return results;
}
How a CI tool would call the API for automated code review:
#!/bin/bash
# ci-code-review.sh - Run AI code review on a diff
API_KEY="${FEATHERLESS_API_KEY}"
MODEL="Qwen/Qwen2.5-Coder-32B-Instruct"
DIFF=$(git diff main...HEAD)
RESPONSE=$(curl -s https://api.featherless.ai/v1/chat/completions \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "$(jq -n \
--arg model "$MODEL" \
--arg diff "$DIFF" \
'{
model: $model,
messages: [
{"role":"system","content":"You are a code reviewer. Be concise."},
{"role":"user","content":("Review this diff:\n"+$diff)}
],
max_tokens: 2048
}'
)")
echo "$RESPONSE" | jq -r '.choices[0].message.content'
# Post as PR comment (GitHub example)
COMMENT=$(echo "$RESPONSE" | jq -r '.choices[0].message.content')
# gh pr comment "$PR_NUMBER" --body "$COMMENT"
X-RateLimit-Limit, X-RateLimit-Remaining)Retry-After headerFor complete details, parameters, and troubleshooting, see the official docs: