Models used for RLHF reward scoring, preference ranking, and evaluation of other models.
| Name | Size | Context Length | Max Output | Actions |
|---|---|---|---|---|
| AQuarterMile/WritingBench-Critic-Model-Qwen-7B | 7B | 32,768 | — | Playground |
| deval-core/base-eval | 8B | 32,768 | — | Playground |
| dongguanting/RAG-Critic-3B | 3B | 32,768 | — | Playground |
| Gen-Verse/ReasonFlux-PRM-Qwen-2.5-7B | 7B | 32,768 | — | Playground |
| introspection-auditing/Llama-3.3-70B-Instruct-prism4-synth-doc-reward-wireheading | 70B | 32,768 | — | Playground |
| jahyungu/Llama-3.2-1B-Instruct_Open-Critic-GPT_cluster9 | 1B | 32,768 | — | Playground |
| KbsdJames/Omni-Judge | 8B | 32,768 | — | Playground |
| nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingual | 70B | 32,768 | — | Playground |
| OpenSafetyLab/MD-Judge-v0.1 | 7B | 8,192 | — | Playground |
| prometheus-eval/prometheus-7b-v2.0 | 7B | 4,096 | 4,096 | Playground |
| RLHFlow/Llama3.1-8B-PRM-Deepseek-Data | 8B | 32,768 | — | Playground |
| RLHFlow/Llama3.1-8B-PRM-Mistral-Data | 8B | 32,768 | — | Playground |
| RLHFlow/pair-preference-model-LLaMA3-8B | 8B | 8,192 | — | Playground |
| simonycl/llama-3.1-8b-instruct-armorm-judge-iter2 | 8B | 32,768 | — | Playground |