← Back to Catalog
J

Reward & Judge Models

Models used for RLHF reward scoring, preference ranking, and evaluation of other models.

14 models listed 14 total available
RLHF reward scoring Model evaluation Preference ranking Quality grading A/B testing
Name Size Context Length Max Output Actions
AQuarterMile/WritingBench-Critic-Model-Qwen-7B 7B 32,768 Playground
deval-core/base-eval 8B 32,768 Playground
dongguanting/RAG-Critic-3B 3B 32,768 Playground
Gen-Verse/ReasonFlux-PRM-Qwen-2.5-7B 7B 32,768 Playground
introspection-auditing/Llama-3.3-70B-Instruct-prism4-synth-doc-reward-wireheading 70B 32,768 Playground
jahyungu/Llama-3.2-1B-Instruct_Open-Critic-GPT_cluster9 1B 32,768 Playground
KbsdJames/Omni-Judge 8B 32,768 Playground
nvidia/Llama-3.3-Nemotron-70B-Reward-Multilingual 70B 32,768 Playground
OpenSafetyLab/MD-Judge-v0.1 7B 8,192 Playground
prometheus-eval/prometheus-7b-v2.0 7B 4,096 4,096 Playground
RLHFlow/Llama3.1-8B-PRM-Deepseek-Data 8B 32,768 Playground
RLHFlow/Llama3.1-8B-PRM-Mistral-Data 8B 32,768 Playground
RLHFlow/pair-preference-model-LLaMA3-8B 8B 8,192 Playground
simonycl/llama-3.1-8b-instruct-armorm-judge-iter2 8B 32,768 Playground

Links