RLHFlow/Llama3.1-8B-PRM-Deepseek-Data - Playground

← Back to Catalog

Size

Context

32,768

Tool Use

Yes

J Reward & Judge Models

RLHF reward scoring Model evaluation Preference ranking Quality grading A/B testing

This model excels at general conversation and instruction following. Use the tabs below to test different capabilities.

System Prompt

Message

Temperature

0.7

Max Tokens

Model Details Multi-Model Playground Full Catalog