← Back to Catalog

RLHFlow/Llama3.1-8B-PRM-Deepseek-Data

Size
8B
Context
32,768
Tool Use
Yes
J Reward & Judge Models
RLHF reward scoring Model evaluation Preference ranking Quality grading A/B testing

This model excels at general conversation and instruction following. Use the tabs below to test different capabilities.

0.7