RewardFunctionsConfig (GRPO)
Below is the class definition for the Reward Functions Config.
class RewardFunctionsConfig(BaseModel):
runtime: RewardFunctionsRuntimeConfig | None = Field(default=None)
functions: dict[str, RewardFunction] = Field(default_factory=dict)
class RewardFunctionsRuntimeConfig(BaseModel):
packages: list[str] | None = Field(default=None)
class RewardFunction(BaseModel):
encoded_fn: str = Field(serialization_alias="encodedFn")
This config is used to set the desired reward functions for GRPO training, as well as specify package dependencies that need to be installed for the reward functions to run. For example:
def my_reward_function(prompt, completion, example) -> float:
import my_pkg
my_pkg.score(...)
cfg = RewardFunctionsConfig(
runtime=RewardFunctionsRuntimeConfig(
packages=[
"mypkg",
]
),
functions={
"my_reward": my_reward_function
},
)