SamplingParamsConfig (GRPO)
Below is the class definition for the Sampling Params Config.
class SamplingParamsConfig(BaseModel):
temperature: float | None = Field(default=None) # non-negative float, default is 0.9
top_p: float | None = Field(default=None) # value in (0, 1], default is 1.0
top_k: int | None = Field(default=None) # non-negative integer or -1 to consider all tokens, default is -1
max_tokens: int | None = Field(default=None) # non-negative integer, default is 1024
These parameters are used to control the sampling process for GRPO fine-tuning in which the model generates candidate completions for a given prompt.