Checklist
Describe the bug
I ran sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode, and I found that the raw_reward curve mismatched between npu and gpu.
Can you align the precision between NPU and GPU?
Reproduction
Just run sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode.
Compare npu & gpu raw_reward curve.
Environment
sglang: v0.5.10
slime: v0.2.2
gpu: h100; npu: a3
Checklist
Describe the bug
I ran sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode, and I found that the raw_reward curve mismatched between npu and gpu.
Can you align the precision between NPU and GPU?
Reproduction
Just run sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode.
Compare npu & gpu raw_reward curve.
Environment
sglang: v0.5.10
slime: v0.2.2
gpu: h100; npu: a3