[Bug] qwen3 30b vl precision mismatch between npu and gpu

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

I ran sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode, and I found that the raw_reward curve mismatched between npu and gpu.

Can you align the precision between NPU and GPU?

### Reproduction

Just run sglang v0.5.10 qwen3 30b vl model under slime’s debug-rollout-only mode.

Compare npu & gpu raw_reward curve.

### Environment

sglang: v0.5.10
slime: v0.2.2
gpu: h100; npu: a3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] qwen3 30b vl precision mismatch between npu and gpu #23222

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] qwen3 30b vl precision mismatch between npu and gpu #23222

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions