You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have already implemented the initial support for eagle speculative decoding with the overlap scheduler, and here is the roadmap for more feature optimizations and support. The initial skeleton code is this PR #11398
Motivation
We have already implemented the initial support for eagle speculative decoding with the overlap scheduler, and here is the roadmap for more feature optimizations and support. The initial skeleton code is this PR #11398
The design illustration is here
Note
The arg
--enable-beta-spechas been deprecated, please useexport SGLANG_ENABLE_SPEC_V2=1to enable this feature.page size & topk support
memory allocation
Attention backend support
verify_done.synchronize()an option @hnyls2002sampling
speculative methods
SpecTpWorkerfor all speculative decoding backends @hnyls2002SpecTpWorkercompatible with allTpModelWorkerfeatures.DP attention support
EP support
PD disaggregation
LoRA Support
Aggressive Optimizations
Related resources
No response