My MTP post showed multi-token prediction roughly doubling Qwen3. 6-27B's generation on a 3090. A reader asked the question I'd skipped: what about prompt processing at long context ?
Source: [Dev.to](https://dev.to/sysoft/the-prefill-wall-why-mtps-2x-barely-moves-long-context-latency-qwen36-27b-rtx-3090-185i)