banner
Loouis

Loouis

大模型推理引擎性能對比,VLLM、SGLang、LMDeploy吞吐量測試

簡單對比 3 個大模型推理引擎吞吐速度,單位為輸出 token/s,短輸入長輸出場景,其餘參數見表後

VLLM | SGLang | LMDeploy#

ConcurrencyVLLM 0.6.1.post2VLLM 0.6.3.post1LMDeploy 0.6.0a0LMDeploy 0.6.2SGLang 0.3.4.post2SGLang 0.3.4.post2
(--disable-cuda-graph)
128.7328.7656.1957.2437.2329.96
271.5373.26113.12113.4873.5958.28
4133.38136.05205.51199.01136.73111.24
8246.14251.59398.73393.48258.21215.53
16394.25401.67704.69709.27461.89444.48
32480.26481.75967.34973.24562.36557.93
64520.11526.011119.221123.07594.03602.36
128479.02481.63989.14890.44534.69582.97
  • 測試模型:Qwen2.5-14B-Instruct-AWQ
  • 硬體環境:E5 2680v4 + 2080ti 22G * 1

Pasted image 20241123103935

Pasted image 20241123103943


未經授權,請勿轉載

載入中......
此文章數據所有權由區塊鏈加密技術和智能合約保障僅歸創作者所有。