banner
Loouis

Loouis

大模型推理引擎性能对比,VLLM、SGLang、LMDeploy吞吐量测试

简单对比 3 个大模型推理引擎吞吐速度,单位为输出 token/s,短输入长输出场景,其余参数见表后

VLLM | SGLang | LMDeploy#

ConcurrencyVLLM 0.6.1.post2VLLM 0.6.3.post1LMDeploy 0.6.0a0LMDeploy 0.6.2SGLang 0.3.4.post2SGLang 0.3.4.post2
(--disable-cuda-graph)
128.7328.7656.1957.2437.2329.96
271.5373.26113.12113.4873.5958.28
4133.38136.05205.51199.01136.73111.24
8246.14251.59398.73393.48258.21215.53
16394.25401.67704.69709.27461.89444.48
32480.26481.75967.34973.24562.36557.93
64520.11526.011119.221123.07594.03602.36
128479.02481.63989.14890.44534.69582.97
  • 测试模型:Qwen2.5-14B-Instruct-AWQ
  • 硬件环境:E5 2680v4 + 2080ti 22G * 1

Pasted image 20241123103935

Pasted image 20241123103943


未经授权,请勿转载

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.