I bought a 1C1G AMD Ryzen 9 7950X VPS on Black Friday, which can barely run LLM. Here’s a record of how to quickly install and run LLM on such a VPS.
Hardware Configuration#
---------------------Basic Information Query--Thanks to all open-source projects---------------------
CPU Model : AMD Ryzen 9 7950X 16-Core Processor
CPU Core Count : 1
CPU Frequency : 4491.540 MHz
CPU Cache : L1: 64.00 KB / L2: 512.00 KB / L3: 16.00 MB
AES-NI Instruction Set : ✔ Enabled
VM-x/AMD-V Support : ✔ Enabled
Memory : 90.74 MiB / 960.70 MiB
Swap : 0 KiB / 2.00 MiB
Disk Space : 1.12 GiB / 14.66 GiB
----------------------CPU Test--Passed sysbench test-------------------------
-> CPU Test in Progress (Fast Mode, 1-Pass @ 5sec)
1 Thread Test (Single Core) Score: 6402 Scores
---------------------Memory Test--Thanks to lemonbench open-source-----------------------
-> Memory Test (Fast Mode, 1-Pass @ 5sec)
Single Thread Read Test: 75694.60 MB/s
Single Thread Write Test: 42458.49 MB/s
Software Configuration#
- Select Inference Engine: Due to pure CPU inference, we choose to use Ollama as the inference engine.
- Select Model: Choose the Qwen2.5-0.5b model Q4 quantization version, which is less than 400MB in size, suitable for 1GB of memory.
Install and Run the Model#
curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen2.5:0.5b
Engage in Conversation#
>>> hello, who are you?
I am Qwen, an AI language model developed by Alibaba Cloud. I was trained using millions of natural language processing (NLP) examples from the internet and my responses are generated through advanced neural network algorithms. My primary goal is to assist with tasks such as text generation, summarization, answering questions, and more. If you have any questions or need further clarification on a topic, feel free to ask!
To exit the conversation, please type /bye
.
>>> /bye
Performance Testing#
-
Download Test Script
wget https://github.com/Yoosu-L/llmapibenchmark/releases/download/v1.0.1/llmapibenchmark_linux_amd64
-
Set Script Permissions
chmod +x ./llmapibenchmark_linux_amd64
-
Run Performance Test
./llmapibenchmark_linux_amd64 -base_url="http://127.0.0.1:11434/v1" -concurrency=1,2,4 #optional
Output Example#
################################################################################################################
LLM API Throughput Benchmark
https://github.com/Yoosu-L/llmapibenchmark
Time:2024-12-03 03:11:48 UTC+0
################################################################################################################
Input Tokens: 45
Output Tokens: 512
Test Model: qwen2.5:0.5b
Latency: 0.00 ms
| Concurrency | Generation Throughput (tokens/s) | Prompt Throughput (tokens/s) | Min TTFT (s) | Max TTFT (s) |
|-------------|----------------------------------|-------------------------------|--------------|--------------|
| 1 | 31.88 | 976.60 | 0.05 | 0.05 |
| 2 | 30.57 | 565.40 | 0.07 | 0.16 |
| 4 | 31.00 | 717.96 | 0.11 | 0.25 |
Uninstall#
# Stop Ollama service:
sudo systemctl stop ollama
# Disable Ollama service:
sudo systemctl disable ollama
# Remove Ollama service file:
sudo rm /etc/systemd/system/ollama.service
# Remove Ollama binary file:
sudo rm /usr/local/bin/ollama
# sudo rm /usr/bin/ollama
# sudo rm /bin/ollama
Disclaimer#
This tutorial is for entertainment purposes only. The 0.5b LLM is difficult to meet production requirements and can cause significant CPU and memory bandwidth usage during inference, affecting the experience of neighborsdeleted chicken.