banner
Loouis

Loouis

Running large language models on VPS

I bought a 1C1G AMD Ryzen 9 7950X VPS on Black Friday, which can barely run LLM. Here’s a record of how to quickly install and run LLM on such a VPS.

Hardware Configuration#

---------------------Basic Information Query--Thanks to all open-source projects---------------------
 CPU Model          : AMD Ryzen 9 7950X 16-Core Processor
 CPU Core Count     : 1
 CPU Frequency      : 4491.540 MHz
 CPU Cache          : L1: 64.00 KB / L2: 512.00 KB / L3: 16.00 MB
 AES-NI Instruction Set : ✔ Enabled
 VM-x/AMD-V Support : ✔ Enabled
 Memory             : 90.74 MiB / 960.70 MiB
 Swap               : 0 KiB / 2.00 MiB
 Disk Space         : 1.12 GiB / 14.66 GiB
----------------------CPU Test--Passed sysbench test-------------------------
 -> CPU Test in Progress (Fast Mode, 1-Pass @ 5sec)
 1 Thread Test (Single Core) Score:          6402 Scores
---------------------Memory Test--Thanks to lemonbench open-source-----------------------
 -> Memory Test (Fast Mode, 1-Pass @ 5sec)
 Single Thread Read Test:          75694.60 MB/s
 Single Thread Write Test:          42458.49 MB/s

Software Configuration#

  1. Select Inference Engine: Due to pure CPU inference, we choose to use Ollama as the inference engine.
  2. Select Model: Choose the Qwen2.5-0.5b model Q4 quantization version, which is less than 400MB in size, suitable for 1GB of memory.

Install and Run the Model#

curl -fsSL https://ollama.com/install.sh | sh
ollama run qwen2.5:0.5b

Engage in Conversation#

>>> hello, who are you?
I am Qwen, an AI language model developed by Alibaba Cloud. I was trained using millions of natural language processing (NLP) examples from the internet and my responses are generated through advanced neural network algorithms. My primary goal is to assist with tasks such as text generation, summarization, answering questions, and more. If you have any questions or need further clarification on a topic, feel free to ask!

To exit the conversation, please type /bye.

>>> /bye

Performance Testing#

  1. Download Test Script

    wget https://github.com/Yoosu-L/llmapibenchmark/releases/download/v1.0.1/llmapibenchmark_linux_amd64
    
  2. Set Script Permissions

    chmod +x ./llmapibenchmark_linux_amd64
    
  3. Run Performance Test

    ./llmapibenchmark_linux_amd64 -base_url="http://127.0.0.1:11434/v1" -concurrency=1,2,4 #optional
    

Output Example#

################################################################################################################
                                          LLM API Throughput Benchmark
                                    https://github.com/Yoosu-L/llmapibenchmark
                                         Time:2024-12-03 03:11:48 UTC+0
################################################################################################################
Input Tokens: 45
Output Tokens: 512
Test Model: qwen2.5:0.5b
Latency: 0.00 ms

| Concurrency | Generation Throughput (tokens/s) |  Prompt Throughput (tokens/s) | Min TTFT (s) | Max TTFT (s) |
|-------------|----------------------------------|-------------------------------|--------------|--------------|
|           1 |                            31.88 |                        976.60 |         0.05 |         0.05 |
|           2 |                            30.57 |                        565.40 |         0.07 |         0.16 |
|           4 |                            31.00 |                        717.96 |         0.11 |         0.25 |

Uninstall#

# Stop Ollama service:
sudo systemctl stop ollama

# Disable Ollama service:
sudo systemctl disable ollama

# Remove Ollama service file:
sudo rm /etc/systemd/system/ollama.service

# Remove Ollama binary file:
sudo rm /usr/local/bin/ollama
# sudo rm /usr/bin/ollama
# sudo rm /bin/ollama

Disclaimer#

This tutorial is for entertainment purposes only. The 0.5b LLM is difficult to meet production requirements and can cause significant CPU and memory bandwidth usage during inference, affecting the experience of neighborsdeleted chicken.

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.