DeepSeek Environment Deployment

1. DeepSeek Overview

DeepSeek is a rapidly emerging startup that has recently gained widespread attention with the launch of its DeepSeek-V3 large language model. After several rounds of technical iterations and optimizations, the performance of DeepSeek-V3 has reached a level comparable to the OpenAI-O1 model, and even surpassed it in certain aspects. Most notably, the DeepSeek R1 model has been fully open-sourced and is available for free use.

2. DeepSeek Deployment

There are two methods for deploying DeepSeek on the Luckfox Omni3576 running Debian 12: using the Ollama tool and using the Rockchip official RKLLM quantization deployment. The following sections will introduce both methods.

Name	Download Link
Ollama Software Package (linux-arm64)	Google Drive Download
DeepSeek Sample Program	Google Drive Download
RKLLM Model	Google Drive Download
Cross-compilation Tool: gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu	Google Drive Download

2.1 Deploying with Ollama Tool

Ollama is an open-source framework for running large language models locally, designed to allow users to easily deploy and run large language models (LLMs) on their local machines, supporting the latest DeepSeek models.

Download the Linux-arm64 version of the Ollama software package.

curl  -L https://ollama.com/download/ollama-linux-arm64.tgz  -o ollama-linux-arm64.tgz

Extract the file to the /usr directory.

sudo  tar -C /usr -xzf ollama-linux-arm64.tgz

创建 ollama 用户和用户组，并将当前用户加入到 ollama 用户组。

sudo  useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo  usermod -a -G ollama $(whoami)

Create the ollama user and group, and add the current user to the ollama group.

sudo vim /etc/systemd/system/ollama.service

[Unit]
Description=Ollama    Service
After=network-online.target
 
[Service]
ExecStart=/usr/bin/ollama    serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
 
[Install]
WantedBy=default.target

# Reload systemd configuration and enable the Ollama service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama

# After successful installation, the version will be displayed
luckfox@luckfox:~$ ollama -v
ollama version is 0.5.11

Run Ollama to execute the DeepSeek R1 1.5B model.
```
ollama  run deepseek-r1:1.5b
```
On the first run, the model files will be downloaded from the Ollama website.

2.2 Deploying with RKLLM Quantization (PC Ubuntu 22.04)

The RKLLM-Toolkit is a development suite designed to help users quantize and convert large language models on their computers. Similar to the previously introduced RKNN-Toolkit2, it simplifies model deployment and execution by providing a Python interface on PC platforms. To use RKNPU, users must first run RKLLM-Toolkit on the computer to convert the trained model to RKLLM format, then deploy it on the development board via the RKNN C API or Python API. Model training and conversion must be completed by the user. Model conversion can refer to the rknn-llm repository under rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/Readme.md. This section will focus on using the RKLLM model provided by Rockchip.

Clone the rknn-llm repository.

git  clone https://github.com/airockchip/rknn-llm.git  --depth 1

After cloning, check the directory structure.

doc
 └── Rockchip_RKLLM_SDK_CN.pdf # RKLLM SDK Chinese Documentation
 └── Rockchip_RKLLM_SDK_EN.pdf # RKLLM SDK English Documentation
examples
 ├── DeepSeek-R1-Distill-Qwen-1.5B_Demo # Board-side API inference demo
 ├── Qwen2-VL-2B_Demo # Multimodal inference demo
 └── rkllm_server_demo # RKLLM-Server deployment demo
rkllm-runtime
 ├── runtime
 │  └── Android
 │  └── librkllm_api
 │  └── arm64-v8a
 │  └── librkllmrt.so # RKLLM Runtime library
 │  └── include
 │  └── rkllm.h # Runtime header file
 │  └── Linux
 │  └── librkllm_api
 │  └── aarch64
 │  └── librkllmrt.so # RKLLM Runtime library
 │  └── include
 │  └── rkllm.h # Runtime header file
rkllm-toolkit
 ├── rkllm_toolkit-x.x.x-cp38-cp38-linux_x86_64.whl
 └── rkllm_toolkit-x.x.x-cp310-cp310-linux_x86_64.whl
rknpu-driver
 └── rknpu_driver_x.x.x_xxxxxxx.tar.bz2
scripts
 ├── fix_freq_rk3576.sh # RK3576 fixed-frequency script
 └── fix_freq_rk3588.sh # RK3588 fixed-frequency script

Go to the example directory.

cd  rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy

Configure the cross-compiler path by modifying the build-linux.sh file.
```
GCC_COMPILER_PATH=~/opts/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
修改为：
GCC_COMPILER_PATH=<sdk path>/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
```
- Note: Cross-compilers are typically backward compatible but not forward compatible. It is recommended to use version 10.2 or later. You can download the cross-compiler from the official site or use the version provided in the SDK.
Run the build-linux.sh script to cross-compile the example program.
```
./build-linux.sh
```
After compilation, an install folder will be generated in the deploy directory, containing the compiled executable and RKLLM runtime library.
```
install/
└──  demo_Linux_aarch64
     ├── lib
     │   └── librkllmrt.so
     └── llm_demo
```
Transfer the generated demo_Linux_aarch64 folder to the development board.
```
scp -r luckfox@192.168.9.185:/home/luckfox
```

Run the executable on the development board.

cd  /userdata/demo_Linux_aarch64/
 
# Set up the dependency library environment
export LD_LIBRARY_PATH=./lib

# View board-side inference performance:
export RKLLM_LOG_LEVEL=1

# Run the Executable
# Usage:  ./llm_demo <model_path> <max_new_tokens> <max_context_len>
./llm_demo  DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm 2048 4096

After running, the output will appear as shown in the following image. You can then start asking questions to the model:

1. DeepSeek Overview​

2. DeepSeek Deployment​

2.1 Deploying with Ollama Tool​

2.2 Deploying with RKLLM Quantization (PC Ubuntu 22.04)​

1. DeepSeek Overview

2. DeepSeek Deployment

2.1 Deploying with Ollama Tool

2.2 Deploying with RKLLM Quantization (PC Ubuntu 22.04)