DeepSeek Environment Deployment
1. DeepSeek Overview
DeepSeek is a rapidly emerging startup that has recently gained widespread attention with the launch of its DeepSeek-V3 large language model. After several rounds of technical iterations and optimizations, the performance of DeepSeek-V3 has reached a level comparable to the OpenAI-O1 model, and even surpassed it in certain aspects. Most notably, the DeepSeek R1 model has been fully open-sourced and is available for free use.
2. DeepSeek Deployment
There are two methods for deploying DeepSeek on the Luckfox Omni3576 running Debian 12: using the Ollama tool and using the Rockchip official RKLLM quantization deployment. The following sections will introduce both methods.
| Name | Download Link |
|---|---|
| Ollama Software Package (linux-arm64) | Google Drive Download |
| DeepSeek Sample Program | Google Drive Download |
| RKLLM Model | Google Drive Download |
| Cross-compilation Tool: gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu | Google Drive Download |
2.1 Deploying with Ollama Tool
Ollama is an open-source framework for running large language models locally, designed to allow users to easily deploy and run large language models (LLMs) on their local machines, supporting the latest DeepSeek models.
Download the Linux-arm64 version of the Ollama software package.
curl -L https://ollama.com/download/ollama-linux-arm64.tgz -o ollama-linux-arm64.tgzExtract the file to the
/usrdirectory.sudo tar -C /usr -xzf ollama-linux-arm64.tgz创建 ollama 用户和用户组,并将当前用户加入到 ollama 用户组。
sudo useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
sudo usermod -a -G ollama $(whoami)Create the
ollamauser and group, and add the current user to theollamagroup.sudo vim /etc/systemd/system/ollama.service[Unit]
Description=Ollama Service
After=network-online.target
[Service]
ExecStart=/usr/bin/ollama serve
User=ollama
Group=ollama
Restart=always
RestartSec=3
Environment="PATH=$PATH"
[Install]
WantedBy=default.target# Reload systemd configuration and enable the Ollama service
sudo systemctl daemon-reload
sudo systemctl enable ollama
sudo systemctl start ollama
# After successful installation, the version will be displayed
luckfox@luckfox:~$ ollama -v
ollama version is 0.5.11Run Ollama to execute the DeepSeek R1 1.5B model.
ollama run deepseek-r1:1.5bOn the first run, the model files will be downloaded from the Ollama website.


2.2 Deploying with RKLLM Quantization (PC Ubuntu 22.04)
The RKLLM-Toolkit is a development suite designed to help users quantize and convert large language models on their computers. Similar to the previously introduced RKNN-Toolkit2, it simplifies model deployment and execution by providing a Python interface on PC platforms. To use RKNPU, users must first run RKLLM-Toolkit on the computer to convert the trained model to RKLLM format, then deploy it on the development board via the RKNN C API or Python API. Model training and conversion must be completed by the user. Model conversion can refer to the rknn-llm repository under rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/Readme.md. This section will focus on using the RKLLM model provided by Rockchip.
Clone the
rknn-llmrepository.git clone https://github.com/airockchip/rknn-llm.git --depth 1After cloning, check the directory structure.
doc
└── Rockchip_RKLLM_SDK_CN.pdf # RKLLM SDK Chinese Documentation
└── Rockchip_RKLLM_SDK_EN.pdf # RKLLM SDK English Documentation
examples
├── DeepSeek-R1-Distill-Qwen-1.5B_Demo # Board-side API inference demo
├── Qwen2-VL-2B_Demo # Multimodal inference demo
└── rkllm_server_demo # RKLLM-Server deployment demo
rkllm-runtime
├── runtime
│ └── Android
│ └── librkllm_api
│ └── arm64-v8a
│ └── librkllmrt.so # RKLLM Runtime library
│ └── include
│ └── rkllm.h # Runtime header file
│ └── Linux
│ └── librkllm_api
│ └── aarch64
│ └── librkllmrt.so # RKLLM Runtime library
│ └── include
│ └── rkllm.h # Runtime header file
rkllm-toolkit
├── rkllm_toolkit-x.x.x-cp38-cp38-linux_x86_64.whl
└── rkllm_toolkit-x.x.x-cp310-cp310-linux_x86_64.whl
rknpu-driver
└── rknpu_driver_x.x.x_xxxxxxx.tar.bz2
scripts
├── fix_freq_rk3576.sh # RK3576 fixed-frequency script
└── fix_freq_rk3588.sh # RK3588 fixed-frequency scriptGo to the example directory.
cd rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deployConfigure the cross-compiler path by modifying the
build-linux.shfile.GCC_COMPILER_PATH=~/opts/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
修改为:
GCC_COMPILER_PATH=<sdk path>/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu- Note: Cross-compilers are typically backward compatible but not forward compatible. It is recommended to use version 10.2 or later. You can download the cross-compiler from the official site or use the version provided in the SDK.
Run the
build-linux.shscript to cross-compile the example program../build-linux.shAfter compilation, an
installfolder will be generated in thedeploydirectory, containing the compiled executable and RKLLM runtime library.install/
└── demo_Linux_aarch64
├── lib
│ └── librkllmrt.so
└── llm_demoTransfer the generated
demo_Linux_aarch64folder to the development board.scp -r luckfox@192.168.9.185:/home/luckfoxRun the executable on the development board.
cd /userdata/demo_Linux_aarch64/
# Set up the dependency library environment
export LD_LIBRARY_PATH=./lib
# View board-side inference performance:
export RKLLM_LOG_LEVEL=1
# Run the Executable
# Usage: ./llm_demo <model_path> <max_new_tokens> <max_context_len>
./llm_demo DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm 2048 4096After running, the output will appear as shown in the following image. You can then start asking questions to the model:
