Skip to main content

DeepSeek Environment Deployment

1. DeepSeek Overview

DeepSeek is a rapidly emerging startup that has recently gained widespread attention with the launch of its DeepSeek-V3 large language model. After several rounds of technical iterations and optimizations, the performance of DeepSeek-V3 has reached a level comparable to the OpenAI-O1 model, and even surpassed it in certain aspects. Most notably, the DeepSeek R1 model has been fully open-sourced and is available for free use.

2. DeepSeek Deployment

There are two methods for deploying DeepSeek on the Luckfox Omni3576 running Debian 12: using the Ollama tool and using the Rockchip official RKLLM quantization deployment. The following sections will introduce both methods.

NameDownload Link
Ollama Software Package (linux-arm64)Google Drive Download
DeepSeek Sample ProgramGoogle Drive Download
RKLLM ModelGoogle Drive Download
Cross-compilation Tool: gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnuGoogle Drive Download

2.1 Deploying with Ollama Tool

Ollama is an open-source framework for running large language models locally, designed to allow users to easily deploy and run large language models (LLMs) on their local machines, supporting the latest DeepSeek models.

  1. Download the Linux-arm64 version of the Ollama software package.

    curl  -L https://ollama.com/download/ollama-linux-arm64.tgz  -o ollama-linux-arm64.tgz
  2. Extract the file to the /usr directory.

    sudo  tar -C /usr -xzf ollama-linux-arm64.tgz
  3. 创建 ollama 用户和用户组,并将当前用户加入到 ollama 用户组。

    sudo  useradd -r -s /bin/false -U -m -d /usr/share/ollama ollama
    sudo usermod -a -G ollama $(whoami)
  4. Create the ollama user and group, and add the current user to the ollama group.

    sudo vim /etc/systemd/system/ollama.service
    [Unit]
    Description=Ollama Service
    After=network-online.target

    [Service]
    ExecStart=/usr/bin/ollama serve
    User=ollama
    Group=ollama
    Restart=always
    RestartSec=3
    Environment="PATH=$PATH"

    [Install]
    WantedBy=default.target
    # Reload systemd configuration and enable the Ollama service
    sudo systemctl daemon-reload
    sudo systemctl enable ollama
    sudo systemctl start ollama

    # After successful installation, the version will be displayed
    luckfox@luckfox:~$ ollama -v
    ollama version is 0.5.11
  5. Run Ollama to execute the DeepSeek R1 1.5B model.

    ollama  run deepseek-r1:1.5b
  6. On the first run, the model files will be downloaded from the Ollama website.

2.2 Deploying with RKLLM Quantization (PC Ubuntu 22.04)

The RKLLM-Toolkit is a development suite designed to help users quantize and convert large language models on their computers. Similar to the previously introduced RKNN-Toolkit2, it simplifies model deployment and execution by providing a Python interface on PC platforms. To use RKNPU, users must first run RKLLM-Toolkit on the computer to convert the trained model to RKLLM format, then deploy it on the development board via the RKNN C API or Python API. Model training and conversion must be completed by the user. Model conversion can refer to the rknn-llm repository under rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/Readme.md. This section will focus on using the RKLLM model provided by Rockchip.

  1. Clone the rknn-llm repository.

    git  clone https://github.com/airockchip/rknn-llm.git  --depth 1
  2. After cloning, check the directory structure.

    doc
    └── Rockchip_RKLLM_SDK_CN.pdf # RKLLM SDK Chinese Documentation
    └── Rockchip_RKLLM_SDK_EN.pdf # RKLLM SDK English Documentation
    examples
    ├── DeepSeek-R1-Distill-Qwen-1.5B_Demo # Board-side API inference demo
    ├── Qwen2-VL-2B_Demo # Multimodal inference demo
    └── rkllm_server_demo # RKLLM-Server deployment demo
    rkllm-runtime
    ├── runtime
    │ └── Android
    │ └── librkllm_api
    │ └── arm64-v8a
    │ └── librkllmrt.so # RKLLM Runtime library
    │ └── include
    │ └── rkllm.h # Runtime header file
    │ └── Linux
    │ └── librkllm_api
    │ └── aarch64
    │ └── librkllmrt.so # RKLLM Runtime library
    │ └── include
    │ └── rkllm.h # Runtime header file
    rkllm-toolkit
    ├── rkllm_toolkit-x.x.x-cp38-cp38-linux_x86_64.whl
    └── rkllm_toolkit-x.x.x-cp310-cp310-linux_x86_64.whl
    rknpu-driver
    └── rknpu_driver_x.x.x_xxxxxxx.tar.bz2
    scripts
    ├── fix_freq_rk3576.sh # RK3576 fixed-frequency script
    └── fix_freq_rk3588.sh # RK3588 fixed-frequency script
  3. Go to the example directory.

    cd  rknn-llm/examples/DeepSeek-R1-Distill-Qwen-1.5B_Demo/deploy
  4. Configure the cross-compiler path by modifying the build-linux.sh file.

    GCC_COMPILER_PATH=~/opts/gcc-arm-10.2-2020.11-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
    修改为:
    GCC_COMPILER_PATH=<sdk path>/prebuilts/gcc/linux-x86/aarch64/gcc-arm-10.3-2021.07-x86_64-aarch64-none-linux-gnu/bin/aarch64-none-linux-gnu
    • Note: Cross-compilers are typically backward compatible but not forward compatible. It is recommended to use version 10.2 or later. You can download the cross-compiler from the official site or use the version provided in the SDK.
  5. Run the build-linux.sh script to cross-compile the example program.

    ./build-linux.sh
  6. After compilation, an install folder will be generated in the deploy directory, containing the compiled executable and RKLLM runtime library.

    install/
    └── demo_Linux_aarch64
    ├── lib
    │ └── librkllmrt.so
    └── llm_demo
  7. Transfer the generated demo_Linux_aarch64 folder to the development board.

    scp -r luckfox@192.168.9.185:/home/luckfox
  8. Run the executable on the development board.

    cd  /userdata/demo_Linux_aarch64/

    # Set up the dependency library environment
    export LD_LIBRARY_PATH=./lib

    # View board-side inference performance:
    export RKLLM_LOG_LEVEL=1

    # Run the Executable
    # Usage: ./llm_demo <model_path> <max_new_tokens> <max_context_len>
    ./llm_demo DeepSeek-R1-Distill-Qwen-1.5B_W4A16_RK3576.rkllm 2048 4096
  9. After running, the output will appear as shown in the following image. You can then start asking questions to the model: