Introduction
How to Choose LLM
How to run LLM
Quantization
Model Deployment
Summary
Test for Lattepanda 3 Delta 864 (8GB) & LLM
Test for Raspberry Pi 5 (8GB) & LLM
Reference
Related Article

Published September 18, 2024 © Apache-2.0

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA etc. )

Deploy and run popular LLMs (Large Language Models) on the Lattepanda 3 Delta 864, including LLaMA, LLaMA2, Phi-2, and ChatGLM2.

BeginnerProtip1 hour65

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA etc. )

Things used in this project

Hardware components

LattePanda 3 Delta

Story

Introduction

This article will guide you on how to deploy and run popular LLMs (Large Language Models) on the Lattepanda 3 Delta 864, including LLaMA, LLaMA2, Phi-2, and ChatGLM2. We will compare the differences in runtime speed, resource consumption, and model performance among these LLMs to assist you in selecting a device that meets your needs and to provide a reference for AI research with limited hardware resources. Additionally, we will discuss the key steps and considerations to help you experience and test the performance of LLMs on the Lattepanda 3 Delta 864.

How to Choose LLM

LLM usually puts forward the prerequisite requirements for CPU/GPU in the project requirements. Since GPU inference for LLM is not currently available on the Lattepanda 3 Delta 864, we need to prioritize models that support CPU. Due to the RAM limitations, we should give preference to smaller models. Generally, a model requires RAM that is double its memory size to operate smoothly. Quantized models have lower memory demands. Therefore, we recommend using quantized models to experience the performance of LLMs on the Lattepanda 3 Delta 864.

The following list is a selection of smaller models from the open_llm_leaderboard on the Huggingface website and the latest popular models.

P.S.

1.ARC(AI2 Reasoning Challenge)

2.HellaSwag(Testing the model's common sense reasoning abilities)

3.MMLU(Measuring Massive Multitask Language Understanding)

4.TruthfulQA(Measuring How Models Mimic Human Falsehoods)

How to run LLM

We used LLaMA.cpp and the CPU of the Lattepanda 3 Delta 864 to infer LLMs. Here, we will take ChatGLM-6B as an example to provide you with detailed instructions on how to deploy and run an LLM on the Lattepanda 3 Delta 864, which has 8GB RAM, 64GB eMMC, and is running Ubuntu 20.04.

Quantization

The following is the process of quantizing ChatGLM2-6B 4bit via GGML on a Linux PC:

The first section of the process is to set up ChatGLM.cpp on a Linux PC, download the ChatGLM-6B-int4 model, convert and copy it to a USB drive. We need the Linux PC’s extra power to convert the model as the 8GB of RAM in a delta864 is insufficient.

Clone the ChatGLM.cpp repository into your local machine:

git clone --recursive https://github.com/li-plus/chatglm.cpp.git && cd chatglm.cpp

If you forgot the --recursive flag when cloning the repository, run the following command in the chatglm.cpp folder:

git submodule update --init --recursive

Install necessary packages:

python3 -m pip install -U pip

python3 -m pip install torch tabulate tqdm transformers accelerate sentencepiece

Compile the project using CMake:

sudo apt-get install cmake
cmake -B build
cmake --build build -j --config Release

pip uninstall transformers
pip install transformers==4.33.0

Download the model and other files to chatglm.cpp/THUDM/chatglm-6b: https://huggingface.co/THUDM/chatglm-6b-int4

Use convert.py to transform ChatGLM-6B into quantized GGML format. For example, to convert the fp16 original model to q4_0 (quantized int4) GGML model, run:

python3 chatglm_cpp/convert.py -i THUDM/chatglm-6b -t q4_0 -o chatglm-ggml.bin

Model Deployment

Here is the process of deploying and running ChatGLM-6B-q4 on Lattepanda 3 delta 864 Ubuntu 20.04:

git clone --recursive https://github.com/li-plus/chatglm.cpp.git && cd chatglm.cpp

git submodule update --init --recursive

python3 -m pip install -U pip

python3 -m pip install torch tabulate tqdm transformers accelerate sentencepiece

sudo apt-get install cmake
cmake -B build
cmake --build build -j --config Release

pip uninstall transformers
pip install transformers==4.33.0

To run the model in interactive mode, add the -i flag. For example:

cd chatglm.cpp
./build/bin/main -m chatglm-ggml.bin -i

In interactive mode, your chat history will serve as the context for the next round of conversation.

Run ./build/bin/main -h to explore more options!

Summary

Test for Lattepanda 3 Delta 864 (8GB) & LLM

Test for Raspberry Pi 5 (8GB) & LLM

Reference：

ChatGLM.cpp

Llama.cpp

Deploy and run LLM on LattePanda Sigma (LLaMA, Alpaca, LLaMA2, ChatGLM)

Deploy and run LLM on Raspberry Pi 5 vs Raspberry Pi 4B (LLaMA, LLaMA2, Phi-2, Mixtral-MOE, mamba-gp

Deploy and run LLM on Raspberry Pi 4B (LLaMA, Alpaca, LLaMA2, ChatGLM)

DFRobot

67 projects • 157 followers

Empowering Creation for Future Innovators

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA etc. )

Things used in this project

Hardware components

Story

Introduction

How to Choose LLM

How to run LLM

Quantization

Model Deployment

Summary

Test for Lattepanda 3 Delta 864 (8GB) & LLM

Test for Raspberry Pi 5 (8GB) & LLM

Reference：

Credits

DFRobot

Comments

Embed the widget on your own site

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA etc. )

Deploy and run LLM on Lattepanda 3 Delta 864 (LLaMA etc. )

Things used in this project

Hardware components

Story

Introduction

How to Choose LLM

How to run LLM

Quantization

Model Deployment

Summary

Test for Lattepanda 3 Delta 864 (8GB) & LLM

Test for Raspberry Pi 5 (8GB) & LLM

Reference：

Related Article:

Credits

DFRobot

Comments

Related channels and tags