Created July 31, 2024

Deploy Llama model in AMD Radeon PRO W7900

Things used in this project

Hardware components

Story

LLM was amazing when it came out. So I want to build my own LLM on my computer. But I do not have a computer with high-performance GPU to run a large LLM. People have deployed LLM on many platforms, and I heard that AMD has great progress in its GPU. So, I am wondering if I can build a LLM based on AMD's GPU.

I would like to run a large llama on the w7900. To achieve this. I has learn a lot of knowledge and wring many codes.

First of all, I need to PTQ the LLM to save GPU memory.

Secondly, I need to write an inference engine to make the llm inference. Luckily the are many open source projects thatcan be used.

Thirdly, I need to write an UI for the easy use.

finally, we can run this

import gradio
import requests


def run(prompt):
    url = "http://127.0.0.1:8099/completion"
    data = {"prompt": prompt, "n_predict": 128}
    resp = requests.post(url, json=data)
    data = resp.json()
    content = data.pop("content")
    return content, data


# demo = gradio.Interface(
#        fn=run,
#         inputs=["text"],
#        outputs=["text", "json"]
# )


with gradio.Blocks() as demo:
    inp = gradio.Textbox(
        label="Prompt",
        placeholder="prompt",
        lines=3
    )

    btn_run = gradio.Button("Run")
    out = gradio.Textbox()
    extra = gradio.Json()

    btn_run.click(fn=run, inputs=inp, outputs=[out, extra])

# demo.launch(server_name="0.0.0.0")
demo.launch(server_name="0.0.0.0", share=True)

Credits

xsran

2 projects • 0 followers

Deploy Llama model in AMD Radeon PRO W7900

Things used in this project

Hardware components

Story

Code

LLM UI

server

Credits

xsran

Comments

Embed the widget on your own site

Deploy Llama model in AMD Radeon PRO W7900

Deploy Llama model in AMD Radeon PRO W7900

Things used in this project

Hardware components

Story

Code

LLM UI

server

Credits

xsran

Comments