With the rapid advancement of academic research and technological innovation, a vast number of papers are published each year. For researchers, effectively sifting through, understanding, and synthesizing key information from these documents has become increasingly important but also more challenging. Traditional methods of literature review often rely on manual reading and summarization, which can be time-consuming, labor-intensive, and inefficient. Therefore, developing a tool that assists users in quickly skimming, comprehending, and extracting the core content of papers is particularly important.
In recent years, large language models have seen significant development, providing new possibilities for addressing these challenges. The outstanding performance of these models makes them an ideal choice for building efficient literature-assistance tools. This project aims to leverage large language models to develop an intelligent application for assisting with paper reading. This application will efficiently summarize, organize, synthesize, and abstract relevant information from scholarly documents, thereby enhancing the productivity of researchers.
How it worksThis project leverages AMD’s AI PC, which is equipped with a Neural Processing Unit (NPU) specifically designed to accelerate neural network inference. The NPU supports the deployment and application of various models, providing robust hardware support for high-performance artificial intelligence tasks. Building on this foundation, we have deployed a large language model called internlm/Agent-FLAN-7b. This model was trained on the Agent-FLAN dataset, building upon Llama2-7b, and it exhibits exceptional agency capabilities and natural language processing skills.
To enhance the model’s comprehension and response capabilities, we adopted the llamaindex framework to build a Retrieval-Augmented Generation (RAG) system. This framework converts users’ PDF files into an external knowledge base and retrieves relevant information based on user queries. The retrieved information is then integrated into the prompt, enabling the model to generate more accurate and meaningful responses based on the document content.
Driver InstallThe UM790 should come with the Integrated Processing Unit (IPU) driver pre-installed, but if you're unsure, you can reinstall it. There are two known installation paths:
1. One is through the download link mentioned in the tutorial on the AMD Ryzen AI website.(https://ryzenai.docs.amd.com/en/latest/inst.html)
2. Another is by downloading the driver tool from the Minisforum official website under UM790 Support.(https://www.minisforum.com/new/support?lang=cn#/support/page/download/79)
After installing the driver, if you don't see the IPU device listed in the Device Manager, it's likely because the IPU feature is not enabled in the BIOS settings. Follow these steps to enable it:
1. Enter BIOS:
Press and hold the Delete key during startup to enter the BIOS settings.
2. Navigate to IPU Settings:
Go to Setup > Advanced > CPU Configuration, and enable the IPU feature.
3. Save and Restart:
Save the changes and restart the computer.
After restarting, you should be able to see the AMD IPU Device listed under System Devices in the Device Manager.
Remember!!!!! All the command should conduct in CMD environment.
First, we should install Ryzen AI SW follow the instruction
Then, we clone our project to our computer
git clone https://github.com/fengzhaoxin/paper_helper.gitTo successfully run the LLM, we should move everything in the RyzenAI_Package folder to the conda env folder (YOUR_ENV_PATH\Lib\site-packages). We can run
conda env listto find the path of our conda env.
Finally we should install other used packages, such as llamaindex and streamlit.
Model QuantAfter successfully set up the environment, we can do model quantitation, all the things we need is in the "model_quant" folder.
we first activate the ryzenai-transformers env, then run "run_awq.py" for quantitation.
conda activate ryzenai-transformers
setup_local.bat
cd model_quant
python run_awq.py --w_bit 3 --lm_head --flash_attention --model ../Agent-FLAN-7b --output ./Agent-FLAN-7bwhere "--model../Agent-FLAN-7b" is the trained model folder, and "--output./Agent-FLAN-7b" is the output model name.
After quantitation, we will get the quant model, we move it to the model folder. Moreover, we should download the embedding-model "m3e-small", and move the tokenizer to the model folder, which is as follow:
First, we should apply a LlamaCloud API key on LlamaCloud (llamaindex.ai), and put it in your config.ini file just like your OpenAI key:
[llamma]
api_key = llx-xxxxxxxxxNow, we finish all the preparation, and can run the "web_demo.py" for test. Have fun!
streamlit run web_demo.pyDemonstration video





Comments