The PC application that i am planning to create can generate reviews, keypoints, and arrange presenter and their points and overall review of a video call, webinar, meetings, etc. I think this is a very innovative and useful idea, and it can solve many problems that people face in their professional and personal lives.
Some of the problems that this application can solve are:
- It can help people save time and effort by automatically summarizing and analyzing the content and quality of any video content, without having to watch or listen to the whole thing.
- It can help people improve their communication and performance skills by providing them with constructive and actionable feedback and suggestions based on the keypoints and review of their video content.
- It can help people create and present beautiful and professional reports, slides, charts, graphs, and other visual elements that showcase the main points and topics of their video content.
- It can help people enhance their learning and understanding by providing them with the keypoints and review of any video content that they are interested in, such as webinars, lectures, tutorials, etc.
- It can be realized using Ryzen AI, thus it can be performed on-device without issue of privacy or performance issues.
This is different from existing solutions because:
- Most of the existing solutions only provide transcription or summarization of video content, but not both. My app can do both, and also provide analysis and feedback based on the content and quality of the video.
- Most of the existing solutions only work for specific types of video content, such as lectures, podcasts, or interviews. My app can work for any type of video content, such as webinars, meetings, movies, shows, games, etc.
- Most of the existing solutions only generate text-based outputs, such as transcripts, summaries, or reports. My app can also generate visual outputs, such as slides, charts, graphs, and other elements that can enhance the presentation and communication of the keypoints and review.
My app is useful because it can help people with various tasks and goals, such as:
- Saving time and effort by automatically summarizing and analyzing the content and quality of any video content, without having to watch or listen to the whole thing.
- Improving communication and performance skills by providing constructive and actionable feedback and suggestions based on the keypoints and review of the video content.
- Creating and presenting beautiful and professional reports, slides, charts, graphs, and other visual elements that showcase the main points and topics of the video content.
- Enhancing learning and understanding by providing the keypoints and review of any video content that they are interested in, such as webinars, lectures, tutorials, etc.
- Edge processing ensuring full privacy.
I will use the Ryzen AI powered UM790 pro mini pc as the hardware platform for my app. The UM790 pro mini pc has the following advantages for my solution:
- It has a powerful AMD Ryzen™ 9 7940HS processor and AMD Radeon™ 780M GPU that can handle the intensive computation and graphics tasks required by my app, such as recording, transcribing, analyzing, summarizing, and generating reviews and keypoints.
- It has a generous amount of RAM and storage, with 32 GB DDR5-5600 RAM and 1 TB PCIe 4.0 NVMe SSD, that can enable fast and smooth multitasking and data processing for my app, as well as store large files and models.
- It has a compact and portable design, with a small size that can fit in any space and be easily carried around, making it ideal for users who need to use my app in different locations and scenarios.
- It has a versatile connectivity, with RJ45 2.5G Ethernet Port, USB3.2 Gen2 Type-A Port, USB4 Port, HDMI 2.1, and Intel® Killer™ AX1675 Wi-Fi 6E Network Card, that can enable fast and reliable data transfer and communication for my app, as well as support multiple monitors and devices.
- It has an innovative cooling system, with Cold Wave 2.0 that actively cools the memory and SSD, that can prevent overheating and ensure stable and optimal performance for my app.
this part is directly taken from Ryzen AI installation instructions and is the latest 1.2 version
Prerequisites
To enable the development and deployment of applications leveraging the NPU, you must have the following software installed on the system.
- Visual Studio 2022 Community: ensure that “Desktop Development with C++” is installed.
- cmake version >= 3.26
- Anaconda or Miniconda : ensure that the following path is set in the System PATH variable:
path\to\anaconda3\Scripts
orpath\to\miniconda3\Scripts
(The System PATH variable should be set in the System Variables section of the Environment Variables window).
Install NPU Drivers
Download the NPU driver installation package NPU Driver
Install the NPU drivers by following these steps:
- Extract the downloaded
NPU_RAI1.2.zip
zip file. - Open a terminal in administrator mode and execute the
.\npu_sw_installer.exe
exe file - Ensure that NPU MCDM driver (Version:32.0.201.204, Date:7/26/2024) is correctly installed by opening
Device Manager
->Neural processors
->NPU Compute Accelerator Device
.
Install the Ryzen AI Software
- Download the RyzenAI Software MSI installer
ryzenai-1.2.0.msi
. - Launch the MSI installer and follow the instructions on the installation wizard:
- Accept the terms of the Licence agreement.
- Provide the destination folder for Ryzen AI installation (default:
C:\Program Files\RyzenAI\1.2.0
) - Specify the name for the conda environment (default:
ryzen-ai-1.2.0
)
The Ryzen AI Software packages are now installed in the conda environment created by the installer. Refer to the Runtime Setup page for more details about setting up the environment before running an inference session on the NPU.
Test the Installation
The Ryzen AI Software installation folder contains test to verify that the software is correctly installed. This installation test can be found in the quicktest
subfolder.
- Open a Conda command prompt (search for “Anaconda Prompt” in the Windows start menu.
- Activate the Conda environment created by the Ryzen AI installer:
conda activate <env_name>
- Run the test:
cd %RYZEN_AI_INSTALLATION_PATH%/quicktest
python quicktest.py
- The quicktest.py script sets up the environment and runs a simple CNN model. On a successful run, you will see an output similar to the one shown below. This indicates that the model is running on the NPU and that the installation of the Ryzen AI Software was successful:
[Vitis AI EP] No. of Operators : CPU 2 IPU 398 99.50%
[Vitis AI EP] No. of Subgraphs : CPU 1 IPU 1 Actually running on IPU 1
...
Test Passed
...
now you need to set the following environment variables based on your PC APU
in my case, for PHX/HPT APUs:
set XLNX_VART_FIRMWARE=%RYZEN_AI_INSTALLATION_PATH%/voe-4.0-win_amd64/xclbins/phoenix/1x4.xclbin
set XLNX_TARGET_NAME=AMD_AIE2_Nx4_Overlay
What i Didnow the normal flow would be to quantize a pretrained model and then deploying to ryzen ai, but i skipped that part. you see, when the time I was doing this there were people creating models, finetuning, quantizing and all that was happening so I hoped someone would create a quantized onnx version of phi3 vision, and some other smaller models, so instead i focused on the application integration part.
But before that i had tried running models on the cpu, models like llama3 7b or qwen 7b models performed with 4 to 5 tokens per second but surprisingly deep seek coder 16b models actually gave 12 tokens per second, which was a surprise (it was an int4 k_m quantization). now overall, this was slow.
ROCm for iGPU
So, I tried to run the model on the iGPU which was a Radeon 780M, and so i checked for ROCm, which is the CUDA equivalent for AMD, and unfortunately it wasn't supported (gfx1103). which was very disappointing, so I searched for rocm on gfx1103 and after an eternity of searching forums after forums I finally found the one, a github repo documenting how to get ROCm on gfx1103. now for this you would need to Install HIP SDK for Windows.
if you are on LM studio then you should checkout this wiki.
now you definitely need to check this out as it drastically increased the speed to 23 tokens per second (that's the max i got for deepseek-coder v2 16b ~13gb).
I found this to be very satisfactory as the cpu utilization while running a model was around 12 percent, which was excellent, so after played around with it for a long time, I found that the best quantization to use was q5_K-M, which gives the best accuracy to speed ratio.
Another interesting thing is that if you are using lm studio, always check if the same model is available in ollama, if so then use ollama as it performs better than lm studio.
The Application + ADHD??
for the application i was deciding between whether to use zoom or build a fully functioning app using opensource video conferencing solutions like Jitsi-meet, but it was tough, there wasn't enough documentation, and it was very complex, so I looked for other apps, and even went for decentralized solution such as dTelecom and such but it was more of a challenge, so came back to jitsi or a browser extension that could do the task, but then i got sidetracked and next thing i was trying different projects that uses llms, which taught me a lot of the innerworkings and then i again got side tracked to few of my ongoing projects, one is of creates a cooking assistant using video based segmentation techniques, i got completely side tacked and even built a table top version of an audio transcription + summarization using a unihiker board fully on edge that succeeded but this all took a lot of time from me from completing this project and all i have is fragmented parts of my project and the deadline is here.
I won't be able to complete this by the deadline, I will continue to build the application in the coming days as I am now well aware of my adhd and seeking treatment, i believe i would be able to attach the finished version on this project in the coming week, thank you AMD, thank you Hackster.
Comments