We address this problem, in today's world, many rely on AI tools for problem-solving and other services, yet accessibility remains a challenge for speech, hearing, and visually impaired individuals. Our innovative AI device, leveraging advanced technologies, addresses this issue by providing inclusive and seamless interaction for all users.
Our AI device is designed to be accessible by all types of Users and features a sleek curved display for show the generating output like hand sign language as a result that revolutionizes accessibility for all. It utilizes the Seed Studio Grove Vision AI module for effectively detect the object and user actions and the AMD Cloud based Instinct™ MI210 accelerator used for effectively train the models, these offering robust processing capabilities. With the help of YOLOv8 the object detection model trained effectively, and it's given a good result. With the help of result Blender 3D modeling Software, the ASL hand sign are modeled. Additionally, the integration of the GPT-5 model ensures advanced natural language understanding and generation and generate the results. We named for this tool Gallaudet Accessible AI. Reason - ASL emerged as a language in the American School for the Deaf (ASD), founded by Thomas Gallaudet in 1817.
Demo Video:
1.Play a movie using ASl hand sign:
Play a movie using a "Movie" hand sign. In this process, the both "play" and "Movie" hand sign recognized and convert to the Text and contained by the Action or Object detection model. The text before, give to the GPT model the command checked if the command was OS related command execute the command. If the command not OS related, given to the GPT model .
Play Sign and Movie Sign:
2.Getting a Story from GPT model Using a Story Hand Sign:
Story Sign and Tell Sign:
3. play a Song:
Song Sign:
Before training a model setup the AMD Cloud Accelerator:
Follow the below steps to setup the model training environment:
- Login to the AMD Cloud Accelerator Next click create a new workload:Next choose needed application for your workload like pytorch or tensorflow and click next:
- Select or Upload needed files like Dataset and python files and click next:
- Set running time and choose how many GPUs you want and click next:
- Select Accelerator AIG MI210 and Click next and review the details you set click next:
- Review your Setup and click Run Workload. After navigating to the dashboard and see the status of workload if running click on workload name.
- If the workload running after few minutes, you will see a Connect Button and Key, so click connect.
- After clicking the connect, next page appears in new tab then Enter your Secret Key in token field and click login.
- The work field will open choose notebook to start the training and use the terminal to install needed modules.
Collecting Datasets and images:
I used the python program to collected the ASL hand sign images the code given below:
import os
import cv2
import time
import uuid
imagepath="C:/Users/SARATHY/Desktop/collection"
labels=['story']
number = 30
for label in labels:
os.mkdir('C:/Users/SARATHY/Desktop/collection//'+label)
cap=cv2.VideoCapture(0)
print("Collecting images for {}".format(label))
for imgnum in range(number):
ret,frame=cap.read()
imagename=os.path.join(imagepath,label,label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
cv2.imwrite(imagename,frame)
cv2.imshow('frame',frame)
time.sleep(2)
if cv2.waitKey(1)and 0xFF==ord('q'):
break;
cap.release()
Annotation :
I used the Roboflow online software tool to Annotate my whole ASL Hand Sign Images and Create labels for my dataset and then the image preprocess are done by the help of this tool.
The dataset attached below.
Model Training and Evaluation:
I used a Yolov8 to train my model, the process of a training, evaluation, Finding Metrix score of a model are given below:
Model Training Code:
!pip install ultralytics==8.0.20
from IPython import display
display.clear_output()
import ultralytics
ultralytics.checks()
from ultralytics import YOLO
from IPython.display import display, Image
%cd /content/drive/MyDrive/AMEN.v1i.yolov8
!yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=200 imgsz=224 plots=Tru
Confusion Matrix:
Validate and Metrix Score of the Model:
Report of the Validation: The YOLOv8.0.20 model, evaluated using Python 3.10.12 and Torch 2.3.1+cu121 on a Tesla T4 GPU, demonstrates excellent performance with a high overall box precision of 0.994, perfect recall of 1.000, and an mAP@50 of 0.995. Class-specific metrics are also strong, with precision and recall values consistently reaching 1.000 across most categories, and mAP@50-95 scores ranging from 0.751 to 0.885. The model processes images efficiently, with pre-process and inference times of 0.8 ms and 3.3 ms respectively, although post-processing is more time-consuming at 25.5 ms per image. Finally the model performs robust and accurate.
F1 Score:
The YOLOv8 model’s F1 score of 0.80 indicates robust performance with a balanced precision and recall. Continued efforts in data refinement and model optimization can potentially enhance this score further.
Finally, the model was ready!
3D Model Create:
In this project we need a 3D model for ASL hand Signs. So, we created a sample 3D model for ASL hand signs. This 3Dmodel will convert to video in runtime rely on model generating the text.
Future product Design:
Curved Display:
In this part the hearing-impaired person will see the output in the format ASL Hand Sign based on Model result.
Future Product Video:
Data Set Link:
In this link i attached the Dataset and this project code python files.
https://drive.google.com/drive/folders/1cZF4Oymw0MlPdPC4qYvkCPgvdQg8QZEN
Request to all reader:
Please given your value feedback for develop this project.
Comments