Team Gallaudet:

GOWTHAM S

•

Balamurugan.K

•

21CS004 Annalakshmi

Created July 29, 2024

Gallaudet Accessible AI

In today's world, many rely on AI tools for daily tasks, yet accessibility remains a challenge for speech and hearing impaired persons.

112

Things used in this project

Hardware components

Seeed Studio Grove Vision AI Module V2

Seeed Studio XIAO ESP32C3

Software apps and online services

AMD Accelerator Cloud

Github Roboflow

Microsoft Windows 10

Linux Blender

Story

We address this problem, in today's world, many rely on AI tools for problem-solving and other services, yet accessibility remains a challenge for speech, hearing, and visually impaired individuals. Our innovative AI device, leveraging advanced technologies, addresses this issue by providing inclusive and seamless interaction for all users.

Our AI device is designed to be accessible by all types of Users and features a sleek curved display for show the generating output like hand sign language as a result that revolutionizes accessibility for all. It utilizes the Seed Studio Grove Vision AI module for effectively detect the object and user actions and the AMD Cloud based Instinct™ MI210 accelerator used for effectively train the models, these offering robust processing capabilities. With the help of YOLOv8 the object detection model trained effectively, and it's given a good result. With the help of result Blender 3D modeling Software, the ASL hand sign are modeled. Additionally, the integration of the GPT-5 model ensures advanced natural language understanding and generation and generate the results. We named for this tool Gallaudet Accessible AI. Reason - ASL emerged as a language in the American School for the Deaf (ASD), founded by Thomas Gallaudet in 1817.

Demo Video:

1.Play a movie using ASl hand sign:

Play a movie using a "Movie" hand sign. In this process, the both "play" and "Movie" hand sign recognized and convert to the Text and contained by the Action or Object detection model. The text before, give to the GPT model the command checked if the command was OS related command execute the command. If the command not OS related, given to the GPT model .

Demo Video for Movie play

Play Sign and Movie Sign:

1 / 2

2.Getting a Story from GPT model Using a Story Hand Sign:

Story narrating

Story Sign and Tell Sign:

1 / 2

3. play a Song:

Song Sign:

Detection Testing Video:

Detection Testing

Training a Model:

Before training a model setup the AMD Cloud Accelerator:

Follow the below steps to setup the model training environment:

Login to the AMD Cloud Accelerator Next click create a new workload:Next choose needed application for your workload like pytorch or tensorflow and click next:
Select or Upload needed files like Dataset and python files and click next:
Set running time and choose how many GPUs you want and click next:
Select Accelerator AIG MI210 and Click next and review the details you set click next:
Review your Setup and click Run Workload. After navigating to the dashboard and see the status of workload if running click on workload name.
If the workload running after few minutes, you will see a Connect Button and Key, so click connect.
After clicking the connect, next page appears in new tab then Enter your Secret Key in token field and click login.
The work field will open choose notebook to start the training and use the terminal to install needed modules.

1 / 9

Collecting Datasets and images:

I used the python program to collected the ASL hand sign images the code given below:

import os
import cv2
import time
import uuid

imagepath="C:/Users/SARATHY/Desktop/collection"
labels=['story']
number = 30

for label in labels:
    os.mkdir('C:/Users/SARATHY/Desktop/collection//'+label)
    cap=cv2.VideoCapture(0)
    print("Collecting images for {}".format(label))
    
    for imgnum in range(number):
        ret,frame=cap.read()
        imagename=os.path.join(imagepath,label,label+'.'+'{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imagename,frame)
        cv2.imshow('frame',frame)
        time.sleep(2)

        if cv2.waitKey(1)and 0xFF==ord('q'):
            break;
    cap.release()

Annotation :

I used the Roboflow online software tool to Annotate my whole ASL Hand Sign Images and Create labels for my dataset and then the image preprocess are done by the help of this tool.

The dataset attached below.

1 / 2

Model Training and Evaluation:

I used a Yolov8 to train my model, the process of a training, evaluation, Finding Metrix score of a model are given below:

Model Training Code:

!pip install ultralytics==8.0.20
from IPython import display
display.clear_output()
import ultralytics
ultralytics.checks()

from ultralytics import YOLO
from IPython.display import display, Image

%cd /content/drive/MyDrive/AMEN.v1i.yolov8
!yolo task=detect mode=train model=yolov8s.pt data= data.yaml epochs=200 imgsz=224 plots=Tru

The training process

Confusion Matrix:

Validate and Metrix Score of the Model:
Report of the Validation: The YOLOv8.0.20 model, evaluated using Python 3.10.12 and Torch 2.3.1+cu121 on a Tesla T4 GPU, demonstrates excellent performance with a high overall box precision of 0.994, perfect recall of 1.000, and an mAP@50 of 0.995. Class-specific metrics are also strong, with precision and recall values consistently reaching 1.000 across most categories, and mAP@50-95 scores ranging from 0.751 to 0.885. The model processes images efficiently, with pre-process and inference times of 0.8 ms and 3.3 ms respectively, although post-processing is more time-consuming at 25.5 ms per image. Finally the model performs robust and accurate.

F1 Score:

The YOLOv8 model’s F1 score of 0.80 indicates robust performance with a balanced precision and recall. Continued efforts in data refinement and model optimization can potentially enhance this score further.

Finally, the model was ready!

3D Model Create:

In this project we need a 3D model for ASL hand Signs. So, we created a sample 3D model for ASL hand signs. This 3Dmodel will convert to video in runtime rely on model generating the text.

1 / 6

Future product Design:

Curved Display:

In this part the hearing-impaired person will see the output in the format ASL Hand Sign based on Model result.

1 / 2 • Curved Display

Future Product Video:

Data Set Link:

In this link i attached the Dataset and this project code python files.

https://drive.google.com/drive/folders/1cZF4Oymw0MlPdPC4qYvkCPgvdQg8QZEN

Request to all reader:

Please given your value feedback for develop this project.

Custom parts and enclosures

Sketchfab still processing.

Schematics

Code

import cv2
from ultralytics import YOLO
import sys
import pygame
import tkinter as tk
from tkinter import messagebox
import os
import subprocess
import psutil
import time
from gpt import gptvoice


pygame.mixer.init()
alarm_sound = pygame.mixer.Sound("D:/f1.mp3")
global class_id


mask_model_path = 'D:/best1.pt'
mask_model = YOLO(mask_model_path)
tracker = cv2.TrackerMIL_create()

def play_media(file_path):
    
    player_path = "C:\\Program Files (x86)\\Windows Media Player\\wmplayer.exe"

    if not os.path.exists(player_path):
        print("Windows Media Player not found at the specified path.")
        return

 
    cmd = f'"{player_path}" "{file_path}"'

    try:
        
        subprocess.Popen(cmd, shell=True)
        print(f"Playing {file_path} in Windows Media Player...")
        time.sleep(5)  

    except Exception as e:
        print(f"Error occurred: {e}")


def start_detection_and_tracking():
    video = cv2.VideoCapture(0)
    global class_id
    if not video.isOpened():
        messagebox.showerror("Error", "Could not open camera")
        return

    ok, frame = video.read()

    if not ok:
        messagebox.showerror("Error", "Cannot read camera frame")
        return

   
    bbox = cv2.selectROI(frame, False)
    ok = tracker.init(frame, bbox)
    initial_center = (bbox[0] + bbox[2] / 2, bbox[1] + bbox[3] / 2)
    moved = False

   
    mask_threshold = 0.5
    movement_threshold = 10

    while True:
       
        ok, frame = video.read()
        if not ok:
            break

        
        mask_results = mask_model(frame)[0]
        mask_detected = False

        for result in mask_results.boxes.data.tolist():
            x1, y1, x2, y2, score, class_id = result

            if score > mask_threshold:
                cv2.rectangle(frame, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 4)
                cv2.putText(frame, mask_results.names[int(class_id)].upper(), (int(x1), int(y1 - 10)), cv2.FONT_HERSHEY_SIMPLEX, 1.3, (0, 255, 0), 3, cv2.LINE_AA)

                mask_detected = True

        try:
            for detection in mask_results:
                if 'class_id' in detection:
                    class_id = detection['class_id']
                    label = mask_results.names[int(class_id)].upper()
                else:
                    print("class_id not found in detection")
        except KeyError:
            print("Error: class_id not found in detection. Continuing process...")
            continue

        
        ok, bbox = tracker.update(frame)

        current_center = (bbox[0] + bbox[2] / 2, bbox[1] + bbox[3] / 2)
        distance = ((current_center[0] - initial_center[0]) ** 2 + (current_center[1] - initial_center[1]) ** 2) ** 0.5

        if distance > movement_threshold:
            moved = True

        if mask_detected or moved:
            
            label = mask_results.names[int(class_id)].upper() 

            

            
            with open('detect.txt', 'w') as f:
                f.write(label + '\n')

            with open('detect.txt', 'r') as f:
                content = f.read()

            
            if 'SONG' in content:
                media_file = "D:/f1.mp3" 
                play_media(media_file)
                with open('detect.txt', 'w') as f:
                    f.truncate(0)
            if 'MOVIE' in content:
                media_file = "D:\Future Man - 01x01 - Pilot.mkv"  
                play_media(media_file)
                with open('detect.txt', 'w') as f:
                    f.truncate(0)
                    "D:\Future Man - 01x01 - Pilot.mkv"
            if 'STORY' in content:
                delay=10
                time.sleep(delay)
                gptvoice(content)
                with open('detect.txt', 'w') as f:
                    f.truncate(0)
                os.remove('detect.txt')

        
        cv2.imshow("Combined Detection and Tracking", frame)

       
        k = cv2.waitKey(1) & 0xff
        if k == 27:
            break

    
    video.release()
    cv2.destroyAllWindows()


root = tk.Tk()
root.title("Detection")

start_button = tk.Button(root, text="Start Detection", command=start_detection_and_tracking)
start_button.pack()

exit_button = tk.Button(root, text="Exit", command=root.destroy)
exit_button.pack()

root.mainloop()

from transformers import GPT2LMHeadModel, GPT2Tokenizer
import pyttsx3

def gptvoice(prompt):
   
    model_name = 'gpt2'
    model = GPT2LMHeadModel.from_pretrained(model_name)
    tokenizer = GPT2Tokenizer.from_pretrained(model_name)

   
    inputs = tokenizer.encode("tell "+prompt, return_tensors='pt')

    
    outputs = model.generate(
        inputs, 
        max_length=200, 
        num_return_sequences=1, 
        no_repeat_ngram_size=2, 
        early_stopping=True, 
        temperature=0.7,  
        top_k=50,         
        top_p=0.95,       
        do_sample=True    
    )

    
    engine = pyttsx3.init()

    
    for i, output in enumerate(outputs):
        
        story = tokenizer.decode(output, skip_special_tokens=True)
        
       
        print(f"Story {i + 1}:\n{story}\n")
        
        
        engine.say(story)

   
    engine.runAndWait()

import bpy
import os

def load_fbx(filepath):
    bpy.ops.import_scene.fbx(filepath=filepath)

def set_output_path(output_path):
    bpy.context.scene.render.filepath = output_path

def set_render_settings():
    bpy.context.scene.render.image_settings.file_format = 'PNG'
    bpy.context.scene.frame_start = 1
    bpy.context.scene.frame_end = 250 
    bpy.context.scene.frame_step = 1

def render_animation():
    bpy.ops.render.render(animation=True)

def create_video_from_fbx(fbx_file, output_dir, output_video):
    
    bpy.ops.wm.read_factory_settings(use_empty=True)
    
   
    load_fbx(fbx_file)
    
    
    set_render_settings()
    set_output_path(os.path.join(output_dir, 'frame_####.png'))
    
   
    render_animation()
    
    
    os.system(f'ffmpeg -r 30 -i {output_dir}/frame_%04d.png -vcodec libx264 -crf 25 -pix_fmt yuv420p {output_video}')

if __name__ == "__main__":
    import sys
    fbx_file = sys.argv[1]
    output_dir = sys.argv[2]
    output_video = sys.argv[3]
    create_video_from_fbx(fbx_file, output_dir, output_video)

import subprocess

def generate_video(fbx_file, output_dir, output_video):
   
    subprocess.run([
        'blender', '--background', '--python', 'render_fbx.py', '--', fbx_file, output_dir, output_video
    ])

def match_text_to_model(generated_text):
   
    return "/path/to/matched_model.fbx"

def main():
    generated_text = "Your GPT-generated text here" 
    fbx_file = match_text_to_model(generated_text)
    output_dir = "/path/to/output_frames"
    output_video = "/path/to/output_video.mp4"
    
    generate_video(fbx_file, output_dir, output_video)
    import cv2
    cap = cv2.VideoCapture(output_video)
    while(cap.isOpened()):
        ret, frame = cap.read()
        if ret:
            cv2.imshow('Video', frame)
            if cv2.waitKey(25) & 0xFF == ord('q'):
                break
        else:
            break
    cap.release()
    cv2.destroyAllWindows()

if __name__ == "__main__":
    main()

Gallaudet Accessible AI

Things used in this project

Hardware components

Software apps and online services

Story

Detection Testing Video:

Training a Model:

Custom parts and enclosures

Hand Sign

Hand Sign2

Hand 3

Future Design

Future product

Schematics

Process of The Entire System

In Future Development Chart

Code

Realtime Running code

Result getting from Generative model code

Sample code For 3Dmodel matching and create a video rely on gpt5 model result

second part of above code

Credits

GOWTHAM S

Balamurugan.K

21CS004 Annalakshmi

Comments

Embed the widget on your own site

Gallaudet Accessible AI

Gallaudet Accessible AI

Things used in this project

Hardware components

Software apps and online services

Story

Detection Testing Video:

Training a Model:

Custom parts and enclosures

Hand Sign

Hand Sign2

Hand 3

Future Design

Future product

Schematics

Process of The Entire System

In Future Development Chart

Code

Realtime Running code

Result getting from Generative model code

Sample code For 3Dmodel matching and create a video rely on gpt5 model result

second part of above code

Credits

GOWTHAM S

Balamurugan.K

21CS004 Annalakshmi

Comments