Overview
Technical Overview
Background
Let's understand some code snippets
How can the idea be revenue-generating? Why Xilinx FPGA is better for this project

Created November 20, 2020 © MIT

Recon With Xilinx

System to fill the gunfire data gap and detects, locates gunfire to enable a fast and precise response to over 90% of gunfire incident.

IntermediateWork in progress20 hours133

Things used in this project

Hardware components

Zynq UltraScale+ MPSoC ZCU102

Software apps and online services

TensorFlow

Story

Overview:

This project demonstrates how to build a device using the Xilinx ZCU104 development kit to solve most of the problems faced by government authorities like Police, Ambulance Services, etc. The device uses Machine Learning algorithms to check for the gunfire and notifies it to the authorities with precise information like the number of times the gun was fired, which gun was used for firing, location, direction, and the frames captured from the camera. The project also demonstrates how we can provide more insights into the community, locality to these authorities just by collecting and utilizing the data got from our device. I got inspiration from news, articles like school shootings, gunfire in public, gun injuries account for $2.8 billion in the emergency room and hospital charges in the United States each year, so I found a call within me as an active member of this innovative, committed community.

After experiencing the threat of a school shooting, as well as the changes in the school via countermeasures, students continue to experience the trauma. In several peer‐reviewed articles on mental health consequences of school shootings by Lowe & Galea, it is shown that mass shootings can bring on the onset of PTSD and continued depression. In the cities that are home to these kinds of events, the town can experience continued paranoia and an exaggerated sense of fear. Lowe & Galea continue to say that continued research is necessary to pinpoint the exact mental symptoms that occur in the victims of school shootings.

On average only 20% of gunfire incidents are reported to the police. This means police are working with an 80% deficit in intelligence as it relates to gun violence and are unable to respond to the majority of shootings. Even when gunfire is reported to police, there are several minute delays in getting information to the police. Also, the reported location of the gunfire is often vague or inaccurate due to limitations in human hearing. The result is that critical time is lost getting the police to the actual crime scene. Moreover, teenagers are at higher risk.

Its really frightening the younger generation.

Problems due to late reporting of gunshots

Police and other government authorities have less opportunity to arrive at the scene quickly to identify suspects and witnesses or render lifesaving aid it even makes it extremely limited.
For community members who live where gunfire occurs, the lack of a precise and consistent response from police can fuel a negative perception of law enforcement, and a belief that they don’t care.
Persistent gunfire in a community carries a heavy social, psychological, and economic toll.

The important measure taken while making this project was collecting data efficiently and patiently, once the data is collected the job becomes a lot easier (You are bound to fail if you don't collect correct data to feed your hungry ML frameworks). For the data collection part, I used the gunfire audio from shooting games (like CS and Valorant).

The backbone of this project is that you need to tune your device after every stage. So let's get started, enjoy learning with Machine Learning.

Technical Overview:

Gun violence is an urgent, complex, and multifaceted problem. It requires evidence-based, multifaceted solutions. Over 13,000 people are murdered with a gun every year in the United States. The majority of these shootings take place in cities, where violence is further concentrated spatially, racially, and within groups, gangs, or cliques.

1 / 4

Background:

Theme: Intelligent Video Analytics

1) City Surveillance:

"I want other people to know that no one is exempt from gun violence. It can happen to anyone at any time. No place is truly safe. Even if you do all the right things—if your son does all the right things—it still might not be enough." - Pam Bosley

With the increase in the rate of gun availability 24/7 surveillance system is needed so that the police and other officials can arrive at the spot to control the situation. Moreover, an advanced surveillance system can even help in forensics and contribute to justice. Previously police would receive the information about the gunshot from the people at the spot or if anyone finds a shot body. Now, we can use machine learning to detect the gunshot within seconds and inform the police in less than a minute. This step will control gun violence and make the local masses more loyal toward government officials. Using the correct analysis the machine would auto-suggest the police when and how much weapon is to be needed at the spot, and thus can save resources, efforts and will lead to a less panic situation in the society.

You can check this website for more problems faced by my local masses because of gun violence.

https://www.americanprogress.org/issues/guns-crime/reports/2018/05/04/450343/americas-youth-fire/

https://en.wikipedia.org/wiki/Gun_violence_in_the_United_States

2) Determining the direction and distance of the sound source.

To address the problem, I investigated the potential benefits of adding a microphone array to the system, which consists of multiple microphones that can simultaneously capture audio. As the audio is captured, digital signal processing (DSP) algorithms can be applied to cancel out echoes, determine the direction of individual sound sources, reduce background noise, etc. These are providing a much-needed boost to the performance of voice-enabled devices.

Check these links for further details:

https://medium.com/kkbankol-events/raspberry-pi-15662c3ca881

https://www.hindawi.com/journals/js/2017/6782176/

1 / 3 • Digital Microphone Array for Sound-Source Localization. Since I didn't get any project on Microphone Array with FPGA so used the audio data from Matrix Creator.

We can even make the device more smart and precise with different techniques like delocalization of audio signals.

We can be more innovative by using multiple Xilinx devices, e.g. using ultra96 as a stand-alone device at local streets which will send the data to the main-board or review office.

3) Audio classification for Gunshot Detection.

We will prepare the dataset of a gunshot sound with specific labels namely rifles, SMG, pistol, revolver. We can even access more information from the data like the number of bullets fired and the type of gun. We are purposely using video and audio files from games like Valorant as there is less noise and moreover is easily available for a game lover like me.

1 / 28

4) Image Classification for gun detection and 3D reconstruction for knowing the exact location of the firing point.

We can use different cameras and sensors to get the exact location of the culprit and would help for the forensics and also to trace back the exact cause for the open firing.

1 / 2

Let's understand some code snippets.

Let's see our audio data and try to understand the variations so that we can train our model.

Sample Spectogram

import librosa
from scipy.io import wavfile as wav
import numpy as np
filename = 'smg0.wav'
librosa_audio, librosa_sample_rate = librosa.load(filename)
scipy_sample_rate, scipy_audio = wav.read(filename)
print('Original sample rate:', scipy_sample_rate)
print('Librosa sample rate:', librosa_sample_rate)

#plt

import matplotlib.pyplot as plt
# Original audio with 2 channels
plt.figure(figsize=(12, 4))
plt.plot(scipy_audio)

Some more code snippets can be seen in the images attached.

1 / 5 • I'll attach the full code at the end.

The full code is attached at the end. And I have used the game video for this project so a similar video is to be needed for the input. We can even tune the model with a much better dataset of real guns and real environments like crowded places and many more.

Let's see our image dataset and train our model.

How can the idea be revenue-generating? Why Xilinx FPGA is better for this project?

Since, a similar idea is being used in 20 cities by ShotSpotter, Inc and the idea has resulted in a decrease in the rate of homicide by 75%. The company is almost generating approximately $45 million.

NEWARK, Calif., Nov. 09, 2020 (GLOBE NEWSWIRE) -- ShotSpotter, Inc. (NASDAQ: SSTI), the leader in acoustic gunshot detection and precision policing solutions that help law enforcement officials and security personnel prevent and reduce gun violence, reported financial results for the third quarter ended September 30, 2020.

Revenues increased 14% to $11.4 million from $10.0 million for the third quarter of 2019.
Gross profit increased 8% to $6.4 million (57% of revenues) from $6.0 million (60% of revenues) for the third quarter of 2019.
Net income increased 27% to $566,000 from $446,000 for the third quarter of 2019.
Adjusted EBITDA1 increased 44% to $3.3 million from $2.3 million for the third quarter of 2019.
Realized 3 new “go-live” square miles of coverage and 6 square miles of attrition during the quarter, bringing the total live miles to 758 at the end of the quarter.
Maintained a strong balance sheet with $28.7 million in cash and cash equivalents at the end of the quarter, and $20 million available on its line of credit.
Subsequent to quarter-end, signed a definitive agreement to acquire LEEDS, LLC, a leading investigative case management software provider, with the transaction expected to close in November 2020. Upfront consideration of $17.0 million includes $15.0 million in cash and $2.0 million in ShotSpotter common stock, plus a potential earnout of $5.0 million over the next two years.

Full-year 2020 revenue guidance narrowed and slightly increased to a range of $44.5 million to $45.0 million, representing 10% growth at the midpoint compared to full-year 2019. The company’s revenue guidance for full-year 2020 excludes any contribution from LEEDS, LLC.

1 / 2

FPGA is a better option because of the following reasons:

Latency: How long does it take to compute something?
FPGAs are good at this.
Connectivity: What input/output can we connect and with which bandwidth?
FPGAs can directly be connected to inputs and can offer very high bandwidth.
Engineering cost: How much effort does it cost to express the computation?
The engineering cost is typically much higher than for instruction based architectures, so the advantages must really be worth it.
Energy efficiency: How much energy does it cost to compute something?
This is often listed as a large benefit of FPGA.

Code

import IPython.display as ipd

ipd.Audio('dataset/smg/smg0.wav')

# Load imports

import IPython.display as ipd
import librosa
import librosa.display
import matplotlib.pyplot as plt

filename = 'dataset/smg/smg0.wav'
plt.figure(figsize=(12,4))
data,sample_rate = librosa.load(filename)
_ = librosa.display.waveplot(data,sr=sample_rate)
ipd.Audio(filename)

filename = 'dataset/smg/rifle0.wav'
plt.figure(figsize=(12,4))
data,sample_rate = librosa.load(filename)
_ = librosa.display.waveplot(data,sr=sample_rate)
ipd.Audio(filename)

filename = 'dataset/smg/pistol0.wav'
plt.figure(figsize=(12,4))
data,sample_rate = librosa.load(filename)
_ = librosa.display.waveplot(data,sr=sample_rate)
ipd.Audio(filename)

filename = 'dataset/smg/revolver0.wav'
plt.figure(figsize=(12,4))
data,sample_rate = librosa.load(filename)
_ = librosa.display.waveplot(data,sr=sample_rate)
ipd.Audio(filename)

import pandas as pd
metadata = pd.read_csv('dataset.csv')
metadata.head()

print(metadata.class_name.value_counts())

import librosa 
from scipy.io import wavfile as wav
import numpy as np

filename = 'dataset/rifle/rifle0.wav' 

librosa_audio, librosa_sample_rate = librosa.load(filename) 
scipy_sample_rate, scipy_audio = wav.read(filename) 

print('Original sample rate:', scipy_sample_rate) 
print('Librosa sample rate:', librosa_sample_rate)

print('Original audio file min~max range:', np.min(scipy_audio), 'to', np.max(scipy_audio))
print('Librosa audio file min~max range:', np.min(librosa_audio), 'to', np.max(librosa_audio))

import matplotlib.pyplot as plt

# Original audio with 2 channels 
plt.figure(figsize=(12, 4))
plt.plot(scipy_audio)

# Librosa audio with channels merged 
plt.figure(figsize=(12, 4))
plt.plot(librosa_audio)

mfccs = librosa.feature.mfcc(y=librosa_audio, sr=librosa_sample_rate, n_mfcc=40)
print(mfccs.shape)

import librosa.display
librosa.display.specshow(mfccs, sr=librosa_sample_rate, x_axis='time')

def extract_features(file_name):
   
    try:
        audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None 
     
    return mfccsscaled

# Load various imports 
import pandas as pd
import os
import librosa

# Set the path to the full UrbanSound dataset 
fulldatasetpath = 'dataset/'

metadata = pd.read_csv('audio_classification.csv')

features = []

# Iterate through each sound file and extract the features 
for index, row in metadata.iterrows():
    
    file_name = os.path.join(os.path.abspath(fulldatasetpath),'fold'+str(row["fold"])+'/',str(row["slice_file_name"]))
    
    class_label = row["class_name"]
    data = extract_features(file_name)
    
    features.append([data, class_label])

# Convert into a Panda dataframe 
featuresdf = pd.DataFrame(features, columns=['feature','class_label'])

print('Finished feature extraction from ', len(featuresdf), ' files')


from sklearn.preprocessing import LabelEncoder
from keras.utils import to_categorical

# Convert features and corresponding classification labels into numpy arrays
X = np.array(featuresdf.feature.tolist())
y = np.array(featuresdf.class_label.tolist())

# Encode the classification labels
le = LabelEncoder()
yy = to_categorical(le.fit_transform(y))

# split the dataset 
from sklearn.model_selection import train_test_split 

x_train, x_test, y_train, y_test = train_test_split(X, yy, test_size=0.2, random_state = 42)

import numpy as np
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Convolution2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import np_utils
from sklearn import metrics 

num_labels = yy.shape[1]
filter_size = 2

# Construct model 
model = Sequential()

model.add(Dense(256, input_shape=(40,)))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))

model.add(Dense(num_labels))
model.add(Activation('softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', metrics=['accuracy'], optimizer='adam')

# Display model architecture summary 
model.summary()

# Calculate pre-training accuracy 
score = model.evaluate(x_test, y_test, verbose=0)
accuracy = 100*score[1]

print("Pre-training accuracy: %.4f%%" % accuracy)

from keras.callbacks import ModelCheckpoint 
from datetime import datetime 

num_epochs = 100
num_batch_size = 32

checkpointer = ModelCheckpoint(filepath='audio_classifier.hdf5', 
                               verbose=1, save_best_only=True)
start = datetime.now()

model.fit(x_train, y_train, batch_size=num_batch_size, epochs=num_epochs, validation_data=(x_test, y_test), callbacks=[checkpointer], verbose=1)


duration = datetime.now() - start
print("Training completed in time: ", duration)


# Evaluating the model on the training and testing set
score = model.evaluate(x_train, y_train, verbose=0)
print("Training Accuracy: ", score[1])

score = model.evaluate(x_test, y_test, verbose=0)
print("Testing Accuracy: ", score[1])

import librosa 
import numpy as np 

def extract_feature(file_name):
   
    try:
        audio_data, sample_rate = librosa.load(file_name, res_type='kaiser_fast') 
        mfccs = librosa.feature.mfcc(y=audio_data, sr=sample_rate, n_mfcc=40)
        mfccsscaled = np.mean(mfccs.T,axis=0)
        
    except Exception as e:
        print("Error encountered while parsing file: ", file)
        return None, None

    return np.array([mfccsscaled])


def print_prediction(file_name):
    prediction_feature = extract_feature(file_name) 

    predicted_vector = model.predict_classes(prediction_feature)
    predicted_class = le.inverse_transform(predicted_vector) 
    print("The predicted class is:", predicted_class[0], '\n') 

    predicted_proba_vector = model.predict_proba(prediction_feature) 
    predicted_proba = predicted_proba_vector[0]
    for i in range(len(predicted_proba)): 
        category = le.inverse_transform(np.array([i]))
        print(category[0], "\t\t : ", format(predicted_proba[i], '.32f') )


filename = 'rifle_test.wav' 
print_prediction(filename)

import tensorflow as tf
tf.test.gpu_device_name()

!pip install -q keras

from glob import glob
from sklearn.model_selection import train_test_split

rifles = glob('train/rifle/*.jpg')
smg = glob('train/smg/*.jpg')
pistol = glob('train/smg/*.jpg')
revolver = glob('train/smg/*.jpg')

rifle_train, rifle_test = train_test_split(rifles, test_size=0.30)
smg_train, smg_test = train_test_split(smg, test_size=0.30)
pistol_train, pistol_test = train_test_split(pistol, test_size=0.30)
revolver_train, revolver_test = train_test_split(revolver, test_size=0.30)

TRAIN_DIR = 'train'
TEST_DIR = 'test'

!mkdir test

!mkdir test/Rifle
files = ' '.join(rifle_test)
!mv -t test/Rifle $files

!mkdir test/Smg
files = ' '.join(smg_test)
!mv -t test/Smg $files

!mkdir test/Pistol
files = ' '.join(pistol_test)
!mv -t test/Pistol $files

!mkdir test/Revolver
files = ' '.join(revolver_test)
!mv -t test/Revolver $files

import numpy as np
from PIL import Image
import matplotlib.pyplot as plt

cats = np.random.choice(rifle_train, 13)
dogs = np.random.choice(smg_train, 12)
dogs = np.random.choice(pistol_train, 11)
dogs = np.random.choice(revolver_train, 10)
data = np.concatenate((rifles, smg, pistol, revolver))
labels = 13 * ['Rifle'] + 12 *['Smg'] + 11 *['Pistol'] + 10 *['Revolver']

N, R, C = 25, 5, 5
plt.figure(figsize=(12, 9))
for k, (src, label) in enumerate(zip(data, labels)):
    im = Image.open(src).convert('RGB')
    plt.subplot(R, C, k+1)
    plt.title(label)
    plt.imshow(np.asarray(im))
    plt.axis('off')


# Model customization
from keras.models import Model
from keras.layers import Dense, GlobalAveragePooling2D, Dropout
from keras.applications.inception_v3 import InceptionV3, preprocess_input

CLASSES = 2
    
# setup model
base_model = InceptionV3(weights='imagenet', include_top=False)

x = base_model.output
x = GlobalAveragePooling2D(name='avg_pool')(x)
x = Dropout(0.4)(x)
predictions = Dense(CLASSES, activation='softmax')(x)
model = Model(inputs=base_model.input, outputs=predictions)
   
# transfer learning
for layer in base_model.layers:
    layer.trainable = False
      
model.compile(optimizer='rmsprop',
              loss='categorical_crossentropy',
              metrics=['accuracy'])


# Data augmentation
from keras.preprocessing.image import ImageDataGenerator

WIDTH = 299
HEIGHT = 299
BATCH_SIZE = 32

# data prep
train_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

validation_datagen = ImageDataGenerator(
    preprocessing_function=preprocess_input,
    rotation_range=40,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    fill_mode='nearest')

train_generator = train_datagen.flow_from_directory(
    TRAIN_DIR,
    target_size=(HEIGHT, WIDTH),
		batch_size=BATCH_SIZE,
		class_mode='categorical')
    
validation_generator = validation_datagen.flow_from_directory(
    TEST_DIR,
    target_size=(HEIGHT, WIDTH),
    batch_size=BATCH_SIZE,
    class_mode='categorical')


x_batch, y_batch = next(train_generator)

plt.figure(figsize=(12, 9))
for k, (img, lbl) in enumerate(zip(x_batch, y_batch)):
    plt.subplot(4, 8, k+1)
    plt.imshow((img + 1) / 2)
    plt.axis('off')
    

# Transfer learning
EPOCHS = 5
BATCH_SIZE = 32
STEPS_PER_EPOCH = 320
VALIDATION_STEPS = 64

history = model.fit_generator(
    train_generator,
    epochs=EPOCHS,
    steps_per_epoch=STEPS_PER_EPOCH,
    validation_data=validation_generator,
    validation_steps=VALIDATION_STEPS)

model.save('image_classification.h5')

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec

from keras.preprocessing import image
from keras.models import load_model


def predict(model, img):
    """Run model prediction on image
    Args:
        model: keras model
        img: PIL format image
    Returns:
        list of predicted labels and their probabilities 
    """
    x = image.img_to_array(img)
    x = np.expand_dims(x, axis=0)
    x = preprocess_input(x)
    preds = model.predict(x)
    return preds[0]

model = load_model('image_classification.h5')

img = image.load_img('sample_test_revolver.jpg', target_size=(HEIGHT, WIDTH))
preds = predict(model, img)

Credits

Shubham

6 projects • 9 followers

Turned 20 and can't resist myself from learning and using AI. Learning to tackle global problems.

Sumit Kumar

32 projects • 94 followers

19 y/o. My daily routine involves dealing with electronics, code, distributed storage and cloud APIs.

Recon With Xilinx

Things used in this project

Hardware components

Software apps and online services

Story

Overview:

Technical Overview:

Background:

Let's understand some code snippets.

How can the idea be revenue-generating? Why Xilinx FPGA is better for this project?

Schematics

Schematic

Code

Data_Exploration_and_Visualisation_Audio.py

Data_Preprocessing_and_Data_Splitting_Audio.py

Model_Training_Audio.py

Prediction_Audio.py

Image_Classification_Gun.py

Prediction_Image.py

Xilinx_Project_Submission

Credits

Shubham

Sumit Kumar

Comments

Embed the widget on your own site

Recon With Xilinx

Recon With Xilinx

Things used in this project

Hardware components

Software apps and online services

Story

Overview:

Technical Overview:

Background:

Let's understand some code snippets.

How can the idea be revenue-generating? Why Xilinx FPGA is better for this project?

Schematics

Schematic

Code

Data_Exploration_and_Visualisation_Audio.py

Data_Preprocessing_and_Data_Splitting_Audio.py

Model_Training_Audio.py

Prediction_Audio.py

Image_Classification_Gun.py

Prediction_Image.py

Xilinx_Project_Submission

Credits

Shubham

Sumit Kumar

Comments

Related channels and tags