Problems and Challenges in The Wild
SenseCAP K1100 Device
Grove Vision AI for Animals Detection
JoinBase for No-coding On-the-Edge IoT Data Stack
Pack All Being the TinyWild
Wildlife Survey in the Wetland Park
Data Analysis to the Wildlife Survey
Cost Analysis
Outlook
Reference

Published September 28, 2022

TinyWild, Make Wild IoT in Your Hand

TinyWild, a low cost on-the-edge realtime data stack for wildlife survey, makes flexible edge IoT data stack in book-sized green space.

IntermediateFull instructions provided287

Things used in this project

Hardware components

Seeed Studio SenseCAP K1100 - The Sensor Prototype Kit with LoRa® and AI

Rock Pi S

Software apps and online services

JoinBase

Story

Problems and Challenges in The Wild

Central Australia's Wild

Many wild ecosystems do not have good network connections, and it is difficult for data analysis systems that rely on the cloud to work well in such wildlife-rich scenarios. Furthermore, such cloud-based systems are usually so expensive that wildlife researchers or institutions cannot afford.

SenseCAP K1100 Device

Received free SenseCAP K1100 kit from Seeed

I am honored to receive the free hardware - SenseCap K1100 kit from Seeed. In this kit,

the Wio terminal, can be seen as a handheld data hub with a visualization interface, which can manage various sensor modules and even display simple sensor data charts.
the Grove AI module, provides TinyML-based wildlife recognition in the TinyWild.

Grove Vision AI for Animals Detection

Wio termial in the free kit(right) and an old iPad AC charger(left)

This is a very great but challenging task. On-the-edge AI has a profound impact on the intelligence in the wild, because there is no good network in the wild to connect to cloud services with unlimited computing power.

At present, there is not much public research on the workedge AI for wildlifes or animals. Let us do a deep customization for this tasks:

Dataset

One of the few publicly available animal datasets - Animals Detection Images Dataset from Kaggle (called "animials-80" dataset ) has been used. It contains 80 animals in 9.6GB images, and should be great enough for common animal recogization task.

Prepare images and lables for Yolov5 training

Function to preprocessing animals-80 dataset

Thanks to the Kaggle's working, we do not need to do labeling ourself. But the original label format in the animals-80 is not Yolov5 format. A preparation work has been carried by me on it. The core part is the preprocessing function shown above. Please the later Code section for more.

Training

Unfortunately, I don't have enough resources to do a full training on full 9.6GB training. So, a picked subset of animials-80 dataset has been choosen.

1 / 2 • 15-animal-kinds training

python3 train.py --img 192 --batch 32 --epochs 200 --data data/animal.yaml --cfg yolov5n6-xiao.yaml --weights yolov5n6-xiao.pt --name animals --cache --project runs/train2

We use a 24-core Xeon SP to do the training using above the commands got from offical example. However, after two hours (Yes, it proves again that Don't use CPU to train even it is a top Xeon SP), the final recognition effect is found to be very poor. You can see the main metrics are very low: the precision is 0.6, the recall and mAP_0.5 are just around 0.3. In fact, this result is close to not working.

Let's reduce the types of recognized animals to four: spider, duck, magpie and butterfly, which of course are the most common animals in a suburban wild area.

1 / 2 • 4-animal-kinds training

Ok, the recall and mAP_0.5 are upward to around 0.6. Not too bad. We will see the result in the late TinyWild's Wildlife Survey.

JoinBase for No-coding On-the-Edge IoT Data Stack

JoinBase website

In the core of the server side of TinyWild, an free on-the-edge IoT data stack - JoinBase is introduced.

Nowadays, it is impossible to run MQTT broker + database + no-coding visualization at one resource constrained edge. JoinBase just comes as a game changer.

Unlike existed IoT for wild solutions, with the help of JoinBase's edge-cloud-in-one architecture, the TinyWild gives out a wildlife diversity real-time monitoring and analysis system reference implementation in low cost, high availablity and great scalability, from UI to data analysis on the edge.

All done in the wild. No network connection to cloud any more needed for data analysis, even if you are in the African savannah to observe the rhinoceros. Our TinyWild system is especially suitable for real-world wildlife research.

JoinBase powered SBC(right) and an old iPad AC charger(left)

Pack All Being the TinyWild

All-in-one TinyWild

Server Side Coding

The JoinBase data stack used by TinyWild is a all-in-one

We just write a SQL schema to create a table in the JoinBase like you do it in the your traditional database:

create table iot_into_the_wild.sensors (
  ts DateTime,
  light Int16,
  sound Int16,
  imu_x Int16,
  imu_y Int16,
  imu_z Int16,
  animal String,
  confidence Int8
)
PARTITION BY yyyymmdd(ts);

Then, it is enough to start to service device messagings, no more codes. More usages about JoinBase could be seen in its own website.

Client Side Coding

In the client side, we change SenseCap's official no-coding tooling SenseCraft to make Wio termial to work with an edge data stack.

Three main contributions are done in the TinyWild development, compared to the official version of SenseCraft:

1. Dynamic sensor-join-in/out for a wide database table has been supported.

2. MQTT and Sampler thread event loop has been enhanced.

3. Properly calibrated realtime clock has been supported via RTC and rpcWiFi libray.

More details could be seen in the related repo in reference.

After main coding done, TinyWild is ready for using. Finally, let recap the size of TinyWild parts:

Wio Terminal - 72mm * 57mm * 12mm
SBC -Rock Pi S - 42mm * 42mm
Solar Charger - 30.5cm * 18cm

The total size of TinyWild is 305mm x 180mm by the largest part - solar charger, of which the size is just to that of a book.

That is why it named: Tiny. Let's go to Wild!

Wildlife Survey in the Wetland Park

The vlog of TInyWild's wildlife survey

For evaluating the TinyWild, I go to the country park to complete a wildlife survey.

Watch this vlog!

Location

We start from the wetland lake in the center of park but keep on running till we're back where we started in the entrance of the park.

Continuous Monitorng

select count(ts) as number_of_records from iot_into_the_wild.sensors;
select animal,count(animal) as number_of_animal from iot_into_the_wild.sensors group by animal;
select count(animal) as number_of_highly_true_ducks from iot_into_the_wild.sensors where animal = 'Duck' and confidence > 70;
select count(light) as number_of_great_sunlight from iot_into_the_wild.sensors where light > 900;
select count(light) as number_of_near_lightless from iot_into_the_wild.sensors where light < 50;

Monitoring queries running in TinyWild is persistent in the whole survey.

Static observation on Duck

There are wild ducks (mallards) in the lake. To test the static recognition performance of Grove AI module, the carema is been pinned to lakeside for around half an hour via a tripod. See more results below.

Data Analysis to the Wildlife Survey

It is time to evaluate the peformance Grove AI and whole TinyWild system

Grove AI for Wildlife Recognition

1 / 3 • Charting and Data from sql queries for the whole survey

In the basic conclusion of this wildlife survey, for individual identification, it is not particularly ideal. But, for survey, the qualitative information collected is effective.

let's observe the whole survey group-by (in the third picture above) query:

Most of animals are "Unkown"

Because it is found hat the outputs of Grove AI have a great possiblity with the confidence 100 (100% for short) even for its own built-in (people detection) model. This is impossible. So we treat all confidence >= 100 detection as "Unkown".

Empty "animal" in the results

This stems from the logics of database storage model and that of SenseCap's no-coding SenseCraft in that our TinyWild's Wio termial codes are modified from it: the data logic will send data in a timer interval then clear the input buffer, if the buffer can not filled by sensors then for single point, you can easily ignore it but for a row of points like show in the TinyWild database table, we still need to something here.

So, change to see the records with high confidence, here > 75 in the query:

Bufferfly is relatively outstanding but without Magpie that've seen many times in the park.

This seems the Magpie are been recongized as the bufferfly. But what they have in common is that, they often flys in the air.

Duck are observed in the lakeide and in the records

We exactly found that the Grove AI works greatly for nearby animal detection like we done in lakeside: we got four counts when suddenly three ducks swims into the scope of camera in a relative static positioning.

1 / 3 • Duck observation in the wildlife survey

See more in vlog.

For the great portable and mobility of the TinyWild, let's look at another case: continuous light sensorsing in the edge:

light sensor value changes in the return journey

When I went back ( @ 17:09 ) after survey at the lakeside done, I go through a big tree road ( after 16-17mins according to the recoding ). The above figure is the plot of light sensor values at the last 22 mins of the return journey.

I'm going through a tall wooded area (it obvoiusly become dark in that time, although the ios shot is fine)

The TinyWild completely and accurately recorded of the entire changes of light sensors while the entire cameraman is moving all the time, many times the network is poor.

Cost Analysis

TinyWild is tiny and wild(no cloud needed). It is also cheap:

Seeed SenseCap K1100: $0 (free give-out)
SBC - Rock Pi S: $15 
Solar Charger:  $38

The parts of observation endpoints like iPad or laptop are not included here. Because they are replaceable with any have-screen endpoints with the web access capability. For example, an unused phone. I have three unused phones and one unused pad...

The total cost of TinyWild is $53, and you can reduce the solar charger to a much cheaper common charger if you are not working too long in the wild.

Outlook

Edge LoRaWan Gateway

Due to the lack of time, we did not complete the work related to adding LoRaWan gateway to JoinBase server and TinyWild. But we promise to have support for LoRa Gateways in a few weeks, even if the race is over.

After that, JoinBase will be the world's first data stack with MQTT and LoRaWan dual-gateway supporting, and can uniquely run on $15 SBC with a 3MB binary at the same time.

More Accurate Edge AI

For the recognition of a small-kinds, short-range, low-speed objects, Grove AI has shown good results. However, if let Grove AI module to interfence with 15-animal-kinds model, it is found the runtime latency is larger than 1ms. And it is alsos observed that the overheating hot loop may cause the module to hang.

So, to make better use of Grove AI module well, more community practices are needed.

Reference

[1] https://github.com/open-joinbase/yolov5-swift

[2] https://github.com/open-joinbase/tinywild

animals.py

from ctypes import cdll
import numpy as np  # linear algebra
import pandas as pd  # data processing, CSV file I/O (e.g. pd.read_csv)
import glob
from tqdm import tqdm
import os
import cv2
import matplotlib.pyplot as plt

data_dir = "/iot_in_the_wild/yolo/animal/archive"
train_dir = os.path.join(data_dir, "train")
test_dir = os.path.join(data_dir, "test")

all_train_subdir = glob.glob(train_dir+"/*")
all_test_subdir = glob.glob(test_dir+"/*")

train_classes = [os.path.basename(pp) for pp in all_train_subdir]
test_classes = [os.path.basename(pp) for pp in all_test_subdir]

print("There is %d classes in train dataset, and %d classes in test dataset" %
      (len(train_classes), len(test_classes)))

print(train_classes == test_classes)


train_image_counts = {os.path.basename(
    pp): [len(glob.glob(os.path.join(pp, "*.jpg")))] for pp in all_train_subdir}
test_image_counts = {os.path.basename(
    pp): [len(glob.glob(os.path.join(pp, "*.jpg")))] for pp in all_test_subdir}
# all_image_counts=train_image_counts.copy()
# all_image_counts={k:all_image_counts[k]+test_image_counts[k] for k in all_image_counts.keys()}
train_data_df = pd.DataFrame(train_image_counts, index=["train"]).transpose()
test_data_df = pd.DataFrame(test_image_counts, index=["test"]).transpose()
all_data_df = train_data_df.copy()
all_data_df["test"] = test_data_df
print(all_data_df.head())

all_data_df = all_data_df.sort_values(by=["train", "test"], ascending=False)

yolo_train_dir = "yolo2/train"
yolo_test_dir = "yolo2/test"

for dd in [yolo_train_dir, yolo_test_dir]:
    for ss in ["images", "labels"]:
        print(os.path.join(dd, ss))
        os.makedirs(os.path.join(dd, ss), exist_ok=True)

for subdir_id in tqdm(range(len(all_train_subdir))):
    subdir = all_train_subdir[subdir_id]


def process_dataset(subdirs, dst_dir, class_names, size=(640, 640), link=False):
    for subdir_id in tqdm(range(len(subdirs))):
        subdir = subdirs[subdir_id]
        prefix = os.path.basename(subdir)
        for image_file in glob.glob(os.path.join(subdir, "*.jpg")):
            image_file_basename = os.path.basename(image_file)
            label_file = os.path.join(
                subdir, "Label", image_file_basename).replace(".jpg", ".txt")
            dst_image_file = os.path.join(
                dst_dir, "images/%s_%s" % (prefix, image_file_basename))
            dst_label_file = os.path.join(
                dst_dir, "labels/%s_%s" % (prefix, image_file_basename.replace(".jpg", ".txt")))
            if os.path.exists(dst_label_file):
                continue

            image = cv2.imread(image_file)
            height, width = image.shape[0:2]
            with open(label_file) as fobj:
                with open(dst_label_file, "w") as wobj:
                    while True:
                        item = fobj.readline()
                        if item is None or len(item) == 0:
                            break
                        class_name = prefix
                        item = item[len(class_name):]
                        item = item.split()
                        xmin = float(item[0])
                        ymin = float(item[1])
                        xmax = float(item[2])
                        ymax = float(item[3])

                        cx = (xmin + xmax)/2.0/width
                        cy = (ymin + ymax)/2.0/height
                        bw = (xmax - xmin)/width
                        bh = (ymax - ymin)/height
                        class_id = class_names.index(class_name)
                        output_line = "%d %f %f %f %f\n" % (
                            class_id, cx, cy, bw, bh)
                        wobj.write(output_line)

            if link == True:
                os.symlink(image_file, dst_image_file)
            else:
                image = cv2.resize(image, size)
                cv2.imwrite(dst_image_file, image)


# process_dataset(all_train_subdir, yolo_train_dir, train_classes, size=(640,640), link=False)
# train_subdir = all_train_subdir[0:1]
train_subdir = all_train_subdir[:]
classes = [os.path.basename(pp) for pp in train_subdir]

print("classes:")
print(classes)

process_dataset(train_subdir, yolo_train_dir,
                classes, size=(640, 640), link=False)

test_subdir = all_test_subdir[:]
classes = [os.path.basename(pp) for pp in test_subdir]
process_dataset(test_subdir, yolo_test_dir,
                classes, size=(640, 640), link=False)

yaml_file = "yolov5/data/animal.yaml"
train_images_dir = os.path.join("..", yolo_train_dir, "images")
val_images_dir = os.path.join("..", yolo_test_dir, "images")


names_str = ""
for item in classes:
    names_str = names_str + ", \'%s\'" % item
names_str = "names: ["+names_str[1:]+"]"

with open(yaml_file, "w") as wobj:
    wobj.write("train: %s\n" % train_images_dir)
    wobj.write("val: %s\n" % val_images_dir)
    wobj.write("nc: %d\n" % len(classes))
    wobj.write(names_str+"\n")