The purpose of this project of ours, called Human-firewall project, is to introduce a smart intercom able to notify you on a messaging application with a photo of the person that rang the doorbell and offer an evaluation of the individual in real time.
This way you will know immediately if you can trust the individual in question and if the person is not trustworthy you will avoid exposing yourself, your savings and your home to danger.
N.B: We are not aiming to create a fully fledged intercom but to create a prototype that may get attached to an existing intercom.
From the existing intercom, I will capture the doorbell input which will trigger a camera to snap a photo of the subject, using this photo and some behind-the-scenes magic (spoiler: it's machine learning) I will send the subject-evaluation to the house-owner.
Evaluating subjects using common knowledge over LAN
At this point, you might ask: “how can you evaluate the person that rings the intercom?”.
We opted for a peer-review approach, each of the house-owners will have the ability to classify the people that rings at his house and by doing so a sort-of common knowledge is built.
This knowledge will be shared across all the intercoms that exist on the same network so that, for example, a scammer that has been flagged by your neighbour will be correctly detected even when he comes at your door.
Condominium as our targeted environment
Our projects truly shine in a condominium where there are multiple buildings each with its own intercoms and where there is a shared network (wireless or wired it doesn’t matter).
This could be used also in smaller environments but this would lead to a worse detection rate and a reduced common knowledge for the smart intercoms.
We imagined that this could be also used by shop-owners to protect their business.
Hardware and services
For our demo of smart intercom, we adopted as our reference board a Raspberry Pi 3 (using Raspbian as its OS), but newer versions are good as well as long as it’s possible to establish a connection to a local network.
Together with the board, we used the Pi Camera module V2 which provides HD images of the subjects that ring the bell.
You also need everything in order to give energy to the board (who would ever have guessed that?) and a button that will simulate the user input on the doorbell.
Cloud services? No thanks, the Human-firewall project doesn’t need them, everything will work by exclusively using the LAN so that we can provide the highest level of privacy to the sensible data that will be used (but never stored).
Our project is composed of three parts:
- The slave subsystem, which is used to capture the doorbell input take photos and work with them to recognize people.
- The bot subsystem, which hosts the telegram bot of the condominium.
- The master subsystem, which is the focal point of our shared knowledge architecture.
Given these three subsystems, we have two types of boards, a master board, and a slave board, however, the user does not see any difference between a master and a slave board.
All the boards are supposed to be in the same NATed WiFi/wired Lan, no open ports to the internet are needed, except the ones for the Telegram’s servers which are used by the bot subsystem.
The master system
The master board hosts an instance of the master subsystem which is composed of a web server, an MQTT broker, contains the Telegram bot of the condominium and uses the slave subsystem to also work as a doorbell.
The web server is used to communicate with the slaves, and offers several endpoints:
- Identification endpoint: So that new slaves in the network can find it scanning the LAN.
- Ring endpoint: Used to manage the event when a person rings a doorbell connected to the network, the master will send the photo and the evaluation data to the Telegram bot, allowing it to notify the event to the user and to identify who has rung.
- Timestamp endpoint: When interrogated it gives the last timestamp in the database, allows slaves to understand if they are up-to-date.
- Recovery endpoints: Allows the slaves to download the feedback and the recognition data from a given timestamp to the last timestamp.
Those endpoints are used by the slave systems and so there is the need to initialize the master before any slave.
The slave system
The slave board contains only one instance of the slave subsystem.
The slave subsystem is composed by an SQLite database to maintain the feedback of the users and an in-memory database of feature vectors, with an on-disk copy to support restarts, which are the way in which we represent people faces.
The feedback database uses a composite key, formed by:
- Feature vector, which identifies the person who has rang and needs to be classified
- Chat ID, to identify the person who is giving the feedback
With this schema, one user can cast only one feedback on the person who has rung, even across multiple doorbells and in multiple ring events, so we update an old and possibly incorrect feedback given by a user on a specific person instead to add to the database the updated feedback. This is done to keep our classification meaningful and updatable.
Upon startup the will locate the master by scanning the network to locate the identification endpoint, then, having located the master it will initiate the recovery procedure, by checking if its data is updated and downloading the updates, if necessary.
After having located the master and updated the databases, it will subscribe to the MQTT broker to receive the new updates from the whole system and it will wait for a local ring event.
This recovery process comes pretty handily both at the first initialization and in case of disaster recovery (for example a slave has powered down, potentially ending up with a desynchronized DB with respect to the master or other slaves).
When the local ring event is intercepted (i.e when someone rings the slave doorbell) a photo is captured and by using Dlib and face_recognition libraries the person’s face is located and encoded as a feature vector. Then, the slave interrogates its local feedback database to see if for that person there is any feedback; then the photo, the board id, and the feedback is sent to the master board, through the ring endpoint.
If at any moment a message from the MQTT broker is received, its content will be added to the feedback and encoding database, increasing the board knowledge.
The bot subsystem
The bot subsystem is hosted only on the master board and it’s used to interact with users, to notify them of ring events and to collect and distribute their feedback.
The notifications are sent to users’ Telegram account through the use of the Telegram APIs.
We opted for using Telegram because, by doing so, we can easily reach every platform (iOS, Android, Windows, Mac, Linux) without the need of implementing our own mobile application and server infrastructure, at the same time the user will not be bloated with yet another app in his beloved device.
The Telegram bot uses a chat ID to identify a user and a board id, obtained by hashing the board’s serial number, to identify the doorbell.
The bot is configured by the user when it specifies the ID of the doorbells for which it will receive notifications, for a more user-friendly application we let the user specify a name for each doorbell configured, which is much easier to remember than the id.
These configuration settings are maintained in the user database, where we store the user chat ID and the configured doorbells.
When a ring event is notified to the master board it will notify the bot subsystem which will send a notification, containing the photo and the classification of the person, computed as a majority vote from the available feedback, to every user that has configured a doorbell with the corresponding event board id.
A user then can see the event on their phone and, if the system has located the face of the person, then the user can also see the person classification, if a previous feedback is available, and give their own classification.
When a classification is given, the bot subsystem will publish a message into an MQTT queue, in which all slaves are subscribed, with the chat id, the feature vector we use to identify the person and the classification given by the user.
Face recognition library
An interface to the dlib's state-of-the-art face recognition library that provides several face locations algorithms and pre-trained vectorization models.
We use the HOG model (O(#pixels)) to locate faces instead that the default CNN model of dlib, a less accurate model but also less CPU intensive.
To compute encodings, the pre-trained model is based on ResNet, a Deep CNN
We use a threshold of 0.6 in the Euclidean product to detect if two encodings are referred to the same person, a bit high to scale but it is ok for a condominium.
Mosquitto and Paho
We choose Eclipse Mosquitto as our broker, an open source implementation of an MQTT broker.
Mosquitto is lightweight and is suitable for use on devices with a low CPU clock.
We choose to not have persistent storage of messages to not waste disk space.
The MQTT client we opted for is the Eclipse Paho project, which provides open-source client implementations of MQTT and MQTT-SN messaging protocols for multiple programming languages (Python included).
On the Raspberry Pi 3 the multiprocess execution of dlib causes a deadlock, so we had to force the execution of the dlib library on a single thread with the following workaround, which is already included in our setup script:
export OPENBLAS_NUM_THREADS=1 export OPENBLAS_MAIN_FREE=1
Flask is a microframework for Python based on Werkzeug and Jinja 2.
We use it to host the master board web server since it is an extensible framework which can be easily tailored for several purposes.
For the basic needs of the project, we only choose the Flask core module and some basic extensions which make our web server lightweight and fast.
To interact with Telegram’s APIs (over HTTP) in the bot subsystem we opted for a community made wrapper named python-telegram-bot.
The wrapper implements a wide range of functionalities from sending messages to conversation handling, while, preventing synchronization issues from appearing.
It has a small memory footprint and low CPU utilization because it uses webhooks to prevent polling Telegram’s servers.
- Deploy the application:
Grab the code in the master_scripts and execute install.py to setup the master board.
It works either on a laptop with Ubuntu and on a Raspberry with Raspbian OS.The script will ask you for a telegram API key.
If you want to add one or more slaves, you can do it by using the code in slave_scripts. On Raspbian the services are also added to systemd and started at boot.
- Setup the board on telegram:
Start the bot and give the /configure command, to name and add a board
- Use the doorbell:
Press the button on the doorbell to receive a notification and leave a feedback, once the feedback has been given the system will use it for future predictions