The Raspberry Pi is a great platform, but something that is often frown upon is that it relies on a SD card for its system storage.
They get corrupted...
They fail...
They are bad...
First of all, this is because cheap, bad quality SD cards are often used. They are great to start playing with your Pi, but if you are not just playing then you may want to invest in an industrial-grade SLC SD card which will perform as well as any other storage alternative.
Then, often external factors lead to filesystem or data corruption which are not related to SD cards. For instance, an unstable power supply could cause these issues on any storage media... but on Raspberry Pi the poor SD card is the only one to blame, so let's blame it...
SD cards are fine...On the other hand SD cards are a very convenient solution: small, easy to flash, easy to replace, easy to find... basically, they are just the best option for something like the Raspberry Pi. There, I said it...
What else is there?Now enters the Raspberry Pi Compute Module: they had the great idea of bringing the Raspberry Pi platform to an easier embeddable format for industrial applications.
The standard version comes with an embedded eMMC Flash device which replaces the SD card. Then there is the "Lite" version which instead replaces the SD card with... well... nothing! In fact, the bus lines which normally connect the CPU to the SD card are simply routed to the DDR2 SODIMM connector (where all the GPIO pins are too) so that you can connect it to whatever storage media you prefer...
Wait, what?And if you can connect it to one storage, then maybe you can connect it to something that switches between more of them... and SD cards are so convenient... maybe, if you are still not convinced on the reliability of 1 SD card, would 2 do?
And even if you already are a SD card supporter, well, having 2 of them comes with great perks... I'll get to that soon.
More SD cards, please!Introducing Strato Pi CM Duo!
The Strato Pi family encompasses a range of industrial servers based on the Raspberry Pi platform, specifically the Strato Pi CM series is based on the Compute Module and its latest version is Strato Pi CM Duo.
Amongst all the other reliability and security features shared with the other modules in the Strato Pi family, its highlight is the support for 2 SD cards!
First of all, how does it work?
Basically the SD bus lines of the Compute module are connected to a high-speed switching matrix that can route the bus to 2 different microSD card holders. With simple commands from the Raspberry Pi you can then switch between them.
Now you are saying: "But if I switch the SD card while the system is running wouldn't it break everything?" Yes, of course it would... be patient...
Now you are thinking: "OK, anyway it would be much cooler if I could access both SD cards at the same time..." Again, yes it would... and you can!
More SD card access, please!If you dig deeper in the Compute Module specs you'll find that, other than the primary SD interface from which the system boots, there is a secondary one which can also be connected to an external storage.
So instead of leaving the second SD card hanging, Strato Pi CM Duo allows for connecting it to the secondary interface and use it as an extra external storage.
This secondary interface is also available on the Compute Module with eMMC memory, so here you can have extra storage on SD card too, but not switch between them for booting :/
What am I supposed to do with all this now?There's many things you can do with this double-swapping SD cards configuration:
- OS/data storage separation: have your system files in your primary SD card and use the secondary one only for data storage, so if the secondary one (which would be the one where you write the most) fails, your systems continues to work.
- System redundancy: have a plain copy of the system on both SD cards. If one fails you switch to the other.
- In-field full-system upgrades: after many years of continuous uptime with small run-time updates and patches, your system has arrived to a point where it needs a full upgrade that you simply can't perform while the system is running. No worries, just flash the SD on the secondary interface with the brand-new system image, once it's done, reboot from it... and since you're at it, flash the once-primary SD too.
- Isn't all this enough? what else would you do?
OK, let's get our hands dirty. Here I'll show you how to configure Strato Pi CM Duo for system redundancy, i.e. we'll have 2 (almost) identical SD cards and we'll switch between them when one (pretends to) fail.
In the following, I'll be assuming you have some basic knowledge about accessing your Pi via SSH, editing files from shell and some other basic shell commands. No more is required.
SD cards setupFirst of all get 2 microSD cards and flash them with your favourite flavour of Raspbian. I'll go with Raspbian Buster Lite.
I'm sure you know how to, but just in case, here is a useful link: https://www.raspberrypi.org/documentation/installation/
Since we won't have access to the Pi through keyboard and display, we need to enable SSH access: to this end, you simply need to add an empty file named "ssh" to the /boot/ partition of the SD cards.
Now setup your Strato Pi CM Duo and place the 2 SD cards in their slots:
Power it up. The system will boot from SDA (the one on the bottom).
SSH into the system. You can either find the IP address assigned to the Pi or try using its default hostname "raspberrypi":
$ ssh pi@raspberrypi.local
Enter the password (default "raspberry") and you are in.
Strato Pi kernel module installationThe simplest way to control your Strato Pi CM Duo is to install its kernel module that allows for accessing all the functionalities via sysfs, i.e. by simply reading and writing (virtual) files.
You find a step-by-step guide on its GitHub repo.
Make sure it is properly installed and automatically loaded at boot.
Mission-critical applicationNow, we'll create a simple service that starts at boot and simply blinks Strato's LED. This will be our never-to-be-stopped application.
Create a file called "unstoppable-led.sh" in /usr/local/bin/ with this content:
#!/bin/bash
while :
do
echo F > /sys/class/stratopi/led/status
sleep 1
done
make it runnable:
$ sudo chmod +x /usr/local/bin/unstoppable-led.sh
try running it:
$ /usr/local/bin/unstoppable-led.sh
You'll see the L1 LED switch on and off every 1 second. Stop it with Ctrl-C.
Now we create a systemd service that will call this script at boot and restart it if, for any reason, terminates.
Create a file called "unstoppable-led.service" in /etc/systemd/system/ with this content:
[Unit]
Description=Unstoppable LED
[Service]
Type=simple
ExecStart=/usr/local/bin/unstoppable-led.sh
Restart=always
[Install]
WantedBy=multi-user.target
Enable the service:
$ sudo systemctl enable unstoppable-led
Reboot the system:
$ sudo reboot
When it restarts, the LED will keep blinking forever.
So far, if the process hangs or stops running, no one will notice and our LED won't be so unstoppable.
Let's introduce Strato Pi's hardware watchdog functionality to the table.
Other than blinking the LED, we'll signal that our application is running by toggling the watchdog heartbeat line. Modify unstoppable-led.sh as follows:
#!/bin/bash
echo 10 > /sys/class/stratopi/watchdog/timeout
while :
do
echo F > /sys/class/stratopi/led/status
echo F > /sys/class/stratopi/watchdog/heartbeat
sleep 1
done
And restart the service:
$ sudo systemctl restart unstoppable-led
Let's configure the watchdog to be always enabled, reduce its control timings and save this configuration in Strato's MCU:
$ echo 10 > /sys/class/stratopi/watchdog/down_delay
$ echo 10 > /sys/class/stratopi/power/down_delay
$ echo A > /sys/class/stratopi/watchdog/enable_mode
$ echo S > /sys/class/stratopi/mcu/config
The /sys/class/stratopi/watchdog/timeout is set in the script because, when the watchdog is set to always enabled mode, after a power-cycle it is restored to 60. (I will not go into details on why this is done, if interested you can read the docs).
With this configuration, if the watchdog doesn't see a heartbeat for 10 seconds, it will wait 10 seconds and initiate the power-cycle, which in turn will wait 10 seconds more and then power off the Compute Module.
In a real scenario one would want to monitor the watchdog timeout warning line (/sys/class/stratopi/watchdog/expired) and, if enabled, try a clean system shutdown. Here we'll simulate a severe hang of the system that can only be fixed by a plain old off/on.
So now let's see what happens when we stop our process:
$ sudo systemctl stop unstoppable-led
When the LED stops blinking, after 30 seconds Strato's watchdog will forcefully reset the Compute Module and restart it. After that the LED will shine again.
Already this looks like a more unstoppable LED, but what if our process stopped because our SD fried up and the system can no longer manage to boot from it?
Let's fix that!
Since Strato Pi has been reset, you have been disconnected, so SSH again and login.
We need to setup our secondary SD (SDB) as we did for our primary one. Let's disable the watchdog for a moment and configure Strato to boot from SDB as default:
$ echo D > /sys/class/stratopi/watchdog/enable_mode
$ echo B > /sys/class/stratopi/sd/sdx_default
$ echo S > /sys/class/stratopi/mcu/config
$ echo 1 > /sys/class/stratopi/power/down_enabled
The last command tells Strato to initiate a power-cycle immediately.
When it reboots, SSH into it again and you will find yourself in SDB-land! You can see this by noticing that the LED is not blinking.
Note: when you SSH into the Pi, you may get a warning about the host identification being changed. This is true, since we are booting from a different system that has different keys. If this is blocking you, you need to delete the corresponding entry from the "known hosts" list. How to do this depends on your OS; on Linux/Mac:
$ ssh-keygen -R <hostname or ip>
So, now you need to go back to the "Strato Pi kernel module installation" chapter and re-do everything for SDB until you have your LED blinking at boot.
You can skip the intermediate points and just use the final configuration. One more thing, use the following "unstoppable-led.sh" script, instead of the same one, so that we can distinguish which SD we are booting from by the speed of the LED blinking:
#!/bin/bash
echo 10 > /sys/class/stratopi/watchdog/timeout
while :
do
echo F > /sys/class/stratopi/led/status
echo F > /sys/class/stratopi/watchdog/heartbeat
sleep 0.2
done
Please, don't get trapper into a loop, just go back only once and stop before reaching this point ;).
Go!
OK, if you are back here it means that you have setup everything on SDB and your unstoppable LED is blinking fast.
Now, we configure the watchdog so that, if the heartbeat fails and the reset kicks in, it will automatically switch SD card to boot from while the Pi is off (see, no switching while the system is running):
$ echo 1 > /sys/class/stratopi/watchdog/sd_switch
Let's make sure the watchdog is configured as always enabled:
$ echo A > /sys/class/stratopi/watchdog/enable_mode
$ echo S > /sys/class/stratopi/mcu/config
And let's see what happens if we stop the unstoppable:
$ sudo systemctl stop unstoppable-led
After 30 seconds the system will reboot from the other SD and you are back to your slow blinking, i.e. boot from SDA.
Login back and stop the process again, Strato will reset and go back to fast blinking, i.e. boot from SDB.
You could even try to remove power from Strato Pi, remove SDB (which is now our primary SD) and see what happens when you start it again...
Strato will try to boot from SDB, but there is no SD to boot from, so the watchdog will not receive its heartbeat signal, after 60 seconds (we discussed this above) plus 20 more seconds of delays Strato will reset and boot from SDA, and our LED will continue to shine...
Is this real life?It is, but in real applications there's more things you would want to do to ensure reliability for your system. For instance, you could update the heartbeat only if a series of health checks on your application and the file system are passed. Then you could have a separate process monitoring the watchdog timeout line and attempt a clean shutdown before the reset triggers. And if your process is storing data, you might want to copy this data on the secondary SD too, so that when it becomes primary, it is up-to-date with the work done so far. And of course, if things start to fail you may want to signal it to someone so that, for instance, a broken SD car can be replaced before the other one fails too (which is NEVER going to happen! ;)).
That's it folks, I hope you enjoyed it!
Now go, and make your applications unstoppable!
Comments