This article outlines a cookbook about creating a monitoring system for a connected embedded system. We used it to create one that can monitor all type of devices (even those that run Bare-Metal and RTOS), and we thought it’d make sense to help others build similar systems. This cookbook could easily be used to monitor device operational data as well as your actual application data. After deploying an IoT system, you need to make sure everything is working as it should and close a feedback loop from the field. We’re about to show you how to build one using open source tools almost for free. We also have a post that outlines the need for such a system (Check It Out).
Let’s focus on three things:
- Device and gateway code that streams data to a centralized backend.
- Backend and database to process and store the data.
- Frontend to display the data.
Once these three building blocks are in place, acting upon the monitored data and connecting it to your operation processes is relatively easy.
Device and Gateway Logging tool
When thinking about a logging utility hosted on your end devices and gateways, the things that come up are:
- It needs to have a small footprint.
- Never crash my device.
- Consume little power.
- Not disruptive to the application code.
Your devices mean your code. It’s challenging to generalize this topic as a lot depends on the communication channel used to connect to the internet. We open sourced a logging tool that logs events from your end devices and sends it to your gateway/backend. If you are using nRF52 with BLE GATT and a gateway we also provide you with a sample that works out of the box. Go to our GitHub to get started.
- jumper-ulogger — is an under 500B memory footprintlogging framework you integrate into your end device (with examples for porting to nRF52 device and CC3200).
2. jumper-logging-agent — is the service that runs on the gateway.
3. jumper-ble-logger — is a logging agent service example for a BLE GW.
If you have any issues or question send us a note to firstname.lastname@example.org.
Backend and Database
This is where you should pay attention more carefully because there’s some good stuff here: I recommend using the following toolsets to make this work:
- A key-value databaseto store a current snapshot of a device’s state (we tested Google’s Firebase and Amazon’s DynamoDB). This is your shadow device.
- A time-series databaseto store historical device data and query data for reports and graphs (we tested InfluxDB).
- A serverless infrastructureof choice for minimal required processing (we used AWS Lambda and Google Cloud Functions using the serverless framework).
Let’s break it down using an example. Say you’re managing a fleet of 10,000 devices and you want to monitor the battery-life and signal strength of the device. If you’re not sensitive to bandwidth and power consumption, you might opt to send the device data to the backend as frequently as possible. A data message to the server will look like this: (time, battery-level, signal-strength). If you’re concerned with bandwidth, which is something that we found to be very common, you might split the messages between battery life and signal strength to prevent from sending duplicate data to the backend. You might even not send the time of the event, as you can automatically tag the time on the backend.
In the first case, where you send all the data from the device to the backend all the time, you don’t need the key-value DB. You simply store the message directly into the time-series database.
In the second case, you’ll be sending data in partial messages contain a subset of the data you’re monitoring. That’s where the key-value DB kicks in. The key-value DB keeps an up-to-date reflection of your device state. Every time a partial message is received, it’s data is updated to the key-value DB. The updated device state is then fetched from the key-value DB, and stored onto the time-series database. We recommend this architecture because it’s more likely to scale than just using a time-series DB. Time-series DB are really good at inserting data and handling complex queries. It doesn’t scale that well when it comes to performing a lot of small queries. Key-value DBs, on the other hand, are really great at that.
We ended up using Google Firebase for the key-value store for multiple reasons. We actually built all the application engine, authentication and DB based on it, so using it for the device data was an easy choice.
As for time series DB, we ended up using InfluxDB, hosted on InfluxData servers for $160 a month. Not super cheap, but scales to as many as 10 000 devices (not depending on data load). That’s less than $0.25 per year per device. The downside with InfluxDB is that you have to really be careful about the data fields you define as indexes. The DB could easily suffer for performance issues due to over-indexing.
We also tried Keen.io. Keen is superior when it comes to querying data. The downside is pricing. I must point out that we discussed several pricing issues due to some mistakes we’ve made with inserting data to Keen, and we found their support team to be very responsive and fair.
The glue that puts everything together is the backend code that’s used for authentication and DB storing. We’re very experienced when it comes to scaling backend systems. Our recommendation is simple — don’t build backend systems. Use serverless options. We designed a simple micro-service that receives a message from a device, stores the up-to-date device data to the key-value DB and stores the full data to the time-series DB. Yes — it’s more expensive than managing your own infrastructure. But the amount of time you’ll invest in maintaining servers when you reach even 3,000 devices will offset the cost to favor of serverless architecture.
No need to invent the wheel here too. Keen provides a frontend for generating graphs and embedding them into other websites. We used Grafana as a frontend platform as it comes bundled with InfluxDB’s $160/m plan. Grafana also connected another solid choice for your longer-term DB — elastic search.
Detailed System Diagram
Opting for managed and serverless options has helped us create a system that could handle a load of up to 10,000 devices. Scaling upward from there only requires adding more database instances. If you’re looking for an even easier choice — go with the version elastic search that’s provided as a service from Amazon.
Read the original version of the article here.
This article was written by Jonathan Seroussi, the Co-founder and CEO at Jumper Labs. Jonathan was a co-founder and CTO @ VisualTao, and online CAD company which went on to be acquired by Autodesk, Inc.