In this edition, Dan Wright and Clayton Yochum from Everactive join us to share how they’re bringing analytics and real-time device monitoring to scenarios and places never before possible. Learn how they’ve set up their data stack, their database evaluation criteria, their advice for fellow developers, and more.
Everactive combines battery-free, self-powered sensors and powerful cloud analytics to provide “end-to-end” Industrial IoT solutions (IIoT) to our customers (read more about Industrial IoT). Our low-power semiconductor and networking technology is the foundation for our self-powered, always-on wireless sensors. We design and build our sensors in-house (down to the chip), and they’re ruggedized for harsh settings and can operate indefinitely from low levels of energy harvested from heat or light. This means our customers’ devices can continuously stream asset health data, despite radio interference and physical obstacles - like equipment, ducts, and pipes - common in industrial settings.
Since they charge themselves, these sensors stay operational well beyond what’s possible with traditional, battery-powered IIoT devices. We ingest data from thousands of sensors into TimescaleDB, then surface it to our customers through dashboards, charts, and automated alerts.
Our initial products are designed to monitor steam systems, which are used in various industries and applications, like process manufacturing, chemical processing, and district energy, as well as a range of rotating equipment, such as motors, pumps, fans, and compressors. Currently, we serve large, Fortune 500 manufacturers in many sectors, including Food & Beverage, Consumer Packaged Goods, Chemical Process Industries, Pharmaceuticals, Pulp & Paper, and Facilities Management.
We show customers their data through a web-based dashboard, and we also have internal applications to help our in-house domain experts review and label customer data to improve our automated failure detection.
About the Team
We’re a small team of software and data engineers, spanning the Cloud and Data Science teams at Everactive.
Between us, we’ve got several decades of experience managing databases, pipelines, APIs, and various other bits of backend infrastructure.
About the project
Our key differentiator is that our sensors are batteryless: the custom low-power silicon means that they can be put in more places, without requiring servicing for well over a decade.
In turn, this means that we can monitor factory devices that were formerly cost-prohibitive to put sensors on, due to the difficulty or cost associated with charging batteries; being able to collect data economically from more equipment also means that our industrial data streams are more detailed and cover more equipment than our competitors’.
Today, customers place our sensors on steam traps and motors, and we capture a range of metrics – from simple ones, like temperature, to more complex ones, like 3D vibrational data. (You can learn more about steam trap systems and the need for batteryless systems in this overview video.)
We then use this data to inform our customers about the health of their industrial systems, so they can take action when and where required. “Action” in this sense could mean replacing a steam trap, replacing a bad bearing in a machine, or various other solutions to problems.
For example, we’ll automatically alert customers if their monitored equipment has failed or if machines are off when they should be on, so customers can send a crew to fix the failure, or power on the machine remotely.
In addition to receiving alerts from us, customers can use our dashboards to check the latest data and current status of their equipment at any time.
As mentioned earlier, our team’s responsible for delivering these intuitive visualizations to our customers and in-house domain experts – as well as for feeding sensor metrics into our custom analytics to automate failure detection and improve our algorithms.
Before TimescaleDB, we stored metadata in PostgreSQL, and our sensor data in OpenTSDB. Over time, OpenTSDB became an increasingly slow and brittle system.
Our data is very well-suited to traditional relational database models: we collect dozens of metrics in one packet of data, so it makes sense to store those together. Other time-series databases would force us to either bundle metrics into JSON blobs (making it hard to work with in-database) or to store every metric separately (forcing heavy, slow joins for most queries of interest).
TimescaleDB was an easy choice because it let us double-down on Postgres, which we already loved using for metadata about our packet streams. We looked briefly at competitors like InfluxDB, but stopped considering them once it was clear TimescaleDB would exceed our needs.
Our evaluation criteria was pretty simple: will it handle our load requirements, and can we understand how to use it? The former was easy to test empirically, and the latter was essentially “free” as TimescaleDB is “just” a Postgres extension.
This “just Postgres” concept also lets us carefully manage our schema as code, testing and automating changes through CI/CD pipelines. We use sqitch, but popular alternatives include Flyway and Liquibase. We like sqitch because it encourages us to write tests for each migration, and it is lightweight (no JVM).
We previously used Alembic, the migration component of the popular SQLALchemy Python ORM, but as our TimescaleDB database grew to support many clients, it made less sense to tie our schema management to any one of them.
We maintain a layer of abstraction within TimescaleDB by separating internal and external schemas.
Our data is stored as (hyper)tables in internal schemas like “packets” and “metadata,” but we expose them to clients through an “API” schema only containing views, functions, and procedures. This allows us to refactor our data layout, while minimizing interruption in downstream systems by maintaining an API contract. This is a well-known pattern in the relational database world – yet another advantage of TimescaleDB being “simply” a Postgres extension.
Current deployment & future plans
TimescaleDB is clearly much faster than our previous OpenTSDB system...in OpenTSDB, [one common query for recent data] required nearly 10 minutes to load and our first TimescaleDB deployment brought that down to around 7 seconds.
We use Timescale Cloud and love it. We already used Postgres on AWS RDS and didn’t want to have to manage our own database (OpenTSDB convinced us of that!).
It had become normal for OpenTSDB to crash multiple times per week from users asking for slightly too much data at once. TimescaleDB is clearly much faster than our previous OpenTSDB system. More importantly, nobody has ever crashed it.
One not-very-carefully-benchmarked but huge performance increase we’ve seen?
We have a front-end view that requires the last data point from all sensors: in OpenTSDB, it required nearly 10 minutes to load (due to hard-to-fix tail latencies in HBase), and our first TimescaleDB deployment brought that down to around 7 seconds. Further improvements to our schema and access patterns have brought these queries into the sub-second range.
✨ Editor’s Note: For more comparisons and benchmarks, see how TimescaleDB compares to InfluxDB, MongoDB, AWS Timestream, and other time-series database alternatives. To learn more and try Timescale Cloud yourself, see our step-by-step Timescale Cloud tutorial.
Timescale Cloud has been so good for us that it’s triggered a wave of transitions to managed solutions for other parts of our stack. We’ve recently moved our AWS RDS data into Timescale Cloud to further simplify our data infrastructure and make it easier and faster to work with our data.
As you’ll see in the below diagram, our sensors don’t talk directly to TimescaleDB; they pass packets of measurements to gateways via our proprietary wireless protocol. From there, we use MQTT to send those packets to our cloud.
From our cloud data brokers, Apache NiFi processes and routes packets into TimescaleDB (and Timescale Cloud), and our TimescaleDB database powers our dashboard and analytics tools.
We don’t take full advantage of TimescaleDB yet. It’s been so much better out-of-the-box than what we had before that we haven’t bothered to do much optimization: no space partitions, no compression, no continuous aggregates. Compression is on our to-do list; we’re installing larger and larger sensor fleets for new customers, and all of these packets are consuming a lot of space.
✨ Editor’s Note: For more information about the features Clayton and Dan mention, see how and why we built TimescaleDB compression, step-by-step continuous aggregates tutorial, and API documentation (straightforward information for adding space and time dimensions).
We’ll continue to innovate on our technology platform and increase Everactive’s product offerings, including: improving our sensors’ wireless range, lowering power requirements to increase energy harvesting efficiency, integrating with additional sensors, and shrinking device form factor. These successive chip platform enhancements will allow us to monitor the condition of more and more assets, and we’re also developing a localization feature to identify where assets are deployed.
Ultimately, Everactive’s mission is to generate new, massive datasets from a wealth of currently un-digitized physical-world assets. Transforming that data into meaningful insights has the potential to fundamentally improve the way that we live our lives – impacting how we manage our workplaces, care for our environment, interact with our communities, and manage our own personal health.
Getting started advice & resources
If you’re evaluating your database options, two recommendations based on our experiences:
First, if you have enough time-series data that a general database won’t cut it (millions of rows), TimescaleDB should be your first choice. It’s easy to try out, the docs are great, and the community is very helpful.
Second, don’t underestimate the importance of using solutions that leverage a wide knowledge base shared by many/most back-end developers. The increase in team throughput and decrease in onboarding time afforded by TimescaleDB - everyone knows at least some SQL - in contrast to OpenTSDB - an esoteric thing built on HBase - has been a huge advantage. We expected this to some degree, but actually experiencing it firsthand has confirmed its value.
Additionally, the use of schema-as-code tools and an internal/external schema separation discussed above have also been cornerstones of our success. We hadn’t been using these tools and patterns at Everactive previously, but have since seen them catch on in other projects and teams.