There are some very real challenges when providing real-time data. Among them, making sure that it stays relevant and real-time. Over time, real-time air quality data has gotten even better at BreezoMeter, thanks to new processes developed by our engineering R&D team.
Air pollution monitoring station data is collected from multiple sources around the globe, but sometimes this data is delayed in being released from its original source or is offline for any number of other reasons completely out of our control. Additionally, processing that data takes time and we want to eliminate the loss in accuracy embodied in these delays for our customers and their end users. Another important thing to mention here is the fact that the delay of monitoring stations is not consistent over time, and varies between data sources. Some air pollution dispersion and meteorological models incorporated into our system already use prediction methods to keep data as real-time as possible, so we want to make sure that all of our data layers, including air pollution monitoring station data, can be assigned the same ‘timestamp’ before being processed by our algorithms. Synchronizing data sources in time is important for understanding the air quality situation.
Synchronizing all of these sources poses an enormous challenge when trying to spatially represent what the data looks like in real time. The result of this synchronization makes the outcome more reliable for making important decisions relating to minimizing exposure to air pollution around us.
We have always had the most up-to-date data that was available to work with, and processed it as quickly as possible, but there is always an inherent lag time that we were determined to minimize. Not only that, we didn’t want our output to be affected by any delay in the source data availability, and this was most important.
Given the complexity of reporting air quality data in real-time, BreezoMeter’s R&D team has been working on models to accommodate for these challenges to provide an even more accurate and real-time output for our customers and end users.
How do we provide data that is accurate for now, when data sources provide inputs with various delays?
As a big-data company whose services are driven by machine learning and models, we knew that we had a big challenge at hand, but one for which we could engineer a solution.
Meet Catchup.
It’s how BreezoMeter meets the challenge of real-time data reporting. One-of-a-kind, it’s faster, more predictive, and has fewer delays.
The Catchup project was initiated by BreezoMeter’s environmental and software engineers in order to address some of the challenges of wanting to provide data that is real-time and of course, since we’re talking about air pollution, location-based, since air pollution is so dynamic, changing from street to street, hour to hour.
This blog post will walk you through the basics of this predictive model, while a follow-up post will discuss how we are continuously checking our accuracy, including with the Catchup project running.
The Stages of Catchup
There are two main steps in the Catchup process: we have called them Learn and Predict.
LEARN
Learn is an offline process that takes place at regular intervals. The main goal is to create a “black-box” function that will ultimately produce predictions of air pollutant concentrations. In practice, the Learn process involves “learning” the patterns of a large chunk of data from several months, including air pollution measurements, meteorological data, and other relevant variables. These found patterns are expressed through mathematical equations, and “packed” within the said “black-box” functions.
The Learn process is done separately for each and every sensor in each station with its unique criteria. This way we get hundreds of thousands of “black-boxes,” or prediction functions.
The ‘black boxes’ that are created during the Learn phase are put to use during the next stage of Catchup, called Predict.
PREDICT
The Predict phase is one of the preceding steps to the spatial algorithms. This phase helps to synchronize the timestamp of all data points which will be fed into the algorithm, to the time at which the algorithm is set to finish running.
The algorithm itself is what enables BreezoMeter to plot the data points on the map and provide location-based data down to the city block resolution.
A look at the steps involved:
- Each new air pollution measurement goes through a QA process and is saved in the database.
- When it is time to calculate an updated real-time map, the prediction process starts.
- All relevant measurements are pulled out of the database and fed into the Catchup prediction functions – the “black boxes” we mentioned before.
- The prediction functions output new concentration values, which are the most updated data points.
- These are fed into the next stages of our algorithm to produce the real-time map.
We have very strict thresholds for accuracy, which means that once the system sees that a data point is delayed by a certain amount, it is automatically disqualified. In contrast, if the data point is delayed but still falls under the threshold, its value is compared with other models to determine its accuracy. If the accuracy score is too low, the data point is disqualified and a value from a different model is used in its place.
Wrapping up Catchup
The data we use originates from well-established and reliable sources. It is always the most up-to-date possible. The difference BreezoMeter provides comes from what we are able to do with the data before passing it on to our customers via our air quality API. Using the prediction methods described above to compensate for delays in availability and processing times, while continuously measuring accuracy, is the only way to report the most real-time air quality data possible, to be integrated into technologies, apps, and devices worldwide. This represents just one of the ways that BreezoMeter offers added value to our customers and ultimately to their end users. Other unique characteristics of the air quality data we provide via API are the integration of traffic data, local weather data, using enhanced interpolation models, and running continuous quality assurance.
Talk to an air quality specialist at BreezoMeter to learn how integrating outdoor air quality data into your product can give your customers the added value and better health opportunities they want.
More posts about air quality:
Ms. Shaked Friedman has a B.Sc from the Technion, Israel Institute of Technology, and has been an Environmental Engineer in the Research and Development Team at BreezoMeter since 2014.