# BreezoMeter’s Continuous Accuracy Testing for Reliable Air Quality Data

BreezoMeter’s Algorithm Team explains how the accuracy of our computational air quality model is continuously tested and examined for error. Learn how BreezoMeter continuously monitors its air quality data algorithms in order to ensure high levels of accuracy for individuals and businesses around the world.

## Why is it Important to Calculate and Monitor Air Quality Data Accuracy?

When delivering a product which is the result of a computational model, one of the first things that needs to be checked is the quality of the result – is it good enough for the end-users’ needs? In the case of air quality data – is it accurate enough to help people reduce their exposure to harmful air pollutants in the environment around them?

A suitable way to answer these questions is through testing the accuracy of the model as a whole. This kind of test typically produces a single result that can act as a metric for the accuracy of the product as a whole.

For example, when looking at a weather temperature forecast of 20 degrees Celsius, it can be useful to know how likely it is that this number will be close to reality: is it usually wrong by 1 degree or by 15 degrees? The same principle goes for air quality data.

Knowing the performance of a model, and the performance of the different parts which make up the model, is very valuable – it can point out any weak points or data anomalies, and provide insights into what areas can be improved and how.

## How Can Air Quality Data be Checked for Accuracy?

Assessing the quality of a model’s result, also called model validation, is basically done by comparing the result to data considered as true. In the field of statistics, this “true data,” or ground truth, can be any set of data points that represents the objective of the model.

#### Cross-Validation

In case the model is of a physical phenomena, ground truth can be obtained through measurements: By going out into the field with equipment and measuring the phenomenon that the model was designed to calculate. In the event that the ground truth measurement itself is part of the model’s input, another way to assess the result is by cross validation.

Cross validation (CV) is a statistical technique for model validation in which a subset of the input data is left out, to later serve as the ground truth and be compared against the model’s results. This comparison can be a simple subtraction, which yields the model’s errors.

This process is typically done more than once, each time with a different part of the data left out, to account for the data’s variance (each data chunk is slightly different). The results of all comparisons – which are the errors – are then combined together, to best represent the overall situation. A possible way to combine the errors is by taking their average, where n is the number of repetitions:

\$\$Error = frac{1}{n}sum_{i=1}^{n} (ground;truth)_i – (model’s;result)_i\$\$

There are several types of CV, named by the way the data is divided. For example, in k-fold cross validation, the input data is randomly partitioned k times, and the model is run k times, each time with another partition as the input.

#### Leave-One-Out Cross Validation Method

Another example is the leave-one-out cross validation (LOOCV), in which a single data point is removed from the input each time the model is run.

## Detecting Extreme Values with Percentiles

“Mean, or average”, are widely known terms and simply mean the sum of a group of numbers, divided by the amount of numbers in the group.

“Percentiles and median”, on the other hand, are slightly less used in non-scientific contexts. A percentile is “the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found.” (Wikipedia, 2018)The Median is the 50th percentile – it is the value in the middle of the dataset.

Percentiles are important for a deeper understanding of the model’s errors and behaviour, since they provide information on the distribution of the error values: are most errors close to the mean? Are most errors low and only a few of them very high?

A similar insight can be made by using the root mean squared error (RMSE).

\$\$RMSE = sqrt{frac{1}{n}sum_{i=1}^{n} ;[(ground;truth)_i – (model’s;result)_i]^2}\$\$

By squaring the errors we give more weight to larger values, making this statistic more sensitive to outliers (i.e. extreme values). Therefore, if the RMSE is significantly larger than the mean, we know there are probably some large error values in our results.

## Visualizing Continuous Accuracy with CAT Reports

To complement the Continuous Accuracy System (CAT) system, we also use graphs reported dynamically to view the results – a live CAT report. We’ve built our CAT reports in Google’s Data Studio which enables great flexibility: each type of accuracy test we perform has its own section and graphs, which can show data from varying time periods.

This dynamic CAT report is used on a daily basis to monitor our performance, and support important decisions. For example, with this information, it is possible to identify areas of the model that can benefit from improvement, and the accuracy of the models can be demonstrated to stakeholders.

Our CAT reports also enable us to check how changes that are made to the model affect the level of accuracy and how they can be modified accordingly. Knowing accuracy levels can help to identify any problems before they could become ongoing.

Above is an example graph taken from BreezoMeter’s CAT report.

These are our model’s hourly errors over a two week period from May 2018, for the pollutant ground level ozone O3 (parts per billion, ppb). The data is from the global air quality API, including 80+ countries. Each colored line represents a different statistic of the errors, and the grey bars represent the number of monitoring stations included in each calculation. 