A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005–2022

Guillaume Barbalat, Ian Hough, Michael Dorman, Johanna Lepeule, Itai Kloog

Research output: Contribution to journalArticlepeer-review


Understanding and managing the health effects of Nitrogen Dioxide (NO2) requires high resolution spatiotemporal exposure maps. Here, we developed a multi-stage multi-resolution ensemble model that predicts daily NO2 concentration across continental France from 2005 to 2022. Innovations of this work include the computation of daily predictions at a 200 m resolution in large urban areas and the use of a spatio-temporal blocking procedure to avoid data leakage and ensure fair performance estimation. Predictions were obtained after three cascading stages of modeling: (1) predicting NO2 total column density from Ozone Monitoring Instrument satellite; (2) predicting daily NO2 concentrations at a 1 km spatial resolution using a large set of potential predictors such as predictions obtained from stage 1, land-cover and road traffic data; and (3) predicting residuals from stage 2 models at a 200 m resolution in large urban areas. The latter two stages used a generalized additive model to ensemble predictions of three decision-tree algorithms (random forest, extreme gradient boosting and categorical boosting). Cross-validated performances of our ensemble models were overall very good, with a ten-fold cross-validated R2 for the 1 km model of 0.83, and of 0.69 for the 200 m model. All three basis learners participated in the ensemble predictions to various degrees depending on time and space. In sum, our multi-stage approach was able to predict daily NO2 concentrations with a relatively low error. Ensembling the predictions maximizes the chance of obtaining accurate values if one basis learner fails in a specific area or at a particular time, by relying on the other learners. To the best of our knowledge, this is the first study aiming to predict NO2 concentrations in France with such a high spatiotemporal resolution, large spatial extent, and long temporal coverage. Exposure estimates are available to investigate NO2 health effects in epidemiological studies.

Original languageEnglish
Article number119241
JournalEnvironmental Research
StatePublished - 15 Sep 2024


  • 200 m resolution
  • Daily predictions
  • Decision-tree
  • Nitrogen dioxide
  • Spatio-temporal blocking
  • Spatio-temporal modeling


Dive into the research topics of 'A multi-resolution ensemble model of three decision-tree-based algorithms to predict daily NO2 concentration in France 2005–2022'. Together they form a unique fingerprint.

Cite this