Datasets description
The EUPP Benchmark is relying on forecast datasets to compare different postprocessing methods, but at the start of the project, not many available datasets were fit for this kind of activity. Indeed, the training and verification of postprocessing algorithms require pairs of forecasts and observations to be matched, both in space and time. Therefore, the activities of the Benchmark started with the development of specific datasets.
Currently there are 2 different datasets:
Base dataset
The base dataset was the first to be constructed, to cover a large portion of the European domain with IFS forecasts and ERA5 gridded reanalysis (used as gridded observations). This dataset is not used anymore directly during the benchmarking activities, but is still available online.
Data from the base dataset can only be downloaded from our climetlab plugin, with the particularity that only one forecast date can be downloaded at any given time (no parallel download).
Some specification of the dataset:
- The gridded main Eumetnet postprocessing benchmark dataset contains ECMWF ensemble and deterministic forecasts over a large portion of Europe, from 36 to 67° in latitude and from -6 to 17° of longitude, and covers the years 2017-2018.
- It also contains the corresponding ERA5 reanalysis for the purpose of providing observations for the benchmark.
- For some dates, it contains also reforecasts that covers 20 years of past forecasts recomputed with the most recent model version.
- All the forecasts and reforecasts provided are the noon ECMWF runs.
- The ensemble forecasts and reforecasts also contain by default the control run (the 0-th member).
- The gridded data resolution is 0.25° x 0.25° which corresponds roughly to 25 kilometers.
Dataset documentation: https://eupp-benchmark.github.io/EUPPBench-doc/files/base_datasets.html
EUPPBench dataset
TODO
Dataset documentation: https://eupp-benchmark.github.io/EUPPBench-doc/files/EUPPBench_datasets.html