Base datasets over Europe’s domain
These are the global datasets available on a large portion of Europe which constitute the base to develop more specific benchmark datasets.
Warning
Access to the forecasts and observations is currently time-granular: these datasets cannot be sliced over the issuance time dimension.
Datasets description
There are two main datasets:
1 - Gridded Data
The gridded main Eumetnet postprocessing benchmark dataset contains ECMWF ensemble and deterministic forecasts over a large portion of Europe, from 36 to 67° in latitude and from -6 to 17° of longitude, and covers the years 2017-2018.
It also contains the corresponding ERA5 reanalysis for the purpose of providing observations for the benchmark.
For some dates, it contains also reforecasts that covers 20 years of past forecasts recomputed with the most recent model version.
All the forecasts and reforecasts provided are the noon ECMWF runs.
The ensemble forecasts and reforecasts also contain by default the control run (the 0-th member).
The gridded data resolution is 0.25° x 0.25° which corresponds roughly to 25 kilometers.
Please note that you can presently only retrieve one forecast date for each
climetlab.load_dataset
call.
There are 8 gridded sub-datasets:
1.1 - Extreme Forecast Index
All the Extreme Forecast Index (EFI) variables can be obtained for each forecast date.
It includes:
Parameter name |
ECMWF key |
Remarks |
---|---|---|
2ti |
||
10wsi |
||
10fgi |
||
capei |
||
capesi |
||
mx2ti |
||
mn2ti |
||
sfi |
||
tpi |
The EFI are available for the model step range (in hours) 0-24, 24-48, 48-72, 72-96, 96-120, 120-144 and 144-168.
Usage: The EFI variables can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-efi', date, parameter)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys described above. Setting 'all'
as parameter
download all
the EFI parameters.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-efi', "2017-12-02", "2ti")
ds.to_xarray()
Note
By definition, observations are not available for Extreme Forecast Indices (EFI).
1.2 - Surface variable forecasts
The surface variables can be obtained for each forecast date, both for the ensemble (51 members) and deterministic runs.
It includes:
Parameter name |
ECMWF key |
Remarks |
---|---|---|
2t |
||
10u |
||
10v |
||
tcc |
||
100ua |
Observations not available |
|
100va |
Observations not available |
|
cape |
||
stl1 |
||
tcw |
||
tcwv |
||
swvl1 |
||
sd |
||
cin |
Observations not available |
|
vis |
Observations not available |
Some missing observations will become available later.
The forecasts are available for the model steps (in hours) 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 93, 96, 99, 102, 105, 108, 111, 114, 117, 120, 123, 126, 129, 132, 135, 138, 141, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234 and 240. All the steps are automatically retrieved.
Usage: The surface variables forecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-surface', date, parameter, kind)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys described above. Setting 'all'
as parameter
download all
the surface parameters. The kind
argument allows to select the
deterministic or ensemble forecasts, by setting it to 'highres'
or
'ensemble'
.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-surface', "2017-12-02", "sd", "highres")
ds.to_xarray()
1.3 - Pressure level variable forecasts
The variables on pressure level can be obtained for each forecast date, both for the ensemble (51 members) and deterministic runs.
It includes:
Parameter name |
Level |
ECMWF key |
Remarks |
---|---|---|---|
850 |
t |
||
700 |
u |
||
700 |
v |
||
500 |
z |
||
700 |
q |
||
850 |
r |
The forecasts are available for the same model steps as the surface variables above.
Usage: The pressure level variables forecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-pressure', date, parameter, level, kind)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys described above. Setting 'all'
as parameter
download all
the parameters at the given pressure level. The level
argument is
the pressure level, as a string or an integer. The kind
argument
allows to select the deterministic or ensemble forecasts, by setting it
to 'highres'
or 'ensemble'
.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-pressure', "2017-12-02", "z", 500, "highres")
ds.to_xarray()
1.4 - Processed surface variable forecasts
Processed surface variables can be obtained for each forecast date, both for the ensemble (51 members) and deterministic runs. A processed variable is either accumulated, averaged or filtered.
It includes:
Parameter name |
ECMWF key |
Remarks |
---|---|---|
tp |
||
sshf |
||
slhf |
||
ssr |
||
str |
||
cp |
||
mx2t6 |
||
mn2t6 |
||
ssrd |
||
strd |
||
10fg6 |
All these variables are accumulated or filtered over the last 6 hours preceding a given forecast timestamp. Therefore, the forecasts are available for the model steps (in hours) 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96, 102, 108, 114, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234 and 240. All the steps are automatically retrieved.
Usage: The processed surface variables forecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-surface-processed', date, parameter, kind)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys described above. The kind
argument allows to select the
deterministic or ensemble forecasts, by setting it to 'highres'
or
'ensemble'
.
Note
For technical reason, most fields cannot be retrieved
along the others and must be downloaded alone. E.g. a request with
parameter=['tp', 'mx2t6']
will fail while one with
parameter='tp'
will succeed.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-forecasts-surface-processed', "2017-12-02", "mx2t6", "highres")
ds.to_xarray()
1.5 - Surface variable reforecasts
The surface variables for the ensemble reforecasts (11 members) can be obtained for each reforecast date. All the variables described at the point 1.2 above are available.
The reforecasts are available for the model steps (in hours) 0, 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, 96, 102, 108, 114, 120, 126, 132, 138, 144, 150, 156, 162, 168, 174, 180, 186, 192, 198, 204, 210, 216, 222, 228, 234 and 240. All the steps are automatically retrieved.
Note
The ECMWF reforecasts are only available Mondays and Thursdays. Providing any other date will fail.
Usage: The surface variables reforecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-surface', date, parameter)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys. Setting 'all'
as parameter
download all the surface
parameters.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-surface', "2017-12-28", "sd")
ds.to_xarray()
1.6 - Pressure level variable reforecasts
The variables on pressure level for the ensemble reforecasts (11 members) can be obtained for each reforecast date All the variables described at the point 1.3 above are available.
The reforecast are available for the same model steps as the surface variables above.
Note
The ECMWF reforecasts are only available Mondays and Thursdays. Providing any other date will fail.
Usage: The pressure level variables reforecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-pressure', date, parameter, level)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys. Setting 'all'
as parameter
download all the parameters at
the given pressure level. The level
argument is the pressure level,
as a string or an integer.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-pressure', "2017-12-28", "z", 500)
ds.to_xarray()
1.7 - Processed surface variable reforecasts
Processed surface variables as described in section 1.4 can also be obtained as ensemble reforecasts (11 members).
The reforecast are available for the same model steps as the surface variables described in section 1.5.
Note
The ECMWF reforecasts are only available Mondays and Thursdays. Providing any other date will fail.
Usage: The surface variables forecasts can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-surface-processed', date, parameter)
ds.to_xarray()
where the date
argument is a string with a single date, and the
parameter
argument is a string or a list of string with the ECMWF
keys.
Note
For technical reason, most fields cannot be retrieved
along the others and must be downloaded alone. E.g. a request with
parameter=['tp', 'mx2t6']
will fail while one with
parameter='tp'
will succeed.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-reforecasts-surface-processed', "2017-12-28", "mx2t6")
ds.to_xarray()
1.8 - Static fields
Various static fields associated to the forecast grid can be obtained, with the purpose of serving as predictors for the postprocessing.
Note
For consistency with the rest of the dataset, we use the ECMWF parameters name, terminology and units here. However, please note that - except for the Surface Geopotential - the fields provided are from other non-ECMWF data sources evaluated at grid points. Currently, the main data source being used is the Copernicus Land Monitoring Service.
It includes:
Parameter name |
ECMWF key |
Remarks |
---|---|---|
landu |
Extracted from the CORINE 2018 dataset. Values and associated land type differ from the ECMWF one. Please look at the “legend” entry in the metadata for more details. |
|
mterh |
Extracted from the EU-DEMv1.1 data elevation model dataset. |
|
z |
The model orography can be obtained by dividing the surface geopotential by g=9.80665 ms \({}^{-2}\). |
Usage: The static fields can be retrieved by calling
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-static-fields', parameter)
ds.to_xarray()
where the parameter
argument is a string with one of the ECMWF keys
described above. It is only possible to download one static field per
call.
Example:
ds = cml.load_dataset('eumetnet-postprocessing-benchmark-training-data-gridded-static-fields', 'mterh')
ds.to_xarray()
2 - Stations Data
Not yet provided.
3 - Getting the observations corresponding to the (re)forecasts
Once obtained, the observations corresponding to the forecasts or reforecasts
(if available) can be retrieved in the
xarray format by
using the get_observations_as_xarray
method:
obs = ds.get_observations_as_xarray()
4 - Explanation of the metadata
The following metadata are available in the gridded forecast, reforecast and observation data:
latitude: The latitude of the grid points.
longitude: The longitude of the grid points.
depthBelowLandLayer: the layer below the surface (valid for some variables only, here there is only the upper surface level).
number: the number of the ensemble member. The 0-th member is the control run. Also present in observation, but set to 0.
time: the forecast or reforecast date (reforecasts are only issued on Mondays and Thursdays).
year: a dimension to identify the year in the past, year=1 means a forecast valid 20 years ago at the reforecast day and month, year=20 means a forecast valid one year before the reforecast date. Only valid for reforecasts.
step: the step of the forecast (the lead time).
surface: the layer of the variable considered (here there is just one, at the surface).
valid_time: the actual time and date of the corresponding forecast data.
5 - Major ECMWF model changes
In 2017 and 2018, there were 2 model changes of the ECMWF model on total:
Implementation date |
Summary of changes |
Resolution |
Full IFS documentation |
---|---|---|---|
05-Jun-2018 |
Unchanged |
||
11-Jul-17 |
Unchanged |
Source: https://www.ecmwf.int/en/forecasts/documentation-and-support/changes-ecmwf-model
Data License
See the DATA_LICENSE file.