Data Management in DIVA¶
This document describes the process of handling data for the DIVA chatbot, such as downloading and processing climate and geographical data.
Setting up the python environment¶
Before starting, ensure that the required Python environment is prepared according to the instructions found in the “How to Use DIVA” documentation.
Install the necessary libraries by adding the following imports to your Python script:
import xarray as xr
import os
import geopandas as gpd
import pandas as pd
Downloading climate data from SesameO¶
ERA5 data are accessed through the Sesameo platform, which provides a unified interface to explore and retrieve datasets available in the DestinE Data Lake.
Sesameo allows users to:
Browse datasets from Copernicus and Destination Earth
Filter by variables, time range, and geographic region
Access data via web download or API endpoints
To access the ERA5 collection:
Open the ERA5 dataset page on Sesameo: https://sesameo.destine.eu/collections/reanalysis-era5-single-levels
Log in with your Destine credentials
In the dataset interface, you can explore the metadata and available variables
Use the search and filter panel to define the subset of data you need: variables, temporal range, spatial region
After selecting the dataset, SesameO provides different access methods:
Direct Download
API access
Downloading climate data from “Cache B”¶
(!) Access to this data is conditioned to special access authorization. Please refer to DestinE’s data policy for more details: https://destine-data-lake-docs.data.destination-earth.eu/en/latest/dedl-discovery-and-data-access/DestinE-Data-Policy-for-DestinE-Digital-Twin-Outputs/DestinE-Data-Policy-for-DestinE-Digital-Twin-Outputs.html
The climate data is stored in “Cache B” as NetCDF files and can be read using the xarray library in Python. Below is an example of available collections in Cache B:
cache_b_collections = [
"https://cacheb.dcms.destine.eu/era5/reanalysis-era5-land-no-antartica-v0.zarr",
"https://cacheb.dcms.destine.eu/era5/reanalysis-era5-single-levels-v0.zarr",
"https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-sfc-v0.zarr",
"https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-o2d-v0.zarr",
"https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-IFS-NEMO-0001-high-pl-v0.zarr",
"https://cacheb.dcms.destine.eu/d1-climate-dt/ScenarioMIP-SSP3-7.0-ICON-0001-high-sfc-v0.zarr",
]
You can load a collection using xarray as follows:
da = xr.open_dataset(
cache_b_collections[0],
engine='zarr',
storage_options={"client_kwargs": {"trust_env": "true"}},
chunks={}
)
Once the dataset is loaded, you can select specific variables of interest. For example:
da = da[['tp']] # Selecting the variable 'tp' (total precipitation)
Viewing and Saving Data Locally
At this stage, the data is not downloaded but simply “viewed.” To save the data locally, you can use the following method:
path_export = "/path/to/netcdf_file.nc"
data.to_netcdf(path_export)
The saved NetCDF file can be easily reopened later:
data = xr.open_dataset(path_export)
Downloading shapefiles for geographical zones¶
Shapefiles are used to define geographical zones (e.g., cities, countries). While shapefiles can be sourced from various platforms online, for this project, we use Eurostat’s resources: Eurostat GISCO
Once downloaded, the shapefiles can be loaded into Python using the geopandas library:
shapefile = gpd.read_file(path_data)
The loaded shapefile is similar to a pandas DataFrame, allowing you to perform standard operations for data processing.