Welcome to DIVA’s documentation!¶
DIVA: DestinE Intelligent Visualisation Artificial Intelligence¶
Demonstration¶
Explore DIVA’s capabilities through our online demonstration: → DIVA Demonstrator
For detailed documentation of DIVA, please refer to diva.destine.eu/doc
Overview¶
DIVA (DestinE Intelligent Visualisation Artificial Intelligence) is an innovative prompt-to-chart language model designed to assist journalists in illustrating climate change data. By converting natural language prompts into detailed visualizations, DIVA provides past and future localized climate information. This tool empowers journalists to create impactful graphics that enhance storytelling and engage audiences by providing both historical context and future projections of climate change scenarios.
Features¶
Prompt-to-Chart Conversion: Enter natural language prompts to generate detailed climate visualizations.
Localized Data: Access climate information specific to your region, allowing for precise and relevant reporting.
Future Projections: Illustrate potential climate scenarios, helping readers understand possible future changes.
Past Trends: Showcase historical climate data to provide context and depth to your reporting.
Multi-language Support: Select your preferred language from the menu on the left, making DIVA accessible to a global audience.
Interactive Graphics: Engage users with interactive visualizations that offer deeper insights into the data.
Variety of Chart Types: Choose from a range of chart types, including line charts, bar charts, histograms, color stripes, and box plots, to best represent your data.
Brief Summary of Displayed Graph: Automatically generate a concise summary of the graph just displayed, providing an overview of key insights.
Export Options: Export the visualisations as images or download the underlying data in CSV format for further analysis.
Example Use Case¶
Illustrating Climate Change Forecasts for Journalism¶
As a journalist, you can leverage DIVA to create visual representations of climate change forecasts for your articles. For example, you might want to illustrate how temperature and precipitation patterns are expected to change in a specific city over the next 50 years. With DIVA, you can easily generate charts that communicate these scenarios effectively, enhancing the storytelling of your piece and engaging your audience with visually compelling data.
Installation and Usage¶
DIVA relies on a large language model (Mistral-7B) and on climate datasets stored in NetCDF format, which can become sizable depending on the user’s query range (spatial and temporal). The following requirements ensure smooth local or server execution.
Quick Installation¶
After cloning the project, you can install and configure DIVA automatically using the installation script:
🔹 On Linux¶
bash setup_diva_linux.sh
🔹 On Windows¶
setup_diva_windows.bat (double-click)
This script will:
create the Python virtual environment
install all dependencies
generate the default configuration file (~/.diva/settings.ini)
guide you interactively through the required data downloads
set up environment variables needed for DIVA to run
If you prefer to perform each step manually, verify your system, or customize your installation, continue with the detailed instructions below.
1. Hardware Requirements¶
Component |
Recommended |
Notes |
|---|---|---|
GPU |
NVIDIA GPU with ≥ 11 GB VRAM |
DIVA’s Mistral-7B model typically uses ~10 GB of VRAM. Tested on NVIDIA L4 (24 GB) under CUDA 12.2. |
CUDA Toolkit |
≥ 11.8 (tested 12.2) |
Must match your driver version (e.g. 535.274.02) |
CPU |
4 cores (min), 8 + cores recommended |
Used for preprocessing and chart generation |
RAM |
16 GB min (16–32 GB recommended) |
Memory usage scales with dataset size (NetCDF loads) |
Storage |
250 GB free min (> 500 GB recommended) |
For model weights + temporary NetCDF data caching |
OS |
Linux (Ubuntu 20.04 / 22.04 tested) |
Any modern 64-bit Linux works |
Python |
3.10.x |
Check GPU availability¶
Before starting, verify CUDA and PyTorch detect your GPU:
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
If the output returns True and shows your GPU model (e.g. NVIDIA L4), your environment is ready.
2. Environment Setup¶
Before running DIVA, you’ll need to prepare your Python environment, install the required dependencies, and configure paths for both model and climate data.
2.1. Install dependencies¶
Clone the project and install all required Python packages listed in requirements.txt:
git clone <url_of_diva>
cd diva
<Create your environment (with conda, venv or other)>
pip install --upgrade pip
pip install -r requirements.txt
💡 This will install all necessary libraries, including: torch, transformers, streamlit, fastapi, xarray, geopandas, bitsandbytes, plotly, spacy, and more.
2.2. Download the LLM (Mistral-Nemo-Instruct)¶
DIVA uses the Mistral-Nemo-Instruct-2407-bnb-4bit model as its core LLM.
You can download it directly from Hugging Face using the huggingface_hub CLI:
pip install huggingface_hub
Then, download the model locally:
hf download unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit
💡 The model requires approximately 10 GB of VRAM in memory during inference. You can store it anywhere (e.g. /home/user/LLM_models/) and specify its path in your configuration file (see below).
2.3. Download ERA5 Climate Data¶
DIVA integrates both climate datasets (ERA5) and geospatial data (shapefiles for countries and cities). These resources are required for generating localized visualizations and performing spatial queries.
Example dataset structure expected by DIVA:
/home/<user>/data/
├── era5_past_t2m_sp_tg_fg10_1970_2023.nc
├── NUTS_RG_10M_2021_4326.shp
├── cities.shp
└── worldcities/
└── worldcities.csv
🧊 ERA5 Climate Data¶
Dataset: ERA5 hourly data on single levels from 1940 to present.
This dataset provides hourly estimates of key climate variables such as temperature, precipitation, wind speed, and surface pressure.
Data period: 1940 → present
Provider: Copernicus Climate Data Store (CDS)
Format: NetCDF (.nc)
Access: Files can be downloaded directly from the CDS web interface or using the CDS API.
💡 You can select your variables, geographic extent, and time range directly through the Copernicus Data Store UI. The total file size depends on the selected area and period.
⚠️ The size of ERA5 files can vary from a few hundred MBs to several GBs depending on the region and temporal range selected by the user.
🗺️ Shapefiles¶
DIVA also relies on two official geospatial datasets from Eurostat (GISCO):
Countries (NUTS): Territorial units for statistics (NUTS), used for country-level or regional aggregation. Available HERE.
Cities (LAU): Local Administrative Units (LAU), used for fine-grained geographic analysis (city-level data). Available HERE.
🗺️ worldcities dataset¶
DIVA also uses an additional global cities dataset to provide finer geographic resolution and enable location-based queries beyond Europe.
worldcities dataset: Use the “Basic” version, which is free and available HERE.
2.4. Create the configuration file¶
DIVA reads its settings from a configuration file located in:
~/.diva/settings.ini
If this file does not exist, create it manually:
mkdir -p ~/.diva
nano ~/.diva/settings.ini
Example configuration (sensitive values anonymized):
[paths]
path_data = /home/user/data/
shapefile_countries = NUTS_RG_10M_2021_4326.shp
shapefile_cities = cities.shp
worldcities = worldcities/worldcities.csv
path_model_mistral_nemo = /home/user/LLM_models/Mistral-Nemo-Instruct-2407-bnb-4bit
path_model_mistral7B = /home/user/LLM_models/Mistral_instruct_v01
[keycloak]
client_secret = *****
[data processing]
precipitation = {"era5_past_t2m_sp_tg_fg10_1970_2023.nc" : {"type_of_measure": "accum", "step_units": 1}}
[elasticsearch]
endpoint = https://<your-endpoint>/elasticsearch/
apikey = *****
datastream_name = <datastream_name>
[elasticsearch-dev]
endpoint = http://localhost:9200/
apikey = *****
datastream_name = <datastream_name>
[information]
last_release_date = 27-06-2025
[users]
can_see_prompts = 'user1','user2'
[monitoring]
authorized_IP = 'xx.xx.xx.xx','yy.yy.yy.yy'
💡 The settings.ini file is read automatically at startup. Make sure the path_model_* and path_data entries point to valid locations on your system
Launching the Project¶
To launch the DIVA service, navigate to the project’s root directory and execute the following command:
cd src
python -m streamlit run diva/gui/run_app.py
Example Conversation¶
🙋: Show me the temperature from January to September 2016.
🖥️: Thank you for your request. I can give you a graphical view of the temperature. Could you please specify where you want to make the study?
🙋: I want it in Frascati.
🖥️: #Display Graph
🖥️: #Display Graph
This conversation illustrates the interactive nature of DIVA, allowing users to request specific data visualizations and receive tailored results.
Available data¶
Four variables are available in DIVA :
Temperature (2m from the surface)
Wind Speed
Precipitation
Surface Pressure
The historical data is extracted for now from ERA5 :
https://cds.climate.copernicus.eu/datasets/reanalysis-era5-single-levels?tab=overview
The dataset reanalysis-era5-single-levels contains multiple variables about the climate from 1940 to present, although diva only uses data from 1970 to present. The data are provided on a regular latitude-longitude grid with daily update frequency.
The available zone:
Longitude : from -14.0 to 45.7
Latitude : from 35.0 to 66.70
(A map showing this zone is available from the app (Information tab))
Used model¶
Diva only uses open-source european large language models. The current version relies especially on Mistral-7B-v1-Instruct and on the NLP libraries SpaCy and NLTK.
License¶
This project is licensed under the Apache License 2. See the LICENSE file for more details.