xdggs-dggrid4py: A xdggs plugin for IGEO7 DGGS
xdggs-dggrid4py aims to provides IGEO7 DGGS indexing for xarray datasets through xdggs. It consists of 2 main functions:
Easy access to IGEO7 indexed xarray datasets, including query the datasets with points in WGS84, or zone ID.
Convert raster dataset in GeoTiff format to IGEO7 indexed xarray datasets
Converting a GeoTiff into an IGEO7 DGGS indexed xarray dataset
There are many ways to convert a GeoTiff into a DGGS indexed dataset. The main idea is to assign a pixel with a DGGS zone ID.
Nearest zone centroid: Assign the pixel to the nearest zone centroid, measured by the CRS supported by the DGGS. For IGEO7 DGGS, it uses WGS84.
Zonal statistics: Assign pixels that are within a zone of the DGGS.
Weighted average: Overlay the DGGS zone’s grid on the raster grid, then, for each zone, calculate the percentage of area overlapped with pixels as the weights. Finally, calculate the zone’s value using the weights and raster value.
General steps of the nearest-neighbourhood method.
This notebook demonstrates conversion using the nearest neighbourhood method with IGEO7 DGGS.
The first step is to determine the refinement level to use. It affects the number of rows of the dataset after conversion.
For example, if the refinement level approximately matches the spatial resoultion (in square meters), then the expected number of rows in the resulting dataset should be roughly equal to the number of zones of the region being converted. But if a coarser refinement level is used, the number of rows in the converted result should equal the number of pixels.
Then, generate the zone centroids together with ID that exist in the region of the GeoTiff
Assign each pixel to the nearest zone by measuring the distance between the pixel and zone centroids.
Demonstration
Install the package from pypi or install it from GitHub
1# install it from pypi :
2# !pip install xdggs-dggrid4py
3# or install it from GitHub for the lastest commits.
4# pip install git+https://github.com/LandscapeGeoinformatics/xdggs-dggrid4py.git
5
6#At the time of preparing this notebook, a bug was found in dggrid4py (https://github.com/allixender/dggrid4py/issues/46)
7# that affect some functions of xdggs (e.g. `sel_lonlat`). It was fixed in the main branch, but has not yet been built into a release.
8# It needs to install the dggrid4py from git.
9#!pip uninstall -y dggrid4py
10#!pip install git+https://github.com/allixender/dggrid4py.git
Import libraries
1import xarray as xr
2import rioxarray as rio
3import os
4import warnings
5warnings.filterwarnings("ignore")
6
7# xdggs-dggrid4py require both dggrid4py and DGGRID installed
8# export the path of DGGRID binary with the environment variable "DGGRID_PATH"
9os.environ['DGGRID_PATH']= "/home/dick/micromamba/envs/xdggs/bin/dggrid"
Using rioxarray to load a GeoTiff into xarray dataset
In this example, we use a sample collection prepared by the Landscape and Geoinformatics Lab of the University of Tartu, which is available on Zenodo.
The collection consists of three GeoTiff files (DEM, TWI, Slope) located around Elva, Estonia.
1# Using DEM sample to demonstrate the conversion, you can also try with others sample using the following links
2# Slope: https://zenodo.org/records/18877265/files/est_topo_slope_10m_clipped.tif?download=1
3# TWI: https://zenodo.org/records/18877265/files/est_topo_twi_10m_clipped.tif?download=1
4
5tiff_path = "https://zenodo.org/records/18877265/files/est_topo_dem_10m_clipped.tif?download=1"
6raster_ds = rio.open_rasterio(tiff_path, band_as_variable=True, masked=True)
7raster_ds
<xarray.Dataset> Size: 8MB
Dimensions: (x: 1779, y: 1081)
Coordinates:
* x (x) float64 14kB 6.334e+05 6.335e+05 ... 6.512e+05 6.512e+05
* y (y) float64 9kB 6.468e+06 6.468e+06 ... 6.457e+06 6.457e+06
spatial_ref int64 8B 0
Data variables:
band_1 (y, x) float32 8MB ...
Attributes:
AREA_OR_POINT: AreaUsing xdggs-dggrid4py regridding module to convert Raster dataset with the nearest zone centroid method.
xdggs-dggrid4py follows the general steps described above for the nearest centroid method, it used S2PointIndex from pys2Index to calculate the distance between pixel coordinates and the zone centroids in WGS84.
Import the conversion method
xdggs-dggrid4py.regridding.centroid_based.nearestcentroid: nearest centroid method implementationxdggs-dggrid4py.regridding.centroid_based.mapblocks_nearestcentroid: nearest centroid method implementation for parallelism
Import the regridding method
xdggs-dggrid4py supports conversion with parallelism or not.
xdggs_dggrid4py.regriddingregridding: perform regridding with a single processmapblocks_regridding: perform regridding with mapblocks parallelism
Mapblocks parallelism
To run conversion in parallel, it first partitions the GeoTiff into smaller blocks. Partition is done using xarray chunks; each block is then processed individually, as described in the general steps. Then, it combines the results from each block to form the final dataset to return. The parallelism is achieved using dask.array.map_blocks.
Input parameters for regridding:
ds: the xarray dataset to be convertedgrid_name: the name of DGGS to be used for conversion.IGEO7method: method that used for regriddingcoordinates: the coordinate variables name of the ds in list. Default to[x, y]original_crs: CRS in wkt format. xdggs-dggrid4py will use the CRS fromcrs_wktif it exist in the spatial_ref ofds. Otherwise, using the wkt supplied. Defualt toNone.refinement_level: An integer to indicate at which refinement level to convert. Default to-1If set to
-1. It automatically calculates the finest refinement level that matches the spatial resolution. It is done by calculating the surface area of a sphere that represents the GeoTiff region, then dividing that area by the number of pixels to obtain the average area per pixel. The average area per pixel is used to determine the refinement level to be used.
wgs84_geodetic_conversion: Default toTrue.convert the input clip bounds from WGS84 geodetic coordinates to the authalic sphere coordinates.
convert hexagon centroids from the authalic sphere to WGS84 geodetic before assigning pixels to zone centroids.
dggs_vert0_lon: The longitude of the initial vertex. It is set to 11.20 by default; users can change it to other values by passing the parameter to the function. Default to11.20zone_id_repr: Which representation is used for the zone ID, it supports :['int', 'hexstring', 'textual'], default to ‘int’.
1%%time
2
3# To perform conversion using nearest centroid mehtod as a single process
4refinement_level=11
5from xdggs_dggrid4py.regridding.methods.centroid_based import nearestcentroid
6from xdggs_dggrid4py.regridding import regridding
7igeo7_indexed_ds = regridding(ds=raster_ds,
8 grid_name='igeo7',
9 method='nearestcentroid',
10 refinement_level=refinement_level,
11 wgs84_geodetic_conversion=True,
12 zone_id_repr='int')
13
14# To perform conversion using nearest centroid mehtod with mapblocks parallelism
15# Try with auto finer refinement level (-1), uncomment below to run
16#
17#from xdggs_dggrid4py.regridding.methods.centroid_based import mapblocks_nearestcentroid
18#from xdggs_dggrid4py.regridding import mapblocks_regridding
19#refinement_level=-1
20#raster_ds = raster_ds.chunk({'x':200, 'y':200}) # create partitions using chunks
21#igeo7_indexed_ds = mapblocks_regridding(ds=raster_ds,
22# grid_name='igeo7',
23# method='mapblocks_nearestcentroid',
24# refinement_level=refinement_level,
25# wgs84_geodetic_conversion=True,
26# zone_id_repr='int')
27igeo7_indexed_ds
Registered regridding method centerpoint
Registered regridding method nearestcentroid
Registered regridding method mapblocks_nearestcentroid
xdggs_dggrid4py.utils area of extent (km^2): 206.60011408860592
xdggs_dggrid4py.utils average area per square grid (km^2): 0.00010743082602019237
CPU times: user 12.4 s, sys: 456 ms, total: 12.8 s
Wall time: 15.6 s
<xarray.Dataset> Size: 23MB
Dimensions: (zone_id: 1923099)
Coordinates:
* zone_id (zone_id) int64 15MB 18804137500082175 ... 18669025747795967
spatial_ref int64 8B 0
Data variables:
band_1 (zone_id) float32 8MB 39.01 39.21 39.33 ... 109.7 110.0 110.2
Attributes:
AREA_OR_POINT: AreaNow, the 2D dataset is converted into a 1D dataset indexed by IGEO7 DGGRS. The related grid info is stored as the zone_id attributes.
1# Apply mean aggregation to the dataset by zone ID
2igeo7_indexed_ds = igeo7_indexed_ds.groupby('zone_id').mean()
3igeo7_indexed_ds
<xarray.Dataset> Size: 92kB
Dimensions: (zone_id: 7664)
Coordinates:
* zone_id (zone_id) int64 61kB 18667920867459071 ... 18805349754601471
spatial_ref int64 8B 0
Data variables:
band_1 (zone_id) float32 31kB 73.59 73.35 74.41 ... 68.39 68.5 68.68
Attributes:
AREA_OR_POINT: AreaOptional, create multi-refinement level dataset using IGEO7 z7 hierarchical structure
z7 indexing provides a hierarchical structure between refinement levels, known as the 1 to 7 parent-child hierarchy. Using the zone ID, it is easy to calculate the parent zone at a coarser refinement level it belongs to. Therefore, using this relation, it is simple to perform aggregation at different refinement levels for a multi-refinement level dataset.
1import numpy as np
2# function to get the parent zone id of the input zone ID
3def get_parent_z7int(z7_int_zone_id, refinement_level=int):
4 binary_repr = np.binary_repr(z7_int_zone_id, width=64)
5 binary_repr = binary_repr[:(refinement_level*3)+4]
6 binary_repr = binary_repr.ljust(64, '1')
7 return int(binary_repr,2)
1# Uisng xarray datatree to store dataset at each refinement level as a group
2ds_datatree = xr.DataTree(name="Mutli_refinement_levels_datatree")
3
4import time
5import zarr
6encode = {}
7compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
8for rf_level in range(1, refinement_level+1):
9 s = time.time()
10 #print(f'working on refinement level {rf_level}')
11 tmp = igeo7_indexed_ds.copy()
12 tmp['zone_id'] = xr.apply_ufunc(get_parent_z7int, tmp.zone_id, dask='parallelized', vectorize=True,
13 kwargs={'refinement_level': rf_level})
14 tmp = tmp.groupby('zone_id').mean(skipna=True)
15 tmp = tmp.chunk('auto')
16 enc = {x: {"compressor": compressor} for x in list(tmp.data_vars.keys())}
17 enc.update({"zone_id": {"compressor": compressor}})
18 ds_datatree = ds_datatree.assign({f"refinement_level_{rf_level}": xr.DataTree(dataset=tmp)})
19 encode.update({f"/refinement_level_{rf_level}": enc})
20 print(f'done refinement level {rf_level} ({time.time()-s})')
21ds_datatree.to_zarr(f'./multi_refinement_level_ds.zarr', encoding=encode)
done refinement level 1 (0.014177799224853516)
done refinement level 2 (0.024408817291259766)
done refinement level 3 (0.03421354293823242)
done refinement level 4 (0.008975744247436523)
done refinement level 5 (0.008719444274902344)
done refinement level 6 (0.008643150329589844)
done refinement level 7 (0.008930683135986328)
done refinement level 8 (0.009763240814208984)
done refinement level 9 (0.010070085525512695)
done refinement level 10 (0.010570526123046875)
done refinement level 11 (0.011226892471313477)
<xarray.backends.zarr.ZarrStore at 0x7f2206e1b880>
Accessing IGEO7 indexed DGGS xarray datasets using xdggs-dggrid4py
After the conversion is done, the dataset is indexed with IGEO7 DGGS. However, the index doesn’t mean anything to others. Therefore, xdggs-dggrid4py provides the IGEO7Index as an interface between the user and the IGEO7 DGGS index. It is implemented as a plugin of xdggs.
1# import xdggs and register IGEO7Index by importing the IGEO7Index
2import xdggs
3from xdggs_dggrid4py.index import IGEO7Index
1# since the xdggs convention uses the name `cell_ids` as the DGGS zone ID, we have to rename the `zone_id` to `cell_ids`
2igeo7_indexed_ds = igeo7_indexed_ds.rename({'zone_id': 'cell_ids'})
Create a IGEO7Index instance for the IGEO7 DGGS dataset
Invoke the xdggs.decode function to create an IGEO7Index index object for the dataset
From the output, the
decodefunction created an IGEO7Index object using the attributes fromcell_ids(zone_id)
1igeo7_indexed_ds = igeo7_indexed_ds.pipe(xdggs.decode)
2igeo7_indexed_ds
<xarray.Dataset> Size: 92kB
Dimensions: (cell_ids: 7664)
Coordinates:
* cell_ids (cell_ids) int64 61kB 18667920867459071 ... 18805349754601471
spatial_ref int64 8B 0
Data variables:
band_1 (cell_ids) float32 31kB dask.array<chunksize=(144,), meta=np.ndarray>
Indexes:
cell_ids IGEO7Index(level=11)
Attributes:
AREA_OR_POINT: AreaUsing IGEO7Index to access the dataset
xdggs provides several accessor functions through
<dataset>.dggssel_latlon:cell_boundaries:explore:
1lats = [58.28459946, 58.27637829, 58.26980135]
2lons = [26.41758320, 26.42799669, 26.36551576]
3igeo7_indexed_subset = igeo7_indexed_ds.dggs.sel_latlon(lats, lons)
4igeo7_indexed_subset
[########################################] | 100% Completed | 103.37 ms
[########################################] | 100% Completed | 203.46 ms
<xarray.Dataset> Size: 44B
Dimensions: (cell_ids: 3)
Coordinates:
* cell_ids (cell_ids) int64 24B 18802036187332607 ... 18801924518182911
spatial_ref int64 8B 0
Data variables:
band_1 (cell_ids) float32 12B dask.array<chunksize=(3,), meta=np.ndarray>
Indexes:
cell_ids IGEO7Index(level=11)
Attributes:
AREA_OR_POINT: Area1%%time
2igeo7_indexed_ds['band_1'].compute().dggs.explore()
[########################################] | 100% Completed | 433.60 ms
[########################################] | 100% Completed | 533.49 ms
[########################################] | 100% Completed | 4.43 ss
[########################################] | 100% Completed | 4.44 s
CPU times: user 5.18 s, sys: 101 ms, total: 5.28 s
Wall time: 5.51 s