Professor of Data Science for Crop Systems
Date

Satellite gravimetry observations as provided by the GRACE and GRACE-FO missions have revolutionized our understanding of the global water cycle under climate change. However, the available time span of slightly more than 20 years is still relatively short for isolating long-term climate related signals such as trends or changes in the frequency of extreme events, as the water storage time series might still be masked by dominating interannual (natural) variations. Therefore, several approaches have been introduced in recent years to extend the GRACE/-FO data record into the past by exploiting additional data sets and innovative methodology, e.g. based on machine learning approaches.
In this study, we present a reconstruction of terrestrial water storage observed by GRACE (TWS) using a modified spatio-temporal graph neural network tailored for multivariate time series (Wu et. al 2020). Input features include multiple climate and hydrological variables from the ERA5 reanalysis, such as precipitation, evapotranspiration and runoff. The model architecture combines graph convolution modules to capture spatial dependencies and temporal convolution modules to learn
temporal patterns. A designed adjacency matrix encodes relationships between regions based on both geographic distance and similarity in the historical time series. In contrast to traditional deep learning approaches that rely on large input matrices, our method exploits the inherent efficiency of graph-based data structures by explicitly encoding time series data for each feature within individual nodes.
Lara Johannsen, Lukas Arzoumanidis, Youness Dehbi, Annette Eicker
HafenCity University Hamburg, Germany

Spatial data are characterised by intrinsic properties such as autocorrelation, scale dependence, and high heterogeneity. These characteristics require explicit consideration in the development of data-driven models. With the growing availability of spatial data sets, such as those derived from remote sensing, natural images, or textual sources with a spatial reference, not only the volume increases but also the variety of potentially combinable information. Currently, manual labelling of such data for specific tasks is resource intensive. Self-supervised learning provides a viable approach in this context: it facilitates the automatic generation of suitable representations from large volumes of unlabelled data and serves as a foundation for adaptable, cross-domain downstream tasks similar to foundation models.
Approaches such as SatCLIP, GeoCLIP, and CSP have effectively illustrated the contrastive coupling of two modalities, namely image data and spatial positioning, within a unified embedding space. Although these binary models succeed in mitigating the necessity for labelling, their representational capacity remains constrained, as they encapsulate only a portion of the accessible locational data.
This study explores the integration of multiple modalities into a shared embedding space through contrastive learning, utilising position as a binding element. The modalities include texts from Wikipedia, multispectral Sentinel-2 imagery, and geolocations. To achieve this, three architectural models are developed: (1) an adapted version of ImageBind, (2) cyclic training employing a fixed position encoder, and (3) a modified NT-Xent loss that fuses data pairs from diverse modalities.
Preliminary training with the text dataset reveals that the generated representations exhibit properties distinct from those of image-focused models. The evaluation is based on classification and regression-based downstream tasks in order to systematically analyse the representation quality, generalisation ability and information content of multimodal spatial embeddings.