SpectralEarth is a large-scale multi-temporal dataset designed to address the current lack of comprehensive, globally representative hyperspectral datasets, which has limited the development of foundation models in hyperspectral remote sensing. It is intended for users from the AI and data mining domain to develop self-supervised and unsupervised learning algorithms on hyperspectral imagery.
The dataset leverages data from the Environmental Mapping and Analysis Program (EnMAP), a German hyperspectral satellite mission that monitors and characterizes Earth's environment on a global scale. EnMAP delivers accurate data that provides information on the status and evolution of terrestrial and aquatic ecosystems, supporting environmental monitoring, management, and decision-making.
SpectralEarth comprises 538,974 non-overlapping, non-georeferenced image patches extracted from 11,636 globally distributed EnMAP L2A scenes collected between April 2022 and April 2024. The patches span 415,153 unique locations and are 128×128 pixels in size, with 202 spectral bands each (after excluding bands affected by water absorption). All tiles were selected via manual visual inspection, with cloud coverage kept below ~10%. Additionally, 17.5% of the locations include multiple timestamps, enabling multi-temporal hyperspectral analysis.
Labeled subsets are provided for downstream tasks covering land cover, crop type, and tree species classification. These are based on various geospatial products, including
- CORINE Land Cover
- Cropland Data Layer (CDL)
- National Land Cover Database (NLCD)
- TreeMap, a tree species map of the continental US
- BD-Foret V2, a forest inventory product from the Institut Géographique National (IGN) France
- EuroCrops, derived from national agricultural inventories across European countries
- BNETD Land Cover Map for Ivory Coast
Two additional benchmarks are based on hyperspectral data from the DESIS and EO-1 Hyperion sensors, matched to CDL for crop-type classification. These labeled subsets serve as benchmarks for evaluating models pretrained on SpectralEarth.
References:
Contacts:
Properties:
Items:
The SpectralEarth dataset collection is a large-scale multi-temporal dataset designed for users from the AI and data mining domain to pretrain hyperspectral foundation models. Only available in our Downloadservice.
Related Datasets:
2022-10-10
2022-10-10