NEON and GBIF: Partners in Biodiversity
September 8, 2021
Looking for some highly standardized U.S. zooplankton data to compare with other samples across the world? Maybe you are interested in studying whiskers from deer mice or DNA from mosquitos dispersed globally?
Each year, NEON collects and archives more than 100,000 biological specimens and samples that complement the field observations and automated measurements collected at our field sites. These samples represent a rich resource unique among natural history collections due to NEON's continental- and decadal-scale ecology. These archival samples are available upon request from the NEON Biorepository—and now, NEON biological sample data can be found in the Global Biodiversity Information Facilities (GBIF) network. The partnership allows NEON data to be discovered and used alongside similar historical and global datasets. It's a partnership that benefits both current NEON data users and the international science community at large.
GBIF: Research Infrastructure on a Global Scale
GBIF is an international network and research infrastructure funded by the world's governments and aimed at providing anyone, anywhere, open access to data about all types of life on Earth. GBIF is a bit like Google for biodiversity data; it aggregates multiples sources of biodiversity data into one centralized, searchable data repository. So, if you want to see all available mosquito records from all participating networks, for example, a quick search will return 1.4 million records with associated GPS coordinates.
As an aggregator, GBIF provides a wealth of options for researchers. More than 2000 publishers have contributed data to GBIF, including many natural history museums and conservation societies around the planet. These records are combined into a rich global database with more than 61,000 datasets, from citizen-scientist generated eBird observations to geographically-tagged DNA sequences from the European Nucleotide Archive.
Data from the NEON Biorepository has joined this rich trove of ecological datasets. All records associated with NEON biological samples will be uploaded and published to the GBIF data repository, making them more accessible and searchable for the international research community.
NEON Data on the GBIF Platform
NEON is partnering with GBIF to automate the data ingestion process for the GBIF platform. Currently, there are over 145,000 occurrences of NEON specimen data on the platform, from zooplankton in Puerto Rico to terrestrial invertebrates in Alaska. There are also 35 published datasets that include data on mosquito DNA, ear tissue samples from small mammals, tree canopy foliage and litter, ground beetle trap bycatch, and so much more.
GBIF also tracks publications that use data published on their site, powered by their innovative use of digital object identifiers (DOIs) for each downloaded dataset (and, of course, data users who cite the DOIs in their publications!). Tracking publications and usage of NEON data is critical to understanding how and where our ecological data are currently used and what we can do to improve it over the lifetime of the Observatory. Currently, GBIF shows 46 citations in literature that use data NEON has published to their site.
"Considering that NEON has only been publishing sample data to GBIF since 2019, 46 peer-reviewed papers already is a good showing," says Dr. Katie LeVan, formerly a research scientist at NEON involved with insect ecology and data science. "It's expected to go up and up."
GBIF also publishes an annual Science Review, which identifies research uses and citations of biodiversity information accessed through GBIF's global infrastructure. The peer-reviewed articles summarized in its pages offer a partial but instructive view of research investigations that have been enhanced and supported by open-access data that GBIF members and publishers make available through the network.
The recently published 2020 report—which highlights nearly 70 of the 743 papers recorded in the GBIF literature-tracking program from 2019—includes at least ten papers that cite NEON specimens stored in the Biorepository.
NEON already contributes a disproportionate number of occurrences of carabids in the GBIF records, says LeVan. "For a lot of the taxa accessed through GBIF, NEON is becoming an outsized contributor," she continues. "NEON is really different from the Smithsonian or other collections in that it has digitized nearly everything from the start. Other institutions may have tons and tons more archival material, but it can be hard to know what's available or accessible when so much of that is not online. NEON samples are digitized from day one of collection all the way to final curation."
The Future of GBIF and NEON
The NEON Biorepository at Arizona State University manages the collections data in the Symbiota platform, which includes a publication pipeline to GBIF. GBIF has been working to expand its database to all sampling event data rather than strictly specimen-based occurrence data. To that end, NEON staff have been working with Abigail Benson, United States Geological Survey (USGS) scientist and GBIF North American Node Manager, to develop a pipeline to publish NEON occurrences that are not associated with a physical sample onto GBIF. Benson is part of the GBIF network of Node Managers throughout the world who facilitate data publication for contributors in their region.
The GBIF initiative is just one way that the NEON program is working to make data more accessible and findable for the research community. A similar agreement with the Environmental Data Initiative (EDI), which aggregates data from the Long Term Ecological Network (LTER) community and other networks, will make it easier to search and analyze data from the NEON program alongside data from other participating organizations.
Aggregating data from multiple sources allows researchers to compare datasets from different time periods and different parts of the world to ask and answer a broader range of ecological questions. With GBIF, data from the NEON Biorepository can now be searched and compared alongside similar datasets from other global biomes or from historical records kept by natural history museums. This puts the NEON biodiversity data into a broader historical and global context. As we move forward, the NEON program will continue to look for ways to make our data more findable, usable, and useful for the science community.