NEON Ambassador Workshop Series Puts a Spotlight on Derived Data Products
October 5, 2023
The NEON Ambassador Program brings together a diverse cohort of researchers and educators to promote the use of NEON data and grow the NEON user community. This past July, the Ambassadors convened a virtual workshop focusing on NEON derived data products. Participants are now working on a summary paper to outline best practices in creating, validating, and using derived data products. It's one more way the NEON Ambassadors are helping to make NEON data more usable and understandable for the wider research community.
What's a Derived Data Product?
A derived data product is a processed, summarized, combined, or modeled dataset that is created based on primary data collected from NEON sites. While initial data collection captures the immediate observations and measurements (e.g., temperature, soil moisture, or organismal species identified), derived data products are processed products of these measurements that provide insights or summaries that are often more useful for researchers. For example:
- NEON's airborne remote sensing data, which includes high-resolution spectrometer data, can be used to compute various vegetation indices such as NDVI (Normalized Difference Vegetation Index).
- Sensor data from the NEON flux towers and soil arrays can be used to derive insights into soil-atmosphere interactions, nutrient cycling, and gas fluxes.
- Tree girth measurements and remote sensing data can be used to extrapolate total vegetation biomass at a plot or site.
Dr. Danica Lombardozzi, a professor at Colorado State University in ecosystem science and sustainability and a project scientist at the National Center for Atmospheric Research (NCAR), says, "NEON provides a ton of data that are useful in a lot of capacities. But most of the time, the data needs additional processing to be useful for a specific research case. Derived data takes that raw data and curates or transforms it into something more usable for a research need."
One form of derived data simply fills in data gaps in a series so that it can be used for a specific purpose. For example, if instruments fail to work at specific times or a model needs data provided in a different time series, an algorithm can be used to fill in the data gaps and prepare the data for further processing. Other derived products may be created by partitioning, such as extrapolating the separate flows of carbon dioxide into and out of an ecosystem using atmospheric gas concentration measurements. Or, multiple data products may be combined in various ways to produce new data products.
The NEON data portal includes a number of derived data products. All raw NEON data (Level 0 data) is processed to some extent for quality control and consistency before it is added to the NEON data portal. Higher levels of data processing are used to derive additional data products. Level 4 data, the most highly processed data in the NEON data portal, includes derived products such as stream discharge rates (derived from surface water elevation and a stage-discharge rating curve) and carbon dioxide flux between the atmosphere and surface (derived from wind and gas concentration measurements).
Additional derived data products have been developed by the NEON user community. Some of these derived products are available through the Environmental Data Initiative (EDI) and other collaborative initiatives. However, the possibilities for derived data are nearly limitless, and the derived products available through EDI or other collaborations are only the tip of the iceberg. Many other researchers have created derived products of their own to explore a variety of research questions.
Dr. Andrew Richardson, a professor in ecosystem ecology at Northern Arizona University, explains, "A lot of people don't understand that NEON only takes the measurements. There are lots of other things that can be done with the data. But there is no reason for everyone who wants to use data derived from NEON tower measurements, for example, to have to go through all the same processing and quality control and gap-filling steps. If you want to use those data, and I want to use those data, and other people want to use those data, it's a waste of time for everyone to do the same calculations. So, our idea is that by doing that extra level of processing, putting stuff out there in a more polished and usable form, it just makes it easier for people to use NEON data."
Developing Standards for Derived Data Products
The July 2023 workshop was conceived to address questions and challenges around derived data products. The four-day virtual workshop, hosted by NEON and Knowinnovation, was organized by Lombardozzi, Richardson, and NEON Ambassadors Kelly Aho, Jeff Atkins, and Jessie Walker. Twenty applicants were selected to participate, many of them early career scientists. Participants were selected to represent a diverse range of backgrounds, perspectives, and research interests. Lombardozzi says, "We tried to select people who had some experience with NEON data and had some interest in moving beyond the basics of what NEON data are to really explore the uses of the derived data products."
Knowinnovation facilitated the virtual sessions for the first three days of the workshop. Participants broke into small groups to explore potential opportunities for derived data products based on NEON data and to identify standards and best practices for creating, assessing, and validating user-generated derived products. Several groups developed ideas for new derived data products they would like to create. These included:
- A data product collating NEON organismal trait data from multiple trophic levels at comparable spatial and temporal scales.
- An RShiny application that allows users to estimate ecosystem responses to fire at NEON sites.
- Maps showing different kinds of disturbances at NEON sites using RShiny or Google Earth Engine.
On day four, more than a dozen participants came together to discuss the creation of a summary paper that will provide guidelines for developing and distributing NEON-derived data products, including quality standards. The paper will be submitted to an ESA Journal for peer review later this year. Organizers hope the paper will help others interested in creating and sharing derived data products for the benefit of the NEON user community.
They hope to see other outcomes from the workshop as well. For example, some participants are interested in organizing a hackathon for collaborative work on some of the new derived data products suggested at the workshops. Ultimately, they hope to see several new derived data products available through EDI. Richardson says, "The goal is to have products indexed and easily accessible for people. We want them to be easy to find and easy to use." There is also interest in creating educational materials for teachers in higher education and K-12 around NEON-derived data products.
About the NEON Ambassador Program
The Derived Data Workshop would not have been possible without the dedication and expertise of our NEON Ambassadors. The Ambassador Program was created to engage the NEON user community and promote the use of NEON data within the scientific community and beyond. Ambassadors have the opportunity to work with like-minded peers representing a diverse range of disciplines, backgrounds, and institutional affiliations. They receive support from NEON staff and access to a dedicated community toolkit with resources designed by and for NEON Ambassadors.
The launch of NEON Ambassadors 2.0 is in the works! Contact ambassador@battelleecology.org with questions, or check https://www.neonscience.org/neon-ambassador-program for updates as they become available.