Case Study
Building a Forecasting Community: The EFI-RCN NEON Forecasting Challenge
June 9, 2022
Think you can predict what NEON data will show before they are collected? That's exactly what participants in the Ecological Forecasting Initiative Research Coordination Network (EFI-RCN) NEON Ecological Forecasting Challenge were invited to do.
Teams and individuals are creating ecological forecasts around five themes, including terrestrial water and carbon fluxes, beetle communities, tick populations, aquatic ecosystems, and plant phenology. The goal? To build a community of practice to share forecasting methods and approaches, and to help those new to forecasting develop data skills.
In this two-part blog series, we will introduce some of the participants and explore the lessons learned from the first year of the challenge.
About the Challenge
The EFI-RCN NEON Ecological Forecasting Challenge, supported by funding from the National Science Foundation (NSF), was established in late 2020 through a collaboration between EFI and NEON. It is open to individuals and teams of all backgrounds and experience levels, from students to veteran researchers. Participants are challenged to build forecasting models, using past data from the NEON program, that can successfully predict future NEON data, which has a 30 year timeline. EFI anticipates continuing the annual challenge at least through 2024. Those interested in participating in a future challenge can find more information on the EFI Forecasting Challenge website.
The second round of the Challenge in 2022 is organized around five themes:
- Aquatic Ecosystems (temperature, dissolved oxygen, and chlorophyll-a) at two lake and three river/stream NEON sites
- Terrestrial Carbon and Water Fluxes (net ecosystem exchange of CO2 and evapotranspiration) at ten terrestrial NEON sites
- Tick Populations (Amblyomma americanum (Lone Star tick) nymphal tick abundance) at nine terrestrial NEON sites
- Plant Phenology (daily greenness and redness of plants throughout the year) at ten deciduous broadleaf forest, six grassland, and two shrubland NEON sites
- Beetle Communities (beetle abundance and species richness) at 47 terrestrial sites
Lessons Learned from Round 1
Some of our participants shared their stories and advice for future forecasters.
Tick Populations
Dr. Nick Clark is a Research Fellow at the University of Queensland in Australia with a background in ecology and qualitative genetics. He got interested in forecasting after attending an EFI workshop in 2019 and submitted his forecast this year as an individual. He says, "I had done a little modeling work in ecology, but not forecasting. EFI was a really useful resource to learn more about forecasting methods and meet other people in the field. I was really sold by the idea of using a challenge to accelerate learning."
Clark chose the tick challenge because of his previous research in the spread of tick-borne diseases, particularly in pet animals. He used a simple model that looks only at past tick data without trying to incorporate ecological theory or other data. Using time-series data from past seasons, he was able to create a forecast for nymph abundance at NEON field sites. His forecast performed reasonably well for sites where tick abundance is relatively stable, but did not predict more drastic changes seen at some sites. "I would hope that as we add environmental variables, the forecasts would do better," he says. "We need to understand what caused those drastic changes. But it was really interesting to see where the model did OK and where it failed spectacularly."
While his model did not prove accurate for all sites, for Clark, that was not the point. He says, "My goal was simply to say that if I only had access to this time series, could I use that to produce a reasonable forecast? I'm learning how to deal with challenges with time-series data and how to apply different kinds of models I found in the literature." He discovered that many general forecasting models do not translate well to ecology because of the unique challenges of working with ecological time-series data, which tend to be more complex (with more seasonal variation, for example) and messy (with missing data due to field conditions, differences in collection methods, etc.). He appreciated working with the NEON data, which provide more consistent and usable time-series data than are often available to ecologists.
Clark is continuing to play with different forecasting models and would like to try some more complex models, including models that do a better job of integrating seasonal variation and potential drivers such as climate data. He now has a graduate student working with him. They are developing some R packages that can handle ecological time-series data that they hope will lead to better benchmark models. He may also use the EFI-RCN NEON Forecasting Challenge with students in the future. Currently, he is working on developing models that could provide forecasts for multiple outcomes and species of interest.
Aquatics Ecosystems
Pricilla Ceja, a senior at Cal Poly Humboldt majoring in zoology, first heard about the challenge from her advisor, Dr. Nievita Bueno Watts. She quickly joined forces with Briana Ruiz, another Cal Poly Humboldt undergrad, and Alyssa Willson, a Ph.D. candidate at the University of Notre Dame. The team took on the dissolved oxygen portion of the aquatics challenge. For Ceja, the experience was a first introduction to ecological forecasting. She says, "The challenge was a good way to test my skills and apply what I'd learned from the EFI videos I watched in the spring."
Willson has been involved with EFI since she began her Ph.D. program. She served as a student representative on a RCN committee for the challenge. She says, "I've been particularly interested in the part of the Ecological Forecasting Initiative that is concerned with making data science and forecasting more accessible to a broader range of students. The challenge is great, because what better way to learn forecasting than to actually create a forecast with data and infrastructure that is already in place? You're not starting from scratch, wondering where to find data and which data will be useful. With the NEON data products, all of that is in place, so you can just focus on creating the forecast." She connected with Dr. Watts through a mutual collaborator at EFI. When Watts was asked if she would be an advisor for Ceja and Ruiz on the project, she jumped at the chance.
The team developed a forecasting model for dissolved oxygen using atmospheric pressure as a covariate, using data from NEON's Lake Barco (BARC) site. Ceja explains, "We saw that there was a positive correlation with atmospheric pressure and dissolved oxygen: when there's a lot of atmospheric pressure, there should be a lot of dissolved oxygen. And so we wanted to see if making a forecast with atmospheric pressure as a covariate for dissolved oxygen gave us any information."
Ultimately, they learned that using atmospheric pressure as a covariate does not improve their ability to predict dissolved oxygen for the lake; their forecast performed similarly to the null model, a simple random walk state-space model. Willson says, "I don't think creating a highly accurate forecast was our real goal here. The question was more, 'can undergraduate students with no prior knowledge of forecasting methods successfully make a forecast that involves coming up with a hypothesis, collecting and cleaning the data, and building a model?' And the answer was yes, after just six months of training, they did very well."
Ceja says the experience has piqued her interest in how ecological forecasting could be used in the real world to make management decisions. For example, managers could use forecasts of dissolved oxygen levels to make proactive decisions to improve the health of the lake, such as using aeration during periods where oxygen levels are likely to be low. Ceja says, "I never really made the connection of managers needing to know more about the future so they could fix issues that might come up. I was really interested in learning more about how to build ecological models and how those models might be applied to solve problems." She also valued the opportunity to learn how to use RStudio and data analysis and management methods. She would like to apply her new data and forecasting skills to other ecological questions in the future. She says, "The challenge was a lot of fun, and it was really interesting seeing all of the data available from NEON."
Phenology
Yiluan Song, a grad student at the University of California – Santa Cruz, submitted her forecast in the phenology challenge. Phenology is the science of the timing of recurring biological events, such as leaf-out for plants or migration for birds. The EFI-RCN NEON Forecasting Challenge focused on plant phenology, using color (red and green) data from the NEON phenocams to monitor greenness at NEON field sites. Song says, "We always say that phenology is nature's calendar. It also involves how these timings are affected by environmental conditions and human activities."
Song is studying the intersections between plant phenology, climate change, and human society as part of her Ph.D. program at UCSC, working under Dr. Kai Zhu in the Environmental Sciences department. Zhu's work is supported by an NSF CAREER Award (2045309); Song received additional support from Dr. Steve Munch at NOAA. She got into ecological forecasting after attending a workshop on empirical dynamic modeling—the method she ultimately used for the challenge. "The method is great because it is very flexible; it doesn't require us to know a lot about the mechanisms that are driving the dynamics."
Song's forecasts leveraged the green (green chromatic coordinate, GCC) and red (red chromatic coordinate, RCC) pixel data from the phenocams and were automated to submit daily. (Check out this example of GCC data.) They came close to beating the null model, which is considered highly successful in the context of the contest. Song explains, "Green peaks in the summer, while red peaks in the spring and again in the fall. It's pretty amazing that you can capture plant phenology just by the colors of the pixels." She used an empirical model to describe a nonlinear relationship between weather data and plant phenology data from NEON forested sites. The approach is flexible, in that it does not specify whether temperature or precipitation was the environmental driver. The model performed very well at forested sites with relatively stable phenology and strong seasonality.
This empirical approach may do better at making predictions into the future than traditional models based on historical data, Song believes, especially when it comes to making predictions in a changing climate. She says, "It's hard to model the effects of climate change because it happens on a larger timescale, ten to twenty years or longer. If we know how weather changes have impacted phenology in the past, we can model those changes pretty well in the future. But when we're talking about climate change, with variation that we have never seen in the past, modeling becomes very hard."
Song is excited to be attending an EFI short course in Boston this year, which was originally scheduled for 2020. She also plans to continue to expand on her forecasting work. Zhu and Munch received a grant from Microsoft to build a web app that delivers the phenology forecasting results. "We actually built the whole workflow to forecast leaf phenology and all the way to flowering phenology," she explains. "I think it's useful. Not even just for research, but for scientific communication, or informing recreation, or even for pollen allergies."
Song loved the experience of participating in the challenge. "I'm a big fan of hackathons and online challenges, but it's not often you see these in ecology. I've learned so much from the dashboard and materials that the organizers created for us. And I learned a lot about using the cyberinfrastructure to automate the workflows."