Case Study
What’s That Tree? This Neural Net Can Tell You
June 17, 2020
How long would it take to count and classify all of the trees in a single forest? For humans, it’s a daunting job.
Andrew Fricker, an assistant professor at California Polytechnic State University, thinks that a neural net—a form of artificial intelligence (AI)—could deliver faster and more accurate information about plant species composition and abundance. Andrew used remote sensing data from the NEON Airborne Observation Platform (AOP) to train a neural net to classify tree species in a Sierra Nevada forest. He and his coauthors describe their approach in Remote Sensing: “A Convolutional Neural Network Classifier Identifies Tree Species in Mixed-Conifer Forest from Hyperspectral Imagery.”
Teaching an AI to Classify Tree Species with Deep Learning
“Humans,” says Andrew, “are natural classifiers.” Even small children are able to accurately identify an object as a “tree.” With education and training, most of us learn to classify trees as deciduous or coniferous and to tell the difference between an oak tree and a fir. Trained botanists can accurately classify trees down to their genus, species, and subspecies, distinguishing similar species such as the Lodegepole Pine (Pinus contorta) and Jeffrey Pine (Pinus jeffreyi).
Computers see the world a little differently. A neural net is designed to mimic how the human brain works and learn in ways similar to a human child. To learn how to classify trees, the neural net has to first be trained to recognize a tree and distinguish it from its surroundings. Then, it has to identify characteristics that will help it classify the tree by species. This may involve different datasets than humans use for species classification. For example, rather than counting needle clusters and examining pinecones, an AI may use spectral data and other clues to tell one kind of pine from another. Hyperspectral images have spectral data that goes beyond what the human eye is able to see, so the neural net is looking at a spectrally deep dataset with superhuman eyes.
Andrew’s study trained a Convolutional Neural Network (CNN) to identify and classify trees in the Teakettle Experimental Forest (TEAK) in the Sierra Nevada Mountains in California. A CNN is a type of neural network specifically geared towards image classification. CNNs have already been applied to many types of image classification problems, such as converting handwritten documents into digital data or helping self-driving vehicles recognize people, cars, and road signs. Andrew believes this is the first time a CNN has been applied to the problem of classifying tree species.
Training a neural network requires data—a lot of data. Deep learning is a method of training a neural net that involves feeding it large volumes of training data and letting the network learn over time how to interpret it. In this case, the team used hyperspectral and LiDAR (Light Detection and Ranging) data from a single flightline of the AOP at TEAK. The remote sensing data was paired with “ground truth” data collected through traditional field methods. The neural net looked at all kinds of features, including structure, texture, shape, color (spectra), and the area surrounding each tree. Over several rounds of training, it learned which types of data were most relevant in making an accurate determination of tree species.
Andrew says, “The NEON remote sensing data is very dense, which makes it ideal for deep learning. In addition, we had a lot of species composition data from the ground already, either from the GPS points collected in the field that intersected the flightline or from the U.S. Forest Service.”
Improving the Accuracy of Tree Classification Using Remote Sensing Data
Remote sensing hyperspectral data, which records the wavelengths (spectra) of light reflected by the forest canopy, has been used for vegetation classification for decades. However, Andrew believes that this is the first time that remote sensing data has been paired with a convolutional neural net for tree identification. CNNs are a relatively new form of neural net that leverages deep learning. The goal is to improve the accuracy of species classification using remote sensing data.
“Spectral Mixing Analysis algorithms, or SMAs, just use the spectral data. So the program determines that red firs typically reflect light at certain wavelengths, and then looks for anything else with the same spectral characteristics and classifies it as a red fir,” Andrew explains. The CNN used in his study looks at all types of data, including structural data collected by LiDAR. Bringing in other types of data—such as canopy structure, tree shape, texture, and contextual clues from the surrounding landscape—could greatly improve the accuracy of automated tree classification.
Understanding the species composition and abundance of forests, their spatial distribution, and how these characteristics are changing over time is very important to ecologists. Plants form the base of the food web and provide habitat for insects, birds, and mammals. They also impact (and are impacted by) soil characteristics and microbial communities. The mix of tree species in a forest, their relative abundance, how they are distributed across space, and the overall structure of the forest canopy and understory are directly linked to the composition and abundance of animal communities, soil chemistry and structure, carbon sequestration potential, nutrient fluxes, and many other ecological variables.
Remote sensing enables ecologists to monitor forests over much larger geologic and temporal scales than is possible with traditional field methods. Botanists typically only sample a small fraction of the total trees in a forest when conducting species counts. With remote sensing, they can look at all of the trees in the forest, providing insights into variations in species composition across space that may be missed when looking at data from a limited number of sampling plots. Combining large-scale maps of tree species distribution with other types of ecological, geological, or hydrological data allows researchers to see how plant community composition and distribution is correlated with other variables, such as soil characteristics, the presence or absence of other species, precipitation patterns or terrain features. Monitoring forests over time will allow ecologists to see how forest communities are changing in response to climate change, natural disturbances, land use and management decisions, or invasive species.
The Future of AI-Based Tree Classification
The neural net trained by Andrew and his team was able to successfully classify and map the seven dominant species of trees found within the Teakettle Experimental Forest. The maps produced by the CNN are demonstrably more accurate than maps produced by older SMAs and other machine learning algorithms, such as the “Random Forests” method. The best results were obtained by combining the hyperspectral data already used by SMAs with deep learning methods that leveraged LiDAR and other data types.
He is excited about the potential of remote sensing data for ecology. “The availability of field data is often a limiting factor for the kinds of ecological questions we can ask. Using remote sensing data opens up all kinds of possibilities for ecology,” he says. In order to achieve this vision, remote sensing data must be as accurate as possible.
More work remains to be done to scale up the neural net method for larger areas and other types of forests. Ultimately, Andrew would like to see similar methods applied to map and classify trees across all of the NEON terrestrial field sites. “This is a proof of concept,” he says. “We’ve shown that the method is effective. I want to see others replicate this at other sites. Mapping all of the trees within a study site is an important steppingstone to get all at kinds of other ecological questions.”
To further that aim, the team has made their code freely available to researchers through GitHub at https://github.com/jonathanventura/canopy. Data from the study is available at https://zenodo.org/record/3470250#.Xdl3li2ZMW9. A number of researchers have already downloaded the code for their own projects. “The code is surprisingly easy to use, and it works,” Andrew says. “I encourage anyone interested to download it and see what they can do with it.”