Tutorial

Access and Work with NEON Geolocation Data

Authors: Claire K. Lunch

Last Updated: Mar 21, 2023

This tutorial explores NEON geolocation data. The focus is on the locations of NEON observational sampling and sensor data; NEON remote sensing data are inherently spatial and have dedicated tutorials. If you are interested in connecting remote sensing with ground-based measurements, the methods in the vegetation structure and canopy height model tutorial can be generalized to other data products.

In planning your analyses, consider what level of spatial resolution is required. There is no reason to carefully map each measurement if precise spatial locations aren't required to address your hypothesis! For example, if you want to use the Vegetation structure data product to calculate a site-scale estimate of biomass and production, the spatial coordinates of each tree are probably not needed. If you want to explore relationships between vegetation and beetle communities, you will need to identify the sampling plots where NEON measures both beetles and vegetation, but finer-scale coordinates may not be needed. Finally, if you want to relate vegetation measurements to airborne remote sensing data, you will need very accurate coordinates for each measurement on the ground.

Learning Objectives

After completing this tutorial you will be able to:

access NEON spatial data through data downloaded with the neonUtilities package.
access and plot specific sampling locations for TOS data products.
access and use sensor location data.

Things You’ll Need To Complete This Tutorial

R Programming Language

You will need a current version of R to complete this tutorial. We also recommend the RStudio IDE to work with R.

Setup R Environment

We'll need several R packages in this tutorial. Install the packages, if not already installed, and load the libraries for each.

# run once to get the package, and re-run if you need to get updates

install.packages("ggplot2")  # plotting

install.packages("neonUtilities")  # work with NEON data

install.packages("neonOS")  # work with NEON observational data

install.packages("devtools")  # to use the install_github() function

devtools::install_github("NEONScience/NEON-geolocation/geoNEON")  # work with NEON spatial data



# run every time you start a script

library(ggplot2)

library(neonUtilities)

library(neonOS)

library(geoNEON)



options(stringsAsFactors=F)

Locations for observational data

Plot level locations

Both aquatic and terrestrial observational data downloads include spatial data in the downloaded files. The spatial data in the aquatic data files are the most precise locations available for the sampling events. The spatial data in the terrestrial data downloads represent the locations of the sampling plots. In some cases, the plot is the most precise location available, but for many terrestrial data products, more precise locations can be calculated for specific sampling events.

Here, we'll download the Vegetation structure (DP1.10098.001) data product, examine the plot location data in the download, then calculate the locations of individual trees. These steps can be extrapolated to other terrestrial observational data products; the specific sampling layout varies from data product to data product, but the methods for working with the data are similar.

First, let's download the vegetation structure data from one site, Wind River Experimental Forest (WREF).

If downloading data using the neonUtilities package is new to you, check out the Download and Explore tutorial.

# load veg structure data

vst <- loadByProduct(dpID="DP1.10098.001", 
                     site="WREF",
                     check.size=F)

Data downloaded this way are stored in R as a large list. For this tutorial, we'll work with the individual dataframes within this large list. Alternatively, each dataframe can be assigned as its own object.

To find the spatial data for any given data product, view the variables files to figure out which data table the spatial data are contained in.

View(vst$variables_10098)

Looking through the variables, we can see that the spatial data (decimalLatitude and decimalLongitude, etc) are in the vst_perplotperyear table. Let's take a look at the table.

View(vst$vst_perplotperyear)

As noted above, the spatial data here are at the plot level; the latitude and longitude represent the centroid of the sampling plot. We can map these plots on the landscape using the easting and northing variables; these are the UTM coordinates. At this site, tower plots are 40 m x 40 m, and distributed plots are 20 m x 20 m; we can use the symbols() function to draw boxes of the correct size.

We'll also use the treesPresent variable to subset to only those plots where trees were found and measured.

# start by subsetting data to plots with trees

vst.trees <- vst$vst_perplotperyear[which(
        vst$vst_perplotperyear$treesPresent=="Y"),]



# make variable for plot sizes

plot.size <- numeric(nrow(vst.trees))



# populate plot sizes in new variable

plot.size[which(vst.trees$plotType=="tower")] <- 40

plot.size[which(vst.trees$plotType=="distributed")] <- 20



# create map of plots

symbols(vst.trees$easting,
        vst.trees$northing,
        squares=plot.size, inches=F,
        xlab="Easting", ylab="Northing")

All vegetation structure plots at WREF

We can see where the plots are located across the landscape, and we can see the denser cluster of plots in the area near the micrometeorology tower.

For many analyses, this level of spatial data may be sufficient. Calculating the precise location of each tree is only required for certain hypotheses; consider whether you need these data when working with a data product with plot-level spatial data.

Looking back at the variables_10098 table, notice that there is a table in this data product called vst_mappingandtagging, suggesting we can find mapping data there. Let's take a look.

View(vst$vst_mappingandtagging)

Here we see data fields for stemDistance and stemAzimuth. Looking back at the variables_10098 file, we see these fields contain the distance and azimuth from a pointID to a specific stem. To calculate the precise coordinates of each tree, we would need to get the locations of the pointIDs, and then adjust the coordinates based on distance and azimuth. The Data Product User Guide describes how to carry out these steps, and can be downloaded from the Data Product Details page.

However, carrying out these calculations yourself is not the only option! The geoNEON package contains a function that can do this for you, for the TOS data products with location data more precise than the plot level.

Sampling locations

The getLocTOS() function in the geoNEON package uses the NEON API to access NEON location data and then makes protocol-specific calculations to return precise locations for each sampling effort. This function works for a subset of NEON TOS data products. The list of tables and data products that can be entered is in the package documentation on GitHub.

For more information about the NEON API, see the API tutorial and the API web page. For more information about the location calculations used in each data product, see the Data Product User Guide for each product.

The getLocTOS() function requires two inputs:

A data table that contains spatial data from a NEON TOS data product
The NEON table name of that data table

For vegetation structure locations, the function call looks like this. This function may take a while to download all the location data. For faster downloads, use an API token.

# calculate individual tree locations

vst.loc <- getLocTOS(data=vst$vst_mappingandtagging,
                     dataProd="vst_mappingandtagging")

What additional data are now available in the data obtained by getLocTOS()?

# print variable names that are new

names(vst.loc)[which(!names(vst.loc) %in% 
                      names(vst$vst_mappingandtagging))]

## [1] "utmZone"                  "adjNorthing"              "adjEasting"              
## [4] "adjCoordinateUncertainty" "adjDecimalLatitude"       "adjDecimalLongitude"     
## [7] "adjElevation"             "adjElevationUncertainty"

Now we have adjusted latitude, longitude, and elevation, and the corresponding easting and northing UTM data. We also have coordinate uncertainty data for these coordinates.

As we did with the plots above, we can use the easting and northing data to plot the locations of the individual trees.

plot(vst.loc$adjEasting, vst.loc$adjNorthing, 
     pch=".", xlab="Easting", ylab="Northing")

All mapped tree locations at WREF

We can see the mapped trees in the same plots we mapped above. We've plotted each individual tree as a ., so all we can see at this scale is the cluster of dots that make up each plot.

Let's zoom in on a single plot:

plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")], 
     pch=20, xlab="Easting", ylab="Northing")

Tree locations in plot WREF_085

Now we can see the location of each tree within the sampling plot WREF_085. This is interesting, but it would be more interesting if we could see more information about each tree. How are species distributed across the plot, for instance?

We can plot the tree species at each location using the text() function and the vst.loc$taxonID field.

plot(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")], 
     type="n", xlab="Easting", ylab="Northing")

text(vst.loc$adjEasting[which(vst.loc$plotID=="WREF_085")], 
     vst.loc$adjNorthing[which(vst.loc$plotID=="WREF_085")],
     labels=vst.loc$taxonID[which(vst.loc$plotID=="WREF_085")],
     cex=0.5)

Tree species and their locations in plot WREF_085

Almost all of the mapped trees in this plot are either Pseudotsuga menziesii or Tsuga heterophylla (Douglas fir and Western hemlock), not too surprising at Wind River.

But suppose we want to map the diameter of each tree? This is a very common way to present a stem map, it gives a visual as if we were looking down on the plot from overhead and had cut off each tree at its measurement height.

Other than taxon, the attributes of the trees, such as diameter, height, growth form, and canopy position, are found in the vst_apparentindividual table, not in the vst_mappingandtagging table. We'll need to join the two tables to get the tree attributes together with their mapped locations.

The neonOS package contains the function joinTableNEON(), which can be used to do this. See the tutorial for the neonOS package for more details about this function.

veg <- joinTableNEON(vst.loc, 
                     vst$vst_apparentindividual,
                     name1="vst_mappingandtagging",
                     name2="vst_apparentindividual")

Now we can use the symbols() function to plot the diameter of each tree, at its spatial coordinates, to create a correctly scaled map of boles in the plot. Note that stemDiameter is in centimeters, while easting and northing UTMs are in meters, so we divide by 100 to scale correctly.

symbols(veg$adjEasting[which(veg$plotID=="WREF_085")], 
        veg$adjNorthing[which(veg$plotID=="WREF_085")], 
        circles=veg$stemDiameter[which(veg$plotID=="WREF_085")]/100/2, 
        inches=F, xlab="Easting", ylab="Northing")

Tree bole diameters in plot WREF_085

If you are interested in taking the vegetation structure data a step further, and connecting measurements of trees on the ground to remotely sensed Lidar data, check out the Vegetation Structure and Canopy Height Model tutorial.

If you are interested in working with other terrestrial observational (TOS) data products, the basic techniques used here to find precise sampling locations and join data tables can be adapted to other TOS data products. Consult the Data Product User Guide for each data product to find details specific to that data product.

Locations for sensor data

Downloads of instrument system (IS) data include a file called sensor_positions.csv. The sensor positions file contains information about the coordinates of each sensor, relative to a reference location.

While the specifics vary, techniques are generalizable for working with sensor data and the sensor_positions.csv file. For this tutorial, let's look at the sensor locations for soil temperature (PAR; DP1.00041.001) at
the NEON Treehaven site (TREE) in July 2018. To reduce our file size, we'll use the 30 minute averaging interval. Our final product from this section is to create a depth profile of soil temperature in one soil plot.

If downloading data using the neonUtilties package is new to you, check out the neonUtilities tutorial.

This function will download about 7 MB of data as written so we have check.size =F for ease of running the code.

# load soil temperature data of interest 

soilT <- loadByProduct(dpID="DP1.00041.001", site="TREE",
                    startdate="2018-07", enddate="2018-07",
                    timeIndex=30, check.size=F)

## Attempting to stack soil sensor data. Note that due to the number of soil sensors at each site, data volume is very high for these data. Consider dividing data processing into chunks, using the nCores= parameter to parallelize stacking, and/or using a high-performance system.

Sensor positions file

Now we can specifically look at the sensor positions file.

# create object for sensor positions file

pos <- soilT$sensor_positions_00041



# view column names

names(pos)

##  [1] "siteID"                           "HOR.VER"                         
##  [3] "sensorLocationID"                 "sensorLocationDescription"       
##  [5] "positionStartDateTime"            "positionEndDateTime"             
##  [7] "referenceLocationID"              "referenceLocationIDDescription"  
##  [9] "referenceLocationIDStartDateTime" "referenceLocationIDEndDateTime"  
## [11] "xOffset"                          "yOffset"                         
## [13] "zOffset"                          "pitch"                           
## [15] "roll"                             "azimuth"                         
## [17] "locationReferenceLatitude"        "locationReferenceLongitude"      
## [19] "locationReferenceElevation"       "eastOffset"                      
## [21] "northOffset"                      "xAzimuth"                        
## [23] "yAzimuth"                         "publicationDate"

# view table

View(pos)

The sensor locations are indexed by the HOR.VER variable - see the file naming conventions page for more details.

Using unique() we can view all the location indices in this file.

unique(pos$HOR.VER)

##  [1] "001.501" "001.502" "001.503" "001.504" "001.505" "001.506" "001.507" "001.508" "001.509" "002.501"
## [11] "002.502" "002.503" "002.504" "002.505" "002.506" "002.507" "002.508" "002.509" "003.501" "003.502"
## [21] "003.503" "003.504" "003.505" "003.506" "003.507" "003.508" "003.509" "004.501" "004.502" "004.503"
## [31] "004.504" "004.505" "004.506" "004.507" "004.508" "004.509" "005.501" "005.502" "005.503" "005.504"
## [41] "005.505" "005.506" "005.507" "005.508" "005.509"

Soil temperature data are collected in 5 instrumented soil plots inside the tower footprint. We see this reflected in the data where HOR = 001 to 005. Within each plot, temperature is measured at 9 depths, seen in VER = 501 to 509. At some sites, the number of depths may differ slightly.

The x, y, and z offsets in the sensor positions file are the relative distance, in meters, to the reference latitude, longitude, and elevation in the file.

The HOR and VER indices in the sensor positions file correspond to the verticalPosition and horizontalPosition fields in soilT$ST_30_minute.

Note that there are two sets of position data for soil plot 001, and that one set has a positionEndDateTime date in the file. This indicates sensors either moved or were relocated; in this case there was a frost heave incident. You can read about it in the issue log, which is displayed on the Data Product Details page, and also included as a table in the data download:

soilT$issueLog_00041[grep("TREE soil plot 1", 
                     soilT$issueLog_00041$locationAffected),]

##      id parentIssueID            issueDate         resolvedDate       dateRangeStart         dateRangeEnd
## 1: 9328            NA 2019-05-23T00:00:00Z 2019-05-23T00:00:00Z 2018-11-07T00:00:00Z 2019-04-19T00:00:00Z
##                                                                                                                          locationAffected
## 1: D05 TREE soil plot 1 measurement levels 1-9 (HOR.VER: 001.501, 001.502, 001.503, 001.504, 001.505, 001.506, 001.507, 001.508, 001.509)
##                                                                                                                                                                                                                           issue
## 1: Soil temperature sensors were pushed or pulled out of the ground by 3 cm over winter, presumably due to freeze-thaw action. The exact timing of this is unknown, but it occurred sometime between 2018-11-07 and 2019-04-19.
##                                                                                        resolution
## 1: Sensor depths were updated in the database with a start date of 2018-11-07 for the new depths.

Since we're working with data from July 2018, and the change in sensor locations is dated Nov 2018, we'll use the original locations. There are a number of ways to drop the later locations from the table; here, we find the rows in which the positionEndDateTime field is empty, indicating no end date, and the rows corresponding to soil plot 001, and drop all the rows that meet both criteria.

pos <- pos[-intersect(grep("001.", pos$HOR.VER),
                      which(pos$positionEndDateTime=="")),]

Our goal is to plot a time series of temperature, stratified by depth, so let's start by joining the data file and sensor positions file, to bring the depth measurements into the same data frame with the data.

# paste horizontalPosition and verticalPosition together

# to match HOR.VER in the sensor positions file

soilT$ST_30_minute$HOR.VER <- paste(soilT$ST_30_minute$horizontalPosition,
                                    soilT$ST_30_minute$verticalPosition,
                                    sep=".")



# left join to keep all temperature records

soilTHV <- merge(soilT$ST_30_minute, pos, 
                 by="HOR.VER", all.x=T)

And now we can plot soil temperature over time for each depth. We'll use ggplot since it's well suited to this kind of stratification. Each soil plot is its own panel, and each depth is its own line:

gg <- ggplot(soilTHV, 
             aes(endDateTime, soilTempMean, 
                 group=zOffset, color=zOffset)) +
             geom_line() + 
        facet_wrap(~horizontalPosition)

gg

## Warning: Removed 1488 rows containing missing values (`geom_line()`).

Tiled figure of temperature by depth in each plot

We can see that as soil depth increases, temperatures become much more stable, while the shallowest measurement has a clear diurnal cycle. We can also see that something has gone wrong with one of the sensors in plot 002. To remove those data, use only values where the final quality flag passed, i.e. finalQF = 0

gg <- ggplot(subset(soilTHV, finalQF==0), 
             aes(endDateTime, soilTempMean, 
                 group=zOffset, color=zOffset)) +
             geom_line() + 
        facet_wrap(~horizontalPosition)

gg

Tiled figure of temperature by depth in each plot with only passing quality flags

Get Lesson Code

spatialData.R