Data Formats and Conventions

From data collected by thousands of automated sensors, to hundreds of field staff working through collection protocols, to airborne instruments collecting data during the flight season, NEON produces millions of data points every day. In order to make sure that the data can be ingested into NEON’s systems, processed, published, and eventually used, careful attention to how data are organized, named, and documented is critical. In this section, we introduce the basics about the data formats and conventions that NEON uses.

General Data Formats and Conventions

This section describes data formats and conventions that NEON uses across all products. For more information specific to data collection systems (observational, instrumented, and airborne), jump down to each Data Collection System section.

Data Packages

When a query for one data product, one or more sites, and a date range is submitted to the data portal or API, a downloadable data package is generated from a store of pre-published files and then zipped. Every package includes data files and documentation files. Each NEON data collection system structures data packages differently to maximize utility of the data, but generally there are separate pre-published file sets for each site and month, which are bundled into the download package. While data are provided in this granular way, our neonUtilities code package provides a straightforward and simple method to join files over multiple sites and months.

File Names

Data files are named using a series of component abbreviations separated by periods ( . ) or underscores ( _ ). Naming conventions for data files differ between NEON data collection systems to meet the needs of their dominant user groups. A file will have the same name whether it is accessed via the data portal or the API. For more information, read the NEON Data Product Numbering Convention and explore the tables below.

Table 1. General abbreviations
Abbreviation	Definition
NEON	An identifier that specifies that the data come from NEON.
DOM	A three-character alphanumeric code, referring to the Domain of data acquisition (D01 - D20).
SITE	A four-character code, referring to the site of data acquisition.
DPL	A three-character alphanumeric code, referring to Data Product processing Level.
PRNUM	A five-character numeric code, referring to the Data Product Number.
REV	A three-digit designation, referring to the revision number of the data product. The REV value is incremented by 1 each time a major change is made in instrumentation, data collection protocol, or data processing such that data from the preceding revision is not directly comparable to the new.
HOR	A three-character alphanumeric code for the measurement locations within one horizontal plane. For example, if one surface measurement were made at each of five soil array plots, the number in the HOR field would range from 001-005.
VER	A three-character alphanumeric code for the measurement locations within one vertical plane. For example, if one temperature measurement is made at each vertical level of a tower with 8 levels, the number in the VER field would range from 010-080.
TMI	A three-character alphanumeric code for the Temporal Index. Refers to the temporal representation, averaging period, or coverage of the data product. 000 = native resolution, 001 = native resolution or 1 minute, 002 = 2 minute, 005 = 5 minute, 015 = 15 minute, 030 = 30 minute, 060 = 60 minutes or 1 hour, 100 = instantaneous measurements, 101-103 = native resolution of replicate sensor 1, 2, and 3 respectively, 999 = measurements at varied interval.
DESC	An abbreviated description of the data file or table.
YYYY-MM	Represents the year and month of the data in the file.
PKGTYPE	The type of data package downloaded. Options are 'basic', representing the basic download package, or 'expanded',representing the expanded download package (see more information below).
GENTIME	The date-time stamp when the file was generated, in UTC. The format of the date-time stamp is YYYYMMDDTHHmmSSZ.
RELEASE	The data release tag (e.g., "RELEASE-2021") or "PROVISIONAL" if not included in a data release.

Table 2. Time stamp abbreviations
Abbreviation	Definition
YYYY	Year
YY	Year, last two digits only
MM	Month: 01-12
DD	Day: 01-31
T	Indicator that the time stamp is beginning
HH	Hours: 00-23
mm	Minutes: 00-59
SS	Seconds: 00-59
Z	Universal Time Coordinated (Universal Coordinated Time), or UTC

Tabular Data - Wide vs. Long

Tabular data can generally be presented in a range of dimensions, from "long" or "narrow" format (many rows, few columns) to "wide" (many columns, fewer rows). Conventions vary both across and within disciplines. It is common to present data in long format when there are repeated measurements over one or only a few parameters over time, or wide format when there are numerous parameters being tracked. Formatting is often influenced by the desire to make the measured parameters more readable to the human eye (typically wide format, but sometimes long format for short tables) or to make data easier to subset by a machine (typically long format). For a brief overview of these two formatting options, see this Wikipedia page. NEON observational and instrumented data are published in tabular format; see below for long vs. wide details for each collection system.

Variable Names

The column headers that are used to describe variables, also often called field names or terms, are intentionally controlled to be easily understood and used in precise ways. Different data collection systems use somewhat different methods to generate and maintain variable names.

Dates and Times

Date and time fields, or time stamps, follow the ISO 8601 standard. In general, this means that a date (for example, January 23, 2020) is typically written out in the format YYYY-MM-DD (2020-01-23). Times are typically added to a date in the format HH:mm:ss'Z', where the Z indicates UTC, or Coordinated Universal Time. UTC is appropriate for NEON data, as NEON field sites are spread across multiple time zones. The format for any given date-time field may be found within the metadata file supplied in a data package. This definition may also contain any rounding (e.g., 'floor' or 'ceiling') that was used on a time stamp.

Data Flags

Data flags can be used to help guide decisions on whether data are fit for specific uses. A tailored suite of quality tests is applied to each data product, and the results are provided along with data values in the form of quality flags and metrics.

Documentation

The Document Library is a rich resource of information about our data products, including overarching science designs, site characterization reports, spatial data, field protocols, data processing documentation, and user guides. Relevant documents are linked to each data product's detail page, which may be accessed via the Explore Data Products page. Data packages also contain the documents that are relevant to the data product.

The following documents are available for most data products, regardless of data collection system:

Science Designs: The science design documents provide the background and strategy used for data collection. They frequently bridge related data products.

Algorithm Theoretical Basis Document (ATBD): A full explanation of the algorithms used to process data. Each ATBD details the scientific theory behind the measurement, relevant processing algorithms, as well as the steps taken to determine uncertainty and to perform quality control/quality assurance. Some ATBDs are specific to a data product, while others describe algorithms applied to many data products.

Each data collection system also has documents specific to it. To learn more, skip to the Specific Information by Data Collection System section below.

Metadata

Each data package contains one or more types of metadata files that may be either human or machine readable. The consistent metadata file provided with all data products is the README file, and EML files are provided for most data products.

README file: A short summary of the data downloaded. Includes a brief description of the data product, file naming conventions, and issue log. Much of the information in the readme can also be found on the Data Product Detail pages on the Data Portal.

EML: Contains machine-readable metadata about the data using the Ecological Metadata Language (EML). EML is a widely used, community supported XML schema that supports rich documentation of data related to ecological research, particularly including environmental, ecological, and earth science data. EML files, which are served with the extension .xml, include site location, data policy, and variable definitions and units. EML files also contain a few pieces of information included nowhere else, such as precision values and time stamp formats.

Instrument Systems (IS) Data

Files

A single field site often has multiple sensors of the same type, each at a different location. One example is soil moisture sensors along an array. A separate data table is provided for each combination of position and calculated averaging interval. An example of a table name is 2DWSD_2min, which translates to 2D Wind Speed and Direction at a 2 minute averaging interval.

The level of granularity for most IS data products is one data file per data product, site, month, vertical and horizontal position of the sensor collecting the data, and calculated averaging interval (most instrumented data are provided at one- and thirty-minute averages). Data files are named using the following pattern:

NEON.DOM.SITE.DPL.PRNUM.REV.HOR.VER.TMI.DESC.YYYY-MM.PKGTYPE.GENTIME.csv

One example of this file name would be:

NEON.D18.TOOL.DP1.00001.001.000.010.002.2DWSD_2min.2019-03.basic.20190422T205021Z.csv

This indicates 2D Wind Speed and Direction at site TOOL (Toolik), collected at tower level one in March of 2019, and that data were averaged over every two-minute interval. The data file itself was generated on April 22, 2019 at 20:50:21 UTC.

Horizontal and Vertical Indices

The location of sensors within a site is provided in horizontal (HOR) and vertical (VER) indices. These indices appear in the following data sources:

File names of data files downloaded from the Portal or API
The horizontalPosition and verticalPosition fields in data tables stacked by the neonUtilities R or Python package
The sensor_positions files downloaded along with data tables via the Portal, API, or neonUtilities. These files provide the spatial locations (latitude, longitude, and elevation) mapped to the HOR and VER indices for the specific data downloaded.

Index values refer to a sensor’s location in the design layout, e.g. the top of the tower or the downstream sensor set. The same indices are used across sites. To translate indices into physical locations, refer to the sensor_positions files.

See tables below for the numbering convention for the HOR and VER indices. Sensor design changes occasionally result in the addition of a new index value; this page will be updated accordingly.

Horizontal (HOR)

Site Type	Location Index	Location
Terrestrial	000	Tower
	001-005	Soil array plots 1-5. If a soil plot becomes unusable, the replacement plot is assigned the next available number for the site (006, 007, etc.).
	700	Instrument hut, if the hut contains a single sensor of a particular type, or multiple sensors that can be differentiated by the vertical location they are connected to
	7##	Instrument hut, if the hut contains multiple sensors of the same type that can’t be differentiated by vertical location
	900	Double Fence Intercomparison Reference (DFIR), typically housing a weighing gauge precipitation collector
Aquatic	101-102	Monopod-mounted sensor set S1 and S2
	103	Buoy sensor station
	104	Lake subsurface-moored assembly
	110	Water level sensor at staff gauge
	111-112	Overhanging sensor set S1 and S2
	130,140, & 190	Lake littoral sensor set 1,2, & 4, respectively (littoral station 3 at TOOK was decommissioned)
	131-132	Stand-alone water level sensor associated with sensor set S1 and S2, respectively
	150 & 160	Lake inflow and outflow sensor sets, respectively
	170 & 180	Lake low-profile littoral sensor set 1 & 2, respectively
	200	On-shore meteorological station
	210	Wet deposition collector, when located on the ground
	220	Secondary precipitation (tipping bucket) at aquatic sites
	301-308	Groundwater wells 1-8

Vertical (VER)

Location Index	Location
000	Ground level/on shore
100	In-stream/river/lake
010, 020 … 0N0	Tower level 1-N (increasing values indicate higher measurement levels)
015, 025 … 0N5	Sensors mounted between tower levels
501-511	Below-ground or below water surface depth 1-11 (increasing values indicate deeper measurements)
999	Indeterminate vertical location, e.g., tower profile

File Format

NEON IS data are provided as comma separated values (CSV) files. The tables are long with respect to time, but wide for all other variables; that is, there is a row in the table for every time stamp that contains all values measured at that time. The neonUtilities R package, when applied to IS data, keeps the long format with respect to time, and also combines data collected at different locations in long format, so that there is a row in the table for each time stamp by location combination.

Vocabularies

Every term is unique; it has a unique name, definition, unit, and data type. For example, the term "decimalLatitude" has the unique definition, "The geographic latitude (in decimal degrees, WGS84) of the geographic center of the reference area", the unit "decimalDegree", and data type "string". Terms may not be used more than once within the same data product table. However, terms may be reused between different data products if the meaning is exactly the same. Wherever that term is used, the definition, unit, and data type are exactly the same. Variable names and their definitions can be found in the variables file in the data downloads.

The NEON variable names for the IS datasets that are submitted to AmeriFlux are mapped to the Ameriflux data variable names. IS data not submitted to AmeriFlux are not formally mapped to any community standard vocabularies or ontologies.

Data Values

Quantitative, or numeric variables : These are generally restricted to be integers or real values. Where needed, real values may be rounded to the significant figure appropriate for the data. Precision values may be found within the EML (.xml) file supplied in a data package.

Missing values : A blank cell indicates that data were not collected, data were filtered by automated quality control procedures, or in rare cases, data were redacted. Redacted data are also indicated by a quality flag. Some software, such as Excel or R, may interpret blank values in different ways, most frequently as "NA".

Data Flags

Data quality flags may indicate that a data value was outside of a typical range of values or that some quality check didn't pass. Each specific flag is well defined by its term description, found in the EML and variables files.

Data quality metrics are summaries of data quality flags for data values that were derived from an aggregate of values, such as for flags on 30-minute values that were aggregated from 1-minute values.

Final quality flags aggregate the results of several quality tests into a final indication of whether each data value is trustworthy or questionable.

Science review flags indicate whether a NEON scientist has, in a specific review of data, recommended that data be marked as questionable or removed.

For more information about quality flags, particularly those used by IS, please refer to the Data Quality Program page and the ATBD: Quality Flags and Quality Metrics for TIS Data Products.

Documents

In addition to Science Designs and ATBDs, IS data products are documented by the following document types:

Sensor Command, Control, and Configuration (C3) Document: Specifies the command, control, and configuration details for operating the relevant sensor and its assembly. It includes a detailed discussion of all necessary requirements for operational control parameters, conditions/constraints, set points, and any necessary error handling.

NEON Preventive Maintenance Procedure: Specifies a list and schedule of checks and actions that NEON personnel perform on the relevant sensor and its assembly to ensure its proper operation. Detailed instructions are provided for more complex tasks.

Metadata

In addition to the README and EML file, IS data products are supplied with:

Sensor position file: Contains the positions of the sensors relative to a reference location, as well as the reference location coordinates.

Variables file: Contains variable definitions and units for each column in each table of the data product.

Surface Atmosphere Exchange (SAE) Data

Files

Surface-Atmosphere Exchange data are delivered as the "Bundled Data Products - Eddy Covariance" data product in the Hierarchical Data Format (HDF5) as a 'bundle' of many data products that are delivered together. Similar to other instrumented data products, each zip file within the downloaded zip contains data for a single site and month.

The basic package contains monthly data files, while the expanded package contains daily data files with all the information that is provided in the basic files along with half-hourly footprint weight matrices and additional data quality information. The naming convention for the downloadable package, and the monthly or daily files contained within the package, are as follows:

Folder: NEON.DOM.SITE.DP4.00200.001.YYYY-MM.PKGTYPE.GENTIME.RELEASE
Monthly data files: NEON.DOM.SITE.DP4.00200.001.nsae.YYYY-MM.PKGTYPE.GENTIME.h5.gz
Daily data files: NEON.DOM.SITE.DP4.00200.001.nsae.YYYY-MM-DD.PKGTYPE.GENTIME.h5.gz

Format

HDF5 is similar to NETCDF in that it is a compressed, self-describing file format that allows for hierarchical structuring of data. Each HDF5 contains numerous data tables. We recommend using the HDFView tool provided as a free download from the HDF Group, but we also provide the stackEddy() function in our neonUtilities code package to join data tables across sites and months. Alternatively the rhdf5 and h5py packages provide functions to interface with the files in the R and Python languages, respectively.

Vocabularies

Variable definitions are provided in the NSAE_HDF5_object_description.csv file that is downloaded with the data product as well as in the objDesc table within the HDF5 file. Here, term names are more generic than in IS files with terms such as max and mean repeated for each data product and nsae, turb, and stor repeated for each scalar. Fully descriptive terms are derived from the location of the dataset in the HDF5 file structure (e.g. the nsae dataset in the fluxCo2 group contains net surface atmosphere exchange CO2 flux data). The variables are named to help streamline the complex code involved in processing the data from raw (level 0) to highly derived (level 4) products.

Data Values

Missing Values: "NaN" is used to represent missing data, and "NA" to represent missing metadata. Missing timestamps in monthly files are due to processing failures in daily file generation.

Data Flags

Data quality flags, data quality metrics, final quality flags, and science review flags are similar to other IS data products, but rather than being embedded within the data tables, they are instead included in the qfqm group within the HDF5 file.

Documents

Same as IS data products.

Metadata

In addition to the typical README and EML files that are provided with all IS data products, the HDF5 files are self describing, and metadata about the file structure and format are embedded within as group and dataset attributes.

Observation Systems (OS) Data

Files

Within each monthly zip package, there may be multiple data tables. The level of granularity for each data table is a type of data collection activity. For example, the Ground Beetles Sampled From Pitfall Traps data product includes individual tables for field data, sorting, initial identification, and later expert identification if needed. A file containing metadata about data validation is also included. The naming convention for OS data files is:

NEON.DOM.SITE.DPL.PRNUM.REV.DESC.YYYY-MM.PKGTYPE.GENTIME.csv

Table names are descriptive. For example, the ground beetle table containing field data from sampling events may be named:

NEON.D02.BLAN.DP1.10022.001. bet_fielddata.2019-05.basic.20200504T163445Z.csv

while a table with data from sorting the field catch later in the lab is named:

NEON.D02.BLAN.DP1.10022.001. bet_sorting.2019-05.basic.20200504T163445Z.csv.

And so on for other tables that are produced throughout the entire processing chain.

Data tables are published when they are available. Downloads of recent data may not include tables containing data with longer processing times, such as the identification data for beetles - field data are available much sooner than expert identification. For more details, see the Data Availability page.

File Format

NEON OS data are always long with respect to time, but may be long or wide for other variables depending on the needs of different data products. For example, sediment chemistry is provided in long format, with a row in the table for every chemical analyte. To understand the formatting of the tables in each OS data product, consult the Data Product User Guide.

OS data tables can follow any of four publication models, depending on the resolution and specificity of the data. The options are:

site-date: Data published in each site by month file are data that were collected at that site, during that month. The large majority of OS data tables follow this model; all IS tables follow this model.
site-all: All available data for a given site are published in every site by month file for the relevant site. This is typically done when contextual data are collected once or infrequently, but are needed to interpret all future data. For example, the trap establishment data for litter traps are published this way.
lab-all: All available data for a given analytical laboratory are published in every site by month file. This is done for data that are specific to a lab, rather than to a sampling location. For example, many labs provide data from analyses of standards run as unknowns along with samples; these data may be relevant to NEON samples from a wide variety of sites and sampling events.
lab-current: The most recently ingested data for a given analytical laboratory are published in every site by month file. In this case the appearance to users is essentially the same as in lab-all, although data handling internal to NEON differs slightly.

Recommended practices: to ensure use of the most up-to-date data and avoid duplication, users should consult the publication date stamp on downloaded files, and work with the most recently published lab files, and the most recently published file for each site for site-all files. The data stacking function in the neonUtilities package does this silently, retaining the most up to date files and discarding the others. Information about the publishing model for each table can be found in the table_types table in neonUtilities, or on the Data Product Details page for each data product.

Sampling bouts and eventIDs

Many OS protocols are organized by sampling bout. A bout represents a group of measurements collected together and reflecting a defined dataset for analysis. For example, each sampling bout for small mammal captures consists of three consecutive nights of trapping, while each bout for vegetation structure consists of measurements on all designated trees and may take a few months to complete. Bouts are scheduled independently at each site.

In published data, sampling bouts are labeled by the eventID data field, to assist the user in grouping data records appropriately. In 2024, NEON standardized the contents of eventIDs across data products; in data releases prior to RELEASE-2025, some eventIDs may not match these standards. Check the issue log for each data product for information about updates to eventIDs.

eventIDs are constructed from relevant data and metadata about the sampling bout. There are a few templates for eventIDs, depending on the sampling design. The contents of the eventID are standardized, but the order of the elements may vary. The tables below are provided to support user understanding, but note that you should not need to extract data from the eventID itself for analyses. The eventID is provided for convenience in grouping data, but the data used to construct it are also available as independent data fields.

Possible elements of an eventID:

Element	Description
MOD	Module abbreviation, e.g. MAM for small mammal sampling. Optional, and omitted in table below, but used in many `eventID`s.
YEAR	Year the sampling bout began. See ISO week section below for rare exceptions.
SITE	NEON site codev
DATE	Day of sampling as YYYYMMDD
ISOWEEK	Week the sampling bout began. See ISO week section below for details.
TYPE	Type of sampling. May refer to season, e.g. spring, or a sampling design distinction, e.g., different plot types that are sampled based on a different design.

Possible eventID formats, and the sampling designs they reflect (note that ordering of elements can differ between data products):

Element	Description
SITE.DATE	All sampling for a bout is carried out in a single day. Common in aquatic protocols.
SITE.YEAR	There is only one bout per year (per site). Sampling may or may not occur over several days/weeks/months.
SITE.YEAR.ISOWEEK	There are several bouts per year.
SITE.YEAR.TYPE	There is more than one bout per year and bouts are determined by biophysical criteria such as season or plot type.
SITE.YEAR.TYPE.ISOWEEK	There are several bouts per year, as well as distinct bouts based on biophysical criteria.

ISO Week

Where bouts are numbered by week, NEON uses the International Organization for Standardization (ISO) standard for week numbering. The week number used in the eventID is the week the sampling bout began.

In some years, the final week of the year can be numbered in a surprising way. ISO weeks begin on Mondays, but are numbered based on the day of year falling on the Thursday of that week. For example: January 1, 2025 was a Wednesday. This means December 30 and 31, 2024 were in ISO week 1 of 2025.

When this occurs, for NEON data products that use ISO week to count bouts AND that include yearBoutBegan as a data field, the year in yearBoutBegan is the year corresponding to the ISO week, not the calendar year.

Bouts and Releases

NEON publishes an annual data Release that is tagged by DOIs and persistent over time. For OS data products, each Release typically includes all data up to a lag time of one year, ending at the end of the calendar year.

In some cases, sampling bouts can span two calendar years. This happens most commonly in vegetation structure data (DP1.10098.001), which are collected in the winter to avoid the growing season, in sampling bouts that can take a few months to complete. But any data product collected in the winter in bouts longer than a day can theoretically have this issue.

Data Release dates are not adjusted for bouts that take place in two calendar years. In these cases, the last sampling season included in a Release may not include all of the data for a bout. Download the Provisional data to work with the complete bout, or restrict your analyses to bouts that are fully contained in the Release.

Vocabularies

For categorical variables in OS data products, terms may be used in conjunction with Lists of Values (LOVs), which describe a fixed list of potential values that may be used for a term within a data product. Having terms with pre-defined lists of possible values aids in accurate data entry and also assists end users with standard values.

The name of the LOV for any given term may be found within the categoricalCodeName column of the variables file found in each monthly zip package. All values for all LOVs used may be found in the categoricalCodes file. LOV usage may also be found in the EML file supplied in a data package.

For biological data, some variable names have been standardized with Darwin Core terms, the Global Biodiversity Information Facility vocabularies, and the VegCore data dictionary, where applicable. For genomic data, some field names have been standardized with the Minimum Information about any (x) Sequence (MIxS) along with several of the environmental package extensions.

Data Values

Free-Text Variables: For free-text fields, few restrictions are applied other than avoiding the use of special characters that may render poorly in some text editors.

Missing Values: In all tabular data, a blank cell indicates data were not collected, or in rare cases, data were lost or redacted. Redacted data are also indicated by a quality flag. Some software, such as Excel or R, may interpret blank values in different ways, most frequently as "NA". Periodically, we may find errors and fill in missing cells on an ad hoc basis.

For some OS data products with long latency times between data collection and publication (due to necessary processing or analytical steps), tables may be initially published with some values missing, and later republished as more data become available. To learn more, read the documentation associated with each data product.

Data Flags

Quality flags in data tables are unique by data product and table. For example, the bet_fielddata table mentioned above contains several variables that can be used to assess the quality of the data - sampleCompromised has values of 'OK', 'handling error', 'damaged, analysis affected', 'sample incomplete', and 'other (described in remarks)'. Use the variables and categoricalCodes files to help assess data quality.

Documents

In addition to Science Designs and ATBDs, OS data products are documented by the following document types:

Data Product User Guide : A brief summary of the sampling design and the structure of the published data. In some cases a single User Guide may cover multiple closely related data products.

Protocols: The protocols used by field scientists to carry out sampling and measurements. Protocols are generally modified over time as recommended by our Technical Working Groups or our Science or Field Science staff. Each OS data table will specify which protocol version was used to collect each datum. All versions are available in our Document Library.

Metadata

In addition to the README and EML files, OS data products are supplied with:

Validation file: Contains validation rules applied to each variable during data entry and ingest. Data entry constraints are described in NEON's Ingest Conversion Language (NICL) syntax. A general description of NICL is available in Nicl_Language.pdf, while each function is more thoroughly described in nicl_function_library.pdf.

Variables file: Contains variable definitions and units for each column in each table of the data product.

Categorical Codes file: Contains the possible list of values for each categorical variable. It includes time stamps for each value, as any given list may change over time.

Airborne Observation Platform (AOP)

Data Files

The AOP is flown over each site usually no more than once per year. AOP data files are organized by data product and site (sometimes two sites if they are spatially contiguous to one another), and year of collection. The data portal and API allow for a granular approach to downloading data files because individual files can exceed 10s of GB for some AOP data products. AOP file naming conventions depend on the data product and contain abbreviations unique to AOP. The AOP data files have an additional set of abbreviations in their naming convention:

Table 3. Abbreviations used only in AOP data products
Abbreviation	Definition
FLHTDATE	Date of flight, YYYYMMDD
FLIGHTSTRT	Start time of flight, YYYYMMDDHH
FLHTSTRT	Start time of flight, YYMMDDHH
YYYY	Year of flight
IMAGEDATETIME	Date and time of image capture, YYYYMMDDHHmmSS
CCCCCC	Digital camera serial number
NNN	Sequential number for indexing files
NNN	Planned flightline number
R	Repeat number
FFFFFF	Numeric code for an individual flightline
V	Visit number
EEEEEE	UTM easting of lower left corner of the tile, in meters
NNNNNNN	UTM northing of lower left corner of the tile, in meters

AOP files are also named using patterns that are different from IS and OS data products (Table 4).

Table 4. AOP file name structure
Data Products	File Name Structure
L1 Products
Digital camera	Varies with the year, camera model, and payload - see Table x
Discrete return lidar, unclassified	NEON_DOM_SITE_DPL_LNNN-R_FLIGHTSTRT_DESC.laz
Discrete return lidar, classified	NEON_DOM_SITE_DPL_EEEEEE_NNNNNNN_DESC.laz
Spectrometer	NEON_DOM_SITE_DPL_FLHTDATE_FFFFFF_DESC.h5
Waveform lidar	NEON_DOM_SITE_DPL_LNNN-R_FLIGHTSTRT_DESC.wvz (or .plz)
L2 Products
Spectrometer	NEON_DOM_SITE_DPL_FLHTDATE_FFFFFF_DESC.zip (or .tif)
L3 products
Spectrometer / Lidar / Camera	NEON_DOM_SITE_DPL_EEEEEE_NNNNNNN_DESC.laz
Camera	YYYY_SITE_V_EEEEEE_NNNNNNN_image.tif

The filenames of the L1 camera images vary with the year, camera model, and payload. Examples of the image filenames are given in Table 5. The L1 filename gives no indication of the location of the image, but this can be either obtained from the file metadata or from a KMZ file distributed with the files.

Table 5. Camera image filenames by payload, model, and year
Year	Payload	Model	Sample Filename
2017	P1	D8900	17052514_EH021656(20170525154620)-0008_ort.tif
2017	P2	D8900	17061916_EH021537(20170619170317)-0010_ort.tif
2017	P3	IQ180	0000003-000575-070518160057-Cam1_ort.tif
2018	P1	D8900	18090812_EH021537(20180908140731)-0010_ort.tif
2018	P2	IXU-RS-1000	C0129_2018-06-04_00898_65470_ort.tif
2018	P3	IQ180	0000014-001805-070418155008-Cam1_ort.tif
2019	P1	IXU-RS-1000	C0119_2019-07-20_11113_12438_ort.tif
2019	P2	IXU-RS-1000	C0129_2019-08-25_43735_51389_ort.tif
2019	P3	IQ180	0000015-002478-042519155459-Cam1_ort.tif
2020	P2	IXU-RS-1000	C0129_2020-06-04_00896_65468_ort.tif
2020	P3	IQ180	0000690-000049-0611201657469-Cam1_ort.tif

Note that the year, month, day, and hour are all encoded in some way in each filename. The serial number of the camera is sometimes included (e.g. "EH021537" and "C0119"). The other numbers are identifiers generated by the camera acquisition software.

File Format

The L1 surface reflectance and at-sensor radiance, as well as L3 surface reflectance image data are stored in an HDF5 (with extension H5) file format that also includes extensive metadata and data quality information. The HDF5 format was selected because of the flexibility it allows in storing supplementary metadata and data quality information in different formats. However, the flexibility in the format inhibits storage of the data in a standardized structure. As a result, the HDF5 files cannot be read by standard geospatial software packages without a dedicated reader. NEON provides coed examples for working with AOP HDF5 data in Python and R as part of our educational data tutorials. All other image data are stored in the OGC Standard Geotiff format commonly used within the remote sensing community.

The L1 lidar point clouds are stored in a LAZ format. LAZ is a lossless compression of the LAS format, identified by the file extension on the file. LAS is the officially adopted lidar file exchange format by the American Society of Photogrammetry and Remote Sensing (ASPRS), and can be read by all commercial lidar software packages. LAZ can also be directly read by some software packages, but if LAS is required, a conversion tool that decompresses LAZ to LAS can be found at https://rapidlasso.com/lastools/.

The L1 waveform lidar product is stored in a compressed version of the Pulsewaves format. In the PulseWaves file format the data is stored in two different files, with a PLS and WVS file extension. The PLS file stores metadata and geolocation information for each outgoing and returned laser pulse, while the WVS file contains the outgoing and return recorded signals. The compressed files contain a PLZ and WVZ extension. A free utility named pules2pulse, available at https://rapidlasso.com/pulsewaves/, can be used to decompress the PLZ and WVZ files to PLS and WVS.

Vocabularies

Data Values

The L1 and L3 surface reflectance data products provide the image data in values scaled by 10000. To achieve a percent reflectance between 0 and 100, these values provided in the image data must be divided by 100, and to achieve a ratio between 0 and 1, divided by 10000. In data delivered between 2013 and 2019, the L1 at-sensor radiance image data is provided in floating point. In data after 2019, data are delivered in two separate images. The first contains the integer portion of the at-sensor radiance value, and the second provides the decimal part scaled by 50000. To achieve the observed at-sensor radiance value, the integer portion must be added to the decimal portion, after the decimal portion is divided by 50000. All L2 spectrometer data and L3 lidar image data is delivered as floating point values, while L1 and L3 camera data are delivered as 8 bit integers. Therefore, all images other than camera data will store 'no data' values as -9999 which is a convention in the remote sensing community. Since the 8-bit camera images are unable to represent -9999 using 8 bit integers, all 'no data' values are stored as zeros.

Data Flags

Spectrometer products are flagged through a series of ancillary rasters which report quality information of the image data and are delivered within the H5 files. For example, in the L3 surface reflectance H5 file, an ancillary raster is provided which provides information on the observed weather conditions during data collection. The weather quality raster is a three band color image that has values corresponding to 'Green' for 0-10% cloud cover, 'Yellow' for 10-50% cloud cover and 'Red' for >50% cloud cover. Additional ancillary rasters that are provided in the H5 files include the solar elevation angle, and atmospheric conditions such as water vapor content. In L2 spectrometer products, objects such as man-made surfaces are identified which exhibit high reflectance and cause numerical instabilities in the calculations of vegetation or water indexes. Due to the numeric instabilities, we identify these values as 'no data' because the anomalous values can cause difficulty when visualizing the data or determining global statistics of the image data.

The L1 lidar point cloud contains flags in the classification attribute of the LAS file. Classifications follow the numbering guidelines recommended by the ASPRS LAS file specification. As such, points that have been identified as 'noise' are given classification integer '7', and points that could not be definitely classified as either 'ground', 'low / medium / high vegetation', 'building', or 'noise', have been given the classification 'unclassified' which is represented by the integer '1'. In addition to LAZ files that contain the standard X,Y,Z coordinate tuples that construct the point cloud, two full additional sets of LAZ files are provided which contain the X,Y,err_Z and the X,Y,err_H respectively. Here, the err term refers to a simulated error, and Z represents the vertical coordinate and H the horizontal coordinate. These errors can be used to identify points that may be outside tolerance for applications of interest.

Documents

In addition to science designs and ATBDs, AOP data products are supplied with:

Data Processing Quality Assurance (QA) Document: a summary of the data quality metrics used to assess the validity of the AOP data products, as well as information on flight acquisition parameters, and processing parameters. These documents are optionally delivered with the data products for which they are applicable - they must be selected in order to be downloaded. Look for PDF files in the download workflow.

Technical Memos:

Goulden, T., 2014. NEON Discrete Lidar Datum Reconciliation Report, NEON.DOC.002293vA

Goulden, T. & Kampe, T. U. NEON AOP Surveys of City of Boulder Pre and Post 2013 Flood Event.

Metadata

In addition to the README file, AOP camera data products are supplied with:

KML/KMZ/SHP: Keyhole Markup Language (KML) files, which can be zipped into KMZ files, as well as ESRI shapefiles (SHP), document the boundaries of flight lines or camera images that were acquired, along with quality information such as observed cloud cover. When camera data are downloaded from NEON, KMZ files are included. KMZ files can be opened in GoogleEarthPro and contain L1 and L3 image boundaries and filenames, overlaid on a low-resolution image mosaic. The KMZ file facilitates locating individual camera images of interest. The KMZ filename is as follows; YYYY_SITE_V_mosaic.kmz, e.g. 2019_YELL_2_mosaic.kmz. The file will be downloaded to a dedicated Metadata directory within the zipped data package obtained from the portal. Within the KMZ, the low resolution browse image is shown and the tile boundaries are displayed in red, with a red thumbtack in the middle of each tile. Clicking on a thumbtack displays the full filename for that tile.Additionally, KMLs and SHP files of the 1 km by 1 km mosaic boundaries can be downloaded with the L3 lidar data or L3 camera data. These files can be opened and viewed in Google Earth, ESRI ArcGIS and other spatial software packages.