Configuration files

There are two required configuration files for processing data: the global attributes file, which describes attributes that apply to the mooring, and the instrument configuration file, which describes attributes that apply to an instrument on a mooring. Contents of both files will be included as attributes in both the xarray Dataset and the netCDF files.

A note on time and time zones

Time is always in Coordinated Universal Time (UTC).

Transitioning from EPIC to CF Conventions

Historically, data have been released according to NOAA PMEL/EPIC conventions. Today, CF Conventions are used much more frequently, and stglib supports only CF Conventions. Specifying conventions is done via Conventions keyword in either the global attributes file or the instrument configuration file.

Setting CF in global attributes

Conventions; CF-1.8

Setting CF in the instrument configuration file

Conventions: 'CF-1.8'

Specifying CF-1.8 or a later release of the standard will enable straight-to-CF processing.

Global attributes configuration file

This file describes attributes that apply to the mooring, and uses a peculiar formatting as shown in the example below.

 1    SciPi; N. Ganju
 2    PROJECT; USGS Coastal and Marine Geology Program
 3    EXPERIMENT; Grand Bay
 4    DESCRIPTION; Site GB1, Heron Bay
 5    DATA_SUBTYPE; MOORED
 6    DATA_ORIGIN; USGS WHCMSC Sed Trans Group
 7    COORD_SYSTEM; GEOGRAPHIC
 8    Conventions; PMEL/EPIC
 9    MOORING; 1076
10    WATER_DEPTH; 1.55
11    WATER_DEPTH_NOTE; (meters), nominal depth
12    latitude; 30.37876
13    longitude; -88.38794
14    magnetic_variation; -1.88
15    Deployment_date; 2016-08-04 15:41
16    Recovery_date; 2016-10-19 20:10
17    DATA_CMNT; 
18    platform_type; FG Lander
19    DRIFTER; 0
20    POS_CONST; 0
21    DEPTH_CONST; 0
22    WATER_MASS; Grand Bay, AL/MS
23    VAR_FILL; 1.e+35
24    institution; United States Geological Survey, Woods Hole Coastal and Marine Science Center
25    institution_url; https://woodshole.er.usgs.gov

Instrument configuration file

This file is instrument-specific and is YAML formatted. A few examples are given below.

Note

Although YAML supports boolean values, netCDF does not support them as attributes. Because stglib saves the values specified in the instrument configuration file as netCDF attributes, you must enclose values potentially interpreted as boolean (such as true or false) in quotation marks in the YAML file.

Options common to most (all?) instrument config files:

  • Conventions: version of the CF Conventions, 'CF-1.8' presently

  • basefile: the input filename without extension

  • filename: output filename, to which -raw.cdf, -a.nc, etc. will be appended

  • ClockError: number, in seconds, negative is slow. Applies a simple offset for times. Useful if the instrument was deployed in an incorrect time zone.

  • ClockDrift: number, in seconds, negative is slow. Linearly interpolates times for when the instrument clock has drifted.

  • initial_instrument_height: elevation of instrument in meters

  • initial_instrument_height_note

  • P_1ac_note: a note on the atmospheric pressure source used

  • zeroed_pressure: a note detailing whether the pressure sensor was zeroed before deployment, and other pertinent details such as date and time of zeroing.

  • good_dates: a list of dates to clip data by instead of the default Deployment_date and Recovery_date. Example: good_dates: ['2021-01-22 18:32', '2021-04-13 19:27'] # first burst looked suspect. Multiple date ranges can also be used. Example: good_dates: ['2021-01-22 18:32', '2021-02-28 23:59', '2021-04-01 00:00', '2021-04-13 19:27'] # the month of March was bad

  • good_ens: a list of good indices (based on the raw file, zero-based) to clip the data by. Example: good_ens: [10, 500]. To specify multiple good ranges, add additional pairs of indices: good_ens: [10, 500, 560, 600] will clip the data to samples 10-500 and 560-600 in the final file.

  • vert_dim: user specified coordinate variable for vertical dimension for data variables with non-singular vertical dimension (default = ‘z’)

Multiple instruments

Options applicable to many instrument types include:

  • <VAR>_bad_ens: specify bad ensemble ranges (either index numbers or dates) that should be set to _FillValue. If you want multiple ranges, you can do this with additional values in the array. For example, Turb_bad_ens: ['2017-09-30 21:15', '2017-10-02 09:30', '2017-10-12 20:45', '2017-10-16 00:30']. This will set the ranges in late September and early October, and again in mid-October, to _FillValue.

  • <VAR>_bad_ens_indiv: specify ensembles (either index numbers or dates) that should be set to _FillValue. For example, Turb_bad_ens: ['2017-09-30 21:15', '2017-10-02 09:30', '2017-10-12 20:45', '2017-10-16 00:30']. This will set these four individual timestamps to _FillValue.

  • <VAR>_min: fill values less than this minimum valid value. Values outside this range will become _FillValue. Substitute your variable for <VAR>, e.g. fDOMQSU_min.

  • <VAR>_max: fill values more than this maximum valid value.

  • <VAR>_min_diff: fill values where data decreases by more than this number of units in a single time step. This will typically be a negative number.

  • <VAR>_min_diff_pct: fill values where data decreases by more than this percent in a single time step. This will typically be a negative number.

  • <VAR>_max_diff: fill values where data increases by more than this number of units in a single time step.

  • <VAR>_max_diff_pct: fill values where data increases by more than this percent in a single time step.

  • <VAR>_med_diff: fill values where difference between a 5-point (default) median filter and original values is greater than this number.

  • <VAR>_med_diff_pct: fill values where percent difference between a 5-point (default) median filter and original values is greater than this number.

  • <VAR>_max_blip: fill short-lived maximum “blips”, values that increase greater than this number and then immediately decrease at the next time step.

  • <VAR>_max_blip_pct: fill short-lived maximum “blips”, values that increase more than this percent and then immediately decrease at the next time step.

  • <VAR>_trim_fliers: fill flier values, which are data points surrounded by filled data. Set to the maximum size of flier clumps to remove.

  • <VAR>_warmup_samples: fill these many samples at the beginning of each burst.

  • drop_vars: a list of variables to be removed from the final file. For example, drop_vars: ['nLF_Cond_µS_per_cm', 'Wiper_Position_volt', 'Cable_Pwr_V'].

Aquadopp

Aquadopp-specific options include:

  • trim_method: can be 'water level', 'water level sl', 'bin range', None, or 'none'. Or just omit the option entirely if you don’t want to use it.

  • <VAR>_trim_single_bins: trim data where only a single bin of data (after trimming via trim_method) remains. Set this value to true to enable.

  • <VAR>_maxabs_diff_2d: trim values in a 2D DataArray when the absolute value of the increase is greater than a specified amount

  • AnalogInput1_<ATTR> or AnalogInput2_<ATTR>: if <ATTR> is “standard_name”, “long_name”, “units”, “institution”, “comment”, “source”, or “references”, this will create the appropriate attribute for the given variable.

For Aquadopp waves:

  • puv: set to true to compute PUV wave statistics. (EXPERIMENTAL)

 1basefile: 'AQ107703'
 2filename: '10771Baqd'   # name of output file, -raw.cdf or .nc will be appended to this
 3LatLonDatum: 'NAD83'
 4ClockError: 0 # sec, negative is slow
 5orientation: 'UP'          # use this to identify orientation of profiler
 6initial_instrument_height: 0.15  # meters - estimated!!!
 7initial_instrument_height_note: ''
 8zeroed_pressure: 'Yes' # was pressure zeroed before deployment
 9trim_method: 'water level sl'  # Water Level SL trims bin if any part of bin or side lobe is out of water - works best when pressure is corrected for atmospheric
10# trim_method: 'bin range'
11# good_bins: [0,7] # with these two options, trim to the first 7 bins from the transducer
12P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'

Signature

Signature-specific options include (see Aquadopp for others):

  • outdir: output directory (make sure it exists) to write individual cdf files before being compiled into a single cdf file per data type

  • orientation: can be UP or DOWN use this to identify orientation of profiler

 1basefile: 'AQ107703'
 2filename: '10771Baqd'   # name of output file, -raw.cdf or .nc will be appended to this
 3LatLonDatum: 'NAD83'
 4ClockError: 0 # sec, negative is slow
 5orientation: 'UP'          # use this to identify orientation of profiler
 6initial_instrument_height: 0.15  # meters - estimated!!!
 7initial_instrument_height_note: ''
 8zeroed_pressure: 'Yes' # was pressure zeroed before deployment
 9trim_method: 'water level sl'  # Water Level SL trims bin if any part of bin or side lobe is out of water - works best when pressure is corrected for atmospheric
10# trim_method: 'bin range'
11# good_bins: [0,7] # with these two options, trim to the first 7 bins from the transducer
12P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'

RBR instruments

Options specific to RBR instruments exported from the Ruskin software include:

  • basefile: the input filename without extension or data type. For example, if your exported text files are named 055170_20190219_1547_burst.txt, 055170_20190219_1547_data.txt, etc., basefile will be 055170_20190219_1547.

  • wp_min, wp_max: min/max allowable wave period, in seconds

  • wh_min, wh_max: min/max allowable wave height, in meters

  • wp_ratio: maximum allowable ratio between peak period (wp_peak) and mean period (wp_4060).

  • <VAR>_min: fill values less than this minimum valid value. Values outside this range will become _FillValue. Substitute your variable for <VAR>, e.g. P_1ac_min. Only works for P_1 and P_1ac. Useful for trimming by minimum pressure for instruments that go dry on some tidal cycles. Any data within the burst less than the threshold will result in the full burst being filled.

1basefile: '055110_20161020_1503'
2filename: '10793Adw'   # name of output file, -raw.cdf or .nc will be appended to this
3LatLonDatum: 'NAD83'
4initial_instrument_height: 0.15  # meters - estimated!!!
5wp_max: 4
6wh_min: 0.02
7wp_ratio: 2
8P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'

When an RBR instrument is used in CONTINUOUS mode as a profiling instrument (e.g., twisting the endcap to start/stop a profile), include the following line in your configuration file:

  • featureType: 'profile': this CF-compliant featureType instructs stglib to process these data as a profile dataset.

  • latitude: [36.959, 41.533, 27.764], longitude: [-122.056, -70.651, -82.638]: these values can each be specified as a YAML list of latitudes and longitudes, each element in the lists corresponding to a profile.

  • split_profiles: when set to True, split a multi-profile dataset into individual netCDF files for each profile

EXO

EXO-specific options include:

  • skiprows: number of lines to skip in the CSV before the real data begins

Note that negative numeric values in the YAML config file must be treated with care so as not to be interpreted as strings. If you want the minimum value to be, say, -0.2 units for a particular parameter, you must write this as -0.2 and not -.2 in the config file. The latter format will be interpreted as a string and will cause an error.

 1basefile: 'GB0014_14D100014_080316_120000'
 2filename: '10762Aexo'   # name of output file, -raw.cdf or .nc will be appended to this
 3# SN: '14D100014'
 4LatLonDatum: 'NAD83'
 5ClockError: 0 # sec, negative is slow
 6initial_instrument_height: 0.15  # meters - estimated!!!
 7initial_instrument_height_note: ''
 8zeroed_pressure: 'Yes' # was pressure zeroed before deployment
 9P_1ac_note: 'Corrected for variations in atmospheric pressure using Grand Bay NERR met station (GNDCRMET).'
10skiprows: 25
11#fDOMRFU_max_diff: 3
12#fDOMQSU_max_diff: 30
13C_51_min_diff: -0.3
14SpC_48_min_diff: -2.5
15S_41_min_diff: -2
16Turb_max_diff: 100
17# Example of how to trim by specifying the bad ensembles that should be removed.
18# Here we will remove C_51 values at in ensembles 500:600 and 905:910.
19# You must specify these ranges as pairs, start and end
20# This will delete 500-599 and 905-909
21C_51_bad_ens: [500, 600, 905, 910]
22# Here's an example of just removing a single value (51):
23S_41_bad_ens: [51, 52]
24# Or an single range (200-250). Note that Python's indexing means that this
25# will actually remove values 200 through 249.
26Turb_bad_ens: [200, 250]

WET Labs ECO NTU

NTU-specific options include:

  • All the _min, _max, _bad_ens, etc. options available to the EXO.

  • Turb_std_max: fill turbidity based on a maximum standard deviation value.

  • spb: samples per burst

  • user_ntucal_coeffs: polynomial coefficients, e.g., [9.078E-07, 5.883E-02, -2.899E+00].

Vaisala WXT536

WXT-specific options include:

  • RTK_elevation_NAVD88: RTK elevation of the sensor referenced to NAVD88 in meters.

  • dir_offset: a direction offset in degrees from magnetic north to be applied if the sensor was not pointing toward magnetic north.

  • dir_offset_note: a note about the direction offset being used.

EofE ECHOLOGGER

  • All the _min, _max, _bad_ens, etc. options available to the EXO.

  • instrument_type: types “ea” and “aa” are supported.

  • orientation: orientation of transducers types ‘DOWN’ or ‘UP’ are supported.

  • average_salinity: average salinity value (PSU) for the water mass for the deployment site and time period.

  • average_salinity_note: source of average salinity value.

Sequoia Scientific LISST

  • operating_mode: set to burst if instrument was deployed in burst mode

Sontek IQ

  • All the _min, _max, _bad_ens, etc. options available to the EXO.

  • orientation: can be UP or DOWN use this to identify orientation of profiler

  • positive_direction: direction (degrees) of positive flow indicated by the X arrow on top of instrument (optional, recommended)

  • flood_direction: direction (degrees) of flood current in channel, may be opposite of positive flow direction depending on field set up (optional, recommended)

  • channel_cross_section_note: note specifying starting bank (left or right) for RTK transect across the channel and when the transect measurements were collected (optional, recommended)

Onset Hobo

  • All the _min, _max, _bad_ens, etc. options available to the EXO.

  • instrument_type: can be hwl (water level), hwlb (water level as barometer), hdo (dissolved oxygen) or hcnd (conductivity) use these based on parameter measured by hobo logger

  • skipfooter: number of lines to skip in the CSV file at the end of the file

  • ncols: number of columns of data to read, starting at first

  • names: option for user specified column names (only recommended when code will not read names using automated/default method)

Lowell TCM Hobo

  • All the _min, _max, _bad_ens, etc. options available to the EXO.

  • skipfooter: number of lines to skip in the CSV file at the end of the file

  • ncols: number of columns of data to read, starting at first

  • names: option for user specified column names (only recommended when code will not read names using automated/default method)

Vector

  • pressure_sensor_height and velocity_sample_volume_height to specify the elevations of these two sensors.

  • puv: set to true to compute PUV wave statistics. (EXPERIMENTAL)

  • orientation: UP means probe head is pointing up (sample volume above probe head). DOWN means probe head is pointing down (sample volume below probe head).