Writing a guide
Every spreadsheet template should be accompanied by a data guide — a
YAML file that documents where data are located within the template and
how they should be interpreted. This guide serves as the blueprint that
tells the excelDataGuide package exactly where to find and
how to read your data.
A data guide is a human-editable, computer-readable YAML file. YAML
syntax is simple: use colons (:) for key-value pairs,
hyphens (-) for lists, and indentation for nesting. This
section walks you through constructing a guide step by step.
Step 1: Define the header
Every guide begins with five pieces of metadata that describe the template itself:
guide.version: '1.0'
template.name: myassay
template.min.version: '1.0'
template.max.version: '2.0'
plate.format: 96guide.version(required): The version of the guide itself (not the template). Usemajor.minorformat, e.g.,'1.0'or'2.1'. This allows you to track changes to the guide independently from the template.template.name(required): A short, descriptive name for the template (e.g.,'myassay','fluorescence_assay'). This is used to identify which template a guide belongs to and is checked at runtime if you enable template name validation.template.min.version(required): The minimum template version for which this guide is compatible. Usemajor.minorformat (e.g.,'1.0').template.max.version(required, can be null): The maximum template version for which this guide is compatible. If there is no upper limit, use~(YAML null) to indicate “any newer version is OK”. For example,template.max.version: ~means the guide works with all template versions fromtemplate.min.versiononwards.plate.format(conditionally required): The microplate format used in your experiments. Valid values are24,48,96, or384(referring to the number of wells). This is required only if your guide includes anyplatedatalocations (see below). The guide uses this to validate that your platedata ranges have the correct dimensions.
Step 2: Add your first location — template version cell
The .template location is a special,
required location that records the template version
from a single cell. This allows the package to check at runtime that the
data file was created with a compatible template version.
locations:
- sheet: description
type: cells
varname: .template
translate: false
variables:
- name: version
cell: B2What this does: - Reads cell B2 from
the sheet named description - Stores its value in a
variable called version within the .template
group - translate: false means no translation is applied
(the cell value is used as-is)
This location must be present in every guide. The
package uses the version value to check compatibility with
template.min.version and
template.max.version.
Step 3: Add metadata as key-value pairs
Metadata such as experimenter names, dates, and study IDs are typically stored as key-value pairs in your template. One location entry can index multiple ranges.
- sheet: description
type: keyvalue
varname: metadata
translate: true
atomicclass:
- character
- character
- date
- numeric
ranges:
- A5:B8
- A10:B12What this does: - Reads two ranges
(A5:B8 and A10:B12) from the
description sheet - The left column of each range contains
keys (variable names); the right column contains values -
translate: true means the keys will be translated from long
names (in the spreadsheet) to short names (used in your script) using
the translations section defined later -
atomicclass is a list of data types, one per key-value pair
in order. In this example, the first three key-value pairs should be
coerced to character, character, and date respectively, then numeric for
any additional pairs (coercion cycles through the list if needed). All
values are coerced to character by default if atomicclass
is omitted.
Combining multiple ranges into one variable:
Multiple ranges are combined into a single named list
(metadata in this example). This is convenient when
metadata is scattered across multiple sheets or ranges but logically
belongs together.
Step 4: Add experiment parameters
Parameters that are constant across all experiments (e.g., acceptance
criteria, standard concentrations defined in SOPs) are best stored in a
dedicated _parameters sheet. They are indexed like
metadata:
- sheet: _parameters
type: keyvalue
varname: parameters
translate: false
atomicclass: numeric
ranges:
- A2:B5
- A8:B10Here, all values will be coerced to numeric. Use
translate: false if your parameter names are already in the
format you want in the script.
Step 5: Add measured plate data
If your template records plate data (e.g., fluorescence measurements
from a 96-well plate), use the platedata type:
- sheet: _data
type: platedata
varname: plate
translate: false
atomicclass:
- character
- numeric
- numeric
ranges:
- A1:M9
- A11:M19What this does: - Reads data in plate format (e.g.,
well positions and measurements) - Each range is a block of data in the
same row-column layout as the physical plate - Multiple ranges are
stacked vertically into a single data frame - The resulting data frame
includes columns row and col (well positions)
plus one column per measured variable, each coerced to its corresponding
atomicclass
In this example, the first three variables will be character,
numeric, numeric respectively. The first range (A1:M9)
represents rows A–H of the plate (8 rows) with 12 columns of data.
Step 6: Add tabular data
Tables with column headers and multiple rows of data are indexed
using the table type:
- sheet: results
type: table
varname: userresults
translate: false
atomicclass: numeric
ranges:
- C3:E10What this does: - Reads a table from range
C3:E10 - The first row contains column names (variable
names) - All values are coerced to the specified
atomicclass - The result is a data frame, not a list
Step 7: Add single-cell values
Sometimes you need to read isolated values from single cells and give them names in the guide (rather than relying on names in the spreadsheet):
- sheet: analysis
type: cells
varname: qc_checks
translate: false
atomicclass: numeric
variables:
- name: spread_well_1
cell: G6
- name: spread_well_2
cell: G33What this does: - Reads individual cells and assigns
them names specified in the guide - spread_well_1 gets the
value from cell G6, spread_well_2 from
G33 - All values are coerced to numeric - The result is a
named list
This is useful when values are scattered or when the spreadsheet’s cell-naming scheme doesn’t align with your script’s variable-naming scheme.
Step 8: Add translations
If your spreadsheet uses long, user-friendly variable names (e.g.,
“Date of experiment”) but your script prefers short, code-friendly names
(e.g., date), define a translations
section:
translations:
- long: "Date of experiment"
short: date
- long: "Experimenter name"
short: experimenter
- long: "Study identifier"
short: studyID
- long: "Plate identifier"
short: plateIDWhen to use translations: - Set
translate: true in a location to apply these translations
to its variable names - Translations are optional; if you don’t define
any, set translate: false in all locations - The package
provides long_to_shortnames() and
short_to_longnames() functions to convert between
formats
Complete example guide
Below is a complete, working guide for a small fluorescence assay template:
guide.version: '1.0'
template.name: simple_fluorescence
template.min.version: '1.0'
template.max.version: ~
plate.format: 96
locations:
# Required: template version
- sheet: description
type: cells
varname: .template
translate: false
variables:
- name: version
cell: B2
# Metadata: user and experiment information
- sheet: description
type: keyvalue
varname: metadata
translate: true
atomicclass:
- character
- character
- date
- numeric
ranges:
- A5:B8
# Parameters: SOP-defined constants
- sheet: _parameters
type: keyvalue
varname: parameters
translate: false
atomicclass: numeric
ranges:
- A2:B5
# Plate data: fluorescence measurements
- sheet: _data
type: platedata
varname: fluorescence
translate: false
atomicclass: numeric
ranges:
- A1:M9
- A11:M19
# Quality control: derived metrics from analysis sheet
- sheet: analysis
type: cells
varname: qc
translate: false
atomicclass: numeric
variables:
- name: plate_uniformity
cell: D3
- name: signal_range
cell: D4
translations:
- long: "Experimenter name"
short: experimenter
- long: "Date of experiment"
short: date
- long: "Temperature (°C)"
short: temperature
- long: "Sample concentration (µM)"
short: concentrationValidation with CUE
Before using your guide in your scripts, validate it against the CUE schema to catch syntax and structure errors early:
# From the data-raw folder where the schema is located:
cd data-raw
./validate_and_sign.sh path/to/your/guide.ymlThis command will: 1. Check your guide against
excelguide_schema.cue 2. Abort with a detailed error
message if validation fails 3. Embed a SHA256 hash
(cue.verified field) if validation succeeds
The hash acts as proof that the guide was validated; you can verify it later with:
In R, you can also check the hash when loading a guide:
guide <- read_guide("path/to/guide.yml", verify_hash = TRUE)This will warn if the hash is missing or abort if the hash is invalid.