Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable

TUTORIAL ABOUT ASSOCIATION

1. principles | 2. basic data sets | 3. match keys | 4. calibration cascade | 5. association block | 6. virtual calibrations | 7. association tools | 8. OCA | 9. configuration files

1. Principles of association

Find more details in: [1]

This tutorial collects the principles of association as used by QC and as implemented by the dfos system. The purpose of association is:

- data processing
- data packing (although this service has been terminated in the meantime).

Closely related to data processing is the association use case of the calSelector.

Association as understood by QC follows four principles.

First principle of association:
FOLLOW RULES AND PROPERTIES

Do not associate using names, nightlog entries, or phone call instructions. The result of an association should be reproduceable by a different person two years later with the same results (provided there is the same input data set).

The dfos association tool createAB codes all steps of the association workflow, and all general rules. All instrument-specific rules are coded in configuration files. All file properties are kept in header files.
Second principle of association:
WORK ON A DATA SET

Since association is about connecting frames, we need a pool of frames. Within the daily QC workflow, this pool is defined by a night.
Third principle of association:
USE A TIME-MATCH RULE

To select from the stream of suitable candidate calibrations, the following time matching rules could be applied:

- CLOSEST: take the calibration having the smallest difference in time; this is the default rule and is usually also the most reasonable choice since a minimized time difference also means minimal difference between the parameters to be measured by the calibration and applied to correct the science;
- ATTACHED: find a calibration taken with the same OB_ID (these are user-provided special calibrations);
- NEXT: take the closest-in-time calibration taken after the SCIENCE data;
- PREVIOUS: take the closest-in-time calibration taken before the SCIENCE data.
Fourth principle of association:
USE COMMON-PROPERTY OR FIXED-PROPERTY RULES

Common property means: match [object1 having property1] with [object2 having the same property].

Fixed-property rule means: match [object1 having property1] with [object2 having always property2]. For instance, match SCIENCE data, having whatever value of slit width, with STD calibrations taken at 10" slit width.


2. Basic data sets

Association is done within a subset of files from the daily data flow. A subset which does not split further is called a basic data sets. A basic data set is complete, independent, and minimal. Any association information needed is contained in this subset. Synonyms for basic data set are: setup, setting, configuration.

3. Match keys

The set of parameters which defines a basic data set is called a match key. All raw frames in a basic data set share the same match key.

Parameters correspond to FITS keywords. Often they are instrument parameters, but they may also come e.g. as PROG.ID or OBS.ID.

More generally, match keys are needed in three varieties:
- match keys between raw files, used to define a basic data set;
- match keys between raw files and product files, used to associate raw files and calibration product files within a basic data set;
- match keys between product files.


4. Calibration cascade, association map

Within a basic data set, calibration data are organized in a calibration cascade. The cascade is a two-dimensional scheme describing all relevant calibration raw types and their mutual relation. For each raw type, there is one column. Rows describe relations between products and raw data. E.g., a master bias, generated by the first step in the FORS2 IMG cascade, is needed further downwards in the cascade to process a master skyflat. A calibration cascade is visualized by an association map.

The definition of the association map is the first step towards the configuration files for the association tools. These configuration files are a machine-readable version of the more human-friendly association map.

A graphical representation of all components of the calibration cascade, including the association rules and match keys, is called the association map.

The translation of the association map into a machine-readable form defines the configuration files.

The dfos tool createCalibMap creates a calibration map from the association configuration files, both for the dfos operational rules and the calChecker rules. Find the collection of maps here.


5. Association Blocks

The concept of the Association Block is fundamental for association. Association Blocks are described in detail in [2].

An Association Block (AB) is a generalized Reduction Block (RB). The AB contains the complete association information about a set of input frames. It is more general than the RB since it can be defined and created in a homogeneous way for all kinds of data managed by DFO, both from pipeline-supported and unsupported modes.

An AB will generally link to other ABs, following the relationships defined by the calibration cascade. Association Blocks are created at the data organization step. They are updated at later steps of the QC workflow.


6. Cascaded and virtual calibration products

Usually the creation of calibration or science products requires calibration products from an earlier step in the cascade (cascaded calibration products). At the time of AB creation, such products will generally not yet be processed and available. Their names, however, can be predicted on the basis of the association rules. Thereby, these names are made available for use in follow-up ABs which require those products as input. Since at the time of AB creation they exist as names only, they are called virtual calibration products.

If processed in the proper order, the ABs produce, step by step, all required products, turning virtual into real calibration products.

This concept works fine if no AB with cascaded calibrations fails. If this happens, all following ABs from the same basic data set will also fail.


7. Association tools and DFO workflow

The backbone of QC association is the association tool createAB. It provides the association workflow step.

All instrument-specific rules are stored in configuration files. Properties of raw files and calibration product files are stored in headers. The pool of raw data is a raw directory. The pool of processed calibration data is the $DFO_CAL_DIR tree.

7.1 AB creation: createAB

The tool to create an AB is createAB. This tool reads an input set of raw files, classifies them and creates ABs.

Find extensive information about createAB here.

7.2 AB updating: updateAB

When all reduction jobs have been processed and the products have been certified, the products are sorted into their final data directories. At this stage, an AB is updated.

AB updating is provided by the tool updateAB. While preserving most already existing association information, this tool adds the new information available after processing and sorting. The new information is: names of products, names of processing logs, graphical information etc.

We also need a verification mechanism for virtual calibration products. These are the predicted file names for further use within the calibration cascade. After processing, these file names are verified and either flagged as VIRTUAL (if they did not process successfully) or REAL.

If no pipeline-processing has taken place, no information is added.

Find more information about updateAB here.

7.3 Read properties of master calibration files

The tool createAB defines a list of available calibration products, by reading the configured N_MCAL_LIST key and either applying the working DATE or a specified date as start date. It then scans all directories within the specified range and creates symbolic links to the found files in $DFO_CAL_DIR/LINKS. The headers of these files are later read by ABbuilder.

7.4 Read association information from ABs

The ABs contain association information which is written into a database for further use by external user. The tool harvestAB is used to extract that information: it creates for each science file a list of all associated calibrations, both direct (as listed in the parent AB) and indirect (as listed in cascaded ABs).

7.5 List of DFOS tools dealing with ABs

createAB create ABs
updateAB updateexisting ABs
getStatusAB monitors AB status before, during and after processing
processAB launches the processing of an AB
harvestAB extracts AB information into associations database for use on the User Portal

All tools in the middle part of the daily workflow deal in one or another way with ABs. ABs are the central data structure of dfos.


8. OCA

With createAB v2.x, the association engine is a SDD tool called Abbuilder. It has been developed by Stefano Zampieri and the DFS/BET team. This tool uses the OCA framework, which is software developed for the general purpose of Organization, Classification and
A
ssociation of data. Other tools making use of this framework are the dfos tools calChecker, filterRaw and the archive interface tool calSelector. Find more about the OCA syntax here, and about createAB here.


9. Configuration files

All instrument specific information is stored in configuration files. The OCA syntax is described here.

All information is collected in three main files: <instr>_association.h, <instr>_organisation.h, <instr>_association.h. They are supported by some macro files. The set of files is then pre-compiled within createAB, to give a final <instr>.RLS file with all the information necessary for the tool.

The configuration files are described here.



Last update: April 26, 2021 by rhanusch