Common DFOS tools:
|
|||||||||||||||||||||
dfos = Data Flow Operations System, the common tool set for DFO | |||||||||||||||||||||
make
printable
TUTORIAL ABOUT ASSOCIATION1. principles | 2. basic data sets | 3. match keys | 4. calibration cascade | 5. association block | 6. virtual calibrations | 7. association tools | 8. OCA | 9. configuration files1. Principles of associationFind more details in: [1] This tutorial collects the principles of association as used by QC and as implemented by the dfos system. The purpose of association is:
Closely related to data processing is the association use case of the calSelector. Association as understood by QC follows four principles.
Do not associate using names, nightlog entries, or phone call instructions. The result of an association should be reproduceable by a different person two years later with the same results (provided there is the same input data set). The dfos association tool createAB codes all steps of the association workflow, and all general rules. All instrument-specific rules are coded in configuration files. All file properties are kept in header files.
Since association is about connecting frames, we need a pool
of frames. Within the daily QC workflow, this pool is defined by a night.
To select from the stream of suitable candidate calibrations, the following time matching rules could be applied:
- CLOSEST: take the calibration having the smallest difference
in time; this is the default rule and is usually also the most reasonable choice since a minimized time difference also means minimal difference between the parameters to be measured by the calibration and applied to correct the science; Common property means: match [object1 having property1] with
[object2 having the same property].
Fixed-property rule means: match [object1 having property1]
with [object2 having always property2]. For instance, match SCIENCE data, having
whatever value of slit width, with STD calibrations taken at 10" slit width.
2. Basic data setsAssociation is done within a subset of files from the daily data flow. A subset which does not split further is called a basic data sets. A basic data set is complete, independent, and minimal. Any association information needed is contained in this subset. Synonyms for basic data set are: setup, setting, configuration.3. Match keysThe set of parameters which defines a basic data set is called a match key. All raw frames in a basic data set share the same match key.Parameters correspond to FITS keywords. Often they are instrument parameters, but they may also come e.g. as PROG.ID or OBS.ID. More generally, match keys are needed in three varieties: 4. Calibration cascade, association mapWithin a basic data set, calibration data are organized in a calibration cascade. The cascade is a two-dimensional scheme describing all relevant calibration raw types and their mutual relation. For each raw type, there is one column. Rows describe relations between products and raw data. E.g., a master bias, generated by the first step in the FORS2 IMG cascade, is needed further downwards in the cascade to process a master skyflat. A calibration cascade is visualized by an association map.The definition of the association map is the first step towards the configuration files for the association tools. These configuration files are a machine-readable version of the more human-friendly association map. A graphical representation of all components of the calibration cascade, including the association rules and match keys, is called the association map. The translation of the association map into a machine-readable form defines the configuration files. The dfos tool createCalibMap creates a calibration map from the association configuration files, both for the dfos operational rules and the calChecker rules. Find the collection of maps here. 5. Association BlocksThe concept of the Association Block is fundamental for association. Association Blocks are described in detail in [2]. An Association Block (AB) is a generalized Reduction Block (RB). The AB contains the complete association information about a set of input frames. It is more general than the RB since it can be defined and created in a homogeneous way for all kinds of data managed by DFO, both from pipeline-supported and unsupported modes. An AB will generally link to other ABs, following the relationships defined by the calibration cascade. Association Blocks are created at the data organization step. They are updated at later steps of the QC workflow. 6. Cascaded and virtual calibration productsUsually the creation of calibration or science products requires calibration products from an earlier step in the cascade (cascaded calibration products). At the time of AB creation, such products will generally not yet be processed and available. Their names, however, can be predicted on the basis of the association rules. Thereby, these names are made available for use in follow-up ABs which require those products as input. Since at the time of AB creation they exist as names only, they are called virtual calibration products.If processed in the proper order, the ABs produce, step by step, all required products, turning virtual into real calibration products. This concept works fine if no AB with cascaded calibrations fails. If this happens, all following ABs from the same basic data set will also fail. 7. Association tools and DFO workflowThe backbone of QC association is the association tool createAB. It provides the association workflow step. All instrument-specific rules are stored in configuration files. Properties of raw files and calibration product files are stored in headers. The pool of raw data is a raw directory. The pool of processed calibration data is the $DFO_CAL_DIR tree. 7.1 AB creation: createABThe tool to create an AB is createAB. This tool reads an input set of raw files, classifies them and creates ABs. Find extensive information about createAB here. 7.2 AB updating: updateABWhen all reduction jobs have been processed and the products have been certified, the products are sorted into their final data directories. At this stage, an AB is updated. AB updating is provided by the tool updateAB. While preserving most already existing association information, this tool adds the new information available after processing and sorting. The new information is: names of products, names of processing logs, graphical information etc. We also need a verification mechanism for virtual calibration products. These are the predicted file names for further use within the calibration cascade. After processing, these file names are verified and either flagged as VIRTUAL (if they did not process successfully) or REAL. If no pipeline-processing has taken place, no information is added. Find more information about updateAB here. 7.3 Read properties of master calibration filesThe tool createAB defines a list of available calibration products, by reading the configured N_MCAL_LIST key and either applying the working DATE or a specified date as start date. It then scans all directories within the specified range and creates symbolic links to the found files in $DFO_CAL_DIR/LINKS. The headers of these files are later read by ABbuilder. 7.4 Read association information from ABs The ABs contain association information which is written into a database for further use by external user. The tool harvestAB is used to extract that information: it creates for each science file a list of all associated calibrations, both direct (as listed in the parent AB) and indirect (as listed in cascaded ABs). 7.5 List of DFOS tools dealing with ABs
All tools in the middle part of the daily workflow deal in one or another way with ABs. ABs are the central data structure of dfos. 8. OCAWith createAB v2.x, the association engine is a SDD tool called Abbuilder. It has been developed by Stefano Zampieri and the DFS/BET team. This tool uses the OCA framework,
which is software developed for the general purpose of Organization, Classification and 9. Configuration filesAll instrument specific information is stored in configuration files. The OCA syntax is described here. All information is collected in three main files: <instr>_association.h, <instr>_organisation.h, <instr>_association.h. They are supported by some macro files. The set of files is then pre-compiled within createAB, to give a final <instr>.RLS file with all the information necessary for the tool. The configuration files are described here.
|