Common DFOS tools:
|
dfos = Data Flow Operations System, the common tool set for DFO |
make printable | new: | see also: | ||||||||
OCA syntax | ||||||||||
v3.9: - incremental mode: if one raw type triggers more than one action then it is ensured that failing ABs are created again even if one AB for this raw type was already successful - multiple definitions of PROD_STEP2 enabled |
createAB wrapped into autoDaily, processPreImg, calChecker |
|||||||||
|
|
|||||||||
topics: description | fits files vs headers | incremental mode | completeness check | AB re-creation | architecture | assoc logs and ATAB files | output | usage | configuration | cascade | workflow | operations |
This tool creates association information and stores it in Association Blocks (ABs). It forms the central part of the daily workflow. Find its place in the workflow here. Find a general description of the Association Block concepts here.
The tool uses ABbuilder as central association engine. That tool uses the OCA framework, which is software developed by SEG for the purpose of Organization, Classification and Association of data. Other dfos tools making use of this framework are filterRaw and calChecker.
The tool works on a pool of raw hdr data defined by <DATE> and expected to be found under $DFO_HDR_DIR/$DATE.
It has two fundamental modes:
- CALIB for calibration data,
- SCIENCE for science data.
createAB is executed on a selected DATE. It also has an incremental mode working for an incomplete date, used for the incremental processing scheme implemented with autoDaily. More about operational modes here.
Apart from ABs, createAB also creates a text file AB_list_<MODE>_<DATE> under $DFO_MON_DIR. This is the list of created ABs, sorted by the proper execution sequence. That list is fundamental for the daily workflow steps following createAB.
The tool has three TOOL_MODES:
The option -a can be used to override the configured value and enforce running in AUTO mode, which is useful if createAB itself is wrapped in automatic control tools like autoDaily or processPreImg.
Per configuration, the filter tool filterRaw can be called (in mode FILTER) before ABs are created. All filtered files will be hidden before AB creation. So they won't spoil your VCAL list, nor produce any AB. If required, they can be inspected under ${HIDE_DIR}/<date>.
Fits files vs. headers. The entire information needed for the creation of ABs is available in the headers. No need for fits files at that stage.
Incremental mode. This option is designed for the fast data transfer. It is triggered by the option -i and available for mode=CALIB. If set, the tool will first look for already existing CALIB ABs for the specified date. Under certain conditions, it will then move all raw header files listed in these ABs (RAWFILE section) to a temporary directory, and will associate only the remaining headers. The conditions are:
In other words, usually only new headers, and headers from incomplete or failed ABs, will be exposed to createAB, all others will not create again (and not process either). If the above configuration keys are set to NO, only new headers will be used for AB creation. Thereby AB creation and processing can be triggered once per hour, minimizing processing delays and avoiding unnecessary multiple AB creation. Using this option in routine operations requires a properly configured autoDaily (ENABLE_INCREM=YES).
One raw type can trigger several actions, i.e. two or more ABs can have the same set of raw files. New with version 3.9: If at least one of these ABs is failing then the raw file headers listed in these ABs are not moved. With the next call of createAB, these ABs are created again. Only if all ABs are successful then headers are moved and ABs are not created again. This takes care of special use cases where at least one AB depends on pipeline products from other reduction steps that might only become available later (because the calibrations are not yet measured on Paranal). As a side effect, it can result in several successful executions of the same AB until all ABs with the same set of raw files are successful.
No VCALs. With the option -N, the tool uses only certified and renamed master calibrations (MCALs) for association, no virtual calibrations (VCALs). It is available for mode=SCIENCE only. This is a stronger association constraint than the configurable flag SCI_VIRT_MCAL which gives only a warning in case of VCALs being associated. It is available on the command-line or by configuration of the dfoMonitor (CREATEAB_VCAL=YES).
In mode SCIENCE, a completeness check is made between the raw files as extracted from the science ABs, and the science raw files as extracted from data_products. In case of a difference, it raises an alert (email and command-line).That difference would have the consequence that the corresponding science ABs could not be harvested. It would be caused by an OCA configuration inconsistency or incompleteness.
Re-creation of ABs. The tool offers support for the re-creation of ABs. This might become of interest if
For AB re-creation the tool is called with option -r (recreate, or reprocess). Before calling, the user has to prepare the environment. The exact measures depend on the use case:
In re-creation mode the tool goes through the standard workflow but has certain additional parts with dialogues marked by "[RECREATE]". These are interactive by design. For better distinction, you may want to call the option '-a' in addition, thereby forcing the tool to be interactive only in the re-creation parts. You can do the call on the command line, or use the interactive call on the AB monitor ("recreate ABs") which is more comfortable.
After having created all ABs for the specified date and mode, a window pops up with the complete AB list and you are prompted for the AB selection. This is an editor session, with your favorite editor as configured in config.createAB, in the key RECREATE_EDITOR. Mark your selection by a leading 'R<blank>', e.g.:
R GIRAF.2010-05-05T23:08:41.611_tpl.ab | NFLT | Argus_L881.7 | [selected] |
GIRAF.2010-05-05T23:16:14.696.ab |
STD_ARG | Argus_L881.7 | [not selected] |
You do not need to care about hidden dependencies, just select the ABs you know you want to reprocess. Next, the tool analyses dependencies. Dependent ABs are those that are using products from a selected one. The tool displays all selections and dependencies in a second interactive editor session:
R GIRAF.2010-05-05T23:08:41.611_tpl.ab | NFLT | Argus_L881.7 | [selected] |
\_GIRAF.2010-05-05T23:19:21.308.ab | WAVE | Argus_L881.7 | [AB depending on the selected one] |
At that stage you could also remove selections or dependencies, but removing dependencies usually makes no sense.
After the prompt, the tool creates a new AB list called AB_list_<mode>_<date>_recreate. As the last step, createAB then calls createJob (this step is different from normal execution!) because this needs to be done with a special syntax: createJob -m <mode> -d <date> -r <job_ID> where job_ID is <mode>_<date>_recreate, thereby creating the usual job files execAB etc. with suffix _recreate, containing only the selected ABs. This is a mechanism to protect any standard job file which might exist and contain the complete list. These job files are written into the standard JOBS_NIGHT, at which point the tool stops, leaving the execution of JOBS_NIGHT to the user. The jobs will execute as normal and will deliver the usual products which are then available for review together with any other pre-existing products.
Architecture. The tool is a shell-script wrapper calling the DFS
tool ABbuilder. createAB provides all functionality which is specific to DFO while ABbuilder is
the central engine largely independent of DFO-specific settings. Strictly speaking ABbuilder again
is a shell-script wrapper around a central java application:
createAB
|
Association logs. createAB displays the association log to the console, and also into separate text files. These come per AB, have the same root name and the extension .alog, and are found under $DFO_AB_DIR. In particular they contain all associated MCALIBs/MASSOCs, their DELTA_T values, the configured DELTA_T values (validity), plus warnings: warnings if the configured threshold value has been violated, warnings if a required mcalib could not be found, and warnings if a SCIENCE AB has virtual calibrations associated (if keys SCI_VIRT_MCAL | SCI_VIRT_MASS configured as YES). These entries are scanned by the AB monitor tool getStatusAB and get displayed.
ATAB files (or tab files). These are text files with one line in tabular form (hence their name). Their name is <AB_ROOT>.tab. They contain the complete information about a particular AB as displayed on the AB monitor. It has 33 entries, like e.g. processing status, display color, execution time, HC flag, name of processing log, score results. Each of these ATAB files is generated by createAB, and then updated throughout the workflow by the various tools (processAB, processQC, certifyProducts, harvestAB). At any stage in the workflow, the tool getStatusAB can read the tabular information and transform it directly into HTML code, without further queries for information. This is a very fast way for displaying the actual AB monitor, and very efficient since information is only updated when needed, not every time when the tool getStatusAB is called.
ATAB files are created and maintained in $DFO_AB_DIR along with the ABs, and then move to their final destination $DFO_LOG_DIR/$DATE. They are not exported to qcweb since their only value is technical (speed up the AB monitor). They do not contain any information beyond the ABs and format information for the AB monitor.
The tool creates the following output:
Type createAB -h | -H | -v for on-line help about createAB / ABbuilder, and version.
Type
createAB -m CALIB -d 2016-12-30
to create ABs for mode CALIB and date 2016-12-30;
createAB -m SCIENCE -d 2016-10-30 -c test_config.createAB
to create ABs for mode SCIENCE and date 2006-10-30, with a non-standard config file test_config.createAB (mcalib pool is defined by $N_MCAL_LIST and the latest date in $DFO_CAL_DIR, which usually is the one from -d or a few days later);
createAB -m SCIENCE -d 2006-10-30 -D 2006-11-02
to create ABs these ABs with a pool of mcalib files starting at date 2006-11-02 and going backwards by $N_MCAL_LIST days as configured;
createAB -m CALIB -d 2010-09-30 -i
to create CALIB ABs incrementally, i.e. only if they are did not exist before;
createAB -m CALIB -d 2010-09-30 -r [ -a ]
for re-creation of selected ABs.
The tool writes the status values cal_AB or sci_AB into DFO_STATUS. It also writes a timestamp into each created AB, and creates association log files under $DFO_AB_DIR which are scanned by getStatusAB and distributed together with the ABs.
The tool has a configuration which is somewhat more complex than for most other DFOS tools. All relevant configuration is kept under $DFO_CONFIG_DIR/OCA/:
Find here more details about the OCA syntax.
The user may want to specify a non-standard config file by using the -c option (the standard one being config.createAB), e.g. to handle pre-imaging association in addition to standard DFO association. That file must also reside under $DFO_CONFIG_DIR/OCA.
The DFOS configuration file config.createAB has the following structure:
KEY | Description | Example | Comments |
1. Tool and general parameters | |||
CONF_VERSION | Version for the set of configuration files | config.createAB_v2.0 | |
TOOL_MODE | Execute mode (INTER or ERROR or AUTO) | ERROR | AUTO: automatic mode, no interruption; INTER: interactive;
ERROR: partly interactive (interaction only for missing calibrations); Note: can be overridden by option -a[utomatic] at runtime |
FILTER_RAW | turn on or off filterRaw | YES | YES|NO (optional, default: NO). If Y, filterRaw is called before AB creation, and matching files are removed |
PGI_PREPROC | use optional plugin within procedure launchAB | <name> or NONE | optional, default NONE |
PGI_POSTPROC | use optional plugin just before section 4 | <name> or NONE | optional, default NONE; DATE and MODE exported |
PGI_FINAL | use optional plugin just before the tool finishes | <name> or NONE | optional, default NONE; DATE and MODE exported |
ACCEPT_060 | accept science files with run_id 60./060./0060. | YES | NO (default) | if you exceptionally want to create ABs for these SCIENCE data, configure as YES |
XTERM_GEOM | 100x25+1000+1000 | size and position of xterm with RECREATE_EDITOR call (default: 100x25+1000+1000 [bottom right]) | |
1.2 General parameters | |||
GEN_CALDIR | $CAL_DIR directory for general (static) calib files | $DFO_CAL_DIR/gen | DFO convention |
N_MCAL_LIST | number of nights to be scanned for mcalib_list | 5 | to be fine-tuned for your instrument |
N_VCAL_LIST | number of nights to be contained in vcalib_list | 5 | to be fine-tuned for your instrument |
CAL_N_MCAL_LIST | like N_MCAL_LIST, but for CALIB mode only (for SCIENCE, always N_MCAL_LIST applies) | 5 | optional; default: N_MCAL_LIST |
CAL_N_VCAL_LIST | like N_VCAL_LIST, but for CALIB mode only (for SCIENCE, always N_VCAL_LIST applies) | 5 | optional; default: N_VCAL_LIST |
DRS_TYPE | Type of DRS | CON | CPL | INT | used to control recipe execution (CON: CONDOR; INT: internal parallelization with likwid-pin) |
SCI_VIRT_MCAL | give a warning in the alog (*NOK) if a SCIENCE AB has VIRTUAL mcalibs associated (in the MCALIB section) | YES | NO | YES makes sense if SCIENCE ABs are processed (since this AB is doomed to fail). Default is YES, key is optional. Key is evaluated for mode SCIENCE only. |
SCI_VIRT_MASS |
same as SCI_VIRT_MCAL, for MASSOC section | YES | NO | same as SCI_VIRT_MCAL, for MASSOC data |
SUPPRESS_VIRT | list of PRO.CATGs for which no VIRTUAL alert is given for the science ABs | SKY_LINES | optional key (multiple lines supported) |
PROD_STEP2 | name of ACTION for step 2 ("child") ABs | e.g. ACTION_FF_EXTSPECTRA | optional key used in case you want to launch a second action (create a second kind of ABs) which has no triggering RAW_TYPE. Check here for details. Multiple lines are supported since v3.9. |
TRY_AGAIN_INCOMPLETE | for incremental mode: create ABs again if previous version was incomplete | YES | NO | default: YES |
TRY_AGAIN_FAILED | for incremental mode: create ABs again if previous version failed | YES | NO | default: YES |
2. Additional configuration | |||
PACK_ADD | Non-FITS products to be packed into ANCillary files: PS = ps QC files GIF = gif QC files PNG = png files JPG = jpg files |
VALUE = PS: type of product, see above; EXTENSION = ps.gz: identification is based on product root name plus EXTENSION DIRECTORY = $DFO_PLT_DIR: directory (DFO convention) of product |
further values could be supported upon suggestion |
NOCHECK | YES | NOCHECK | default: YES | Used by updateAB v1.3.1 and higher; NOCHECK turns off checking for virtual/real status of calibrations in the ABs. This is useful for some special applications but not in usual dfos workflows. |
The configuration of the cascade follows the matrix scheme which has from left to right all raw types in their logical sequence, and from top to bottom the process flow from raw to product, involving the grouping rule, the recipe, the required input mcalibs and the predicted products. The classification defines the raw types and is configured in <instr>_classification.h. The grouping rules are configured in <instr>_organisation.h. The association rules are configured in <instr>_association.h.
There are currently the following cascade types supported by createAB:
Not supported types are e.g. cascades with raw data from different nights or different OBs.
The classical cascade is configured by one entry block per raw_type, in each of the OCA config files. The sequence of raw_type definition is evaluated by the tool and should reflect the processing sequence (left to right ordering).
An example for that type could be jittered science observations: create and process an AB per single raw file first, e.g. in order to derive QC information. Then create and process an AB per TPL set, in order to create the de-jittered product. Both actions are triggered by the same raw_type.
That cascade is configured in the same way as the simple cascade. In addition, the second action for a given raw_type is entered by additional blocks in the OCA config files.
BIAS | FLAT | STD (step1) |
(step2) |
mbias |
|||
mflat |
|||
single_zp |
|||
night_zp (made from single_zp1, single_zp2) |
In step1, all raw input STD files create corresponding ABs (step1 or parent ABs). The products of those (here called 'single_zp', typically all per setup from the night) need to be combined in another (step2, child) AB which has no input raw files and produces a single night_zp product.
That case cannot be handled in the classical, raw-type driven way. It is configured in config.createAB using the PROD_STEP2 key.
Note that the name of the step2 ACTION must be ACTION_<procatg> where procatg is the PRO.CATG triggering the second step. In config.createAB, configure PROD_STEP2 as ACTION_NIGHT_ZP.
If PROD_STEP2 is set, the tool calls ABbuilder twice:
The HDR files are deleted afterwards. This mechanism works both for hdr and fits input files. Since version 3.9, multiple definitions of PROD_STEP2 are supported.
(This description applies to the DFOS and PHOENIX installations. For OPSHUB installations, the tool is only called internally and in a simplified way.)
1. Filter input data pool
- if FILTER_RAW=YES: call filterRaw -m FILTER -H, to remove the found files from further
processing. Corresponding headers are also moved, but moved back to their original folder at
the end.
2. Pre-compile all OCA configuration files into a final <instr>.RLS file (using gcc)
3. Prepare links under $DFO_CAL_DIR/MCAL to all mcalibs as defined by $N_MCAL_LIST and the latest date in $DFO_CAL_DIR. If option -D <SCANDATE> has been set, the set is defined by this start date instead. Supported are both fits and hdr files in $DFO_CAL_DIR.
4. Only if PROD_STEP2 is set:
4.1 call ABbuilder to create virtual product headers; call PGI_PREPROC if configured
4.2 copy the virtual product headers as .HDR files into RAW_DIR; no ABs are created at that step
5. Only with option -i (incremental) and mode=CALIB:
5.1 check for pre-existing ABs
5.2 move their headers to temporary space, unless INCOMPLETE (if TRY_AGAIN_INCOMPLETE=YES) or FAILED (if TRY_AGAIN_FAILED=YES)
5.3 have only new headers in input data pool, plus the ones from step 5.2
6. Call ABbuilder:
6.1 Classify input data pool
- read fits keys as defined in <instr>_classification.h to classify each raw file into RAW_TYPE | DO_CLASS. (Note: PACK_DIR is obsolete and can be removed.)
6.2 Organize input data into groups
- get raw match keys from <instr>_organisation.h
- get grouping rules (e.g. SINGLE, TPL_A, TPL_D etc.)
6.3 Find association per group, reading rules in <instr>_association.h
- evaluate MCALIB file match keys to find all calibration products required for processing
- evaluate MASSOC match keys to find additional mcalib files (if any) useful for packing
- evaluate RASSOC match keys to find associated raw files (if any) useful for packing
- find recipe, recipe parameter
- find WAITFORs for CONDOR (from associated virtual calibrations)
- set the AB status to 'created'6.4 Create the ABs in a work directory ($DFO_AB_DIR/TMP)
- call PGI_PREPROC if configured6.5 Add virtual calibrations as predicted from the ABs to $DFO_CAL_DIR/VCAL
7A. (normal operations, including incremental mode):
Verify
the ABs (ordered in a cascade as configured in <instr>_classification.h)
- scan for COMPLETE/INCOMPLETE flag
- create and display the association log (includes all found MCALIB/MASSOC with their DELTA_T values, and all missing ones)
- create the .tab files for speedy AB monitor page
- edit the RAW_MATCHKEY section to suppress UNDEFINED entries (typical of mutually exclusive keys e.g. in VIMOS or FORS2)
- for SCIENCE only: check if MCALIBs and MASSOCs have VIRTUAL calibrations (flag as *NOK)
- if configured (TOOL_MODE=INTER/WARN), ask the user what to do in case of incomplete ABs
- construct the list of all created ABs
- move ABs and AB logs to $DFO_AB_DIR
7B. (RECREATE=YES): same as 7A, plus
- offer the created ABs and let the user decide which ones to select
- create the list of ABs on the selected ones and the ones depending on them
- move only those to $DFO_AB_DIR, the other ones are deleted
8A. (normal operations) After AB creation:
- if step 5 was applied (incremental mode), move all headers back to $DFO_HDR_DIR/$DATE
- apply PGI_POSTPROC if configured
- create directories in $DFS_PRODUCT
- manage the set of virtual calibrations (headers in $DFO_CAL_DIR/VCAL): remove outdated ones (defined by $N_VCAL_LIST)
- move back hidden headers to $DFO_HDR_DIR/$DATE
8B. (RECREATE=YES): same as 8A, plus
- call createJob and create the job files for the selected set of ABs (the job files come with suffix _recreate)
9. call PGI_FINAL if configured
Last update: April 26, 2021 by rhanusch |