Common DFOS tools:
|
dfos = Data Flow Operations System, the common tool set for DFO |
make printable | New: | See also: |
This documentation refers to calSelector v3.0. It documents the design, architecture and usage of calSelector from the QC perspective. |
tools related to calSelector: general overview of calSelector here |
|
topics: basics | rules and versions | testing | calSelector and dfos | syntax | calibration maps | virtual products | certified flag | validity | breakpoints |
Find the design document available here.
calSelector is the DFS tool linked to the archive interfaces to deliver associated calibrations for user-selected science files. The tool is based on OCA rules which are provided and managed by the QC group. To support this task, and to continuously monitor the results, the calSelector tool is also installed on the QC machines. There are several tools in the DFOS tool suite related to calSelector. This page documents the high-level functionality of calSelector.
The tool is called for a single dp_id, or a set of dp_ids, and looks for associations in a database. The associations are defined by a set of OCA rules. These OCA rules come in Raw2Master (R2M) syntax, similar to the DFOS_OPS rules. The tool returns a result set in form of a XML file. It has the flag 'complete' if all associations can be satisfied.
calSelector v1 had the Raw2Raw functionality. With calSelector v2, the Raw2Master option was added. The current calSelector v3 has the breakpoints added.
By default the users receive master calibrations (mcalibs) only (plus those raw files configured as RASSOC), and only those that are necessary to execute the science recipe (the last step in the cascade).
If one or more required mcalibs cannot be found, the tool switches to the Raw2Raw (R2R) mode. It still applies the R2M rules, now using either the provenance information in those master calibrations that it can find, or the virtual products mechanism (see below). In R2R mode, all raw calibrations are delivered, including the "calibrations for calibrations", thus supporting the entire calibration and reduction cascade. In this mode, the tool returns certified raw files only (plus static calibrations). This certification flag is a feature related to the dpIngest tool: if a master calibration is ingested by QC, its provenance is checked (the list of parent raw files), and these raw files get the certified flag. See below for more information.
It is also possible for QC to configure an OCA rule for calSelector as "R2R_only", meaning that the tool is forced (by this database configuration) to go R2R (while in the above case it has decided by itself that R2M is not possible and therefore goes R2R). This is typically done if a public pipeline is incompatible with archived master calibrations (processed with an earlier pipeline version) and therefore a delivery of master calibrations would not make sense, but the certification flag provides a useful added value.
Static calibrations ("gencalibs") are always delivered as such. Associations marked as RASSOC always deliver raw files.
The calSelector OCA rules come in R2M syntax, like the DFOS_OPS rules. Actually the DFOS_OPS and CALSELECTOR rules can be merged.
This is the schema for creating CALSELECTOR OCA rules:
With these modifications, the rules become compatible for both DFOS_OPS and CALSELECTOR environments. The rules are then uploaded to web directories and database, and they are displayed graphically for better overview. These tasks are supported by dfos tools calselManager and createCalibMap.
DFOS_OPS | CALSELECTOR | |
CURRENT version | unify | CURRENT version |
from CURRENT | HISTO1 version | |
from HISTO1 | HISTO2 version |
For historical periods with OCA rules different from the CURRENT one, the corresponding CALSELECTOR version is developed from the CURRENT one.
For future periods with the need of a modification in DFOS_OPS, the CURRENT CALSELECTOR rule is made a historical one, and the new DFOS_OPS rule becomes the CURRENT CALSELECTOR rule.
These steps are supported by calselManager.
The alignment of DFOS_OPS and CALSELECTOR OCA rules is not enforced by any tool, it is a conceptual principle. The QC scientist has to make sure that this alignment is achieved and continued. In particular they need to take care that any change on one side is also reflected on the other side.
With a new OCA rule set, testing the associations is an important step towards the proper formulation of the rules. Remember they come in a complex syntax and cannot be interpreted intuitively. The dfos tool verifyAB is used to expose dp_ids to calSelector OCA rules which are either already ingested, or are local and under development. The user can define reference datasets for regression tests, or typical test cases, or filelists with many dp_ids for performance tests. Also, the tool is used to check routinely all new science datasets in the DFOS workflow for proper associations (in particular completeness). These continuous tests replace logically the historical harvesting. For their own purpose, QC not only creates but also stores all new science ABs for future reference. The currently only use case for those ABs would be the IDP production.
The following dfos tools support the QC scientists in their tasks related to calSelector:
calSelector uses standard and well-known OCA syntax. The only new feature is the 'between' statement which is also understood by ABbuilder. It is important to make sure that all science data types have a product defined since this is needed for the creation of vproducts. While this was in place originally, these product definitions were dropped in many cases with the termination of science processing by QC, and now need to be re-introduced again.
The dfos tool createCalibMap supports the creation of calib maps. It displays only those raw types that are relevant for the science reduction. See the entire set of calibration maps here.
A core feature of calSelector are the virtual products ("vproducts"). They represent datasets, as defined in the grouping rules in OCA (organization). Datasets define the smallest reasonable unit for data processing (while the traditional single file is the smallest unit for archiving). A dataset could be: a single file (e.g. UVES), all files from the template (e.g. IMAGE stack), or a subset of those (e.g. all XSHOOTER OBJECT+SKY frames from the same template and the same arm). Vproducts can exist for both calibrations and science data.
Vproducts are used by calSelector for the following purposes:
Vproducts are created in several ways:
From a mail by Ignacio Vera 2015-03-27: "We (DBCM) have set up the following (cronjob) strategy for the generation of Master Products: 1) Every two hours the last 24 hours of calibrations are processed. 2) Once a day the last 10 days of calibrations are processed." |
The tool for the creation of vproducts is called the qcproducthandler. It is a component of the calSelector jar package. It is not used by QC but is running on the archive side in the background. For information, here is its workflow:
There is an automatic process in the archive that marks all raw files as 'certified' if they have been used to generate an mcalib that got archived. Raw files without flag are of unknown (not: bad!) quality. They can exist in the very recent data flow, or in old data with no pipeline support, or no good pipeline support (one recipe is known to fail), or no QC support (SM data only, or standard setups only).
calSelector gives preference to certified raw data if it comes to raw data delivery (see above).
The certified flag is set by dpIngest upon ingestion of a new mcalib, and unset if an mcalib is hidden (dpDelete).
The proper mechanism to flag a raw file as having bad quality is to hide it. Then it will always be ignored by calSelector.
The certification flag reflects an important value added by the QC group.
The OCA rules as maintained by QC do not contain validity information in OCA syntax. For DFOS_OPS and for calChecker, validity is supported in a non-OCA syntax (the commented DELTAT_RULE section) and is evaluated by createAB. That validity concept has three values: OK, NOK, MISS. NOK means a matching calibration exists in the data pool but it is outdated. MISS means that no calibration has been found.
calSelector also supports such a scheme. It is implemented by the use of two between-like statements:
Usually both statements are used, and calSelector then interprets the matches in the following way:
In addition there are the classical time match rules for static mcalibs (the PREVIOUS rule in most cases).
A special case for validities is covered by the concept of breakpoints where associations are not allowed to cross certain well-defined MJD-OBS values. The breakpoints are maintained by QC in a database table calsel_breakpoints (here) and are maintained with the dfos tool writeBreakpoint. They are implemented with calSelector v3.