Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO
*make printable new: see also:
 

v2.0:
- enabled for IDP ingestion within the phoenix project
v2.2:
- checks for DFOS or PHOENIX installation

call_IT
 

v3.0:
- enabled for ingestion of master calibs produced with phoenix 2.0

Make sure that during IDP ingestion, no other major processes run on muc08/muc09: at least for XSHOOTER IDPs it might be true that otherwise the ingestion for some of them might fail because of resource bottlenecks.
[ used databases ] databases phase3 for IDP ingestion; qc_products and qc1 for master calibrations
[ used dfos tools ] dfos tools for IDPs: idpConvert, idp2sdp, call_IT, IngestionTool; qc1Ingest;
for master calibrations: dp
Ingest
[ output used by ] output log file list_ingest in $DFO_LST_DIR
[ output used by ] upload/download ingestion into NGAS
topics: description IDP ingestion: process | conversion | ingestionTool | errors | configuration MCALIB ingestion under PHOENIX: process | deletion | ingestion | configuration how to call | statistics&logs | operational aspects

Note:
- In this documentation, IDPs means Internal Data Products and stands for science data products as created by the phoenix process.
- MCALIBs is short for master calibrations, as created by the phoenix process.
- If nothing is mentioned in particular, the documentation applies to all kinds of products.

In some parts this documentation splits into sections applicable for the ingestion of IDPs, and others for MCALIBs. The IDP part then is shaded light blue (like this cell),

while the MCALIB part is shaded light-yellow (like here). You can then the ignore the respective other part.

PHO
ENIX

ingestProducts for phoenix

[ top ] Description

The tool ingestProducts is enabled for standard DFOS and PHOENIX environments. The environment is recognized from the key THIS_IS_PHOENIX in .dfosrc which is set to YES for PHOENIX environments, and to NO for DFOS environments.
Furthermore, if THIS_IS_PHOENIX=YES, the tool can recognize the MCALIB mode (ingestion of phoenix-generated master calibrations) via the key MCAL_CONFIG in config.phoenix.

Supported modes for ingestProducts      
environment enabled? using ... storage
DFOS_OPS      
CALIB YES dpIngest NGAS
SCIENCE NO    
PHOENIX      
CALIB (MCALIBs) YES** dpIngest NGAS
SCIENCE (IDPs) YES* ingestionTool phase3
* enabled via THIS_IS_PHOENIX=YES in .dfosrc
** new with v3.0; enabled via THIS_IS_PHOENIX=YES in .dfosrc and MCAL_CONFIG set in config.phoenix

In the following the details for the PHOENIX environment are described. The behaviour in the DFOS_OPS environment is documented here.


[ top ] IDP ingestion: general process

IDP ingestion means to ingest the science data products and their ancillary files into the phase3 database (for the header information) and into NGAS (for the files).

Before starting ingesting an IDP stream, the phase3 environment needs to be defined in config.ingestProducts:

Special configuration for IDP ingestion
phase 3 parameter value for UVES for XSHOOTER for GIRAFFE
Phase 3 programme UVES XSHOOTER GIRAFFE
Phase 3 data collection UVES_ECHELLE XSHOOTER_ECHELLE GIRAFFE_MEDUSA
Phase 3 data release/stream 1 1 1

These configuration keys are defined together with ASG.

The tool ingestProducts is effectively a wrapper that first calls a preparation tool (converter), and then the phase3 ingestion tool. The converter tool is either a dfs provided tool (for UVES: it adds header information for phase3 compliance and modifies the FITS file structure), or a shell-script customized to the instrument (adding header information for phase3 compliance). They are described in the following.

Current installation of phase 3 tools on muc08
tool function Instrument tool name
UVES converter UVES /opt/dfs/bin/idp2sdp
XSHOOTER pseudo-converter XSHOOTER $DFO_BIN_DIR/idpConvert_xs (for header tasks only)
GIRAFFE pseudo-converter GIRAFFE $DFO_BIN_DIR/idpConvert_gi (for header tasks only)
ingestion tool  any /opt/dfs/share/IngestionTool.jar

[ top ] Conversion tool: idpConvert and idp2sdp

UVES. For the UVES IDPs, there is the wrapper idpConvert around the special DFI-provided converter tool idp2sdp. It is needed to transform the pipeline-delivered output files into the SDP (science data products) standard format which is a binary table for spectroscopic data. This is the task of the conversion tool. It is instrument specific and is provided by DFI. For the current installation it is called

idp2sdp
For convenience it is wrapped in the helper tool idpConvert.

Other IDPs. For the other IDPs, all structural conversion is done by the pipelines, and only header keys need to be added. This is done by customized header-conversion tools like idpConvert_xs and idpConvert_gi, which are created and maintained by QC. Find their description below.

Installation (UVES only!)

idp2sdp comes as part of the phase3 software installation. idpConvert is installed on sciproc@muc08:$HOME/UVES/bin.

Config file (UVES only!)

The idp2sdp tool has a config file in $DFO_CONFIG_DIR, idp2sdp.cfg (note its special syntax, due to its non-DFOS nature):

Section 1: matching of pro.catg and role
The 'role' is the label of the corresponding column in the binary table. The first column is always the wavelength.
#pro.catg (note the colon at the end): label comment
[prodCatgToRole]   this has to be the first line, required by idp2sdp
FLUX_CAL_ERRORBAR_BLUE: ERR  
...    
FLUXCAL_SCIENCE_BLUE: FLUX  
...    
etc.    
(all possible values of pro.catg need to be listed here, with the 'label' having reserved values determined by the IDP standard)

How to call (UVES only!)

idp2sdp is called by idpConvert, and idpConvert is called by ingestProducts. You can call idpConvert from the command-line:

Type idpConvert -h for on-line help, and idpConvert -v for the version number.

Type idpConvert -H and -V for on-line help, or version number, of the idp2sdp tool.

Call

  • idpConvert -d <date>
to convert all SCIENCE products for the specified date into IDPs
  • idpConvert -m <month>
to convert all SCIENCE products for a specified month into IDPs
  • idpConvert ... -D
run idp2sdp in DEBUG mode

IDP conversion (other instruments)

The corresponding wrapper scripts are always called like idpConvert_xs/gi etc. Their name is configured in config.ingestProducts under CONVERTER.

IDP output (all instruments)

The converted products are found in the subdirectory $DFO_SCI_DIR/<date>/conv. The converter log file (from idp2sdp or from the conversion scripts) is found in $DFO_SCI_DIR/<date>/CONVERTED. This log file is also exported to the qc@qcweb site and can be found in http://qcweb/~qc/<RELEASE>/logs/<date>/CONVERTED.


[ top ] Ingestion: call_IT and IngestionTool

The ingestion tool (call_IT as a wrapper, /opt/dfs/share/IngestionTool.jar as the main component) provides the same kind of functionality as the dpIngest tool for DFOS master calibrations. It is a java package that is provided by DFI. It takes all files from a specified directory, ingests them into ngas, and extracts the header keys into the data repository and from there into the phase3 database.

For convenience the ingestion tool is wrapped in the helper tool call_IT. This helper tool and the ingestion tool itself are used in the same way for all IDP projects.

Installation

The IngestionTool comes as part of the phase3 software installation. The wrapper tool call_IT is installed on the local $DFO_BIN_DIR.

Config file

The IngestionTool tool has a config file in $DFO_CONFIG_DIR, ingestiontool.properties. It is filled and maintained by the developer.

How to call

IngestionTool is called by call_IT, and call_IT is called by ingestProducts. There is no operational need to call it from the command-line but you could:

Type call_IT -h for on-line help, and call_IT -v for the version number.

Type call_IT -H and -V for on-line help, or version number, of the IngestionTool.

Call

  • call_IT -d <date>
to ingest all SCIENCE IDPs for the specified date
  • call_IT -m <month>
to ingest all SCIENCE IDPs for a specified month
  • call_IT ... -D
run IngestionTool in DEBUG mode
  • call_IT ... -F
call validation only

The tool expects the converted IDPs in the subdirectory $DFO_SCI_DIR/<date>/conv. The IngestionTool log is found in $DFO_SCI_DIR/<date>/INGESTED. This log file is also exported to the qc@qcweb site and can be found in http://qcweb/~qc/<RELEASE>/logs/<date>/INGESTED.

call_IT adds some information to the tool log file: a bit of statistics (number of new files ingested, already existing files) and of performance (time needed for ingestion). The tool performance is about 1 sec per (UVES) IDP.

[ top ] Error handling

The log file of the ingestion tool checks the successful execution of the three main steps:

  • file validation (header content etc.)
  • file ingestion into NGAS
  • keyword extraction (from the repository into the phase3 database).

The IngestionTool log has also entries for each single IDP ingestion.

Note that the tool log lists every single entry in the database as a "file" which is actually wrong. If an ingested fits file is registered in 5 other fits files as ANCILLARY file, each such record is counted as a "file" by the tool. This blows up the statistics in the log file. Don't get confused!


[ top ] Configuration of ingestProducts for IDP ingestion: special config keys for phoenix

The tool uses the standard DFOS config.ingestProducts. For the PHOENIX environment, there exist the following special keys:

Section 1: general
# special config keys not needed for DFOS, only for phoenix:
PATH_TO_IT <full pathname of IngestionTool installation> # needed for the time being
CONVERTER <full pathname of converter tool installation> # needed for the time being
PROGRAM_NAME phase3 program name  
COLLECTION_NAME phase3 collection name  
RELEASE_TAG phase3 release tag  

In PHOENIX mode the tool reads a few configuration keys from config.phoenix:

config.phoenix
RELEASE <PROC_INSTRUMENT>, e.g. UVES <RELEASE_TAG>, e.g. UVESR_2 read for updating the statistics in daily_idpstat
INSTR_MODE <PROC_INSTRUMENT>, e.g. UVES <INSTR_MODE>, e.g. UVES_ECH read for updating the statistics in daily_idpstat

[ top ] MCALIB ingestion under PHOENIX: general process

phoenix 2.0 supports not only the creation of IDPs but also of master calibrations. This process has many similarities with the IDP production: it is project driven (not bound nor triggered by daily operations), and comes as a batch (many processing jobs). The main difference to the IDP production is that the (selected) pipeline products are ingested without modifications, do not constitute a phase 3 project (i.e. do not require coordination with ASG), and do not constitute a stream. Otherwise many aspects of their production and ingestion are very similar to the production and ingestion of operational master calibrations. In particular, the underlying ingestion tool (dpIngest) and ingestion storage (NGAS) is exactly the same. Once ingested, phoenix-created master calibrations are identical to the ones created by the daily workflow. Their main motivation comes from reprocessing after pipeline changes or improvements.

Master calibrations are ingested as they are created by the pipelines. Hence no conversion is needed. Nevertheless the ingestion process has two steps, in formal analogy to the IDP ingestion:

  • the deletion of previous instances
  • the ingestion.

Before calling the ingestion, the phoenix tool already does a check for the proper file names to be used upon ingestion (see there).


[ top ] Deletion of previous instances

Depending on the configuration, the tool ingestProducts will decide before the ingestion if any pre-existing master calibrations should be deleted.

Per default, only those master calibrations get deleted and overwritten which have a new instance (by name). This might however result in an unwanted mix of old and new master calibrations. In the operational environment, many if not all calibrations are processed and ingested, no matter if actually used for science reduction:

  • the calibration stream contains a mix of HC calibrations and the ones needed for science;
  • some data types are needed for maintenance only;
  • in the early days calibrations were processed for SM data only but not for VM.

Therefore it might be reasonable to not only overwrite older instances but delete the ones which get no new version.

The tool supports this by configuration. The user may want to decide to delete (hide) certain master calibrations always, no matter if they get replaced or not.

Several cases could occur:

  • The reprocessing covers a particular ins.mode and others not. The tool then needs to know what to do with those modes which won't be automatically replaced by new versions of master calibrations: hide them anyway, or leave them. The correct strategy depends on the circumstances of the reprocessing: are the other (not replaced) master calibrations compatible with the pipeline? Is their quality still acceptable?
  • The reprocessing is motivated by the science reduction strategy, the goal is to deliver correct master calibrations with calSelector to reduced science data. Therefore it might be reasonable to focus the reprocessing on those calibrations that are needed for science reduction, and ignore/delete the others. For that purpose, the tool can be configured by PRO.CATG to be deleted, without a replacement.
  • Static calibrations should never be deleted since they cannot be reprocessed. They can be protected if their PRO.CATG is not configured for deletion.

In general, only those pre-existing master calibrations get deleted which are configured as PHX_DELETE.

The deletion of pre-existing master calibrations is a critical step and can be fine-tuned by calling ingestProducts in DEBUG mode, which is interactive and offers file lists for review before actual hiding. The listings are done for the following cases:

  • hidden by configuration (pro.catg and ins.mode) but not replaced
  • NEW master calibrations without previous instance
  • unchanged files (not hidden and not replaced)
  • replaced files.

[ top ] Ingestion of master calibrations

The tool first creates these lists and executes the file DELETEs, calling dpDelete in -force mode. Then, it calls dpIngest in the usual way (as for daily operations). All actions (deletion and ingestion) are listed in the standard list_ingest_CALIB file. If configured, the qc1_update part is executed, using the QC1 database tables names if configured in the section QC1_TABLE of the configuration file.

[ top ] Configuration of ingestProducts for MCALIB ingestion: special config keys for phoenix

The tool uses the standard DFOS config.ingestProducts. For the PHOENIX environment, there exist the following special keys:

Section 3: PHOENIX deletion configuration for MCALIBs
# special config keys not needed for DFOS, only for PHOENIX (multiple lines supported; comma-separated list for INS_MODE supported)):
PHX_DELETE PRO_CATG INS_MODE # comment
PHX_DELETE FF_EXTERRORS MED,IFU #any master calibration with that PRO_CATG and INS_MODE=MED or IFU gets deleted; one for another INS_MODE does not get deleted
PHX_DELETE FF_EXTERRORS ANY #any master calibration with that PRO_CATG gets deleted, no matter what INS_MODE it has (also includes NULL values!)
PHX_DELETE FF_EXTERRORS MED,IFU,ARG #any master calibration with that PRO_CATG and INS_MODE gets deleted (excludes NULL values!)
PHX_DELETE FF_EXTERRORS NULL #only master calibration with that PRO_CATG and INS_MODE=NULL gets deleted
       
Section 4: PHOENIX definition of QC1 tables with content from reprocessing (if any): list of QC1 tables affected by this PHOENIX project, to be updated with the origfile name; in case of doubt call 'qc1Ingest -instrume $QC1_INSTRUMENT'
QC1_TABLE qc1_giraffe_wave_reproc #name of table (multiple lines supported)

[ top ] How to call

To call the tool in PHOENIX mode for IDPs, make sure to call it in the PHOENIX environment: $THIS_IS_PHOENIX must be YES. This is controlled in $HOME.dfosrc.

To call the tool in PHOENIX mode for MCALIBs, you must

  • make sure to call it in the PHOENIX environment: $THIS_IS_PHOENIX must be YES;
  • Furthermore, enable the key MCAL_CONFIG in config.phoenix, to point to the additional config file used to define the specific PHOENIX MCALIB project and distinguish it from an IDP project (see phoenix);
  • you must also define a resource file $HOME/.dfosrc_X to contain the environment for that project; this is important if you have an IDP project under the same account and you need to distinguish e.g. $DFO_MON_DIR for these two PHOENIX projects; otherwise you can just copy it from the existing $HOME/.dfosrc.

You can call the DEBUG mode of ingestProducts in the PHOENIX MCALIB environment:

ingestProducts -m CALIB -d <date> -D

The tool will then ask you for confirmation about the important steps of instance deletion (used only for MCALIB environment).

Any other call mode is documented in the main page.


[ top ] IDP statistics and logging

The tool writes (both for IDP and MCALIB mode) into the statistics file $DFO_MON_DIR/PHOENIX_DAILY_<RELEASE>, updating the columns for number and size of ingested IDPs. It also calls qc1Ingest of those entries into the DFO database table daily_idpstat and monthly_idpstat (see also the WISQ workflow statistics). For MCALIBs, the corresponding parameters need to be interpreted as applicable for MCALIB products. Since the focus of the QC1 tables is to monitor the creation and ingestion process (in terms of performance, disk space etc.), this mixing-up of IDPs and MCALIBs seems justified.

The last execution of the tool is written into the log file $DFO_LST_DIR/list_ingest_SCIENCE_$DATE.txt. All executions of the tool are logged into $DFO_SCI_DIR/<date>/INGESTED which is also exported to qcweb as http://qcweb/~qc/<RELEASE>/logs/<date>/INGESTED.

The monitor tool phoenixMonitor displays whether or not a certain night with IDPs has already been converted and ingested, by checking for the files $DFO_SCI_DIR/<date>/CONVERTED and INGESTED. For MCALIBs, the tool checks if the files have been properly renamed and ingested.

[ top ] Operational aspects

For IDPs:

  • Make sure that during ingestion, no other major processes run on muc08: at least for XSHOOTER IDPs it might be true that otherwise the ingestion for some of them might fail because of resource bottlenecks.
  • The ingestion tool has the following main steps:
    • "release validation" - meaning a consistency check that in the directory to ingest ($DFO_SCI_DIR/$DATE/conv) *all files* exist that are listed in the ASSON<n> keys of the IDPs, and that *no files* exist in that directory that are *not* listed there. The architectur of the ingestion tool is such that it always works on a full directory and not on individual files.
    • "preparing for archive ingestion" - the ARCFILE key of the IDPs and the ancillary fits files (if any) is added, the CHECKSUM updated
    • "file archival" - all files (fits and non-fits, IDP and ancillary) are ingested into the archive
    • "keyword extraction" - header keys are extracted into the keyword repository.
  • What to do in case of incomplete ingestion:
    • file(s) not ingested (but could be ingested)
    • calling 'ingestProducts' again does not help since the ingestion tool assumes that all fits files in the ingestion directory have to be part of the same ingestion batch (for reasons related to the original phase3 EDP version)
    • therefore, already ingested files have to be identified (from the INGESTED log file), moved to some other place
    • then call 'ingestProducts' for that date, with the new file(s) being ingested successfully
    • then move the already ingested files back, to have a complete product directory
  • What to do in case of incomplete renaming:
    • if config.renameProducts is incomplete, it might occur that not all fits files are renamed; then files named as 'r.<INSTR>' would be ingested
    • fix config.renameProducts; then call 'renameProducts', execute rn_files, call 'idpConvert'
    • then proceed as before ("incomplete ingestion").
  • What to do if files should be hidden (because e.g. they were ingested twice with different names):
    • send a mail to asg@eso.org (see below) with a file list and a description (very much like the formalized procedure through 'hideFrame' for raw files)
  • What to do if IDPs should be ingested again (because a quality problem has been detected):
    Mail by Joerg Retzlaff@asg.eso.org 2013-10-11:
    Please direct any request for support regarding the content of Internal data products in the SAF to ASG. Then, the respective ASG colleague in charge will pick it up and will support you in resolving the problem. ASG's responsibility for the corrective measures includes the initiation of further actions like the modification of the SAF content, which means ASG files the request to DBCM if need be.

    This means:
    - write an email to asg including the list of files to be hidden
    - wait for their response
    - create and ingest the corrected files, with their filename now including a label '-v2' at the end.


For MCALIBs only:

  • Once ingested, the MCALIB files should eventually be replaced by their headers, using the standard DFOS tools cleanupProducts. Check the jobs file JOBS_CLEANUP.

For IDPs or MCALIBs:


[ top ]