Common DFOS tools:
Documentation

dfos = Data Flow Operations System, the common tool set for DFO

PHO
ENIX

phoenix: operational hints

This page documents common operational aspects of the phoenix process. For specific aspects check out the 'phoenix instances' page.

1. Reprocessing, replacement, hiding of Internal data products (IDP)

What to do if files should be replaced (a newer better version exists but the old one should remain visible). There are two options:
- if this is going to happen for a few files: call ingestProducts with option -U on the command line (which enables the --enable-updates flag for the IngestionTool); this has the effect that, after ingestion, a new IDP version (say, v2) is offered as default, and the older version (say, v1) remains visible on demand.
- if for many files, typically for a new release, probably triggered by a new processing scheme or a new pipeline version: then calling on the command line is a bit cumbersome, and you will prefer to set the key ENABLE_UPDATES to YES. It will have the same effect as the previous option but it will be more comfortable, and requires more care: don't forget to switch it back to NO eventually.
Deprecation (hiding): what to do if files should be hidden (e.g. because you discovered a quality issue), no matter if a new version is available or not:
- Use the release manager: log in to http://eso.org/rm .
- Click 'Report data problem' on top of the page and fill the form.
- Thereafter ASG will take care of any required follow-up action (like interfacing with DBCM etc.).
More operational hints related to IDP ingestion here.

2. Processing of single ABs (DEEP mode, TARGET_SELECT)

In DEEP mode, you can call 'phoenix -r <run_ID> -C|-P -a <ab>' where <ab> is the name of a dpc ab and corresponds to one single target (hence the name of the mode, TARGET_SELECT). This is typically done for runs with very large number of ABs. The management of all these tasks then becomes too cumbersome, or the data volume to be handled is too large.

One will then process one target after the other, with option -P. It is straightforward to then also finish the processing with option -M, for one target after the other. If UPDATE_YN in config.phoenix is set to YES, all products will correctly collect in $DFO_SCI_DIR/<run_ID>. The (identical) ingestion jobs in JOBS_INGEST will pile up as well. You will likely want to wait until all target ABs are processed and then ingest in one go, deleting the duplicated ingestion commands by hand.

While this scenario is safe, it is not the most efficient one. It is probably appropriate, for instance, for one deep MUSE cube per day. But for a whole weekend of processing you want to find something more efficient. Then you can collect many calls of 'phoenix -r <run_ID> -P -a <ab>' in a jobs file and execute them in one go. Afterwards, you will call 'phoenix -r <run_ID> -M' for the certification and move of all products in one go. The tool will ask whether you want all ABs reviewed and moved, or just the last one.

The products are eventually collected in the $DFO_SCI_DIR directory (make sure to have ENABLE_UPDATES set to YES, see above). You need to decide whether you need to separate them from pre-existing products, e.g. from an earlier processing release. The phoenixMonitor, and the AB monitor linked to the phoenixMonitor, will always be complete for the entire run.

3. QC jobs in MIDAS in multiple instances

Several instruments have their QC jobs executed with good old MIDAS. For efficiency this is always done in parallel instances (up to 30 on muc08). There is some infrastructure installed to support this (make_para.sh in $DFO_PROC_DIR), and it usually runs fine. But it might happen under unfavourable conditions (e.g. two IDP batches from different accounts on the same machine) that some of the internal MIDAS management processes get stuck in the $MID_WORK directory. Symptoms would be that the MIDAS calls under condor suddenly all fail. Unfortunately there is no easy way under condor to have this issue properly logged. Therefore, as the first measure, best is to clean up the entire $MID_WORK directory and start over again.

4. Processing of IDPs in multiple instances (MUSE)

In order to optimize efficiency, a "MASTER-SLAVE" multi-instance processing model has been developed for MUSE. It uses up to four muc machines (muc10 and muc09, and at times also muc11, muc12) for different parts of the historical batch processing. Processing itself is independent. After ingestion and finishing, all information is ultimately stored centrally, on the MASTER. See here for more.

5. Processing of IDPs in COMPLETE_MODE

In COMPLETE_MODE=YES, the associated CALIBs for the SCIENCE are evaluated day by day. In general they will belong to dates around the SCIENCE date (i.e. earlier, later and the same date), and the CALIB datasets will partly overlap for SCIENCE dates close to each other. This means that processing phoenix in this mode makes sense only in larger batches, i.e. in monthly mode (like phoenix -m 2003-09).

Testing for a particular DATE is possible. But if this occurs as part of an existing monthly batch (like phoenix -d 2003-09-09 for the above example), this might lead to inconsistencies with the monthly batch. Unfortunately this is unavoidable.

6. Ingestion of products in COMPLETE_MODE

For each processed date, there will be an entry for CALIB mode and for SCIENCE mode. The products of the CALIB mode are in $DFO_CAL_DIR/NEW and can be ingested from there. The products of the SCIENCE mode are in $DFO_SCI_DIR as usual.

The ingestion of SCIENCE products (i.e. IDPs) will be done as usual. For the mcalibs, it is likely that the new mcalibs are a smaller dataset than the existing ones, since we have processed only the ones actually needed for science reduction, and not processed the additional ones taken for HC or whatever purpose. We therefore cannot guarantee that each existing (old) mcalib will get a replacement by a new mcalib.

Therefore, we need to

1) hide the old mcalibs (with a DBCM request: specify time range; shoould be large enough like e.g. for a full year);

2) ingest the new calibs with their new names;

3) for an intermediate time, make sure to force calSelector to raw2raw for that period, and enable raw2master afterwards.

Make also sure that by hiding in bulk mode (like for a full year), you don't hide also any static mcalibs. They won't be replaced by the ingestion of the new mcalibs!

7. Creation of SCIENCE ABs

With the date 2020-12-01, the creation of SCIENCE ABs in the DFOS systems has been discontinued, for operational reasons. The phoenix tool in IDP mode now creates the SCIENCE ABs in its own environment, using calSelector and createAB. This is applicable to all streams, but not to historical batches of reprocessing projects. Those have their AB source in the (previous) IDP processing directory on qcweb.

Last update: April 26, 2021 by rhanusch

Common DFOS tools: Documentation

Common DFOS tools:
Documentation