Common DFOS tools:
|
dfos = Data Flow Operations System, the common tool set for DFO |
make printable | new: | see also: | ||||||||
v5.1: |
The tool is managing the daily workflow. Check out under 'Operations' (e.g. here) for information about that workflow. The tool is also supporting the PHOENIX and OPSHUB environments. |
|||||||||
|
|
|||||||||
|
enabled for parallel execution | |
OPS HUB |
enabled for OPSHUB workflow |
This tool provides the central interface for monitoring and managing the QC daily workflow (called DFOS_OPS), the PHOENIX workflow, and the OPSHUB workflow.
It serves as the standard interface to the workflows, offering all needed functionalities and interactivity. For the daily workflow, it scans the active DFO dates, reads and displays their process status, and offers the next workflow steps. For the PHOENIX workflow, it connects to the currently open dates, or pseudo-dates in the DEEP mode. For the OPSHUB workflow, it connects to the currently open PROJECTS.
For PHOENIX and OPSHUB workflows, the dfoMonitor is mainly a passive monitor. For the QC daily workflow, it has active buttons.
This documentation has some parts shaded in grey (if applicable to DFOS_OPS and/or PHOENIX only), or in light blue (if applicable to OPSHUB only).
PHO ENIX |
phoenix is the workflow tool for automatic science processing. It is used by the IDP accounts on muc08+. Find more information here. |
The dfoMonitor in PHOENIX environment supports the CALIB and SCIENCE jobs as required for the PHOENIX project, either organized in DATEs, or in PSEUDO-DATEs (for DEEP projects).
Find more information on the phoenix page.
OPS HUB |
For the OPSHUB, distillery is the workflow tool. It is supported by the dfoMonitor. Find more information here. |
The OPSHUB version of the dfoMonitor has no active buttons. It is organized by PROJECTs and displays all corresponding dates. Depending on the PROJECT configuration, it might have CALIB and SCIENCE jobs at the same time. The dates have two possible states: ABs and processing jobs created; processing jobs executing or executed. It has the XDM monitor as well as an icon bar and the Ganglia system monitors.
Installation checks. The tool monitors the DRS_TYPE (as configured in config.createAB): condor on (CON) or off (anything else). The configured $DFS_RELEASE is displayed. If configured, $MIDAS_CHECK compares the default MIDAS version to $MIDVERS. Finally, the currently enabled pipeline version is displayed (with the detmon pipeline filtered out).
Monitoring of AB counts. If the number of ABs in $DFO_AB_DIR is beyond a certain limit, the AB monitor (getStatusAB) becomes slow, and this will also slow down autoDaily. To become aware of potential issues, the total number of ABs in $DFO_AB_DIR is monitored. It scores red if a hard-coded threshold is hit. Currently this threshold is 2500.
N_ABs:
|
530 |
Disk space, XDM. The data disk space is monitored since with the data disk full, no automatic processing is possible. A quick overview is provided:
data disk: | 120.5 GB (30%) |
dfos_ops: It updates in the background (ash mechanism) if clicked.
dfos_ops and phoenix only: The XDM (eXtended Disk space Monitor) provides detailed feedback about the disk space usage on the data disk. It monitors the following data disk directories:
|
disk space on $DATA_DISK (total: 870 GB) | |
RAW: | $DFO_RAW_DIR | updated each time dfoMonitor is called |
CAL: | $DFO_CAL_DIR | |
SCI: | $DFO_SCI_DIR | |
DFS: | $DFS_PRODUCT | |
LST: | $DFO_LST_DIR | |
*HDR: | $DFO_HDR_DIR | these values are normally read from the
DFO_STATUS file and therefore static! They are also updated eventually when they get removed from DFO_STATUS (if 5000 new entries make them outdated so that they are auto-removed). dfos_ops, PHOENIX: They are updated on demand, using [refresh], which will take a couple of seconds. |
*PLT: | $DFO_PLT_DIR | |
*LOG: | $DFO_LOG_DIR | |
SUM: | sum of all above | |
OTH: | all other data on $DATA_DISK | in non-standard folders |
FREE: | remaining free | remaining free disk space |
Disk space used by the directories is listed in GB; the bar indicates usage in percentage. The disk space score turns red if more than 80% disk volume is occupied.
If a quota is defined in the config file (DATA_QUOTA), it is indicated and taken into account.
dfos_ops only: The XDM is exported to http://www.eso.org/observing/dfo/quality/WISQ/XDM/XDM.html and linked to the WISQ monitor on the navigation bar. |
OPSHUB only: The 'du' command used to retrieve the disk usage works extremely slow on the opshub gluster filesystem. Therefore, the XDM is disabled on the OPSHUB. |
dfos_ops only: The row labelled "CAL" gives an overview of the current last N dates for the autoDaily workflow. It also provides links to launch interactively the tools productExplorer and refreshVCAL. The PHOENIX version has only the link to the tool productExplorer. |
OPSHUB only: "Select other instruments" is a jump menue to the dfoMonitors of the other configured instruments. |
dfos_ops only: This checkbox has links related to the data transfer system (DTS), plus two rows for status checks of NGAS access ("ngas") and of the health of the transfer process ("transfer"), plus two buttons to launch queries. The ngas status is checked each time the dfoMonitor tool is launched, by launching an ngas download with ngasClient (the file is hard-coded as $TEST_FILE). If an error occurs, its code is displayed. As a timeout mechanism, the monitor waits for 60 sec at maximum for ngasClient, and then aborts. The DTS test and the ngas download are done in the background, and the result from the previous execution is displayed. This is usually good enough since dfoMonitor is called by many different tools and therefore usually sufficiently up-to-date. The background call is done because of performance issues. "Transfer" is checked with a query to the sara database which hosts file names and transfer status values. All CALIB files with transfer status < 6 (meaning not yet in the primary archive) are found, if the delay is more than 1 hr and less than 72 hrs. The one with the longest delay is displayed. If none is found, the "transfer" status is ok, otherwise nok. There is also an indication for delays of files of any type, but this is not used for the nok alert. This is motivated by the fact that for incremental processing, and for the closure of the QC loop with Paranal, CALIB files are by far the most important files. To avoid false alerts, delays by less than 1 hour are not evaluated. Delays by more than 72 hours are disregarded either since it is assumed that these might be due to database inconsistencies. This is not always true but the tool cannot decide this. The complete query result is displayed upon launching the red action button (line labelled as "longest delay"). The green action button launches the inverse query, all archived files with status 6 and their delay values (time between OLAS archiving on Paranal and in the primary archive in Garching). The DTS/Evalso monitors are displayed in the bottom monitor panel called "system", right. There are currently monitors for the two PAR-VIT links (#1 | #2), and one monitor for the connection VIT-GAR.
In case of problems, flags will turn red, e.g.:
The ngas and the transfer flags are exported to the web server and embedded in the calChecker and the HC monitor. |
Cronjobs check boxes
The operational cronjobs are monitored here (applicable to dfos_ops only).
dfos_ops only: autoDaily checkbox. This checkbox is intended to make the current status of the processing scheme more transparent. It checks for:
This box must be green for dfos installations. The configured cronjob pattern is visible when hovering the mouse. The activities of autoDaily are displayed in real-time underneath the XDM. If autoDaily is not running, this box displays:
If there is autoDaily activity, messages will inform about progress. You can follow the workflow by clicking on the 'log' link:
HC monitor checkbox. This checkbox monitors the proper update pattern of HC reports. It checks for the existence and proper scheduling of the following jobs:
calChecker checkbox. The first checkbox checks for the existence and the proper scheduling of the calChecker cronjob (to be called every half hour). The second one checks if once a day the FULL mode is called, as a safety mechanism.
AB checkboxes. These checkboxes are used to monitor the autoDaily execution. The following information is displayed:
The last autoDaily execution is written into the file $DFO_MON_DIR/autoDWatcher.html and exported to the HC web site. It is included there in the monitor page http://www.eso.org/observing/dfo/quality/ALL/qc1_info.html, ready to be inspected by the QC shiftleader. It will automatically flag red if its age excesses 6 hours. |
The non-automatic tasks (e.g. ingestion) are listed here:
dfos_ops, PHOENIX:
Managing means: check if the file contains valid entries; offer links to watch, edit, and execute. The open tasks appear under 'ToDo', either in grey (nothing to do) or in yellow (something to do):
The tool also offers links to some log subdirectories ($DFO_MON_DIR/AUTO_DAILY and CRON_LOGS) and to the DFO_STATUS file, with the status flags. POSTIT. There is the option to post notes, reminders etc. of temporary character into a text file and include them in the monitor (POSTIT function). Just click on the 'edit' link and create or edit the file $DFO_MON_DIR/DFO_POSTIT. The text will display in the dfoMonitor after refreshing.
|
Service links. They come in the blue row between the header part and the date result part:
dfos_ops only:
For editing the monitor navigation bar, a link is offered to the corresponding configuration file (config.gui_navbar) which can be edited in the same way as the tool configuration file config.dfoMonitor. The monitor navigation bar is included in all monitors for the daily workflow. NOTE: to update the navigation bar, first edit the config.gui_navbar file, then call dfoMonitor. All other monitors will then show the updated navigation bar after execution of the respective tool. |
The main table is organized by DATEs. Depending on the workflow environment, there are different columns.
Links are offered to related information:
dfos_ops only: There is a link to the daily calChecker result pages ("CAL"). They are permanently stored under $DFO_LST_DIR/CALCHECK. The link 'status' is an extraction from DFO_STATUS for the corresponding date, intended as an overview of the current processing status. The tool displays filtered files, as detected by filterRaw. If an entry exists in $DFO_LST_DIR/filt_<instr>_<date>.txt, the corresponding box is colored yellow, and a link to the list is offered. |
dfos_ops and PHOENIX: The tool displays whether the night had SM or VM (or both) SCIENCE runs. This information is extracted from the data reports. |
Below the main table, there are a few more rows.
dfos_ops and PHOENIX: There is also an icon bar with (hard-coded) standard links:
|
OPSHUB: There is also an icon bar with (hard-coded) standard links:
|
System links (dfos_ops and PHOENIX): The monitor page displays the 4 GANGLIA performance reports for your host:
performance | load_report | cpu_report | mem_report | network_report |
example |
dfos_ops, PHOENIX: Use the H D w m links for easy switching between hour|day|week|month timescales for the Ganglia reports. |
OPSHUB: The reports are hard-coded as hourly reports. |
The server name is read via unix 'hostname'. These reports are produced by SOS under the main URL http://mucmp.hq.eso.org/ganglia/.
For more information about GANGLIA check out the help link on the dfoMonitor in the system monitor "GANGLIA" box.
HTML output. The result HTML page is stored locally under $DFO_MON_DIR/dfoMonitor.html.
dfos_ops only: It is copied, with stripped-off functionalities, to the QC web server (http://qcweb.hq.eso.org/~qc/<instr>/monitor). The extended disk space monitor XDM is exported as a separate page. To have it included in the WISQ information system, it goes to the overview page http://www.eso.org/observing/dfo/quality/WISQ/XDM/XDM.html. ngasWatcher.html and transferWatcher.html are exported to the QC web server (to /qc/<instr>/reports) to be included in calChecker and HC monitor. dfoMonitor is enabled for autoDaily, the wrapper tool for automatic processing the initial part of the daily workflow. The status table on the top right part of the monitor page displays whether an autoDaily is currently executing, monitors the execution status and offers a link to the execution log. The tool has some additional options (-a, -m, -q) which are not required for command-line usage but have been introduced for autoDaily. The tool displays the ingestion status of calibration products (under 'cdb'). The column for science products ('sci') is not filled for dfos_ops. This is useful to get a reminder about data sets not yet ingested, since ingestProducts is called off-line. The tool checks for files list_ingest_CALIB_$DATE.txt in $DFO_LST_DIR. To support incremental processing, the tool offers a special blue button for preliminary certification of TODAY's CALIB data. There you can provide feedback to SciOps (comments about ABs, certification flags). The workflow calls certifyProducts -L ("certifyP-light"). No data are moved, the AB monitor is updated and exported. See more on the certifyProducts page. It is possible to directly edit the configured values for the calibration memory depth, N_MCAL_LIST and N_VCAL_LIST, in the top 'CAL' section. You can also call the utility tools refreshVCAL and productExplorer there. |
dfos_ops, PHOENIX: The tool uses the standard '.esh' and '.ash' mechanism to make the browser interactive. Find a description how to implement this here. The '.ash' functionality is used to interactively update the load or disk status in the background. |
Type dfoMonitor -h for on-line help (there is extended help available from the html page), and dfoMonitor -v for the version number.
dfos_ops, PHOENIX: Type
to create or refresh the dfoMonitor.html page. |
OPSHUB: Type
to create or refresh the dfoMonitor.html page for that instrument. |
There are also hidden options -a (switch off check for autoDaily running); -m (to display the status message for autoDaily); -q (quiet mode, no logging). These are used by autoDaily.
The option -N is available for execution without ngas checking, on the command line.
dfos_ops, PHOENIX: The tool reads its own config file plus some others. config.dfoMonitor defines: |
OPSHUB: The tool configuration file is created and managed by the distillery workflow tool. Don't touch! |
dfoMonitor reads status file information. The disk occupancies for "HDR", "PLT" and "LOG" are written into DFO_STATUS.
dfos_ops, PHOENIX:
|
dfos_ops, PHOENIX:
Find here a description of how dfoMonitor decides about the DFO status of a specific DATE. For each main step of the workflow, three fundamental states can be defined:
The WAIT and DONE status per workflow step is based on finding the corresponding status flag in DFO_STATUS (no matter when the step was executed). The OFFER status is based on the last entry per DATE in DFO_STATUS. Usually these three values will be reached sequentially. But there are some cases where the OFFER state is kept although it has already been executed. This applies to the createAB option which is offered as long as the certifyProducts/moveProducts step has not been finished. The reason for this is that you may want to re-execute all or selected ABs when you discover an error or a bad product. The monitor has three colours to code these states: WAIT is coded grey, OFFER is coded yellow, DONE is coded green. As a special case, the raw_Incomplete status is coded red. |
environment | workflow step | OFFER | DONE | ||
condition(s) to offer action | action offered | condition | action | ||
dfos_ops | entry for DATE |
general conditions for entry:
current date always labelled as "today" |
none specific, depends on status flags | ||
PHOENIX | entry for DATE or pseudo-DATE | all <DATES> with JOBS_PHOENIX jobs in $DFO_JOB_DIR | |||
OPSHUB | entry for PROJECT_DATE | all <DATES> with execAB jobs in $DFO_JOB_DIR | |||
all | complete? | green if raw_Complete set,
otherwise yellow; current date: always yellow |
|||
all | VCAL/MCAL | condition for entry: CALIB products for DATE in $DFO_CAL_DIR/MCAL and VCAL, resp. | blue if DATE is contained in MCAL/VCAL; check also the select list on top | ||
dfos_ops, OPSHUB | createAB (CALIB) | last status entry: raw_Complete or cal_AB or cal_Queued or cal_QC | launch 'createAB -m CALIB' | cal_AB set | |
PHOENIX | none (unless MCALIB project) | ||||
dfos_ops, OPSHUB | CALIB ABs | last status entry: cal_AB or cal_Queued or cal_QC | link to AB status page, number of ABs | n/a (DONE state not offered) | |
dfos_ops, OPSHUB | certifyProducts and moveProducts (CALIB) | last status entry: cal_QC | launch 'certifyProducts -m CALIB' plus 'moveProducts -m CALIB' | cal_Certif set | |
dfos_ops | certifyProducts -L | last status entry: cal_QC and DATE=$TODAY | launch 'certifyProducts -m CALIB -L' plus update getStatusAB | no flag set; no DONE since provisional | |
PHOENIX, OPSHUB | SCIENCE ABs | last status entry: sci_AB | link to AB status page, number of ABs; if N_AB = 0, 'finishNight' offered | n/a (DONE state not offered) | |
all | finish | last status entry: cal_Updated or sci_Updated or (sci_AB and N_AB = 0) | launch 'finishNight' | finished set | offer the 'remove from dfoMonitor' option under DATE |
OPSHUB: For a given entry (PROJECT, DATE) two states can exist:
|
In the DFOS_OPS environment the tool displays the DFOS logo.
In the PHOENIX environment the tool displays the PHOENIX logo.
In the OPSHUB environment the tool displays the OPSHUB logo.
Last update: April 26, 2021 by rhanusch |