Common Trending and QC tools:
Documentation

tqs = Trending and Quality Control System

new:

see also:

v2.2.1:
- option -n (needed for dfosCron)

v2.3:
- $WEB_SERVER replaced by $DFO_WEB_SERVER

closely collaborating with trendPlotter

databases	observations..data_products; reads opslogs text files; reads headers for setup keys and mjd-obs
dfos tools	qcdate; dataclient for headers
output used by	trendPlotter , autoDaily
upload/download	up: LAST_FILES, ftpWatcher
javascript	javascript jCheckHealth (stored inside the tool) to calculate on the web page the ftpWatcher

qc1Parser

Description

This tool parses the OPSLOG files ($DFO_OPS_DIR/QC1_<instr>.<date>.ops.log) which contain information from the pipeline processing on Paranal and are delivered by the Data Handling Administrators (DHA) into the QC machines (under $DFO_OPS_DIR). They are parsed for QC1 parameters which are then fed into the Health Check monitor.

Since generally not all required information is contained in the OPS LOG files, header files are read by qc1Parser. To be as up-to-date as possible, the tool calls dataclient to download the most recent headers (from today), unless this is disabled by either HDR_DOWN=NO (configuration) or option -n (no hdr_update).

The tool is designed to feed the trendPlotter tool. It is embedded in, and called by autoDaily. The trendPlotter calls to update the HC reports are executed by autoDaily.

Short excursion on OPSLOG data. The Paranal pipelines produce QC1 parameters, just like the pipelines used by QC. The Paranal pipelines use a static set of master calibrations and a default set of recipe parameters, often optimized for speed but not for ultimate accuracy. In that sense the Paranal QC1 parameters are preliminary. The ones produced by QC Garching are considered final and are stored in the QC1 database (QC1DB). With the fast data transfer and the incremental processing scheme, the Paranal QC1 values are usually not important anymore, but represent a fall-back solution in case of delivery delays. Therefore they are still handled in the dfos system.

OPSLOG data are delivered once per hour by DHA via ftp to the QC Garching machines.

Structure of QC1 OPSLOGs.

ops log	parameter set	QC keys
QC1_<instr>_2024-11-11.ops.log	set1 (-START...-STOP)	QC key 1
		QC key 2
		QC key 3
		QC key 4
	set2 (-START...-STOP)	QC keys ...
	set3 (-START...-STOP)	QC keys ...
	etc.

The tool checks for '-START' content. Each time such a line is found, all content afterwards is collected into a data set which gets a unique identifier (SETn where n is incremented by one when a new set is found). A data set is finished when the '-STOP' tag is detected.

Next, all data sets are analyzed. They are characterized by a configured unique string which could be PRO.CATG, TPL.ID or recipe ID. This depends on the instrument pipeline, is not standardized and therefore kept configurable. Examples could be 'uves_ech_bias' marking a UVES BIAS data set, or 'MASTER_LOC_FRAME' being the identifier for a VIMOS IFU localization data set.

Once a data set is identified, the tool searches its lines for configured key names (like e.g. QC BIAS MASTER RMS) and catches their content. These keys are in most cases QC1 parameter keys but in principle could be any key in the ops logs.

Typically QC1 ops logs are not complete. They typically lack information about mjd-obs or setup keys which are crucial for trending. Hence mjd-obs is always read from the obs_metadata..data_products database. Headers are scanned by the tool in order to feed missing keys. The user configures the data source (OPSLOG, HEADER).

HC plots and trendPlotter. The tool is designed to deliver input data for the Health Check (HC) process, which is served by the trendPlotter tool. Typically various HC reports exist per instrument, e.g. BIAS. For each such HC report, there is one report type defined for the qc1Parser which has its own unique identifier. Normally the OPSLOG data for the BIAS report are collected into $DFO_TREND_DIR/par_BIAS.dat .

trendPlotter reads from ...	LOCAL	bias.dat, flat.dat etc.
trendPlotter reads from ...	QC1DB	giraffe_bias, giraffe_flat etc.
	OPSLOG	extracted files, e.g.
qc1Parser fills ...		par_BIAS.dat
qc1Parser fills ...		par_FFLAMP.dat

Scanning strategy. The tool parses the opslogs in chronological order, from $TODAY backwards $N_PARSE days (configured). It is however tied to the civil date scale, meaning gaps in the opslog deliveries are counted. For instance, if $TODAY is 2024-01-17 and $N_PARSE is configured as 7, all opslogs earlier than 2024-01-11 are not considered, thus neglecting gaps for dates 2024-01-16, 2024-01-15 and 2024-01-12. In the example below, there are cases when no data were acquired and hence no opslog exist (2024-01-16), and also cases when data were acquired but no opslog entry were created (perhaps because the pipeline was not working; 2024-01-15 and 2024-01-12).

For instruments with data delivery on disks, N_PARSE should cover the typical gaps between data acquisition and data processing by QC (typically 10-12 days). For instruments with fast data delivery via the internet, the opslog data are a fallback solution and need to cover only the past few days.

civil date	ops log	headers	output table
2024-01-17	yes	yes	yes
2024-01-16	no	no	no (no ops logs)
2024-01-15	no	yes	no (no ops logs)
2024-01-14	yes	yes	yes
2024-01-13	yes	yes	yes
2024-01-12	no	yes	no (no ops logs)
2024-01-11	yes	yes	yes
2024-01-10	yes	yes	no (out of range)
2024-01-09	yes	yes	no (out of range)

Incremental scanning and output. Per default, the tool works incremental in two aspects: incremental per date, and incremental per file.

Incremental per date: qc1Parser parses, per default, only OPSLOG files from the 3 latest dates. The outdated entries in that table (older than $N_PARSE days as configured) are detected and removed automatically. This strategy gives much better performance than a full scan. For situations when a full scan is desirable, this can be enfoced by qc1Parser -f.

Incremental per file: qc1Parser employs a differential scheme also per file. The $DFO_OPS_DIR has two subdirectories, XDIFF and XREF. The OPSLOG files from the last 3 dates are copied from $DFO_OPS_DIR to $DFO_OPS_DIR/XREF. The tool then does a unix diff. If found, that difference (and not the whole file) is written into $DFO_OPS_DIR/XDIFF. If no difference is found, nothing is written there. Then, the tool is working against the content of XDIFF only, and thereby does not repeat unnecessary scans. After it has finished the scans, the difference files are deleted, the latest version of the OPSLOG files is copied as new reference file, and the same procedure is repeated the next time. With the hourly delivery pattern, qc1Parser effectively scans only the content of the latest opslog file which has been added during the last hour.

trendPlotter jobs. The tool collects all new calls of trendPlotter jobs in a file called $DFO_JOB_DIR/opslog_execHC. These calls are evaluated by autoDaily at the end of its execution.

Mail notification. The user may want to configure MAIL_NOTIF to YES, to get notified if a new $DFO_JOB_DIR/opslog_execHC has been invoked.

ftpWatcher. Two components of the HealthCheck process are monitored by this component:

the opslog ftp delivery into the QC machines ("ftp")
the (usually cronjobbed) call of qc1Parser which is required to work in order for trendPlotter to see new opslog data points ("parser")

The timestamp of the last opslog file in the QC system is evaluated against the current timestamp (NOW) by the javascript jCheckHealth. Likewise, the timestamp of the last execution of qc1Parser is also evaluated by the javascript against NOW. Both are constantly checked to be smaller than HOUR_DIFF, a hard-coded variable set to 2 hours. The output comes in two versions:

the LATEST_FILES_<instr>.html file (here)
a stripped-off version dynamically included in the HC plots:

	ftp
	parser

both the ftp and the parser processes are ok

	ftp
	parser

the ftp needs your attention

	ftp
	parser

the ftp and the parser process need intervention (or the parser only, this cannot be distinguished!)

Note: this monitor process is set up such that both boxes will turn eventually red if no update occurs ("page is frozen").

Output

a text file called par_<report_type>.dat for each configured report_type (under $DFO_TREND_DIR). This file is ready to be read by trendPlotter for all plots and parameter sets configured there as 'OPSLOG'.
a job file $DFO_JOB_DIR/opslog_execHC which is executed by autoDaily.
an HTML file LAST_FILES_<instr>.html, which lists the last ARCFILE found in the ops logs, per report_type (this as a more detailed check for the HC process being alive).

The LAST_FILES_<instr>.html file are transfered to the QC web site, to /qc/<instrument>/reports/ (as configured). There they are linked to the Health Check trending reports.

How to use

Type qc1Parser -h for on-line help, and qc1Parser -v for the version number.

You can call the tool anytime from the command line:

qc1Parser parses the OPSLOG files for the 3 last dates, updates the par_<...>.dat tables, deletes outdated entries, and exits (incremental mode).

qc1Parser -f [full] parses all OPSLOG files in the configured time range, and creates the result tables from scratch.

qc1Parser -j as above, plus creates $DFO_JOB_DIR/opslog_execHC in the end (mostly for use within autoDaily).

qc1Parser -n enforces to not call dataclient (this is reasonable only within other tools); HDR_DOWN configuration is overridden.

qc1Parser is normally called by the workflow tool autoDaily. This is the recommended way.

Configuration file

The tool configuration file (config.qc1Parser) defines:

Section 1: general parameters
N_PARSE	8	number of most recent nights of OPSLOG files to be parsed
HDR_DOWN	YES\|NO	YES: download header files (not necessary if already provided by cron job); optional, default: YES
MAIL_NOTIF	YES\|NO	notify $OP_ADDRESS if new trendPlotter job started
POST_PLUGIN	pgi_qc1Parser	optional plugin, expected under $DFO_BIN_DIR, executed just before scp and exit
Section 2: Definition of report types Each report type defined here will have a corresponding trendPlotter report type.
NAME	BIAS	name of report
IDENTIFIER	MASTER_BIAS	string to be used as unique identifier (could be pro.catg, tpl.id etc.)
FILTER	YES\|NO	optional flag, to indicate if a filter has to be applied
DP_TYPE	BIAS, or FLAT,LAMP etc.	DPR.TYPE of corresponding raw files
Section 3: Definition of QC1 keys to be parsed Here all keys are defined which shall be found. Two data sources exist: OPSLOG and HEADER; all keys for all reports need to be defined here. ARCFILE and MJD-OBS are always scanned in addition, no need to define them here.
REPORT_TYPE	see section 2
SOURCE	OPSLOG \| HEADER	OPSLOG or HEADER
FITS_KEY_NAME	QC.BIAS.MASTER.MEDIAN etc.
Section 4: Definition of filters If the FILTER tag has been set to YES for a report, the filter needs to be defined here. It is specified by valid shell code, enclosed by &&...&& (end of line).
REPORT_TYPE	BIAS	&&egrep "Medusa1\|Medusa2" \| grep "L543.1"&& (this example finds all entries with string Medusa1 or Medusa2 and L543.1 .

Operational hints.

The tool should catch new opslog entries as soon as possible. Therefore it is wrapped in the workflow tool autoDaily.
By default the tool runs in incremental mode, both to find new OPSLOG entries/LATEST_FILES, and to launch the affected trendPlotter jobs.