QC QC shiftleader: calChecker

Quick overview . Details . calChecker    HC monitor  .   autoDaily .
  DFO monitors | Ganglia | execTimes | CC_POSTIT | HC_POSTIT

Use the qc_shift account on muc02 to login to any of the accounts.
 
what to check how issues and solutions

1. calChecker

Note: you are not required to analyse issues (but you have the license to do so...)

sl icon shiftleader's overview page:
"last refresh": all instruments should display green --> all updates younger than 2 hours (this is conservative; it should run every half hour)

"t_exec": all instruments should display green --> all exec_times below 10 minutes

issue (update older than 1 hr) for one instrument?

YES --> login to account (how?)

possible issues and their solution:
calChecker stuck? ps -efl | grep calChecker

if so, kill and start again (try to find out where and why it was stuck)
load abnormally high?

uptime

if high: try to find out why; call IT, cc lcondere@eso.org for help

cronjob active?

crontab -l | grep calChecker

if not: try to find out why; implement

t_exec higher than 10 min? check if this is an outlier (--> ignore) or a pattern; exec_times could be high by many open, or MISS/NOK analysis cases --> analyze them (if possible!)
   

issue (update older than 1 hr) for all instruments?

YES --> something fatal ongoing --> call SOS

possible candidates:

network problem

e.g.: all processes run but update is not uploaded to web server
database problem isql queries time out; you should get plenty of errors

Put a note on calChecker General news: (here)
(login to stargate1; cd qc/ALL; edit CC_POSTIT; will become visible immediately)

   

calChecker lost its memory (all boxes red), for one instrument

YES --> login to account (how?),
call calChecker -F and check the results

The calChecker page has also documentation about what DHA should check. It is part of the operational agreement for the closed QC loop that they monitor the data transfer and the access to the archive (important for the HC monitor).