10. Alarm¶
Revision: |
|
---|---|
Status: |
Released |
Repository: |
|
Project: |
ELT CII |
Folder: |
userManual/ciiman/src/docs |
Document ID: |
|
File: |
alarm.rst |
Owner: |
Matej Šekoranja |
Last modification: |
September 14, 2022 |
Created: |
July 5, 2019 |
Prepared by |
Reviewed by |
Approved by |
---|---|---|
Jernej Strniša (CSL) Matej Šekoranja (CSL SWE) |
Borut Terpinc (CSL SWE) Jan Pribošek (CSL SWE) |
Gregor Čuk (CSL) |
Revision |
Date |
Changed/ reviewed |
Section(s) |
Modification |
---|---|---|---|---|
1.0 |
6.5.2020 |
jstrnisa/ bterpinc |
All |
Document creation. |
1.1 |
5.7.2020 |
bterpinc |
4 |
Added plugin tab example |
1.2 |
15.10.2020 |
msekoranja |
All |
RIX updates |
1.3 |
17.11.2020 |
msekoranja |
4.3.2. |
Replaced ias-webserver sender with iasWebServer Sender |
1.4 |
13.1.2021 |
jrepinc |
4.2 |
Alarm Configuration GUI updates |
1.5 |
26.1.2021 |
jrepinc, msekoranja |
4.2 |
Alarm Configuration GUI updates (RIXes PSI 19, PSI 21). Transfer function alarm priority paragraph added (PSI 39). |
2.0 |
14.9.2022 |
mschilli |
all |
CII v3: new tooling and workflow for defining and monitoring alarms |
Confidentiality
This document is classified as a confidential document. As such, it or parts thereof must not be made accessible to anyone not listed in the Audience section, neither in electronic nor in any other form.
Scope
This document is manual for the Alarm system used by ESO employees involved with the ELT Core Integration Infrastructure Software project.
Audience
This document is aimed at those Cosylab and ESO employees involved with the ELT Core Integration Infrastructure Software project, as well as other Users and Maintainers of the ELT Core Integration Infrastructure Software.
Glossary of Terms
API |
Application Programming Interface |
---|---|
CII |
Core Integration Infrastructure |
CLI |
Command Line Interface |
CDB |
IAS configuration database |
DP |
Data Point |
GUI |
Graphical User Interface |
ES |
ElasticSearch |
JSON |
Javascript object notation |
OLDB |
Online Database |
URI |
Uniform Resource Identifier |
SVN |
Subversion |
IAS |
Integrated Alarm System |
DASU |
Distributed Alarm System Unit |
ASCE |
Alarm System Computing Element |
IASIO |
Integrated Alarm System Input/Output |
References
ESO, Core Integration Infrastructure Requirements Specification, ESO-192922 Version 7
ESO, Integrated Alarm System Architecture, ESO-293482 Version 2
ESO, Integrated Alarm System Design, ESO-299387 Version 1
ESO, Integrated Alarm System General information, https://github.com/IntegratedAlarmSystem-Group/ias/wiki
https://www.eso.org/~eltmgr/CII/latest/manuals/html/docs/services.html
https://www.eso.org/~eltmgr/CII/latest/manuals/html/docs/oldb.html
https://github.com/IntegratedAlarmSystem-Group/ias/wiki/Transfer-functions-how-to
https://github.com/IntegratedAlarmSystem-Group/ias/wiki/Transfer-function-in-python
https://github.com/IntegratedAlarmSystem-Group/ias/wiki/ConfigurationDatabase
10.1. Overview¶
This document is a user manual for CII Alarm service. It explains how to set-up, start and stop the alarm system (chapter 3), how to configure alarms (chapter 4), and how to inspect and address alarms (chapter 4), including concrete working examples.
The advanced topics (chapter 5) are mostly only relevant for administrators.
10.2. Introduction¶
The CII Alarm System is based on the Integrated Alarm System IAS.
It operates on rules that observe the OLDB and decide whether an alarm should be raised. The alarm system is therefore fully configuration-based. An application never directly triggers an alarm. An alarm is represented by an OLDB datapoint as well. These datapoints are owned by the Alarm system, and you cannot write to them. You read these datapoints to find out which alarms are active or not.
IAS is a software whose main purpose is to get alarms and monitor points from different sources and generate alarms to present to the users who can range from operators in the control room up to engineers at their desks. The alarm system is a message-passing facility that routes information about abnormal situations detected from hardware or software to the user. For IAS architecture and design, please refer to the IAS architecture [2] (necessary to read to get to know IAS concepts) and design [3] documents.
IAS is based on pluggable architecture to integrate with external systems. To connect the IAS to the CII, the CII-IAS bridge was developed. The bridge consists of the following components:
OLDB Plugin - The long-running process that reads from OLDB and writes (via Kafka) to IAS.
OLDB Sink - The long-running process that reads (via Kafka) from IAS and writes to OLDB.
Changes in CII v3:
The following components were retired
CII-OLDB Adaptor
CII-IAS Config Exporter
Alarm GUI webserver
WebServer Sender
Alarm Config GUI
The following features are not available in this version of the CII Alarm System
Acknowledgement
Shelving
10.3. Prerequisites¶
This section describes the prerequisites for using the CII Alarm service.
10.3.1. Set-up¶
An administrator (with root privileges) has to post-install the Alarm system on the host. The procedure is described in #Advanced-Topics/Installation below.
10.3.2. Start-Stop¶
Start the Alarm system service(s)
sudo cii-services start alarm
Stop the Alarm system service(s)
sudo cii-services stop alarm
10.4. Usage¶
The CII Alarm System operates on rules that observe the OLDB and decide whether an alarm should be raised. An application never directly triggers an alarm. An alarm is represented by an OLDB datapoint as well. These datapoints are owned by the Alarm system, and you cannot write to them. You read these datapoints to find out which alarms are active or not.
The general workflow is this:
You define alarm rules
Step 1: You write an alarm definition file
Step 2: You pass the definitions to the alarm system
Step 3: You request the alarm system to read the new definitions
You monitor and trigger alarms
Step 4: You monitor the alarm-datapoints
Step 5: You write to the input-datapoints
10.4.1. Write Alarm Definitions¶
- Location
Alarm rules and CDB are stored in /etc/cii_ias/
Create/Edit this file:
/etc/cii_ias/alarm-rules.yaml
Example
---
alarmId: /alarm/harmoni/sensor1_alarm
shortDesc: Demo alarm for numerical value (lowOn, highOn)
rule: [ minmax, /harmoni/fcs1/sensor1/value, 30, 50 ]
---
alarmId: /alarm/harmoni/errorstates/tracking_alarm
shortDesc: Demo alarm with a boolean input value
rule: [ bool, /harmoni/fcs1/adc1/trackingfailure ]
---
alarmId: /alarm/harmoni/hitemp/dcs_alarm
shortDesc: Detector High Temperature (highOff, highOn)
rule: [ max, /harmoni/dcs1/detector/temperature1, 70.0, 100.0 ]
---
alarmId: /alarm/harmoni/device/error_alarm
shortDesc: Demo alarm with string input
rule: [ regexp, /harmoni/dcs1/device/error_string, Error.* ]
10.4.1.1. Format¶
In the alarm definitions file, each rule consists of 4 lines:
--- #1 three minus signs (required)
alarmId: #2 rule name. must begin with "/alarm/"
shortDesc: #3 free text describing the alarm
rule: #4 the alarm function name and its arguments, in list format
10.4.1.2. Alarm functions¶
These are the available functions for use in the alarm definitions.
Hints
Arguments called “dp” are addresses of OLDB datapoints. Give the absolute path inside the oldb, but without the “cii.oldb://” prefix, i.e. the address shall start with a single slash.
The DP’s oldb-type must be compatible with the function-arg type. E.g., where an alarm function expects a double, avoid passing a String-DP (OldbString), but you can pass an Int-DP (e.g. OldbInt64Std).
* min (double dp, double lowOn)
Alarm activates while dp value is below lowOn
* min (double dp, double lowOn, double lowOff)
Alarm activates when dp value goes below lowOn, deactivates above lowOff.
* max (double dp, double highOn)
Alarm activates while dp value is above highOn
* max (double dp, double highOff, double highOn)
Alarm activates when dp value goes above highOn, deactivates below highOff.
* minmax (double dp, double lowOn, double highOn)
Alarm activates while dp value is below lowOn or above highOn
In other words, this is minmax without hysteresis.
* minmax (double dp, double lowOn, double lowOff, double highOff, double highOn)
Alarm activates when dp value goes below lowOn, deactivates above lowOff.
Analogously for highOn/Off.
In other words, this is minmax with hysteresis.
* bool (bool dp)
Alarm activates while dp value is "true"
* regexp (string dp, string regexp)
Alarm activates while dp value matches the regexp
10.4.2. Inject Definitions¶
The ciiAlarmLoader tool reads the alarm definition file, and creates alarm config as well as other config in the format understood by IAS: supervisor config, converter config, etc. The IAS keeps this information in the IAS configuration database (CDB).
ciiAlarmLoader alarm-rules.yaml . --check-inputs
The alarm loader will output a message if there are alarm rules for which the input-DPs (datapoint-URIs given as arguments to the alarm functions) are currently not existing in the OLDB. These alarm rules will not be able to work, and the affected alarms will never trigger, until the input-datapoints get created by someone. In real scenarios, the input-datapoints get created by the ECS oldb-loader tool, or by control applications.
Alarm rules for which the input-DPs are existing already at the time of reloading the CDB (see next step) will function instantly. For those with missing inputs, there will be a certain delay after the missing input-DPs were created. The alarm system will discover the new inputs within 60 seconds.
Example
To create the input-DPs for the example rules file above, you can use this shell command:
/usr/bin/env python - <<EOF
import elt.oldb, elt.config
elt.oldb.CiiOldbGlobal.set_write_enabled(True)
oldb = elt.oldb.CiiOldbFactory.get_instance()
for k,v in {
"/harmoni/fcs1/sensor1/value" : 34,
"/harmoni/fcs1/adc1/trackingfailure": False,
"/harmoni/dcs1/detector/temperature1": 56.0,
"/harmoni/dcs1/device/error_string": "None"
}.items():
try: oldb.create_data_point_by_value (elt.config.Uri("cii.oldb://"+k) , v)
except: pass
EOF
10.4.3. Request a Reload¶
Assuming the alarm system is already running, request it to re-read its config.
ciiAlarmCtl daemon reload
The command will write a warning if the alarm system is not running, to inform you that no-one will handle your request. If so, see the #Prerequisites section on how to start it.
With this, the definition of alarms is completed, and you can start using them.
10.4.4. Monitor Alarms¶
Since the alarms are OLDB datapoints, you can monitor them with the OLDB API, OLDB GUI, or OLDB tools (all described in [6]), for example:
oldb-cli read </absolute/path/of/alarm-dp> 2>/dev/null | tail -n1
oldb-cli subscribe </absolute/path/of/alarm-dp>
Moreover, the Alarm System comes with 2 dedicated tools:
1. ciiAlarmMon (command line executable)
ciiAlarmMon /etc/cii_ias
2. Alarm Monitor service (system service)
http://<host_where_ias_runs>:5602/
The Alarm Monitor service also records all alarm state changes in a (rolling) file
/var/log/elt/ci-alarm-mon.log
10.4.5. Trigger Alarms¶
Since the inputs are OLDB datapoints, you can read/write them with the OLDB API, OLDB GUI, or OLDB tools (all described in [6]), for example:
oldb-cli read </absolute/path/of/input-dp> 2>/dev/null | tail -n1
oldb-cli write </absolute/path/of/input-dp> <value> 2>/dev/null | tail -n1
Example 1
This examples uses a built-in demo alarm, so will work on every installation
oldb-cli read /alarm/sandbox/cii_diag/basic_alarm 2>/dev/null | tail -n1
Value: NOMINAL
oldb-cli read /sandbox/cii_diag/alarm_source_bool 2>/dev/null | tail -n1
Value: false
oldb-cli write /sandbox/cii_diag/alarm_source_bool true 2>/dev/null | tail -n1
Value: true
oldb-cli read /alarm/sandbox/cii_diag/basic_alarm 2>/dev/null | tail -n1
Value: ALARM_PRIORITY2
oldb-cli write /sandbox/cii_diag/alarm_source_bool false 2>/dev/null | tail -n1
Value: false
Example 2
This example uses the alarms demonstrated earlier, so it can only work if you have followed the above examples.
In terminal A:
ciiAlarmMon /etc/cii_ias
In terminal B:
# Current temperature
oldb-cli read /harmoni/dcs1/detector/temperature1 2>/dev/null | tail -n1
Value: 56.0
# Increase to 90 - no alarm
oldb-cli write /harmoni/dcs1/detector/temperature1 90 2>/dev/null | tail -n1
# Increase temperature above limit - alarm state will change
oldb-cli write /harmoni/dcs1/detector/temperature1 120.0 2>/dev/null | tail -n1
# Decrease to 90 - alarm still on
oldb-cli write /harmoni/dcs1/detector/temperature1 90.0 2>/dev/null | tail -n1
# Decrease to normal - alarm state will change
oldb-cli write /harmoni/dcs1/detector/temperature1 56 2>/dev/null | tail -n1
10.5. Advanced Topics¶
10.5.1. Administration¶
10.5.1.1. Installation¶
The CII-IAS-Bridge and and Alarm tools are included in the ELT DevEnv distribution, but need to be post-installed before they can be used. The post-install routine will also install additional third-party libraries that are not included in the ELT DevEnv (these libraries are only needed on hosts where the CII Alarm System is running).
To set-up the alarm system on a host, do (as root):
/elt/ciisrv/postinstall/cii-postinstall alarm
10.5.1.2. Auto-start¶
Auto-start the Alarm system on boot (requires root privs)
sudo cii-services --beta enable alarm
10.5.1.3. Health Checks¶
Check status of Alarm system
cii-services --beta info
The relevant lines in the output are “ias”, “kafka”, “zookeeper”, and “alarm-mon”
10.5.1.4. Log Configuration¶
Redefine log levels (requires root privs)
# IAS-Bridge
/elt/ciisrv/etc/log/cii-ias-bridge-logging.xml
# Alarm Monitor
/elt/ciisrv/etc/log/cii-alarm-mon-logging.json
10.5.2. Troubleshooting¶
1) Alarm-datapoint has value UNKNOWN (instead of NOMINAL etc.):
Try
ciiAlarmCtl daemon reload
, or (as root)systemctl restart cii_ias
2) To see the logs from the ias processes (requires root privs):
Use
journalctl -f -u cii_ias -u cii_alarm_mon -u kafka -u kafka-zookeeper
.The ias core logs are in
/opt/IasRoot/logs
,/var/log/kafka
,/var/log/kafka-zookeeper
.
3) For debugging, it is possible to use the internal tools that come with IAS:
Set up the shell environment as described in #Run-IAS-directly. This enables you to run, e.g.:
ciiAlarmCtl check heartbeat
iasDumpKafkaTopic
Ex.1:
iasDumpKafkaTopic -t plugin
Ex.2:
iasDumpKafkaTopic -t core | grep <AlarmId>
ciiAlarmCtl check cdb
iasCdbChecker -jCdb $ALARM_CDB_ROOT
4) Visit the CII Knowledge Base: https://gitlab.eso.org/ecs/eltsw-docs/-/wikis/KnowledgeBase/CII
10.5.3. Modify CDB directly¶
The IAS offers a number of additional configuration options, but they require an understanding of the internals of the IAS. Before modifying the IAS CDB directly, read the detailed description of the Alarm system architecture [2], and for all details on alarm configuration attributes see [4] and [12].
Table 1 contains a brief explanation of the configuration elements used in IAS. For the proper explanation of the elements, refer to the IAS documentation [2], [3], [4].
Table 1: Alarm system elements
Entity Name |
Description |
---|---|
ASCE |
An alarm system computing element that runs inside DASU and contains one transfer function. |
DASU |
An alarm system element that runs inside Supervisor and contains one or many ASCEs. |
IASIO |
An alarm system input/output entity. |
Supervisor |
A standalone process that contains one or many DASUs. |
Transfer Function |
A transformation function that calculates alarm states. It receives and outputs IASIOs. |
Plugin |
A plugin for IAS system that interfaces the IAS system with external systems (in CII case OLDB). |
10.5.3.1. General Settings¶
General settings are configured in CDB/ias.json
Set LogLevel, refresh rate, etc. according to the IAS CDB documentation [12].
10.5.3.2. IASIOs¶
IASIOs are input and output entities of the Alarm system. To define a
new IASIO, go to CDB/IASIO/iasios.json
For the IASIO, write the URI of the input data point (e.g. “/alarmtest/device/motor/input_int_dp”, i.e. the relative path of the input data point “cii.oldb:/alarmtest/device/motor/input_int_dp”) and set the type (e.g INT).
Create another IASIO for the output alarm, set the id (e.g. “/motor/input_int_dp_alarm”).
The type must be set to “ALARM”. You can define if the operator can suppress/hide the alarm by
setting canShelve
to “True”.
Other values for both IASIOs can be left empty.
The fields shortDesc
, docURL
and emails
are optional, they provide
additional information on the IASIO. Also note that the fields canShelve
, sound
, and
emails
are only applicable to IASIOs of type ALARM.
Template field allows templatization of the
IASIO (refer to the IAS documentation on templatization).
10.5.3.3. Transfer Functions¶
Alarm system output IASIOs are calculated by Transfer Functions.
Transfer functions are in the IAS domain and must be implemented and
built there. IAS already provides a small set of Transfer functions.
Please refer [8] to [9] and for details on Transfer functions. To import
a new transfer function, go to CDB/TF/tfs.json
and add a new entry.
To register a newly implemented transfer function, write the className
(e.g. “org.eso.ias.asce.transfer.impls.MinMaxThresholdTF”) and implLang
(e.g. “SCALA”).
10.5.3.4. DASUs¶
DASUs are containers for ASCE entities, that make alarm calculations using the Transfer functions. First DASU needs to be defined followed by its ASCE(s) with Transfer function to be used.
To define a new DASU, go to CDB/DASU/
and create a new file,
or copy an existing one.
Set the name of the DASU file (e.g.”int_alarm_dasu.json”). Leave the other properties empty for the time being. The default logging level will be used and there will be no templatization of the DASU.
Go to CDB/ASCE/
and define a new ASCE file (e.g. “int_alarm_asce.json”).
For the transferFunctionID
choose one of the transfer function names
defined earlier (e.g. “org.eso.ias.asce.transfer.impls.MinMaxThresholdTF”).
In the outputId
field enter the output IASIO
defined earlier (e.g. “/motor/input_int_dp_alarm”).
For the dasuId
, write the name of the DASU defined above, (e.g. “int_alarm_dasu”).
For the Inputs, add as an input the IASIO name (e.g. “/alarmtest/device/motor/input_int_dp”) to be the (only) input of the ASCE. In the properties, specify the parameters for the transfer function (e.g. “org.eso.ias.tf.minmaxthreshold.highOn” with value 5, “org.eso.ias.tf.minmaxthreshold.highOff” with value 1)
Note: For the MinMaxThresholdTF transfer function, analogous lower limits minmaxthreshold.lowOff and minmaxthreshold.lowOn can be defined. If both upper and lower limits are defined in the same ASCE, the alarm will not distinguish between the input value going above or below the limits. To distinguish between these two cases, one needs to define two ASCE with the same input, one checking for the high value and the other checking for the low value.
The Alarm priority is determined by the Transfer function. Some implementations allow the priority to be configurable, e.g. org.eso.ias.asce.transfer.impls.MinMaxThresholdTF has a property named org.eso.ias.tf.alarm.priority for that reason. Valid values are: SET_CRITICAL, SET_HIGH, SET_MEDIUM (default), SET_LOW.
To add the ASCE to deploy in the DASU, write the ASCE name into the DASU file
(in CDB/DASU/
) of the respective DASU (i.e. int_alarm_asce
)
To define the output of the DASU, edit the DASU file, and write the
outputId
as defined in the IASIO file (i.e. “/motor/input_int_dp_alarm”).
Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.
10.5.3.5. Supervisor¶
To deploy DASUs we need to define a Supervisor first. Supervisor is a stand-alone process that hosts DASUs among with its ASCEs including needed Transfer Function processing.
To define a new Supervisor, go to CDB/Supervisors
and create a new file,
or duplicate an existing one.
Set the supervisor’s name (e.g. “test_supervisor”),
and assign the hostname (e.g. “localhost”).
In the field DASUs to Deploy
, write the name of the DASU defined above,
leave templateId
and instance
empty, meaning the
DASU will not be processed as a template.
The template
field defines what template to apply and the instance
field
defines the instance number used to apply when processing the template. For
more information on templates refer to the IAS documentation.
Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.
10.5.3.6. Plugin¶
The OLDB Plugin configuration foremost lists the set of IASIO IDs (each of them corresponding to an OLDB data point) to be monitored by the particular Plugin instance.
To configure how frequently the plugin should check for newly created input-DPs,
set the property dpconnRetrySec
, the default value is 60.
To define a new Plugin, go to CDB/Plugins/
and create a new file,
or duplicate an existing one. Name the new plugin “CiiPlugin” and set
the monitored system to “OLDB”. The values
defines a set of data points to monitor.
Add a new monitored value, and select the desired input. If needed
change the “Refresh time” (for how long a value is valid if not updated)
or other attributes.
Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.
10.5.4. Run IAS directly¶
This is not the recommended way, because it is more involved and complicated. But if you cannot use the alarm system as a system service for some reason, it is possible to run it “in user space”.
When you run the alarm system in this way, you have the freedom to store the alarm rules file(s) and CDB outside the standard location, i.e. not in /etc/cii_ias.
Run these commands as eltdev (or other user with write-permissions on the installation folders).
Set-up
mkdir -p $HOME/modulefiles
UPDATES=https://www.eso.org/~eltmgr/CII/latest/install
wget -q -O $HOME/modulefiles/cii_ias.lua $UPDATES/ias/cii_ias.lua-4.0.0
module load cii_ias
Start
ciiAlarmCtl start all+
Check
ciiAlarmCtl is-active all+
Reload
ciiAlarmCtl reload cdb
Stop
ciiAlarmCtl stop all+