10. Alarm

Document ID:

Revision:

2.1

Last modification:

March 18, 2024

Status:

Released

Repository:

https://gitlab.eso.org/cii/info/cii-docs

File:

alarm.rst

Project:

ELT CII

Owner:

Marcus Schilling

Document History

Revision

Date

Changed/ reviewed

Section(s)

Modification

1.0

6.5.2020

jstrnisa/ bterpinc

All

Document creation.

1.1

5.7.2020

bterpinc

4

Added plugin tab example

1.2

15.10.2020

msekoranja

All

RIX updates

1.3

17.11.2020

msekoranja

4.3.2.

Replaced ias-webserver sender with iasWebServer Sender

1.4

13.1.2021

jrepinc

4.2

Alarm Configuration GUI updates

1.5

26.1.2021

jrepinc, msekoranja

4.2

Alarm Configuration GUI updates (RIXes PSI 19, PSI 21). Transfer function alarm priority paragraph added (PSI 39).

2.0

14.9.2022

mschilli

all

CII v3: new tooling and workflow for defining and monitoring alarms

2.1

18.03.2024

mschilli

0

CII v4: Public doc

Confidentiality

This document is classified as Public.

Scope

This document is a manual for the Alarm system of the ELT Core Integration Infrastructure software.

Audience

This document is aimed at Users and Maintainers of the ELT Core Integration Infrastructure software.

Glossary of Terms

API

Application Programming Interface

CII

Core Integration Infrastructure

CLI

Command Line Interface

CDB

IAS configuration database

DP

Data Point

GUI

Graphical User Interface

ES

ElasticSearch

JSON

Javascript object notation

OLDB

Online Database

URI

Uniform Resource Identifier

SVN

Subversion

IAS

Integrated Alarm System

DASU

Distributed Alarm System Unit

ASCE

Alarm System Computing Element

IASIO

Integrated Alarm System Input/Output

References

  1. ESO, Core Integration Infrastructure Requirements Specification, ESO-192922 Version 7

  2. ESO, Integrated Alarm System Architecture, ESO-293482 Version 2

  3. ESO, Integrated Alarm System Design, ESO-299387 Version 1

  4. ESO, Integrated Alarm System General information, https://github.com/IntegratedAlarmSystem-Group/ias/wiki

  5. https://www.eso.org/~eltmgr/CII/latest/manuals/html/docs/services.html

  6. https://www.eso.org/~eltmgr/CII/latest/manuals/html/docs/oldb.html

  1. https://github.com/IntegratedAlarmSystem-Group/ias/wiki/Transfer-functions-how-to

  2. https://github.com/IntegratedAlarmSystem-Group/ias/wiki/Transfer-function-in-python

  3. https://github.com/IntegratedAlarmSystem-Group/integration-tools/blob/develop/docs/DEPLOYMENT_GUIDE.md

  4. https://github.com/IntegratedAlarmSystem-Group/integration-tools/blob/develop/docs/WEB_APPS_DOCUMENTATION.md

  5. https://github.com/IntegratedAlarmSystem-Group/ias/wiki/ConfigurationDatabase

10.1. Overview

This document is a user manual for CII Alarm service. It explains how to set-up, start and stop the alarm system (chapter 3), how to configure alarms (chapter 4), and how to inspect and address alarms (chapter 4), including concrete working examples.

The advanced topics (chapter 5) are mostly only relevant for administrators.

10.2. Introduction

The CII Alarm System is based on the Integrated Alarm System IAS.

It operates on rules that observe the OLDB and decide whether an alarm should be raised. The alarm system is therefore fully configuration-based. An application never directly triggers an alarm. An alarm is represented by an OLDB datapoint as well. These datapoints are owned by the Alarm system, and you cannot write to them. You read these datapoints to find out which alarms are active or not.

IAS is a software whose main purpose is to get alarms and monitor points from different sources and generate alarms to present to the users who can range from operators in the control room up to engineers at their desks. The alarm system is a message-passing facility that routes information about abnormal situations detected from hardware or software to the user. For IAS architecture and design, please refer to the IAS architecture [2] (necessary to read to get to know IAS concepts) and design [3] documents.

IAS is based on pluggable architecture to integrate with external systems. To connect the IAS to the CII, the CII-IAS bridge was developed. The bridge consists of the following components:

  • OLDB Plugin - The long-running process that reads from OLDB and writes (via Kafka) to IAS.

  • OLDB Sink - The long-running process that reads (via Kafka) from IAS and writes to OLDB.

Changes in CII v3:

The following components were retired

  • CII-OLDB Adaptor

  • CII-IAS Config Exporter

  • Alarm GUI webserver

  • WebServer Sender

  • Alarm Config GUI

The following features are not available in this version of the CII Alarm System

  • Acknowledgement

  • Shelving

10.3. Prerequisites

This section describes the prerequisites for using the CII Alarm service.

10.3.1. Set-up

An administrator (with root privileges) has to post-install the Alarm system on the host. The procedure is described in #Advanced-Topics/Installation below.

10.3.2. Start-Stop

Start the Alarm system service(s)

sudo cii-services start alarm

Stop the Alarm system service(s)

sudo cii-services stop alarm

10.4. Usage

The CII Alarm System operates on rules that observe the OLDB and decide whether an alarm should be raised. An application never directly triggers an alarm. An alarm is represented by an OLDB datapoint as well. These datapoints are owned by the Alarm system, and you cannot write to them. You read these datapoints to find out which alarms are active or not.

The general workflow is this:

  1. You define alarm rules

  • Step 1: You write an alarm definition file

  • Step 2: You pass the definitions to the alarm system

  • Step 3: You request the alarm system to read the new definitions

  1. You monitor and trigger alarms

  • Step 4: You monitor the alarm-datapoints

  • Step 5: You write to the input-datapoints

10.4.1. Write Alarm Definitions

Location

Alarm rules and CDB are stored in /etc/cii_ias/

Create/Edit this file:

/etc/cii_ias/alarm-rules.yaml

Example

---
alarmId: /alarm/harmoni/sensor1_alarm
shortDesc: Demo alarm for numerical value (lowOn, highOn)
rule: [ minmax, /harmoni/fcs1/sensor1/value, 30, 50 ]
---
alarmId: /alarm/harmoni/errorstates/tracking_alarm
shortDesc: Demo alarm with a boolean input value
rule: [ bool, /harmoni/fcs1/adc1/trackingfailure ]
---
alarmId: /alarm/harmoni/hitemp/dcs_alarm
shortDesc: Detector High Temperature (highOff, highOn)
rule: [ max, /harmoni/dcs1/detector/temperature1, 70.0, 100.0 ]
---
alarmId: /alarm/harmoni/device/error_alarm
shortDesc: Demo alarm with string input
rule: [ regexp, /harmoni/dcs1/device/error_string, Error.* ]

10.4.1.1. Format

In the alarm definitions file, each rule consists of 4 lines:

---           #1 three minus signs (required)
alarmId:      #2 rule name. must begin with "/alarm/"
shortDesc:    #3 free text describing the alarm
rule:         #4 the alarm function name and its arguments, in list format

10.4.1.2. Alarm functions

These are the available functions for use in the alarm definitions.

Hints

  • Arguments called “dp” are addresses of OLDB datapoints. Give the absolute path inside the oldb, but without the “cii.oldb://” prefix, i.e. the address shall start with a single slash.

  • The DP’s oldb-type must be compatible with the function-arg type. E.g., where an alarm function expects a double, avoid passing a String-DP (OldbString), but you can pass an Int-DP (e.g. OldbInt64Std).

* min (double dp, double lowOn)
  Alarm activates while dp value is below lowOn

* min (double dp, double lowOn, double lowOff)
  Alarm activates when dp value goes below lowOn, deactivates above lowOff.

* max (double dp, double highOn)
  Alarm activates while dp value is above highOn

* max (double dp, double highOff, double highOn)
  Alarm activates when dp value goes above highOn, deactivates below highOff.

* minmax (double dp, double lowOn, double highOn)
  Alarm activates while dp value is below lowOn or above highOn
  In other words, this is minmax without hysteresis.

* minmax (double dp, double lowOn, double lowOff, double highOff, double highOn)
  Alarm activates when dp value goes below lowOn, deactivates above lowOff.
  Analogously for highOn/Off.
  In other words, this is minmax with hysteresis.

* bool (bool dp)
  Alarm activates while dp value is "true"

* regexp (string dp, string regexp)
  Alarm activates while dp value matches the regexp

10.4.2. Inject Definitions

The ciiAlarmLoader tool reads the alarm definition file, and creates alarm config as well as other config in the format understood by IAS: supervisor config, converter config, etc. The IAS keeps this information in the IAS configuration database (CDB).

ciiAlarmLoader  alarm-rules.yaml  .  --check-inputs

The alarm loader will output a message if there are alarm rules for which the input-DPs (datapoint-URIs given as arguments to the alarm functions) are currently not existing in the OLDB. These alarm rules will not be able to work, and the affected alarms will never trigger, until the input-datapoints get created by someone. In real scenarios, the input-datapoints get created by the ECS oldb-loader tool, or by control applications.

Alarm rules for which the input-DPs are existing already at the time of reloading the CDB (see next step) will function instantly. For those with missing inputs, there will be a certain delay after the missing input-DPs were created. The alarm system will discover the new inputs within 60 seconds.

Example

To create the input-DPs for the example rules file above, you can use this shell command:

/usr/bin/env python - <<EOF

import elt.oldb, elt.config
elt.oldb.CiiOldbGlobal.set_write_enabled(True)
oldb = elt.oldb.CiiOldbFactory.get_instance()

for k,v in {
  "/harmoni/fcs1/sensor1/value" : 34,
  "/harmoni/fcs1/adc1/trackingfailure": False,
  "/harmoni/dcs1/detector/temperature1": 56.0,
  "/harmoni/dcs1/device/error_string": "None"
}.items():

try:
  oldb.create_data_point_by_value (elt.config.Uri("cii.oldb://"+k) , v)
except:
  pass

EOF

10.4.3. Request a Reload

Assuming the alarm system is already running, request it to re-read its config.

ciiAlarmCtl daemon reload

The command will write a warning if the alarm system is not running, to inform you that no-one will handle your request. If so, see the #Prerequisites section on how to start it.

With this, the definition of alarms is completed, and you can start using them.

10.4.4. Monitor Alarms

Since the alarms are OLDB datapoints, you can monitor them with the OLDB API, OLDB GUI, or OLDB tools (all described in [6]), for example:

oldb-cli read  </absolute/path/of/alarm-dp>  2>/dev/null | tail -n1

oldb-cli subscribe  </absolute/path/of/alarm-dp>

Moreover, the Alarm System comes with 2 dedicated tools:

1. ciiAlarmMon (command line executable)

ciiAlarmMon /etc/cii_ias

2. Alarm Monitor service (system service)

http://<host_where_ias_runs>:5602/

The Alarm Monitor service also records all alarm state changes in a (rolling) file

/var/tmp/elt/cii-alarm-mon.log

10.4.5. Trigger Alarms

Since the inputs are OLDB datapoints, you can read/write them with the OLDB API, OLDB GUI, or OLDB tools (all described in [6]), for example:

oldb-cli read  </absolute/path/of/input-dp>   2>/dev/null | tail -n1

oldb-cli write  </absolute/path/of/input-dp>  <value>   2>/dev/null | tail -n1

Example 1

This examples uses a built-in demo alarm, so will work on every installation

oldb-cli read /alarm/sandbox/cii_diag/basic_alarm  2>/dev/null | tail -n1
Value:  NOMINAL

oldb-cli read /sandbox/cii_diag/alarm_source_bool  2>/dev/null | tail -n1
Value:  false

oldb-cli write /sandbox/cii_diag/alarm_source_bool true 2>/dev/null | tail -n1
Value:  true

oldb-cli read /alarm/sandbox/cii_diag/basic_alarm  2>/dev/null | tail -n1
Value:  ALARM_PRIORITY2

oldb-cli write /sandbox/cii_diag/alarm_source_bool false 2>/dev/null | tail -n1
Value:  false

Example 2

This example uses the alarms demonstrated earlier, so it can only work if you have followed the above examples.

In terminal A:

ciiAlarmMon /etc/cii_ias

In terminal B:

# Current temperature
oldb-cli read /harmoni/dcs1/detector/temperature1   2>/dev/null | tail -n1
Value:  56.0

# Increase to 90 - no alarm
oldb-cli write /harmoni/dcs1/detector/temperature1 90   2>/dev/null | tail -n1

# Increase temperature above limit - alarm state will change
oldb-cli write /harmoni/dcs1/detector/temperature1 120.0   2>/dev/null | tail -n1

# Decrease to 90 - alarm still on
oldb-cli write /harmoni/dcs1/detector/temperature1 90.0   2>/dev/null | tail -n1

# Decrease to normal - alarm state will change
oldb-cli write /harmoni/dcs1/detector/temperature1 56   2>/dev/null | tail -n1

10.5. Advanced Topics

10.5.1. Administration

10.5.1.1. Installation

The CII-IAS-Bridge and and Alarm tools are included in the ELT DevEnv distribution, but need to be post-installed before they can be used. The post-install routine will also install additional third-party libraries that are not included in the ELT DevEnv (these libraries are only needed on hosts where the CII Alarm System is running).

To set-up the alarm system on a host, do (as root):

/elt/ciisrv/postinstall/cii-postinstall alarm

10.5.1.2. Auto-start

Auto-start the Alarm system on boot (requires root privs)

sudo cii-services --beta enable alarm

10.5.1.3. Health Checks

Check status of Alarm system

cii-services --beta info

The relevant lines in the output are “ias”, “kafka”, “zookeeper”, and “alarm-mon”

10.5.1.4. Log Configuration

Redefine log levels (requires root privs)

# IAS-Bridge
/elt/ciisrv/etc/log/cii-ias-bridge-logging.xml

# Alarm Monitor
/elt/ciisrv/etc/log/cii-alarm-mon-logging.json

10.5.1.5. Logs

Inspect Logs (requires root privs)

journalctl -e  -u cii_ias  -u cii_alarm_mon

10.5.2. Troubleshooting

1) Alarm-datapoint has value UNKNOWN (instead of NOMINAL etc.):

Try ciiAlarmCtl daemon reload, or (as root) systemctl restart cii_ias

2) To see the logs from the ias processes (requires root privs):

Use journalctl -f  -u cii_ias  -u cii_alarm_mon  -u kafka  -u kafka-zookeeper .

The ias core logs are in /opt/IasRoot/logs, /var/log/kafka, /var/log/kafka-zookeeper .

3) For debugging, it is possible to use the internal tools that come with IAS:

Set up the shell environment as described in #Run-IAS-directly. This enables you to run, e.g.:

  • ciiAlarmCtl check heartbeat

  • iasDumpKafkaTopic

  • Ex.1: iasDumpKafkaTopic -t plugin

  • Ex.2: iasDumpKafkaTopic -t core | grep <AlarmId>

  • ciiAlarmCtl check cdb

  • iasCdbChecker -jCdb $ALARM_CDB_ROOT

4) Visit the CII Knowledge Base: https://gitlab.eso.org/ecs/eltsw-docs/-/wikis/KnowledgeBase/CII

10.5.3. Modify CDB directly

The IAS offers a number of additional configuration options, but they require an understanding of the internals of the IAS. Before modifying the IAS CDB directly, read the detailed description of the Alarm system architecture [2], and for all details on alarm configuration attributes see [4] and [12].

Table 1 contains a brief explanation of the configuration elements used in IAS. For the proper explanation of the elements, refer to the IAS documentation [2], [3], [4].

Table 1: Alarm system elements

Entity Name

Description

ASCE

An alarm system computing element that runs inside DASU and contains one transfer function.

DASU

An alarm system element that runs inside Supervisor and contains one or many ASCEs.

IASIO

An alarm system input/output entity.

Supervisor

A standalone process that contains one or many DASUs.

Transfer Function

A transformation function that calculates alarm states. It receives and outputs IASIOs.

Plugin

A plugin for IAS system that interfaces the IAS system with external systems (in CII case OLDB).

10.5.3.1. General Settings

General settings are configured in CDB/ias.json

Set LogLevel, refresh rate, etc. according to the IAS CDB documentation [12].

10.5.3.2. IASIOs

IASIOs are input and output entities of the Alarm system. To define a new IASIO, go to CDB/IASIO/iasios.json

For the IASIO, write the URI of the input data point (e.g. “/alarmtest/device/motor/input_int_dp”, i.e. the relative path of the input data point “cii.oldb:/alarmtest/device/motor/input_int_dp”) and set the type (e.g INT).

Create another IASIO for the output alarm, set the id (e.g. “/motor/input_int_dp_alarm”). The type must be set to “ALARM”. You can define if the operator can suppress/hide the alarm by setting canShelve to “True”.

Other values for both IASIOs can be left empty.

The fields shortDesc, docURL and emails are optional, they provide additional information on the IASIO. Also note that the fields canShelve, sound, and emails are only applicable to IASIOs of type ALARM. Template field allows templatization of the IASIO (refer to the IAS documentation on templatization).

10.5.3.3. Transfer Functions

Alarm system output IASIOs are calculated by Transfer Functions. Transfer functions are in the IAS domain and must be implemented and built there. IAS already provides a small set of Transfer functions. Please refer [8] to [9] and for details on Transfer functions. To import a new transfer function, go to CDB/TF/tfs.json and add a new entry.

To register a newly implemented transfer function, write the className (e.g. “org.eso.ias.asce.transfer.impls.MinMaxThresholdTF”) and implLang (e.g. “SCALA”).

10.5.3.4. DASUs

DASUs are containers for ASCE entities, that make alarm calculations using the Transfer functions. First DASU needs to be defined followed by its ASCE(s) with Transfer function to be used.

To define a new DASU, go to CDB/DASU/ and create a new file, or copy an existing one.

Set the name of the DASU file (e.g.”int_alarm_dasu.json”). Leave the other properties empty for the time being. The default logging level will be used and there will be no templatization of the DASU.

Go to CDB/ASCE/ and define a new ASCE file (e.g. “int_alarm_asce.json”). For the transferFunctionID choose one of the transfer function names defined earlier (e.g. “org.eso.ias.asce.transfer.impls.MinMaxThresholdTF”). In the outputId field enter the output IASIO defined earlier (e.g. “/motor/input_int_dp_alarm”). For the dasuId, write the name of the DASU defined above, (e.g. “int_alarm_dasu”).

For the Inputs, add as an input the IASIO name (e.g. “/alarmtest/device/motor/input_int_dp”) to be the (only) input of the ASCE. In the properties, specify the parameters for the transfer function (e.g. “org.eso.ias.tf.minmaxthreshold.highOn” with value 5, “org.eso.ias.tf.minmaxthreshold.highOff” with value 1)

Note: For the MinMaxThresholdTF transfer function, analogous lower limits minmaxthreshold.lowOff and minmaxthreshold.lowOn can be defined. If both upper and lower limits are defined in the same ASCE, the alarm will not distinguish between the input value going above or below the limits. To distinguish between these two cases, one needs to define two ASCE with the same input, one checking for the high value and the other checking for the low value.

The Alarm priority is determined by the Transfer function. Some implementations allow the priority to be configurable, e.g. org.eso.ias.asce.transfer.impls.MinMaxThresholdTF has a property named org.eso.ias.tf.alarm.priority for that reason. Valid values are: SET_CRITICAL, SET_HIGH, SET_MEDIUM (default), SET_LOW.

To add the ASCE to deploy in the DASU, write the ASCE name into the DASU file (in CDB/DASU/) of the respective DASU (i.e. int_alarm_asce)

To define the output of the DASU, edit the DASU file, and write the outputId as defined in the IASIO file (i.e. “/motor/input_int_dp_alarm”).

Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.

10.5.3.5. Supervisor

To deploy DASUs we need to define a Supervisor first. Supervisor is a stand-alone process that hosts DASUs among with its ASCEs including needed Transfer Function processing.

To define a new Supervisor, go to CDB/Supervisors and create a new file, or duplicate an existing one.

Set the supervisor’s name (e.g. “test_supervisor”), and assign the hostname (e.g. “localhost”). In the field DASUs to Deploy, write the name of the DASU defined above, leave templateId and instance empty, meaning the DASU will not be processed as a template.

The template field defines what template to apply and the instance field defines the instance number used to apply when processing the template. For more information on templates refer to the IAS documentation.

Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.

10.5.3.6. Plugin

The OLDB Plugin configuration foremost lists the set of IASIO IDs (each of them corresponding to an OLDB data point) to be monitored by the particular Plugin instance.

To configure how frequently the plugin should check for newly created input-DPs, set the property dpconnRetrySec, the default value is 60.

To define a new Plugin, go to CDB/Plugins/ and create a new file, or duplicate an existing one. Name the new plugin “CiiPlugin” and set the monitored system to “OLDB”. The values defines a set of data points to monitor. Add a new monitored value, and select the desired input. If needed change the “Refresh time” (for how long a value is valid if not updated) or other attributes.

Finally, save your configuration changes, and ask the Alarm system to re-read its config, as described in section #Usage above.

10.5.4. Run IAS directly

This is not the recommended way, because it is more involved and complicated. But if you cannot use the alarm system as a system service for some reason, it is possible to run it “in user space”.

When you run the alarm system in this way, you have the freedom to store the alarm rules file(s) and CDB outside the standard location, i.e. not in /etc/cii_ias.

Run these commands as eltdev (or other user with write-permissions on the installation folders).

Set-up

mkdir -p $HOME/modulefiles
UPDATES=https://www.eso.org/~eltmgr/CII/latest/install
wget -q -O $HOME/modulefiles/cii_ias.lua  $UPDATES/ias/cii_ias.lua-4.0.0
module load cii_ias

Start

ciiAlarmCtl  start  all+

Check

ciiAlarmCtl  is-active  all+

Reload

ciiAlarmCtl  reload cdb

Stop

ciiAlarmCtl  stop   all+