System Supervisor

The System Supervisor provides the functionality for supervision and monitoring of a configurable set of subsystems.

alternate text

System Supervisor.

The main components of the System Supervisor server are:

  • State Machine engine based on SCXML and implemented in RAD. It contains a set of action and activity classes.

  • A Subsystem Factory class that creates the instances of all subsystem classes at start-up and based on the server configuration.

  • A Facade class that manages the interface between the state machine engine and the subsystem classes.

The System Supervisor uses the Redis Database to store run-time information about itself and about the subsystems it monitors. The System Supervisor subscribes to the status information published by the subsystems. The System Supervisor publishes its own status as well like any other subsystem.

System Supervisor State Machine

The System Supervisor uses a state machine described in a SCXML format that is interpreted by the state machine engine provided by the rad application framework. (SCXML specification).

alternate text

System Supervisor State Machine Diagram.

Off –> NotReady, event: Startup

The System Supervisor starts up and goes automatically to NotOperational/NotReady. Main server objects are instantiated including the basic application that uses the State Machine engine. The System Supervisor reads its own configuration and completes its initialisation. The system supervisor connects to the configured subsystems and request its current status. With this information it computes the estimated state of the system. The System Supervisor will adapt its own state according the overall state of the subsystems. The above means, that if all the subsystems are Operational/Idle, the System Supervisor will trigger an internal event to go to Operational/Idle. The same happens in any state, which means the System Supervisor could go from Operational/Idle back to NotOperational/NotReady.

NotReady –> Ready, event: Init

The Supervisor dispatch the Init command to all the configured subsystems whose access is enabled. Depending of the replied received from the subsystems it will go to Ready substate or remain in substate NotReady. If at least one command returns with an error or timeout this will prevent the Supervisor reaching Ready substate.

NotOperational/Ready –> Operational/Idle, event: Enable

The System Supervisor goes through the Enabling transient state. If configured subsystems are already Operational, the Supervisor does not affect their state and goes immediately to the Operational state. If subsystems are not operational, System Supervisor will dispatch the Enable request to each of the configured subsystems whose access is enabled . If at least one request fails, it will reply with a failure remaining in NotOperational/Ready state.

Operational –> NotOperational/Ready, event: Disable

The System Supervisor is moved back to NotOperational/Ready substate. However if all subsystems are Operational, the System Supervisor will remain Operational since its state reflects the state of the subsystems it coordinates.

NotOperational/Ready –> NotOperational/NotReady, event Reset

The System Supervisor is moved back to NotOperational/NotReady substate. The subsystems are not affected by this transition and keep its actual state/substate.

Configuration

System Supervisor Configuration

The server configuration is a file written in yaml format. (YAML specification). YAML is easy to read format that has been adopted temporary until the integration with CII configuration services.

Many resources about YAML can be found on the web. One could also validate the format online, see http://yaml.org/spec/

Note

The server configuration has been ported to CII Config-ng in the IFW version 4.0.

Note

The entry point for the System Supervisor configuration is the file that contains the server configuration.

server::server_id

This is the id associated with the specific server. This id is used to associate all server configuration parameters as well as the prefix for the DB keys.

server::req_endpoint

This is the endpoint for CII MAL request/reply. The server will listen to incoming commands using this endpoint.

server::pub_endpoint

This is the endpoint for CII MAL pub/sub. The server will publish device topics using this endpoint.

server::db_endpoint

This is the endpoint used by the server for connecting to the Redis DB.

server::db_timeout

This is the server timeout for connecting to the Redis DB.

server::scxml

This is the state machine specification file used by the server.

server::dictionaries

This is the vector of dictionaries to be used by the server.

server::oldb_prefix

This is the prefix to be used for the DB. This prefix is meant to identify uniquely a given system, e.g. micado.

server::log_properties

log4cplus property file to be used by the server.

server::mon_timeout

Monitor timeout for waiting to establish connections to the subsystems. This value should be rarely bigger than few seconds. Default is 1000 [ms].

server::req_timeout

General command timeout for sending commands to subsystems.

server::ob_modes

Vector of observation modes supported by the instrument. Each observation mode defines a list of associated subsystems.

server::subsystems

This is the vector of subsystems active in the supervisor configuration. Only subsystems listed here will be managed by the supervisor.

Each Subsystem has its own set of configuration parameters

<subsystem id>::scope

This is the scope of each subsystems. It can be internal or external. Requests are not forwarded to external subsystems and the System Supervisor only monitors them.

<subsystem id>::type

This is the subsystem type class. Normally subsystem will use the provided class: sup::syssup::common::Generic

<subsystem id>::rr_endpoint

This is the endpoint for the subsystem CII MAL request/reply. The subsystem listen to incoming commands using this endpoint.

<subsystem id>::ps_endpoint

This is the endpoint for the subsystem CII MAL pub/sub. The subsystem publish its status using this endpoint.

<subsystem id>::access

This is a flag to enable/disable accessibility of a subsystem.

An example of a server configuration is provided below.

!cfg.include config/sup/syssup/server/definitions.yaml:
server: !cfg.type:SysSup
    server_id       : 'sup'
    req_endpoint    : "zpb.rr://127.0.0.1:13082/"
    pub_endpoint    : "zpb.ps://127.0.0.1:13345/"
    db_endpoint     : "127.0.0.1:6379"
    db_timeout      : 2
    scxml           : "sup/syssup/server/sm.xml"
    log_properties  : "config/sup/syssup/server/log_properties.cfg"
    oldb_prefix     : "ins1"
    req_timeout     : 60000
    ob_modes        : [
    {
    name: Engineering,
    subsystems: ['fcs1','dummy1']
    },
    {
    name: Imaging,
    subsystems: ['dummy2']
    }
    ]
    subsystems      : [
    {
    name: 'fcs1',
    scope: internal,
    type: sup::syssup::common::Generic,
    rr_endpoint: "zpb.rr://127.0.0.1:15085/StdCmds",
    ps_endpoint: "zpb.ps://127.0.0.1:15045/",
    access: true
    },
    {
    name: 'dummy1',
    scope: internal,
    type: sup::syssup::common::Generic,
    rr_endpoint: "zpb.rr://127.0.0.1:15086/StdCmds",
    ps_endpoint: "zpb.ps://127.0.0.1:15046/",
    access: false
    },
    {
    name: 'dummy2',
    scope: internal,
    type: sup::syssup::common::Generic,
    rr_endpoint: "zpb.rr://127.0.0.1:15087/StdCmds",
    ps_endpoint: "zpb.ps://127.0.0.1:15047/",
    access: true
    }
    ]

Server configuration

The supervisor stores the actual values of the server configuration parameters into the Redis DB . This helps to verify whether the configuration has been loaded correctly. For details of the server configuration parameters, see :ref: sup_config_ref_.

Supervisor configuration Redis DB keys

Redis Key

<instrument id>.<server id>.cfg.db_endpoint

<instrument id>.<server id>.cfg.db_timeout

<instrument id>.<server id>.cfg.dictionaries

<instrument id>.<server id>.cfg.req_timeout

<instrument id>.<server id>.cfg.mon_timeout

<instrument id>.<server id>.cfg.filename

<instrument id>.<server id>.cfg.pub_endpoint

<instrument id>.<server id>.cfg.req_endpoint

<instrument id>.<server id>.cfg.scxml

<instrument id>.<server id>.cfg.oldb_prefix

<instrument id>.<server id>.cfg.log_properties

<instrument id>.<server id>.cfg.server_id

<instrument id>.<server id>.cfg.subsystem.<subsystem id>.scope

<instrument id>.<server id>.cfg.subsystem.<subsystem id>.type

<instrument id>.<server id>.cfg.subsystem.<subsystem id>.rr_endpoint

<instrument id>.<server id>.cfg.subsystem.<subsystem id>.ps_endpoint

<instrument id>.<server id>.cfg.subsystem.<subsystem id>.access

Server Status

The server stores the string representation of its state and substate into the Redis DB.

Server status Redis DB keys

Redis Key

<instrument id>.<server id>.states.state

<instrument id>.<server id>.states.substate

<instrument id>.<server id>.subsystems.<subsystem id>.states.state

<instrument id>.<server id>.subsystems.<subsystem id>.states.substate

Status Estimation

The estimated state/substate of the overall system is based on the individual subsystem states/substates and according to the following criteria:

Each of the known state/substate strings have associated a coding system to simplify the estimation. In the case of the state, the estimation is just the minimum state withing all managed subsystems. Here we have normally only three possible cases: Undetermined, NotOperational and Operational.

In the case of the substate, the estimation it is similar. The overall substate is the minimum substate with the following exception: * if at least one of the substate of the subsystems is any of the transient substates like SettingUp or Recording. The estimated substate will reflect the minimum transient state. The above helps to report the ongoing activities of the managed subsystems.

Note

The estimation is done by a virtual method of the Supervisor Facade and it could be replaced by the applications if needed.

Warning

The estimation relies on the fact that subsystem publish their status according to the defined format.

Commands

The commands currently supported by the server are listed here: List of Commands.

Error Handling

Supervisor commands throw an exception in case of errors or timeouts. Client applications can catch the exceptions and obtain the error message associated with the function getDesc(). This error does not contain neither the history nor the error stack but it normally indicates precisely where the error occurred. Since CII Error service is not yet available, Supervisor cannot use it.

Note

The specific exceptions depends of the given command used.

try {
    auto reply = client->GetState();
 } catch (const stdif::ExceptionErr& e) {
    RAD_LOG_ERROR() << "Error reply " << e.getDesc()  << ").";
}

Serialization

The System Supervisor uses the CII MAL ZPB (ZeroMQ + Google Proto buffers) for serialising commands.

Note

Each command has two parts: a payload and its corresponding reply, see the details in the supif module. The normal replies are plain strings.

Setup Command

The Setup command is intended to produce a change in the run-time configuration.

Since there is a not long operations associated with the Setup command, this operation is blocking. The Supervisor executes the action and then it send the reply back to the originator.

The interface definition of the Setup command can be found in module supif.

Warning

The array does not have a fixed size but it has a limit of 100 elements. A limit is needed by the CII XML ICD.

<method name="Setup" returnType="string" throws="ExceptionErr">
    <argument name="payload" type="nonBasic" nonBasicTypeName="SetupElem" arrayDimensions="(100)"/>
</method>

SubsysNames Command

The SubsysNames command reports in a comma separated list, the subsystems managed by the System Supervisor. An example of the output generated by the SubsysNames command is shown below.

$ supClient zpb.rr://134.171.3.48:30519 SubsysNames ""
subsim2, subsim3

SubsysStatus Command

The SubsysStatus command provides information about each subsystem managed by the System Supervisor. An example of the output generated by the SubsysStatus command is shown below.

$ supClient zpb.rr://134.171.3.48:30519 SubsysStatus ""
subsim2.access = true
subsim2.scope = internal
subsim2.connection_status = Connected
subsim2.state = Operational
subsim2.substate = Idle
subsim3.access = true
subsim3.scope = internal
subsim3.connection_status = Connected
subsim3.state = Operational
subsim3.substate = Idle

Subscriptions

Each subsystem instance created by the factory subscribes to the status of the subsystem. The subscription follows the following naming convention. The System Supervisor relies on this convention to monitor the status of the subsystems.

Subsystem

Parameter

end point

<subsystem>

status

<ps endpoint>/std/status

Publishing

The System Supervisor publishes as any other subsystem its estimated state/substate. This can be used to build a hierarchy of subsystems.

Parameter

end point

status

<ps endpoint>/std/status

Signal Handling

The supervisor handles the SIGUSR1 emitted by Nomad to notify when changes in the template configuration file at run-time. When the Supervisor receives this signal, it reloads the configuration and reconnect to the given subsystem if needed.

alternate text

Supervisor Handling of Nomad Signals.

Troubleshooting

Logging

The System Supervisor implements logging levels according to the log4cplus package where the concept is:

ALL < TRACE < DEBUG < INFO < WARN < ERROR < FATAL < OFF

The basic log levels supported by the SysSup for troubleshooting are listed in the table below.

Name

Verbosity

Description

ERROR

very low

Provide logging only in case of errors.

INFO

low

Provide information for the most important actions.

DEBUG

medium

Provide additional information for the developer.

TRACE

very high

Includes all the function tracing.

To activate a new logging, the command SetLogLevel shall be used. See the example below.

$ supClient zpb.rr://134.171.3.48:30519 SetLogLevel "TRACE"

Logging Configuration File

The behavior of the logging can be controlled using a log property file. The SysSup provides a simple property that defines the basic configuration to get the logging on the console. More complex configuration is possible by providing a custom property file.