System Supervisor

The System Supervisor provides the functionality for supervision and monitoring of a configurable set of subsystems.

alternate text

System Supervisor.

The main components of the System Supervisor server are:

  • State Machine engine based on SCXML and implemented in RAD. It contains a set of action and activity classes.

  • A Subsystem Factory class that creates the instances of all subsystem classes at start-up and based on the server configuration.

  • A Facade class that manages the interface between the state machine engine and the subsystem classes.

The System Supervisor uses the Redis Database to store run-time information about itself and about the subsystems it monitors. The System Supervisor subscribes to the status information published by the subsystems. The System Supervisor publishes its own status as well like any other subsystem.

System Supervisor State Machine

The System Supervisor uses a state machine described in a SCXML format that is interpreted by the state machine engine provided by the rad application framework. (SCXML specification).

alternate text

System Supervisor State Machine Diagram.

Off –> NotReady, event: Startup

The System Supervisor starts up and goes automatically to NotOperational/NotReady. Main server objects are instantiated including the basic application that uses the State Machine engine. The System Supervisor reads its own configuration and completes its initialisation. The system supervisor connects to the configured subsystems and request its current status. With this information it computes the estimated state of the system. The System Supervisor will adapt its own state according the overall state of the subsystems. The above means, that if all the subsystems are Operational/Idle, the System Supervisor will trigger an internal event to go to Operational/Idle. The same happens in any state, which means the System Supervisor could go from Operational/Idle back to NotOperational/NotReady.

NotReady –> Ready, event: Init

The Supervisor dispatch the Init command to all the configured subsystems whose access is enabled. Depending of the replied received from the subsystems it will go to Ready substate or remain in substate NotReady. If at least one command returns with an error or timeout this will prevent the Supervisor reaching Ready substate.

NotOperational/Ready –> Operational/Idle, event: Enable

The System Supervisor goes through the Enabling transient state. If configured subsystems are already Operational, the Supervisor does not affect their state and goes immediately to the Operational state. If subsystems are not operational, System Supervisor will dispatch the Enable request to each of the configured subsystems whose access is enabled . If at least one request fails, it will reply with a failure remaining in NotOperational/Ready state.

Operational –> NotOperational/Ready, event: Disable

The System Supervisor is moved back to NotOperational/Ready substate. The subsystems are not affected by this transition and keep its actual state/substate.

NotOperational/Ready –> NotOperational/NotReady, event Reset

The System Supervisor is moved back to NotOperational/Ready substate. The subsystems are not affected by this transition and keep its actual state/substate.

Configuration

System Supervisor Configuration

The server configuration is a file written in yaml format. (YAML specification). YAML is easy to read format that has been adopted temporary until the integration with CII configuration services.

Many resources about YAML can be found on the web. One could also validate the format online, see http://yaml.org/spec/

Note

The entry point for the System Supervisor configuration is the file that contains the server configuration.

server_id

This is the id associated with the specific server. This id is used to associate all server configuration parameters as well as the prefix for the DB keys.

<server id>::req_endpoint

This is the endpoint for CII MAL request/reply. The server will listen to incoming commands using this endpoint.

<server id>::pub_endpoint

This is the endpoint for CII MAL pub/sub. The server will publish device topics using this endpoint.

<server id>::db_endpoint

This is the endpoint used by the server for connecting to the Redis DB.

<server id>::db_timeout

This is the server timeout for connecting to the Redis DB.

<server id>::scxml

This is the state machine specification file used by the server.

<server id>::subsystems

This is the list of subsystems active in the supervisor configuration. Only subsystems listed here will be managed by the supervisor.

<server id>::waittout

Wait timeout for waiting to establish connections to the subsystems. This value should be rarely bigger than few seconds. Default is 500 [ms].

<server id>::cmdtout

General command timeout for sending commands to subsystems.

Each Subsystem has its own set of configuration parameters

<subsystem id>::scope

This is the scope of each subsystems. It can be internal or external. Requests are not forwarded to external subsystems and the System Supervisor only monitors them.

<subsystem id>::type

This is the subsystem type class. Normally subsystem will use the provided class: sup::syssup::common::Generic

<subsystem id>::rr_endpoint

This is the endpoint for the subsystem CII MAL request/reply. The subsystem listen to incoming commands using this endpoint.

<subsystem id>::ps_endpoint

This is the endpoint for the subsystem CII MAL pub/sub. The subsystem publish its status using this endpoint.

<subsystem id>::access

This is a flag to enable/disable accessibility of a subsystem.

An example of a server configuration is provided below.

server_id           : 'ins1.sup'
ins1.sup:
    req_endpoint    : "zpb.rr://127.0.0.1:13082/"
    pub_endpoint    : "zpb.ps://127.0.0.1:13345/"
    db_endpoint     : "127.0.0.1:6379"
    db_timeout      : 2
    scxml           : "sup/syssup/server/sm.xml"

    subsystems      : ['dummy1','dummy2']
    op_modes        : [´day', 'night']
    conntout        : 500
    cmdtout         : 20000


fcs1:
    scope: internal
    type: sup::syssup::common::Generic
    rr_endpoint: "zpb.rr://127.0.0.1:12082/StdCmds"
    ps_endpoint: "zpb.ps://127.0.0.1:12345/"
    access: true

dummy1:
    scope: internal
    type: sup::syssup::common::Generic
    rr_endpoint: "zpb.rr://127.0.0.1:15080/StdCmds"
    ps_endpoint: "zpb.ps://127.0.0.1:15040/"
    access: false

dummy2:
    scope: internal
    type: sup::syssup::common::Generic
    rr_endpoint: "zpb.rr://127.0.0.1:15082/StdCmds"
    ps_endpoint: "zpb.ps://127.0.0.1:15041/"
    access: true

Server configuration

The supervisor stores the actual values of the server configuration parameters into the Redis DB . This helps to verify whether the configuration has been loaded correctly. For details of the server configuration parameters, see :ref: sup_config_ref_.

Supervisor configuration Redis DB keys

Redis Key

<instrument id>.<server id>.cfg.db_endpoint

<instrument id>.<server id>.cfg.db_timeout

<instrument id>.<server id>.cfg.cmdtout

<instrument id>.<server id>.cfg.conntout

<instrument id>.<server id>.cfg.filename

<instrument id>.<server id>.cfg.loglevel

<instrument id>.<server id>.cfg.pub_endpoint

<instrument id>.<server id>.cfg.req_endpoint

<instrument id>.<server id>.cfg.subsystems

<instrument id>.<server id>.cfg.scxml

<instrument id>.<server id>.cfg.<subsystem id>.scope

<instrument id>.<server id>.cfg.<subsystem id>.type

<instrument id>.<server id>.cfg.<subsystem id>.rr_endpoint

<instrument id>.<server id>.cfg.<subsystem id>.ps_endpoint

<instrument id>.<server id>.cfg.<subsystem id>.access

Server Status

The server stores the string representation of its state and substate into the Redis DB.

Server status Redis DB keys

Redis Key

<instrument id>.<server id>.state

<instrument id>.<server id>.substate

<instrument id>.<server id>.estimated_state

<instrument id>.<server id>.estimated_substate

<instrument id>.<server id>.<subsystem id>.state

<instrument id>.<server id>.<subsystem id>.substate

Status Estimation

The estimated state/substate of the overall system is based on the individual subsystem states/substates and according to the following criteria:

Each of the known state/substate strings have associated a coding system to simplify the estimation. In the case of the state, the estimation is just the minimum state withing all managed subsystems. Here we have normally only three possible cases: Undetermined, NotOperational and Operational.

In the case of the substate, the estimation it is similar. The overall substate is the minimum substate with the following exception: * if at least one of the substate of the subsystems is any of the transient substates like SettingUp or Recording. The estimated substate will reflect the minimum transient state. The above helps to report the ongoing activities of the managed subsystems.

Note

The estimation is done by a virtual method of the Supervisor Facade and it could be replaced by the applications if needed.

Warning

The estimation relies on the fact that subsystem publish their status according to the defined format.

Commands

The commands currently supported by the server are listed here: List of Commands.

Error Handling

Supervisor commands throw an exception in case of errors or timeouts. Client applications can catch the exceptions and obtain the error message associated with the function getDesc(). This error does not contain neither the history nor the error stack but it normally indicates precisely where the error occurred. Since CII Error service is not yet available, Supervisor cannot use it.

Note

The specific exceptions depends of the given command used.

try {
    auto reply = client->GetState();
 } catch (const stdif::ExceptionErr& e) {
    RAD_LOG_ERROR() << "Error reply " << e.getDesc()  << ").";
}

Serialization

The System Supervisor uses the CII MAL ZPB (ZeroMQ + Google Proto buffers) for serialising commands.

Note

Each command has two parts: a payload and its corresponding reply, see the details in the supif module. The normal replies are plain strings.

Setup Command

The Setup command is intended to produce a change in the run-time configuration.

Since there is a not long operations associated with the Setup command, this operation is blocking. The Supervisor executes the action and then it send the reply back to the originator.

The interface definition of the Setup command can be found in module supif.

Warning

The array does not have a fixed size but it has a limit of 100 elements. A limit is needed by the CII XML ICD.

<method name="Setup" returnType="string" throws="ExceptionErr">
    <argument name="payload" type="nonBasic" nonBasicTypeName="SetupElem" arrayDimensions="(100)"/>
</method>

SubsysNames Command

The SubsysNames command reports in a comma separated list, the subsystems managed by the System Supervisor. An example of the output generated by the SubsysNames command is shown below.

$ supClient zpb.rr://134.171.3.48:30519 SubsysNames ""
subsim2, subsim3

SubsysStatus Command

The SubsysStatus command provides information about each subsystem managed by the System Supervisor. An example of the output generated by the SubsysStatus command is shown below.

$ supClient zpb.rr://134.171.3.48:30519 SubsysStatus ""
subsim2.access = true
subsim2.scope = internal
subsim2.connection_status = Connected
subsim2.state = Operational
subsim2.substate = Idle
subsim3.access = true
subsim3.scope = internal
subsim3.connection_status = Connected
subsim3.state = Operational
subsim3.substate = Idle

Subscriptions

Each subsystem instance created by the factory subscribes to the status of the subsystem. The subscription follows the following naming convention. The System Supervisor relies on this convention to monitor the status of the subsystems.

Subsystem

Parameter

end point

<subsystem>

status

<ps endpoint>/std/status

Publishing

The System Supervisor publishes as any other subsystem its estimated state/substate. This can be used to build a hierarchy of subsystems.

Parameter

end point

status

<ps endpoint>/std/status

Signal Handling

The supervisor handles the SIGUSR1 emitted by Nomad to notify when changes in the template configuration file at run-time. When the Supervisor receives this signal, it reloads the configuration and reconnect to the given subsystem if needed.

alternate text

Supervisor Handling of Nomad Signals.

Troubleshooting

Logging

The System Supervisor has implemented six logging levels that provide additional information for troubleshooting to the developer.

Name

Verbosity

Description

ERROR

very low

Provide logging only in case of errors.

INFO

low

Provide information for the most important actions (default).

DEBUG

medium

Provide additional information for the developer.

DEBUG2

high

Includes more details such as Node IDs of OPC-UA attributes

DEBUG3

high

Includes the logging of each subscription event.

TRACE

very high

Includes all the function tracing.

To activate a new logging, the command SetLog shall be used. See the example below.

$ supClient zpb.rr://134.171.3.48:30519 SetLog "TRACE"