System Supervisor¶
The System Supervisor provides the functionality for supervision and monitoring of a configurable set of subsystems.
The main components of the System Supervisor server are:
State Machine engine based on SCXML and implemented in RAD. It contains a set of action and activity classes.
A Subsystem Factory class that creates the instances of all subsystem classes at start-up and based on the server configuration.
A Facade class that manages the interface between the state machine engine and the subsystem classes.
The System Supervisor uses the Redis Database to store run-time information about itself and about the subsystems it monitors. The System Supervisor subscribes to the status information published by the subsystems. The System Supervisor publishes its own status as well like any other subsystem.
System Supervisor State Machine¶
The System Supervisor uses a state machine described in a SCXML format that is
interpreted by the state machine engine provided by the rad
application framework.
(SCXML specification).
Off –> NotReady, event: Startup
The System Supervisor starts up and goes automatically to NotOperational/NotReady. Main server objects are instantiated including the basic application that uses the State Machine engine. The System Supervisor reads its own configuration and completes its initialisation. The system supervisor connects to the configured subsystems and request its current status. With this information it computes the estimated state of the system. The System Supervisor will adapt its own state according the overall state of the subsystems. The above means, that if all the subsystems are Operational/Idle, the System Supervisor will trigger an internal event to go to Operational/Idle. The same happens in any state, which means the System Supervisor could go from Operational/Idle back to NotOperational/NotReady.
NotReady –> Ready, event: Init
The Supervisor dispatch the Init command to all the configured subsystems whose access is enabled. Depending of the replied received from the subsystems it will go to Ready substate or remain in substate NotReady. If at least one command returns with an error or timeout this will prevent the Supervisor reaching Ready substate.
NotOperational/Ready –> Operational/Idle, event: Enable
The System Supervisor goes through the Enabling transient state. If configured subsystems are already Operational, the Supervisor does not affect their state and goes immediately to the Operational state. If subsystems are not operational, System Supervisor will dispatch the Enable request to each of the configured subsystems whose access is enabled . If at least one request fails, it will reply with a failure remaining in NotOperational/Ready state.
Operational –> NotOperational/Ready, event: Disable
The System Supervisor is moved back to NotOperational/Ready substate. The subsystems are not affected by this transition and keep its actual state/substate.
NotOperational/Ready –> NotOperational/NotReady, event Reset
The System Supervisor is moved back to NotOperational/Ready substate. The subsystems are not affected by this transition and keep its actual state/substate.
Configuration¶
System Supervisor Configuration¶
The server configuration is a file written in yaml
format.
(YAML specification). YAML is easy to read format
that has been adopted temporary until the integration with CII configuration services.
Many resources about YAML can be found on the web. One could also validate the format online, see http://yaml.org/spec/
Note
The entry point for the System Supervisor configuration is the file that contains the server configuration.
server_id
This is the id associated with the specific server. This id is used to associate all server configuration parameters as well as the prefix for the DB keys.
<server id>::req_endpoint
This is the endpoint for CII MAL request/reply. The server will listen to incoming commands using this endpoint.
<server id>::pub_endpoint
This is the endpoint for CII MAL pub/sub. The server will publish device topics using this endpoint.
<server id>::db_endpoint
This is the endpoint used by the server for connecting to the Redis DB.
<server id>::db_timeout
This is the server timeout for connecting to the Redis DB.
<server id>::scxml
This is the state machine specification file used by the server.
<server id>::subsystems
This is the list of subsystems active in the supervisor configuration. Only subsystems listed here will be managed by the supervisor.
<server id>::waittout
Wait timeout for waiting to establish connections to the subsystems. This value should be rarely bigger than few seconds. Default is 500 [ms].
<server id>::cmdtout
General command timeout for sending commands to subsystems.
Each Subsystem has its own set of configuration parameters
<subsystem id>::scope
This is the scope of each subsystems. It can be internal or external. Requests are not forwarded to external subsystems and the System Supervisor only monitors them.
<subsystem id>::type
This is the subsystem type class. Normally subsystem will use the provided class: sup::syssup::common::Generic
<subsystem id>::rr_endpoint
This is the endpoint for the subsystem CII MAL request/reply. The subsystem listen to incoming commands using this endpoint.
<subsystem id>::ps_endpoint
This is the endpoint for the subsystem CII MAL pub/sub. The subsystem publish its status using this endpoint.
<subsystem id>::access
This is a flag to enable/disable accessibility of a subsystem.
An example of a server configuration is provided below.
server_id : 'ins1.sup'
ins1.sup:
req_endpoint : "zpb.rr://127.0.0.1:13082/"
pub_endpoint : "zpb.ps://127.0.0.1:13345/"
db_endpoint : "127.0.0.1:6379"
db_timeout : 2
scxml : "sup/syssup/server/sm.xml"
subsystems : ['dummy1','dummy2']
op_modes : [´day', 'night']
conntout : 500
cmdtout : 20000
fcs1:
scope: internal
type: sup::syssup::common::Generic
rr_endpoint: "zpb.rr://127.0.0.1:12082/StdCmds"
ps_endpoint: "zpb.ps://127.0.0.1:12345/"
access: true
dummy1:
scope: internal
type: sup::syssup::common::Generic
rr_endpoint: "zpb.rr://127.0.0.1:15080/StdCmds"
ps_endpoint: "zpb.ps://127.0.0.1:15040/"
access: false
dummy2:
scope: internal
type: sup::syssup::common::Generic
rr_endpoint: "zpb.rr://127.0.0.1:15082/StdCmds"
ps_endpoint: "zpb.ps://127.0.0.1:15041/"
access: true
Server configuration¶
The supervisor stores the actual values of the server configuration parameters into the Redis DB . This helps to verify whether the configuration has been loaded correctly. For details of the server configuration parameters, see :ref: sup_config_ref_.
Redis Key |
---|
<instrument id>.<server id>.cfg.db_endpoint |
<instrument id>.<server id>.cfg.db_timeout |
<instrument id>.<server id>.cfg.cmdtout |
<instrument id>.<server id>.cfg.conntout |
<instrument id>.<server id>.cfg.filename |
<instrument id>.<server id>.cfg.loglevel |
<instrument id>.<server id>.cfg.pub_endpoint |
<instrument id>.<server id>.cfg.req_endpoint |
<instrument id>.<server id>.cfg.subsystems |
<instrument id>.<server id>.cfg.scxml |
<instrument id>.<server id>.cfg.<subsystem id>.scope |
<instrument id>.<server id>.cfg.<subsystem id>.type |
<instrument id>.<server id>.cfg.<subsystem id>.rr_endpoint |
<instrument id>.<server id>.cfg.<subsystem id>.ps_endpoint |
<instrument id>.<server id>.cfg.<subsystem id>.access |
Server Status¶
The server stores the string representation of its state and substate into the Redis DB.
Redis Key |
---|
<instrument id>.<server id>.state |
<instrument id>.<server id>.substate |
<instrument id>.<server id>.estimated_state |
<instrument id>.<server id>.estimated_substate |
<instrument id>.<server id>.<subsystem id>.state |
<instrument id>.<server id>.<subsystem id>.substate |
Status Estimation¶
The estimated state/substate of the overall system is based on the individual subsystem states/substates and according to the following criteria:
Each of the known state/substate strings have associated a coding system to simplify the estimation. In the case of the state, the estimation is just the minimum state withing all managed subsystems. Here we have normally only three possible cases: Undetermined, NotOperational and Operational.
In the case of the substate, the estimation it is similar. The overall substate is the minimum substate with the following exception: * if at least one of the substate of the subsystems is any of the transient substates like SettingUp or Recording. The estimated substate will reflect the minimum transient state. The above helps to report the ongoing activities of the managed subsystems.
Note
The estimation is done by a virtual method of the Supervisor Facade and it could be replaced by the applications if needed.
Warning
The estimation relies on the fact that subsystem publish their status according to the defined format.
Commands¶
The commands currently supported by the server are listed here: List of Commands.
Error Handling¶
Supervisor commands throw an exception in case of errors or timeouts. Client applications can catch the exceptions and obtain the error message associated with the function getDesc(). This error does not contain neither the history nor the error stack but it normally indicates precisely where the error occurred. Since CII Error service is not yet available, Supervisor cannot use it.
Note
The specific exceptions depends of the given command used.
try {
auto reply = client->GetState();
} catch (const stdif::ExceptionErr& e) {
RAD_LOG_ERROR() << "Error reply " << e.getDesc() << ").";
}
Serialization¶
The System Supervisor uses the CII MAL ZPB (ZeroMQ + Google Proto buffers) for serialising commands.
Note
Each command has two parts: a payload and its corresponding reply, see the details in the supif module. The normal replies are plain strings.
Setup Command¶
The Setup command is intended to produce a change in the run-time configuration.
Since there is a not long operations associated with the Setup command, this operation is blocking. The Supervisor executes the action and then it send the reply back to the originator.
The interface definition of the Setup command can be found in module supif.
Warning
The array does not have a fixed size but it has a limit of 100 elements. A limit is needed by the CII XML ICD.
<method name="Setup" returnType="string" throws="ExceptionErr">
<argument name="payload" type="nonBasic" nonBasicTypeName="SetupElem" arrayDimensions="(100)"/>
</method>
SubsysNames Command¶
The SubsysNames command reports in a comma separated list, the subsystems managed by the System Supervisor. An example of the output generated by the SubsysNames command is shown below.
$ supClient zpb.rr://134.171.3.48:30519 SubsysNames ""
subsim2, subsim3
SubsysStatus Command¶
The SubsysStatus command provides information about each subsystem managed by the System Supervisor. An example of the output generated by the SubsysStatus command is shown below.
$ supClient zpb.rr://134.171.3.48:30519 SubsysStatus ""
subsim2.access = true
subsim2.scope = internal
subsim2.connection_status = Connected
subsim2.state = Operational
subsim2.substate = Idle
subsim3.access = true
subsim3.scope = internal
subsim3.connection_status = Connected
subsim3.state = Operational
subsim3.substate = Idle
Subscriptions¶
Each subsystem instance created by the factory subscribes to the status of the subsystem. The subscription follows the following naming convention. The System Supervisor relies on this convention to monitor the status of the subsystems.
Subsystem |
Parameter |
end point |
---|---|---|
<subsystem> |
status |
<ps endpoint>/std/status |
Publishing¶
The System Supervisor publishes as any other subsystem its estimated state/substate. This can be used to build a hierarchy of subsystems.
Parameter |
end point |
---|---|
status |
<ps endpoint>/std/status |
Signal Handling¶
The supervisor handles the SIGUSR1 emitted by Nomad to notify when changes in the template configuration file at run-time. When the Supervisor receives this signal, it reloads the configuration and reconnect to the given subsystem if needed.
Troubleshooting¶
Logging¶
The System Supervisor has implemented six logging levels that provide additional information for troubleshooting to the developer.
Name |
Verbosity |
Description |
---|---|---|
ERROR |
very low |
Provide logging only in case of errors. |
INFO |
low |
Provide information for the most important actions (default). |
DEBUG |
medium |
Provide additional information for the developer. |
DEBUG2 |
high |
Includes more details such as Node IDs of OPC-UA attributes |
DEBUG3 |
high |
Includes the logging of each subscription event. |
TRACE |
very high |
Includes all the function tracing. |
To activate a new logging, the command SetLog shall be used. See the example below.
$ supClient zpb.rr://134.171.3.48:30519 SetLog "TRACE"