System Supervisor(supSupervisor)¶
The System Supervisor provides the functionality for supervision and monitoring of a configurable set of subsystems.
The main components of the System Supervisor server are:
State Machine engine based on SCXML and implemented in RAD. It contains a set of action and activity classes.
A Subsystem Factory class that creates the instances of all subsystem classes at start-up and based on the server configuration.
A Facade class that manages the interface between the state machine engine and the subsystem classes.
The System Supervisor uses the OLDB to store run-time information about itself and about the subsystems it monitors. The System Supervisor subscribes to the status information published by the subsystems. The System Supervisor publishes its own status as well like any other subsystem.
Command Line Arguments¶
Command line argument help is available under the option --help
.
--server-id ARG| -i ARG
(string)Server id. If not specified uses the one included in the configuration file.
--config ARG| -c ARG
(string)Application configuration file.
--log-level ARG| -l ARG
(enum) [default: ERROR]Log level to use. One of ERROR, INFO, DEBUG, TRACE.
--log-prop-file ARG| -l ARG
(string)Log property file.
--req-endpoint ARG| -l ARG
(string)Server MAL Req/Rep endpoint (zpb.rr://<ipaddr>:<port>/).
Environment Variables¶
$CFGPATH
Used to resolve configuration file paths.
$DATAROOT
Specifies the default root path used as output directory for FITS metadata. Metadata files are stored under $DATAROOT/fcf/<fcs instance>.
System Supervisor State Machine¶
The System Supervisor uses a state machine described in a SCXML format that is
interpreted by the state machine engine provided by the rad
application framework.
(SCXML specification).
Off –> NotReady, event: Startup
The System Supervisor starts up and goes automatically to NotOperational/NotReady. Main server objects are instantiated including the basic application that uses the State Machine engine. The System Supervisor reads its own configuration and completes its initialisation. The system supervisor connects to the configured subsystems and request its current status. With this information it computes the estimated state of the system. The System Supervisor will adapt its own state according the overall state of the subsystems. The above means, that if all the subsystems are Operational/Idle, the System Supervisor will trigger an internal event to go to Operational/Idle. The same happens in any state, which means the System Supervisor could go from Operational/Idle back to NotOperational/NotReady.
NotReady –> Ready, event: Init
The Supervisor dispatch the Init command to all the configured subsystems whose access is enabled. Depending of the replied received from the subsystems it will go to Ready substate or remain in substate NotReady. If at least one command returns with an error or timeout this will prevent the Supervisor reaching Ready substate.
NotOperational/Ready –> Operational/Idle, event: Enable
The System Supervisor goes through the Enabling transient state. If configured subsystems are already Operational, the Supervisor does not affect their state and goes immediately to the Operational state. If subsystems are not operational, System Supervisor will dispatch the Enable request to each of the configured subsystems whose access is enabled . If at least one request fails, it will reply with a failure remaining in NotOperational/Ready state.
Operational –> NotOperational/Ready, event: Disable
The System Supervisor is moved back to NotOperational/Ready substate. However if all subsystems are Operational, the System Supervisor will remain Operational since its state reflects the state of the subsystems it coordinates.
NotOperational/Ready –> NotOperational/NotReady, event Reset
The System Supervisor is moved back to NotOperational/NotReady substate. The subsystems are not affected by this transition and keep its actual state/substate.
Configuration¶
System Supervisor Configuration¶
The SysSup in version 4.0.0 has been ported to the CII config-ng library. Unlike yaml-cpp, this library allows to define type information for the configuration parameters. The System Supervisor includes a predefined set of configuration definitions. These files can be found in the syssup/server/resources/config directory.
You can find more information about CII config-ng in the following link. (Config-ng manual).
Note
The entry point for the System Supervisor configuration is the file that contains the server configuration.
server::server_id
This is the id associated with the specific server. This id is used to associate all server configuration parameters as well as the prefix for the DB keys.
server::req_endpoint
This is the endpoint for CII MAL request/reply. The server will listen to incoming commands using this endpoint.
server::pub_endpoint
This is the endpoint for CII MAL pub/sub. The server will publish device topics using this endpoint.
server::db_timeout
This is the server timeout for connecting to the OLDB.
server::scxml
This is the state machine specification file used by the server.
server::dictionaries
This is the vector of dictionaries to be used by the server.
server::oldb_prefix
This is the prefix to be used for the DB. This prefix is meant to identify uniquely a given system, e.g. micado.
server::log_properties
log4cplus property file to be used by the server.
server::mon_timeout
Monitor timeout for waiting to establish connections to the subsystems. This value should be rarely bigger than few seconds. Default is 1000 [ms].
server::req_timeout
General command timeout for sending commands to subsystems.
server::ob_modes
Vector of observation modes supported by the instrument. Each observation mode defines a list of associated subsystems.
server::subsystems
This is the vector of subsystems active in the supervisor configuration. Only subsystems listed here will be managed by the supervisor.
Each Subsystem has its own set of configuration parameters
<subsystem id>::scope
This is the scope of each subsystems. It can be internal or external. Requests are not forwarded to external subsystems and the System Supervisor only monitors them.
<subsystem id>::type
This is the subsystem type class. Normally subsystem will use the provided class: sup::syssup::common::Generic
<subsystem id>::rr_endpoint
This is the endpoint for the subsystem CII MAL request/reply. The subsystem listen to incoming commands using this endpoint.
<subsystem id>::ps_endpoint
This is the endpoint for the subsystem CII MAL pub/sub. The subsystem publish its status using this endpoint.
<subsystem id>::access
This is a flag to enable/disable accessibility of a subsystem.
An example of a server configuration is provided below.
!cfg.include config/sup/syssup/server/definitions.yaml:
server: !cfg.type:SysSup
server_id : 'sup'
req_endpoint : "zpb.rr://127.0.0.1:13082/"
pub_endpoint : "zpb.ps://127.0.0.1:13345/"
db_timeout : 2000
scxml : "sup/syssup/server/sm.xml"
log_properties : "config/sup/syssup/server/log_properties.cfg"
oldb_prefix : "ins1"
req_timeout : 60000
ob_modes : [
{
name: Engineering,
subsystems: ['fcs1','dummy1']
},
{
name: Imaging,
subsystems: ['dummy2']
}
]
subsystems : [
{
name: 'fcs1',
scope: internal,
type: sup::syssup::common::Generic,
rr_endpoint: "zpb.rr://127.0.0.1:15085/StdCmds",
ps_endpoint: "zpb.ps://127.0.0.1:15045/",
access: true
},
{
name: 'dummy1',
scope: internal,
type: sup::syssup::common::Generic,
rr_endpoint: "zpb.rr://127.0.0.1:15086/StdCmds",
ps_endpoint: "zpb.ps://127.0.0.1:15046/",
access: false
},
{
name: 'dummy2',
scope: internal,
type: sup::syssup::common::Generic,
rr_endpoint: "zpb.rr://127.0.0.1:15087/StdCmds",
ps_endpoint: "zpb.ps://127.0.0.1:15047/",
access: true
}
]
Supervisor OLDB¶
The supervisor stores the actual values of the server configuration parameters into the OLDB. This helps to verify whether the configuration has been loaded correctly. For details of the server configuration parameters, see :ref: sup_config_ref_.
OLDB Key |
---|
<instrument id>/<server id>/cfg/db_timeout |
<instrument id>/<server id>/cfg/db_task_period |
<instrument id>/<server id>/cfg/dictionaries |
<instrument id>/<server id>/cfg/req_timeout |
<instrument id>/<server id>/cfg/mon_timeout |
<instrument id>/<server id>/cfg/filename |
<instrument id>/<server id>/cfg/fits_prefix |
<instrument id>/<server id>/cfg/pub_endpoint |
<instrument id>/<server id>/cfg/req_endpoint |
<instrument id>/<server id>/cfg/scxml |
<instrument id>/<server id>/cfg/oldb_prefix |
<instrument id>/<server id>/cfg/log_properties |
<instrument id>/<server id>/cfg/server_id |
<instrument id>/<server id>/cfg/subsystems/<subsystem id>/scope |
<instrument id>/<server id>/cfg/subsystems/<subsystem id>/type |
<instrument id>/<server id>/cfg/subsystems/<subsystem id>/rr_endpoint |
<instrument id>/<server id>/cfg/subsystems/<subsystem id>/ps_endpoint |
<instrument id>/<server id>/cfg/subsystems/<subsystem id>/access |
Server Status¶
The server stores the string representation of its state and substate into the OLDB DB.
OLDB Key |
---|
<instrument id>/<server id>/states/state |
<instrument id>/<server id>/states/substate |
<instrument id>/<server id>/subsystems/<subsystem id>/states/state |
<instrument id>/<server id>/subsystems/<subsystem id>/states/substate |
Status Estimation¶
The estimated state/substate of the overall system is based on the individual subsystem states/substates and according to the following criteria:
Each of the known state/substate strings have associated a coding system to simplify the estimation. In the case of the state, the estimation is just the minimum state withing all managed subsystems. Here we have normally only three possible cases: Undetermined, NotOperational and Operational.
In the case of the substate, the estimation it is similar. The overall substate is the minimum substate with the following exception: * if at least one of the substate of the subsystems is any of the transient substates like SettingUp or Recording. The estimated substate will reflect the minimum transient state. The above helps to report the ongoing activities of the managed subsystems.
Note
The estimation is done by a virtual method of the Supervisor Facade and it could be replaced by the applications if needed.
Warning
The estimation relies on the fact that subsystem publish their status according to the defined format.
Commands¶
The commands currently supported by the server are listed here: List of Commands.
Error Handling¶
Supervisor commands throw an exception in case of errors or timeouts. Client applications can catch the exceptions and obtain the error message associated with the function getDesc(). This error does not contain neither the history nor the error stack but it normally indicates precisely where the error occurred. Since CII Error service is not yet available, Supervisor cannot use it.
Note
The specific exceptions depends of the given command used.
try {
auto reply = client->GetState();
} catch (const stdif::ExceptionErr& e) {
RAD_LOG_ERROR() << "Error reply " << e.getDesc() << ").";
}
Serialization¶
The System Supervisor uses the CII MAL ZPB (ZeroMQ + Google Proto buffers) for serialising commands.
Note
Each command has two parts: a payload and its corresponding reply, see the details in the supif module. The normal replies are plain strings.
Setup Command¶
The Setup command is intended to produce a change in the run-time configuration.
Since there is a not long operations associated with the Setup command, this operation is blocking. The Supervisor executes the action and then it send the reply back to the originator.
The interface definition of the Setup command can be found in module supif.
Warning
The array does not have a fixed size but it has a limit of 100 elements. A limit is needed by the CII XML ICD.
<method name="Setup" returnType="string" throws="ExceptionErr">
<argument name="payload" type="nonBasic" nonBasicTypeName="SetupElem" arrayDimensions="(100)"/>
</method>
SubsysNames Command¶
The SubsysNames command reports in a comma separated list, the subsystems managed by the System Supervisor. An example of the output generated by the SubsysNames command is shown below. The URI shall be adapted to the correct values.
$ supClient zpb.rr://134.171.3.48:30519 SubsysNames ""
subsim2, subsim3
SubsysStatus Command¶
The SubsysStatus command provides information about each subsystem managed by the System Supervisor. An example of the output generated by the SubsysStatus command is shown below.
$ supClient zpb.rr://134.171.3.48:30519 SubsysStatus ""
subsim2.access = true
subsim2.scope = internal
subsim2.connection_status = Connected
subsim2.state = Operational
subsim2.substate = Idle
subsim3.access = true
subsim3.scope = internal
subsim3.connection_status = Connected
subsim3.state = Operational
subsim3.substate = Idle
Subscriptions¶
Each subsystem instance created by the factory subscribes to the status of the subsystem. The subscription follows the following naming convention. The System Supervisor relies on this convention to monitor the status of the subsystems.
Subsystem |
Parameter |
end point |
---|---|---|
<subsystem> |
status |
<ps endpoint>/std/status |
Publishing¶
The System Supervisor publishes as any other subsystem its estimated state/substate. This can be used to build a hierarchy of subsystems.
Parameter |
end point |
---|---|
status |
<ps endpoint>/std/status |
Signal Handling¶
The supervisor handles the SIGUSR1 emitted by Nomad to notify when changes in the template configuration file at run-time. When the Supervisor receives this signal, it reloads the configuration and reconnect to the given subsystem if needed.
Troubleshooting¶
Logging¶
The System Supervisor implements logging levels according to the log4cplus package where the concept is:
ALL < TRACE < DEBUG < INFO < WARN < ERROR < FATAL < OFF
The basic log levels supported by the SysSup for troubleshooting are listed in the table below.
Name |
Verbosity |
Description |
---|---|---|
ERROR |
very low |
Provide logging only in case of errors. |
INFO |
low |
Provide information for the most important actions. |
DEBUG |
medium |
Provide additional information for the developer. |
TRACE |
very high |
Includes all the function tracing. |
To activate a new logging, the command SetLogLevel shall be used. See the example below.
$ supClient zpb.rr://134.171.3.48:30519 SetLogLevel "TRACE"
Logging Configuration File¶
The behavior of the logging can be controlled using a log property file. The SysSup provides a simple property that defines the basic configuration to get the logging on the console. More complex configuration is possible by providing a custom property file.