RTC Supervisor¶
Note
The RTC Supervisor is NOT a deployment tool, it’s role is to provide a single entry point to the RTC for state guiding, monitoring, error recovery and population of the Runtime Configuration Repository.
It responds to a number of the standard commands defined by Stdif and exports global state in the OLDB.
With the introduction of Nomad it is possible that some of the functions listed above may be removed from the RTC Supervisor in particular those related to monitoring.
Introduction¶
The current release of the RTC Supervisor performs the following:
Loading the contents of the Runtime Configuration Repository from the Persistent Configuration Repository using the active Deployment Set during initialisation.
Guiding the state of all SRTC components by forwarding state change requests to them.
Evaluating the overall state of the RTC by monitoring the state of all supervised SRTC components.
Monitoring the liveliness of all SRTC components and detecting if any have crashed.
And in subsequent releases will do the following:
Provide a simple interface for recovery when one or more components are in error.
Monitoring any error events generated by RTC components and generating an overall error indication.
Implement a mode switching interface by reloading parts of the Runtime Configuration Repository from the Persistent Configuration Repository.
Providing a means of updating the Persistent Configuration Repository with items which have been changed in the Runtime Configuration Repository.
To act as a base class to which instrument specific functionality and interfaces can be added.
The RTC Supervisor implementation is divided into a library and a server. The server implementing the usable component. In addition there are a number of test programs used as integration tests and an example implementation of some deployment code which allows a set of components defined in a YAML file to be launched.
The library implements the functionality required for communicating with a set of supervised
rtcObjects
and sending commands to them in a list rtcCommandRequest
and
rtcCommandRequestSeries
. The complete configuration of rtcObjects
is managed by the
rtcObjectConfig
class which reads its config from the runtime repo.
Note
The RTC Supervisor component is currently based on the rtctkExampleComponent
structure,
i.e. it provides a business logic which implements the various activities. This may well change
and become a simple server implementing the stdif commands directly.
Launching the Server¶
Being based on the rtctkExampleComponent the command line of the RTC Supervisor server program is:
$ rtctkRtcSupervisor -h
Options:
-h [ --help ] print help messages
-i [ --cid ] arg component identity
-s [ --sde ] arg service discovery endpoint
The name of the RTC Supervisor instance and the service discovery URI (currently only a YAML file) must be provided, as is shown in the following example command to start the process:
$ rtctkRtcSupervisor \
-i rtc_sup \
-s file:$INTROOT/run/exampleEndToEnd/fileBased/service_disc.yaml
Populating the Runtime Configuration Repository¶
The contents of the Runtime Configuration Repository is automatically loaded from the Persistent Configuration Repository during the initialising state. This is performed before any access of the Runtime Configuration Repository is performed and before any state guiding. This is important to have the configuration available before any component tries to access it.
The automatic loading only occurs if the Persistent Configuration Repository is available,
otherwise this step is skipped and the Runtime Configuration Repository is assumed to be populated
by some other means, e.g. using rtctkConfigTool
to manually populate the repository before hand
(see Populate).
The Persistent Configuration Repository is available when the persistent_repo_endpoint
service
is configured in the Service Discovery.
Currently this simply means that the following example entry should be present in the
Service Discovery YAML file
(where /persistent_repo
should be replaced with the appropriate path):
common:
persistent_repo_endpoint:
type: RtcString
value: file:/persistent_repo
It is also possible to disable the automatic loading of the datapoints into the
Runtime Configuration Repository, even if the Persistent Configuration Repository is available,
by setting the /disable_populate_runtime_repo
boolean datapoint to true in the
Persistent Configuration Repository.
For example, by running the following command
(where /persistent_repo
should be replaced with the appropriate path):
$ rtctkConfigTool --persistent-repo-endpoint file:/persistent_repo \
set persistent /disable_populate_runtime_repo True
By default the Runtime Configuration Repository is populated by loading the active Deployment Set
indicated by the /active_deployment
string datapoint in the Persistent Configuration Repository.
The details of the configuration layout for Deployment Sets is described in section
Configuration Layout.
For certain testing scenarios, it may be easier to perform a simple 1-to-1 copy of the datapoint
hierarchy in the Persistent Configuration Repository,
rather than applying full handling of Deployment Sets.
For such cases, it is possible to force the simple 1-to-1 copy by setting the
/simple_populate_runtime_repo
boolean datapoint to true in the
Persistent Configuration Repository as follows:
$ rtctkConfigTool --persistent-repo-endpoint file:/persistent_repo \
set persistent /simple_populate_runtime_repo True
Note
When forcing a simple 1-to-1 copy of the datapoints, the interpretation of the contents of the Persistent Configuration Repository is completely different. In such a case, the datapoint hierarchy should follow the Runtime Configuration Repository instead.
State Guiding¶
Currently state guiding is performed from the activities defined by the
ExampleComponent, in each activity the AllObjectRequestList()
is used to
send a series of commands in series or in parallel to the list of
supervised components.
The RTC Supervisor implements activities for and understands the following commands
Init
Reset
Recover (currently empty)
Enable
Disable
Note
Run and Idle are not in the above list. The baseline currently is that the sequencer code will be responsible for sending the Run commands to the RTC components in the correct order. This may be re-evaluated.
There are configuration flags available for each of these activities indicating if they should be performed in parallel or series on the list of supervised components.
Commanding¶
To send one of the supported commands you can use the rtctkSendCommand.sh
script which makes use
of the rtctkClient
program implementing the Stdif client interface. For example:
$ rtctkSendCommand.sh rtc_sup Init
$ rtctkSendCommand.sh rtc_sup Reset
$ rtctkSendCommand.sh rtc_sup Recover
$ rtctkSendCommand.sh rtc_sup Enable
$ rtctkSendCommand.sh rtc_sup Disable
Where rtc_sup
is the name which the RTC Supervisor has been passed with the -i flag. The
rtctkSendCommand.sh
script will look in an environment variable $REPO_DIR
for the
service_disc.yaml file in which it will look up the required URIs.
State Evaluation¶
When the object configuration is built from the Runtime Configuration Repository, a list of
publish/subscribe URIs is created, one per supervised component. The business
logic creates a StateSubscriber
with the list of URIs.
The StateSubscriber
’s callback is invoked whenever an event is received and
the RtcObjectConfig::OnStateEventReceived()
method is called.
This sets the state attribute of the identified RtcObject
in the object list,
then evaluates the system believed state/substate and publishes it to the OLDB.
A typical content of the file based OLDB when the system is operational and
supervising two components, object1
and object2
would be:
rtc_sup:
global_display_state:
type: RtcString
value: On:Operational:Idle
global_state:
type: RtcString
value: operational
global_substate:
type: RtcString
value: idle
global_error:
type: RtcBool
value: false
global_error_who:
type: RtcString
value: ""
state:
type: RtcString
value: "On:Operational:Idle;On:Operational:Update:Idle;"
object1:
state:
type: RtcString
value: "On:Operational:Idle;On:Operational:Update:Idle;"
object2:
state:
type: RtcString
value: "On:Operational:Idle;On:Operational:Update:Idle;"
Asynchronous Detection of Component Failure¶
Asynchronous monitoring is performed by the rtcMonitor
class. The rtcServer
has a member which is an rtcMonitor. A thread is created from the
rtcMonitors creator which periodically when active calls the rtcServers
MonitorCycle()
method.
The rtcServer marks the monitor as being active whenever the state is at least NotOperational:Ready.
The rtcServers MonitorCycle() method uses the AllObjectRequestList()
to
send a GetVersion
command with a short timeout to each component. If the
command fails the rtcObject sending the command will mark the component
as having generated an exception and commands will not be sent to it
subsequently.
If a component does fail then the InError
method is called to set the
error flag and record the name of the component in error.
Error Notification¶
In general when the RTC Supervisor notices something has gone wrong it
calls the RtcSupervisor::InError
method which updates the OLDB with the
error and an indication of the cause.
Mutex Usage¶
A std::mutex
is available in the RtcSupervisor
class which can be used
to globally lock the component.
This is used to avoid e.g. the monitor thread trying to “ping” the components when an activity is active.
As new extension points and the ability to add interfaces to the RtcSupervisor
are added it will be necessary for programmers to make use of this facility.
Configuration¶
The supervisor has the following static parameters which define whether the associated activities are started in the supervised components in parallel or series.
static:
init_alone:
type: RtcBool
value: true
enable_alone:
type: RtcBool
value: true
disable_alone:
type: RtcBool
value: false
update_alone:
type: RtcBool
value: false
The supervisor needs to get a list of components which are supervised. As an INTERIM MEASURE, these are read from a DEPL table. Any component that implements RTC deployment set start/stop is likely to populate something similar. The functionality regarding the usage of this DEPL table will be revisited.
The RTC Supervisor only reads the object_list
attribute, the others
are used by the deployment component. The presence of the rtc_sup
in the
object list is optional. It is used by the DEPL component for deployment.
The supervisor skips it if found. If you call your RTC Supervisor
something else, you will need to modify the RTC Supervisor to skip this
new name. Look at the code in the rtcObjectConfig.cpp
file in function
RtcObjectsDescription::LoadFromRuntimeRepo()
.
The following is an example of a YAML file containing the object_list
attribute:
object_list:
type: RtcString
value: "rtc_sup object1 object2"
rtcSupervisor:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkRtcSupervisor"
object1:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkExampleComponent"
object2:
host:
type: RtcString
value: "localhost"
exe:
type: RtcString
value: "rtctkExampleComponent"
Deployment and Other Scripts¶
To aid testing the RTC Supervisor a simple deployment python script is
provided which accepts a file like DEPL.yaml and will deploy each of the
components identified in the object_list using the rtctkStartObject.sh
wrapper, passing the executable name and the component name.
The rtctkStartObject.sh
script acts as a simple wrapper allowing all
RTC components to be identified by looking for the command line
rtctkStartObject.sh
. The script launches the Object and waits for its
completion (no use of Nomad is made).
The rtctkRtcSuper_start_components.sh
script copies some of the resources
into a “run” directory, does some cleanup and uses the deploy mechanism described
above.
The rtctkRtcSuper_stop_components.sh
script just kills all the rtctkStartComponent
instances using killall
The rtctkRtcSuper_show_oldb.sh
script provides a simple way of keeping an eye on the
fake oldb contents.
The rtctkSendCommand.sh
script is a simple wrapper for the rtctkClient
passing the SDE file argument, the script assumes it can find this
file in a directory identified by $REPO_DIR
Todo¶
User extensions. Provide a mechanism for the users to add their own functionality for e.g. “InError” “SetMode”, “Run(thing)”.
SetMode, Mode setting by populating parts of the Runtime Configuration Repository is currently not supported.