RTC Supervisor

The RTC Supervisor is NOT a deployment tool, it’s role is to provide a single entry point to the RTC for state guiding, monitoring, error recovery and population of the run time repo.

It responds to a number of the standard commands defined by Stdif and exports global state in the OLDB.

Note

With the introduction of Nomad it is possible that some of the functions listed above may be removed from the RTC Supervisor in particular those related to monitoring.

Introduction

The current release of the RTC Supervisor performs the following:

  • Loading the contents of the Runtime Configuration Repository from the Persistent Configuration Repository during initialisation.

  • Guiding the state of all SRTC components by forwarding state change requests to them.

  • Evaluating the overall state of the RTC by monitoring the state of all supervised SRTC components.

  • Monitoring the liveliness of all SRTC components and detecting if any have crashed.

And in subsequent releases will do the following:

  • Provide a simple interface for recovery when one or more components are in error.

  • Monitoring any error events generated by RTC components and generating an overall error indication.

  • Properly handling the Deployment Sets in the Persistent Configuration Repository, rather than simply performing a 1-to-1 copy when loading the contents of the Runtime Configuration Repository.

  • Implement a mode switching interface by reloading parts of the Runtime Configuration Repository from the Persistent Configuration Repository.

  • Providing a means of updating the Persistent Configuration Repository with items which have been changed in the Runtime Configuration Repository.

  • To act as a base class to which instrument specific functionality and interfaces an be added.

The RTC Supervisor implementation is divided into a library and a server. The server implementing the usable component. In addition there are a number of test programs used as integration tests and an example implementation of some deployment code which allows a set of components defined in a YAML file to be launched.

The library implements the functionality required for communicating with a set of supervised rtcObjects and sending commands to them in a list rtcCommandRequest and rtcCommandRequestSeries. The complete configuration of rtcObjects is managed by the rtcObjectConfig class which reads its config from the runtime repo.

The RTC Supervisor component is currently based on the rtctkExampleComponent structure, i.e. it provides a business logic which implements the various activities. THIS MAY WELL CHANGE and become a simple server implementing the Stdif commands directly.

Launching the Server

Being based on the rtctkExampleComponent the command line of the RTC Supervisor server program is:

rtctkRtcSupervisor -h
Options:
  -h [ --help ]         print help messages
  -i [ --cid ] arg      component identity
  -s [ --sde ] arg      service discovery endpoint

The name of the RTC Supervisor instance and the service discovery URI (currently only a YAML file) must be provided, as is shown in the following example command to start the process:

rtctkRtcSupervisor -i rtc_sup -s file:$INTROOT/run/exampleEndToEnd/fileBased/service_disc.yaml

Populating the Runtime Configuration Repository

The contents of the Runtime Configuration Repository is automatically loaded from the Persistent Configuration Repository during the initialising state. This is performed before any access of the Runtime Configuration Repository is performed and before any state guiding. This is important to have the configuration available before any component tries to access it.

Note

Currently only a 1-to-1 copy of the datapoints is performed from the Persistent Configuration Repository to the Runtime Configuration Repository and Deployment Sets are not taken into account. This will change in future versions of the RTC Toolkit.

The automatic loading only occurs if the Persistent Configuration Repository is available, otherwise this step is skipped and the Runtime Configuration Repository is assumed to be populated by some other means, e.g. using rtctkConfigTool to manually populate the repository before hand (see Populate-from). The Persistent Configuration Repository is available when the persist_repo_endpoint service is configured in the Service Discovery. Currently this simply means that the following example entry should be present in the Service Discovery YAML file (where /persistent_repo should be replaced with the appropriate path):

common:
    persist_repo_endpoint:
        type: RtcString
        value: file:/persistent_repo

It is also possible to disable the automatic loading of the datapoints into the Runtime Configuration Repository, even if the Persistent Configuration Repository is available, by setting the /common/disable_populate_runtime_repo boolean datapoint to true in the Persistent Configuration Repository. For example, by running the following command (where /persistent_repo should be replaced with the appropriate path):

rtctkConfigTool --repo file:/persistent_repo --path /common/disable_populate_runtime_repo \
                --set --value 1 --type RtcBool

State Guiding

Currently state guiding is performed from the activities defined by the ExampleComponent, in each activity the AllObjectRequestList() is used to send a series of commands in series or in parallel to the list of supervised components.

The RTC Supervisor implements activities for and understands the following commands

  • Init

  • Reset

  • Recover (currently empty)

  • Enable

  • Disable

NOTE: That Run and Idle are not in the above list. The baseline currently is that the sequencer code will be responsible for sending the Run commands to the RTC components in the correct order. This may be re-evaluated.

There are configuration flags available for each of these activities indicating if they should be performed in parallel or series on the list of supervised components.

Commanding

To send one of the supported commands you can use the rtctkSendCommand script which makes use of the rtctkClient program implementing the Stdif client interface as follows:

rtctkSendCommand rtc_sup Init
rtctkSendCommand rtc_sup Reset
rtctkSendCommand rtc_sup Recover
rtctkSendCommand rtc_sup Enable
rtctkSendCommand rtc_sup Disable

Where rtc_sup is the name which the rtcSupervisor has been passed with the -i flag. The rtctkSendCommand script will look in an environment variable $REPO_DIR for the service_disc.yaml with which it will look up the URIs required.

State Evaluation

When the object configuration is built from the Runtime Configuration Repository, a list of publish subscribe URIs is created, one per supervised component. The business logic creates a StateSubscriber with the list of URIs. The StateSubscribers callback to be called whenever an event is received and the rtcObjectConfig::OnStateEventReceived() method is called which sets the state attribute of the identified rtcObject in the object list and then evaluates the system believed state/substate and publishes it in the OLDB.

A typical content of the OLDB when the system is operational and supervising two components, object1 and object2 would be:

rtc_sup:
  global_display_state:
    type: RtcString
    value: On.Operational.Idle
  global_state:
    type: RtcString
    value: operational
  global_substate:
    type: RtcString
    value: idle
  global_error:
    type: RtcBool
    value: false
  global_error_who:
    type: RtcString
    value: ""
  state:
    type: RtcString
    value: "On.Operational.Idle On.Operational.Update.Idle "
object1:
  state:
    type: RtcString
    value: "On.Operational.Idle On.Operational.Update.Idle "
object2:
  state:
    type: RtcString
    value: "On.Operational.Idle On.Operational.Update.Idle "

Asynchronous Detection of Component Failure

Asynchronous monitoring is performed by the rtcMonitor class. The rtcServer has a member which is an rtcMonitor. A thread is created from the rtcMonitors creator which periodically when active calls the rtcServers MonitorCycle() method.

The rtcServer marks the monitor as being active whenever the state is at least NotOperational/Ready.

The rtcServers MonitorCycle() method uses the AllObjectRequestList() to send a GetVersion command with a short timeout to each component. If the command fails the rtcObject sending the command will mark the component as having generated an exception and commands will not be sent to it subsequently.

If a component does fail then the InError method is called to set the error flag and record the name of the component in error.

Error Notification

In general when the rtcSupervisor notices something has gone wrong it calls the rtcSupervisor::InError method which updates the OLDB with the error and an indication of the cause.

Mutex Usage

A std::mutex is available in the RtcSupervisor class which can be used to globally lock the component.

This is used to avoid e.g. the monitor thread trying to “ping” the components when an activity is active.

As new extension points and the ability to add interfaces to the RtcSupervisor are added it will be necessary for programmers to make use of this facility.

Configuration

The supervisor has the following static parameters which define whether the associated activities are started in the supervised components in parallel or series.

Listing 1 rtc_sup.yaml
cfg_static:
    init_alone:
        type: RtcBool
        value: true
    enable_alone:
        type: RtcBool
        value: true
    disable_alone:
        type: RtcBool
        value: false
    update_alone:
        type: RtcBool
        value: false

The supervisor needs to get a list of components which are supervised. As an INTERIM MEASURE, these are read from a DEPL table. It is likely that whatever component implements RTC deployment set starting/stopping will populate something similar. Any functionality regarding the usage of this DEPL table will be revisited.

The RTC Supervisor only reads the object_list attribute, the others are used by the deployment component. The presence of the rtc_sup in the object list is optional it is used by the DEPL component for deployment, the supervisor skips it if found. If you call your rtcSupervisor something else you will need to modify the rtcSupervisor to skip this new name. Look at the code in the rtcObjectConfig.cpp with the comment

Listing 2 depl.yaml
object_list:
    type: RtcString
    value: "rtc_sup object1 object2"
rtcSupervisor:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkRtcSupervisor"
object1:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkExampleComponent"
object2:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkExampleComponent"

Deployment and Other Scripts

To aid testing the rtcSupervisor a simple deployment python script is provided which accepts a file like DEPL.yaml and will deploy each of the components identified in the object_list using the rtctkStartObject.sh wrapper, passing the executable name and the component name.

The rtctkStartObject.sh script acts as a simple wrapper allowing all RTC components to be identified by looking for the command line rtctkStartObject.sh. The script launches the Object and waits for its completion. (No use of Nomad is made)

The rtctkRtcSuper_start_components.sh script copies some of the resources into a “run” directory, does some cleanup and uses the deploy mechanism described above.

The rtctkRtcSuper_stop_components.sh script just kills all the rtctkStartComponent instances using killall

The rtctkRtcSuper_show_oldb.sh script provides a simple way of keeping an eye on the fake oldb contents.

The rtctkSendCommand.sh script is a simple wrapper for the rtctkClient passing the SDE file argument, the script assumes it can find this file in a directory identified by $REPO_DIR

Todo

  • User extensions. Provide a mechanism for the users to add their own functionality for e.g. “InError” “SetMode”, “Run(thing)”.

  • SetMode, Mode setting by populating parts of the Runtime Repo currently not supported.