RTC Supervisor

Note

The RTC Supervisor is NOT a deployment tool, it’s role is to provide a single entry point to the RTC for state guiding, monitoring, error recovery and population of the Runtime Configuration Repository.

It responds to a number of the standard commands defined by Stdif and exports global state in the OLDB.

With the introduction of Nomad it is possible that some of the functions listed above may be removed from the RTC Supervisor in particular those related to monitoring.

Introduction

The current release of the RTC Supervisor performs the following:

  • Loading the contents of the Runtime Configuration Repository from the Persistent Configuration Repository using the active Deployment Set during initialisation.

  • Guiding the state of all SRTC components by forwarding state change requests to them.

  • Evaluating the overall state of the RTC by monitoring the state of all supervised SRTC components.

  • Monitoring the liveliness of all SRTC components and detecting if any have crashed.

And in subsequent releases will do the following:

  • Provide a simple interface for recovery when one or more components are in error.

  • Monitoring any error events generated by RTC components and generating an overall error indication.

  • Implement a mode switching interface by reloading parts of the Runtime Configuration Repository from the Persistent Configuration Repository.

  • Providing a means of updating the Persistent Configuration Repository with items which have been changed in the Runtime Configuration Repository.

  • To act as a base class to which instrument specific functionality and interfaces can be added.

The RTC Supervisor implementation is divided into a library and a server. The server implementing the usable component. In addition there are a number of test programs used as integration tests and an example implementation of some deployment code which allows a set of components defined in a YAML file to be launched.

The library implements the functionality required for communicating with a set of supervised rtcObjects and sending commands to them in a list rtcCommandRequest and rtcCommandRequestSeries. The complete configuration of rtcObjects is managed by the rtcObjectConfig class which reads its config from the runtime repo.

Note

The RTC Supervisor component is currently based on the rtctkExampleComponent structure, i.e. it provides a business logic which implements the various activities. This may well change and become a simple server implementing the stdif commands directly.

Launching the Server

Being based on the rtctkExampleComponent the command line of the RTC Supervisor server program is:

$ rtctkRtcSupervisor -h
Options:
  -h [ --help ]         print help messages
  -i [ --cid ] arg      component identity
  -s [ --sde ] arg      service discovery endpoint

The name of the RTC Supervisor instance and the service discovery URI (currently only a YAML file) must be provided, as is shown in the following example command to start the process:

$ rtctkRtcSupervisor \
    -i rtc_sup \
    -s file:$INTROOT/run/exampleEndToEnd/fileBased/service_disc.yaml

Populating the Runtime Configuration Repository

The contents of the Runtime Configuration Repository is automatically loaded from the Persistent Configuration Repository during the initialising state. This is performed before any access of the Runtime Configuration Repository is performed and before any state guiding. This is important to have the configuration available before any component tries to access it.

The automatic loading only occurs if the Persistent Configuration Repository is available, otherwise this step is skipped and the Runtime Configuration Repository is assumed to be populated by some other means, e.g. using rtctkConfigTool to manually populate the repository before hand (see Populate). The Persistent Configuration Repository is available when the persistent_repo_endpoint service is configured in the Service Discovery. Currently this simply means that the following example entry should be present in the Service Discovery YAML file (where /persistent_repo should be replaced with the appropriate path):

common:
    persistent_repo_endpoint:
        type: RtcString
        value: file:/persistent_repo

It is also possible to disable the automatic loading of the datapoints into the Runtime Configuration Repository, even if the Persistent Configuration Repository is available, by setting the /disable_populate_runtime_repo boolean datapoint to true in the Persistent Configuration Repository. For example, by running the following command (where /persistent_repo should be replaced with the appropriate path):

$ rtctkConfigTool --persistent-repo-endpoint file:/persistent_repo \
        set persistent /disable_populate_runtime_repo True

By default the Runtime Configuration Repository is populated by loading the active Deployment Set indicated by the /active_deployment string datapoint in the Persistent Configuration Repository. The details of the configuration layout for Deployment Sets is described in section Configuration Layout. For certain testing scenarios, it may be easier to perform a simple 1-to-1 copy of the datapoint hierarchy in the Persistent Configuration Repository, rather than applying full handling of Deployment Sets. For such cases, it is possible to force the simple 1-to-1 copy by setting the /simple_populate_runtime_repo boolean datapoint to true in the Persistent Configuration Repository as follows:

$ rtctkConfigTool --persistent-repo-endpoint file:/persistent_repo \
        set persistent /simple_populate_runtime_repo True

Note

When forcing a simple 1-to-1 copy of the datapoints, the interpretation of the contents of the Persistent Configuration Repository is completely different. In such a case, the datapoint hierarchy should follow the Runtime Configuration Repository instead.

State Guiding

Currently state guiding is performed from the activities defined by the ExampleComponent, in each activity the AllObjectRequestList() is used to send a series of commands in series or in parallel to the list of supervised components.

The RTC Supervisor implements activities for and understands the following commands

  • Init

  • Reset

  • Recover (currently empty)

  • Enable

  • Disable

Note

Run and Idle are not in the above list. The baseline currently is that the sequencer code will be responsible for sending the Run commands to the RTC components in the correct order. This may be re-evaluated.

There are configuration flags available for each of these activities indicating if they should be performed in parallel or series on the list of supervised components.

Commanding

To send one of the supported commands you can use the rtctkSendCommand.sh script which makes use of the rtctkClient program implementing the Stdif client interface. For example:

$ rtctkSendCommand.sh rtc_sup Init
$ rtctkSendCommand.sh rtc_sup Reset
$ rtctkSendCommand.sh rtc_sup Recover
$ rtctkSendCommand.sh rtc_sup Enable
$ rtctkSendCommand.sh rtc_sup Disable

Where rtc_sup is the name which the RTC Supervisor has been passed with the -i flag. The rtctkSendCommand.sh script will look in an environment variable $REPO_DIR for the service_disc.yaml file in which it will look up the required URIs.

State Evaluation

When the object configuration is built from the Runtime Configuration Repository, a list of publish/subscribe URIs is created, one per supervised component. The business logic creates a StateSubscriber with the list of URIs. The StateSubscriber’s callback is invoked whenever an event is received and the RtcObjectConfig::OnStateEventReceived() method is called. This sets the state attribute of the identified RtcObject in the object list, then evaluates the system believed state/substate and publishes it to the OLDB.

A typical content of the file based OLDB when the system is operational and supervising two components, object1 and object2 would be:

rtc_sup:
  global_display_state:
    type: RtcString
    value: On:Operational:Idle
  global_state:
    type: RtcString
    value: operational
  global_substate:
    type: RtcString
    value: idle
  global_error:
    type: RtcBool
    value: false
  global_error_who:
    type: RtcString
    value: ""
  state:
    type: RtcString
    value: "On:Operational:Idle;On:Operational:Update:Idle;"
object1:
  state:
    type: RtcString
    value: "On:Operational:Idle;On:Operational:Update:Idle;"
object2:
  state:
    type: RtcString
    value: "On:Operational:Idle;On:Operational:Update:Idle;"

Asynchronous Detection of Component Failure

Asynchronous monitoring is performed by the rtcMonitor class. The rtcServer has a member which is an rtcMonitor. A thread is created from the rtcMonitors creator which periodically when active calls the rtcServers MonitorCycle() method.

The rtcServer marks the monitor as being active whenever the state is at least NotOperational:Ready.

The rtcServers MonitorCycle() method uses the AllObjectRequestList() to send a GetVersion command with a short timeout to each component. If the command fails the rtcObject sending the command will mark the component as having generated an exception and commands will not be sent to it subsequently.

If a component does fail then the InError method is called to set the error flag and record the name of the component in error.

Error Notification

In general when the RTC Supervisor notices something has gone wrong it calls the RtcSupervisor::InError method which updates the OLDB with the error and an indication of the cause.

Mutex Usage

A std::mutex is available in the RtcSupervisor class which can be used to globally lock the component.

This is used to avoid e.g. the monitor thread trying to “ping” the components when an activity is active.

As new extension points and the ability to add interfaces to the RtcSupervisor are added it will be necessary for programmers to make use of this facility.

Configuration

The supervisor has the following static parameters which define whether the associated activities are started in the supervised components in parallel or series.

Listing 10 rtc_sup.yaml
static:
    init_alone:
        type: RtcBool
        value: true
    enable_alone:
        type: RtcBool
        value: true
    disable_alone:
        type: RtcBool
        value: false
    update_alone:
        type: RtcBool
        value: false

The supervisor needs to get a list of components which are supervised. As an INTERIM MEASURE, these are read from a DEPL table. Any component that implements RTC deployment set start/stop is likely to populate something similar. The functionality regarding the usage of this DEPL table will be revisited.

The RTC Supervisor only reads the object_list attribute, the others are used by the deployment component. The presence of the rtc_sup in the object list is optional. It is used by the DEPL component for deployment. The supervisor skips it if found. If you call your RTC Supervisor something else, you will need to modify the RTC Supervisor to skip this new name. Look at the code in the rtcObjectConfig.cpp file in function RtcObjectsDescription::LoadFromRuntimeRepo().

The following is an example of a YAML file containing the object_list attribute:

Listing 11 depl.yaml
object_list:
    type: RtcString
    value: "rtc_sup object1 object2"
rtcSupervisor:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkRtcSupervisor"
object1:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkExampleComponent"
object2:
    host:
        type: RtcString
        value: "localhost"
    exe:
        type: RtcString
        value: "rtctkExampleComponent"

Deployment and Other Scripts

To aid testing the RTC Supervisor a simple deployment python script is provided which accepts a file like DEPL.yaml and will deploy each of the components identified in the object_list using the rtctkStartObject.sh wrapper, passing the executable name and the component name.

The rtctkStartObject.sh script acts as a simple wrapper allowing all RTC components to be identified by looking for the command line rtctkStartObject.sh. The script launches the Object and waits for its completion (no use of Nomad is made).

The rtctkRtcSuper_start_components.sh script copies some of the resources into a “run” directory, does some cleanup and uses the deploy mechanism described above.

The rtctkRtcSuper_stop_components.sh script just kills all the rtctkStartComponent instances using killall

The rtctkRtcSuper_show_oldb.sh script provides a simple way of keeping an eye on the fake oldb contents.

The rtctkSendCommand.sh script is a simple wrapper for the rtctkClient passing the SDE file argument, the script assumes it can find this file in a directory identified by $REPO_DIR

Todo

  • User extensions. Provide a mechanism for the users to add their own functionality for e.g. “InError” “SetMode”, “Run(thing)”.

  • SetMode, Mode setting by populating parts of the Runtime Configuration Repository is currently not supported.