Distributed Minimalistic SRTC System

The provided Putting Everything Together for a Minimalistic SRTC System (fileBased) example can run as it is just on a single machine. This tutorial will explain how what changes are needed to run the example on multiple distributed machines. In particular running the example on two machines.

The example is based on the fileBased Putting Everything Together for a Minimalistic SRTC System example, and this tutorial concentrates only on the parts that need to be modified.

Prerequisites

  1. Two network connected machines with same ELT Development Environment installation.

  2. As this example is based on fileBased Service Discovery the deployment is based on the file system, and we need to share these information between the machines. The easiest way to do this is to share the INTROOT area using a distributed file system like NFS.

  3. User eltdev properly configured, and rtctk (and dependency) installed in shared INTROOT area.

  4. In addition the DDS (multicast) communication between the machines has to be configured correctly

Setup

The example needs two machines to run:

Node

Description

Simple SRTC Computation

Machine where supervisor, telemetry subscriber and data task run. Nomad agent runs as server and client.

Simple HRTC Gateway

Machine where telemetry republishing and MUDPI source run. Nomad agent runs as client only.

The following assumption on IP addresses will be used through the tutorial, and need to be changed accordingly.

Node

IP

Simple SRTC Computation

134.171.2.88

Simple HRTC Gateway

134.171.2.78

The Example Components and deployment

The same components as in the Putting Everything Together for a Minimalistic SRTC System case are used:

On Simple HRTC gateway node run:

Executable

Instance (component) name

rtctkMudpiPublisher

rtctkTelRepub

tel_repub_1

On Simple SRTC node run:

Executable

Instance (component) name

rtctkExampleTelSub

tel_sub_1

rtctkExampleDataTaskTelemetry

data_task_1

rtctkRtcSupervisor

rtc_sup

Nomad set-up

Two network connected machines with same ELT Development Environment installation are required. We need to set-up a Nomad cluster. If Nomad agent service is running on any machine we need to stop it:

systemctl stop nomad

Change /opt/nomad/etc/nomad.d/nomad.hcl files on both nodes.

Simple SRTC Computation

On this machine the Nomad agent should run as server and client. Replace all 127.0.0.1 IP addresses with one that corresponds to a Simple SRTC Computation machine i.e. 134.171.2.88. The agent should consider this machine as a Computation node, which we achieve by adding the following to the client stanza:

client {
   ...
   meta {
      "node_type" = "Computation"
   }
   ...
}

Simple HRTC Gateway

On this machine the Nomad agent should run as a client only and connect to the Nomad agent server running the Simple SRTC Computation. Set bind_addr to 134.171.2.78, remove completely the advertise and server stanzas, and set servers in the client stanza to 134.171.2.88:4647. The agent should consider this machine as a HrtcGateway node, which we achieve by adding the following to the client stanza:

client {
   ...
   servers = ["134.171.2.88:4647"]
   ...
   meta {
      "node_type" = "HrtcGateway"
   }
   ...
}

After modification of the Nomad configuration files on both nodes we need to restart the Nomad agent service as follows:

systemctl start nomad

And check the status:

systemctl status nomad

Note

For the details how to configure and run Nomad please see the Nomad documentation.

Sometimes it is needed to delete some Nomad persistent files before running the service.

This can be done by executing: rm -rf /opt/nomad/var/* when the Nomad agent is stopped!

Note

Since the Nomad agents run under eltdev, the SRTC components are also executed under the same account.

Modifications

The baseline Putting Everything Together for a Minimalistic SRTC System code example for modification can be found in:

_examples/exampleEndToEnd

To run the system distributed over two nodes we need to change the configuration and the script used to run the example.

Configuration

Depending on which machines a particular component is run on, we need to set proper IP addresses for the fields,

  • req_rep_endpoint

  • pub_sub_endpoint

in the config/resource/config/rtctk/exampleEndToEnd/service_disc.yaml file as follows:

common:

    ...

rtc_sup:
    req_rep_endpoint:
        type: RtcString
        value: zpb.rr://134.171.2.88:12081/
    pub_sub_endpoint:
        type: RtcString
        value: zpb.ps://134.171.2.88:12082/
tel_repub_1:
    req_rep_endpoint:
        type: RtcString
        value: zpb.rr://134.171.2.78:12083/
    pub_sub_endpoint:
        type: RtcString
        value: zpb.ps://134.171.2.78:12084/
tel_sub_1:
    req_rep_endpoint:
        type: RtcString
        value: zpb.rr://134.171.2.88:12085/
    pub_sub_endpoint:
        type: RtcString
        value: zpb.ps://134.171.2.88:12086/
data_task_1:
    req_rep_endpoint:
        type: RtcString
        value: zpb.rr://134.171.2.88:12087/
    pub_sub_endpoint:
        type: RtcString
        value: zpb.ps://134.171.2.88:12088/

Nomad job files

For each component we need to create a corresponding Nomad job configuration file.

Telemetry Republisher

Here is an example of a Nomad job file for the Telemetry Republisher component (TelRePub.nomad):

job TelRePub {
datacenters = ["dc1"]

type = "batch"

constraint {
    attribute = "${meta.node_type}"
    value     = "HrtcGateway"
}

group "deploy_tkjob1_group" {
    task "deploy_tkjob1_task" {
    driver = "raw_exec"

    config {
        command = "/bin/bash"

        args = [
        "-l",
        "-c",
        "rtctkTelRepub tel_repub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
        ]
    }
    }
}
}

For others we need to rename the job; modify the value of the constrain stanza so that the component is started on the correct node (HrtcGateway in the example above); and modify the command that needs to be executed, which is defined in ‘’args’’ (it is: rtctkTelRepub tel_repub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml in the example above).

Telemetry Subscriber

The script rtctkExampleEndToEnd.sh located in scripts/src/, which among other things starts the executables, has to be run on the Simple SRTC node. However, before it is run some modifications are needed.

Telemetry Subscriber component nomad job file (TelSub.nomad) relevant changes:

job TelSub {
...
constraint {
    ...
    value = "Computation"
}
    ...
        "rtctkExampleTelSub tel_sub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
    ...

Data Task

Data Task component nomad job file (DataTask.nomad) relevant changes:

job DataTask {
...
constraint {
    ...
    value = "Computation"
}
    ...
    "rtctkExampleDataTaskTelemetry data_task_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
    ...

RTC Supervisor

RTC Supervisor component nomad job file (RtcSupervisor.nomad) relevant changes:

job RtcSupervisor {
...
constraint {
    ...
    value = "Computation"
}
    ...
    "rtctkRtcSupervisor rtc_sup file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
    ...

MUDPI Publisher

MUDPI Publisher should run in HRTC Gateway node (HrtcGateway) and corresponding nomad job file (RtcSupervisor.nomad) should contain:

job MudpiPublisher {
...
constraint {
    ...
    value = "HrtcGateway"
}
    ...
    "rtctkMudpiPublisher -d 100000 -s 36928 192.168.4.53  `rtctkConfigTool --get  --repo file:$PREFIX/run/exampleEndToEnd/runtime_repo --path /tel_repub_1/static/mudpi_receivers/rcv_00/port` `rtctkConfigTool --get  --repo file:$PREFIX/run/exampleEndToEnd/runtime_repo --path /tel_repub_1/static/topics/topic_00/mudpi_topic`"
    ...

Running the Example

Deployment

After the modification and installation (waf install) we should execute:

rtctkExampleEndToEnd deploy

which will copy (as described in section: Configuration modified) configuration, and other files needed for the deployment in (shared) $INTROOT/run/exampleEndToEnd.

As next step we need to start the components using Nomad. First we need to export environment variable that points to the Nomad agent server (134.171.2.88 in our case) as follow:

export NOMAD_ADDR=http://134.171.2.88:4647

If access to Nomad agent server works can be checked:

nomad job status

Finally we instruct Nomad to start the components using job files created as described in section: Nomad job files:

nomad job run TelRePub.nomad
nomad job run  TelSub.nomad
nomad job run  DataTask.nomad

nomad job run  RtcSupervisor.nomad

nomad job run  MudpiPublisher.nomad

Status and logs of jobs can be check using a web browser pointing to http://134.171.2.88:4647

Running

After deployment, the example should be run in a similar manner as is explained in the Putting Everything Together for a Minimalistic SRTC System tutorial, by going through the same life-cycle.

Stopping

Components can be stop using Nomad in the following way:

nomad job stop MudpiPublisher

nomad job stop RtcSupervisor

nomad job stop DataTask
nomad job stop TelSub
nomad job stop TelRePub

And the deployment information can be removed by executing:

rtctkExampleEndToEnd undeploy