Distributed Minimalistic SRTC System¶
The provided Putting Everything Together for a Minimalistic SRTC System (fileBased
) example can run as it is just on a single machine.
This tutorial will explain how what changes are needed to run the example on multiple distributed machines.
In particular running the example on two machines.
The example is based on the fileBased
Putting Everything Together for a Minimalistic SRTC System example, and this tutorial concentrates only
on the parts that need to be modified.
Prerequisites¶
Two network connected machines with same ELT Development Environment installation.
As this example is based on
fileBased
Service Discovery the deployment is based on the file system, and we need to share these information between the machines. The easiest way to do this is to share the INTROOT area using a distributed file system like NFS.User eltdev properly configured, and rtctk (and dependency) installed in shared INTROOT area.
In addition the DDS (multicast) communication between the machines has to be configured correctly
Setup¶
The example needs two machines to run:
Node |
Description |
---|---|
Simple SRTC Computation |
Machine where supervisor, telemetry subscriber and data task run. Nomad agent runs as server and client. |
Simple HRTC Gateway |
Machine where telemetry republishing and MUDPI source run. Nomad agent runs as client only. |
The following assumption on IP addresses will be used through the tutorial, and need to be changed accordingly.
Node |
IP |
---|---|
Simple SRTC Computation |
134.171.2.88 |
Simple HRTC Gateway |
134.171.2.78 |
The Example Components and deployment¶
The same components as in the Putting Everything Together for a Minimalistic SRTC System case are used:
On Simple HRTC gateway node run:
Executable |
Instance (component) name |
---|---|
rtctkMudpiPublisher |
|
rtctkTelRepub |
tel_repub_1 |
On Simple SRTC node run:
Executable |
Instance (component) name |
---|---|
rtctkExampleTelSub |
tel_sub_1 |
rtctkExampleDataTaskTelemetry |
data_task_1 |
rtctkRtcSupervisor |
rtc_sup |
Nomad set-up¶
Two network connected machines with same ELT Development Environment installation are required. We need to set-up a Nomad cluster. If Nomad agent service is running on any machine we need to stop it:
systemctl stop nomad
Change /opt/nomad/etc/nomad.d/nomad.hcl
files on both nodes.
Simple SRTC Computation¶
On this machine the Nomad agent should run as server and client.
Replace all 127.0.0.1 IP addresses with one that corresponds to a Simple SRTC Computation machine i.e. 134.171.2.88.
The agent should consider this machine as a Computation
node,
which we achieve by adding the following to the client
stanza:
client {
...
meta {
"node_type" = "Computation"
}
...
}
Simple HRTC Gateway¶
On this machine the Nomad agent should run as a client only and connect to the Nomad agent server running the Simple SRTC Computation.
Set bind_addr
to 134.171.2.78, remove completely the advertise
and server
stanzas, and set servers
in the client
stanza to 134.171.2.88:4647.
The agent should consider this machine as a HrtcGateway
node,
which we achieve by adding the following to the client
stanza:
client {
...
servers = ["134.171.2.88:4647"]
...
meta {
"node_type" = "HrtcGateway"
}
...
}
After modification of the Nomad configuration files on both nodes we need to restart the Nomad agent service as follows:
systemctl start nomad
And check the status:
systemctl status nomad
Note
For the details how to configure and run Nomad please see the Nomad documentation.
Sometimes it is needed to delete some Nomad persistent files before running the service.
This can be done by executing: rm -rf /opt/nomad/var/* when the Nomad agent is stopped!
Note
Since the Nomad agents run under eltdev, the SRTC components are also executed under the same account.
Modifications¶
The baseline Putting Everything Together for a Minimalistic SRTC System code example for modification can be found in:
_examples/exampleEndToEnd
To run the system distributed over two nodes we need to change the configuration and the script used to run the example.
Configuration¶
Depending on which machines a particular component is run on, we need to set proper IP addresses for the fields,
req_rep_endpoint
pub_sub_endpoint
in the config/resource/config/rtctk/exampleEndToEnd/service_disc.yaml file as follows:
common:
...
rtc_sup:
req_rep_endpoint:
type: RtcString
value: zpb.rr://134.171.2.88:12081/
pub_sub_endpoint:
type: RtcString
value: zpb.ps://134.171.2.88:12082/
tel_repub_1:
req_rep_endpoint:
type: RtcString
value: zpb.rr://134.171.2.78:12083/
pub_sub_endpoint:
type: RtcString
value: zpb.ps://134.171.2.78:12084/
tel_sub_1:
req_rep_endpoint:
type: RtcString
value: zpb.rr://134.171.2.88:12085/
pub_sub_endpoint:
type: RtcString
value: zpb.ps://134.171.2.88:12086/
data_task_1:
req_rep_endpoint:
type: RtcString
value: zpb.rr://134.171.2.88:12087/
pub_sub_endpoint:
type: RtcString
value: zpb.ps://134.171.2.88:12088/
Nomad job files¶
For each component we need to create a corresponding Nomad job configuration file.
Telemetry Republisher¶
Here is an example of a Nomad job file for the Telemetry Republisher component (TelRePub.nomad):
job TelRePub {
datacenters = ["dc1"]
type = "batch"
constraint {
attribute = "${meta.node_type}"
value = "HrtcGateway"
}
group "deploy_tkjob1_group" {
task "deploy_tkjob1_task" {
driver = "raw_exec"
config {
command = "/bin/bash"
args = [
"-l",
"-c",
"rtctkTelRepub tel_repub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
]
}
}
}
}
For others we need to rename the job
; modify the value
of the constrain
stanza so that
the component is started on the correct node (HrtcGateway in the example above);
and modify the command that needs to be executed, which is defined in ‘’args’’
(it is: rtctkTelRepub tel_repub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml in the
example above).
Telemetry Subscriber¶
The script rtctkExampleEndToEnd.sh located in scripts/src/, which among other things starts the executables, has to be run on the Simple SRTC node. However, before it is run some modifications are needed.
Telemetry Subscriber component nomad job file (TelSub.nomad) relevant changes:
job TelSub {
...
constraint {
...
value = "Computation"
}
...
"rtctkExampleTelSub tel_sub_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
...
Data Task¶
Data Task component nomad job file (DataTask.nomad) relevant changes:
job DataTask {
...
constraint {
...
value = "Computation"
}
...
"rtctkExampleDataTaskTelemetry data_task_1 file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
...
RTC Supervisor¶
RTC Supervisor component nomad job file (RtcSupervisor.nomad) relevant changes:
job RtcSupervisor {
...
constraint {
...
value = "Computation"
}
...
"rtctkRtcSupervisor rtc_sup file:$PREFIX/run/exampleEndToEnd/service_disc.yaml",
...
MUDPI Publisher¶
MUDPI Publisher should run in HRTC Gateway node (HrtcGateway) and corresponding nomad job file (RtcSupervisor.nomad) should contain:
job MudpiPublisher {
...
constraint {
...
value = "HrtcGateway"
}
...
"rtctkMudpiPublisher -d 100000 -s 36928 192.168.4.53 `rtctkConfigTool --get --repo file:$PREFIX/run/exampleEndToEnd/runtime_repo --path /tel_repub_1/static/mudpi_receivers/rcv_00/port` `rtctkConfigTool --get --repo file:$PREFIX/run/exampleEndToEnd/runtime_repo --path /tel_repub_1/static/topics/topic_00/mudpi_topic`"
...
Running the Example¶
Deployment¶
After the modification and installation (waf install) we should execute:
rtctkExampleEndToEnd deploy
which will copy (as described in section: Configuration modified) configuration, and other files needed for the deployment in (shared) $INTROOT/run/exampleEndToEnd.
As next step we need to start the components using Nomad. First we need to export environment variable that points to the Nomad agent server (134.171.2.88 in our case) as follow:
export NOMAD_ADDR=http://134.171.2.88:4647
If access to Nomad agent server works can be checked:
nomad job status
Finally we instruct Nomad to start the components using job files created as described in section: Nomad job files:
nomad job run TelRePub.nomad
nomad job run TelSub.nomad
nomad job run DataTask.nomad
nomad job run RtcSupervisor.nomad
nomad job run MudpiPublisher.nomad
Status and logs of jobs can be check using a web browser pointing to http://134.171.2.88:4647
Running¶
After deployment, the example should be run in a similar manner as is explained in the Putting Everything Together for a Minimalistic SRTC System tutorial, by going through the same life-cycle.
Stopping¶
Components can be stop using Nomad in the following way:
nomad job stop MudpiPublisher
nomad job stop RtcSupervisor
nomad job stop DataTask
nomad job stop TelSub
nomad job stop TelRePub
And the deployment information can be removed by executing:
rtctkExampleEndToEnd undeploy