Nomad usage in the Toolkit¶
Nomad has been selected as the mechanism by which processes are launched and monitored, details of the use of Nomad are still evolving as this document is released.
The ICS Framework documentation describes the use of Nomad within that subsystem ICS_HW and gives a basic introduction to Nomad.
One significant difference from the use of Nomad in the ICS Framework to that in the Toolkit is that the RTC Toolkit does not make use of Nomad Templates for service discovery, service end points are defined as described in the section Service Discovery
Nomad starts processes by means of a job description file, since RTC Toolkit components use a standard command line format some automation is possible in the creation of Nomad job files for starting components using this standard format.
The Template job file looks like:
job JOBNAME {
#Specify Nomad datacenters to run job on
datacenters = ["dc1"]
type = "batch"
group "deploy_tkjob1_group" {
task "deploy_tkjob1_task" {
driver = "raw_exec"
config {
command = "COMMAND"
args = [
ARGS
]
}
}
}
}
A simple script is provided for substituting the JOBNAME, COMMAND and ARGS
% rtctkMakeNomadJob.sh
Usage: rtctkMakeNomadJob.sh <job_name> <exec_name> <args string>
An example use for a telemetry-based data task called rtctkExampleDataTask might be:
rtctkMakeNomadJob.sh rtctkExampleDataTask rtctkExampleDataTaskTelemetry "\"data_task_1\", \"service_disc.yaml\""
As you can see from the example, the use of backslash and quotation is tricky depending upon where the script is being called from, you should always check the generated file for correctness.
The above command would create a Nomad job file called “rtcExampleDataTask.nomad” in the working directory containing:
job rtctkExampleDataTask {
#Specify Nomad datacenters to run job on
datacenters = ["dc1"]
type = "batch"
group "deploy_tkjob1_group" {
task "deploy_tkjob1_task" {
driver = "raw_exec"
config {
command = "rtctkExampleDataTaskTelemetry"
args = [
"data_task_1", "service_disc.yaml",
]
}
}
}
}
After ensuring that the Nomad agent is running and the file contents appear correct the resulting job file can be started and checked with the following commands:
eltdev@eltrtctk40:rtctk$ nomad job run rtctkExampleDataTask.nomad
eltdev@eltrtctk40:rtctk$ nomad job status
ID Type Priority Status Submit Date
rtctkExampleDataTask batch 50 running 2021-11-01T21:07:42Z
An example shell script (rtctkExampleEndToEndCiiNomad.sh) is included with the Toolkit delivery demonstrating the use of Nomad Job creation, Nomad Job starting and a basic life-cycle.
RTC Tk components do not currently rely on any Nomad services, so for debugging purposes and during development, components can be launched directly from the command line. The above Nomad example would be the equivalent of executing COMMAND with argument ARGS as the user who started the Nomad Agent, i.e.
eltdev% rtctkExampleDataTaskTelemetry data_task_1 service_disc.yaml &
Support for automatic configuration of multi node Nomad clusters is still under discussion at ESO.
You can interact with clients started by Nomad or from the command line sending commands using the command line tool “rtctkClient”, e.g. to send an Init command to the component data_task_1 use the following command (substituting your own location for te service_disc.yaml):
eltdev% rtctkClient -i data_task_1 -s file:/home/eltdev/INTROOT/eltsrtc05/run/eltsrtc05/service_disc.yaml -c Init
If you want to extend this Nomad usage to support a multi node cluster it is necessary to run multiple Nomad agents and for the Nomad job file to specify on which machine the job should be executed. All of this is currently under discussion at ESO. An example of multi node Nomad usage can be found in: Distributed Minimalistic SRTC System.