Nomad usage in the ToolkitΒΆ

Nomad has been selected as the mechanism by which processes are launched and monitored, details of the use of Nomad are still evolving as this document is released.

The ICS Framework documentation describes the use of Nomad within that subsystem ICS_HW and gives a basic introduction to Nomad.

One significant difference from the use of Nomad in the ICS Framework to that in the Toolkit is that the RTC Toolkit does not make use of Nomad Templates for service discovery, service end points are defined as described in the section Service Discovery

Nomad starts processes by means of a job description file, since RTC Toolkit components use a standard command line format some automation is possible in the creation of Nomad job files for starting components using this standard format.

The Template job file looks like:

job JOBNAME {
  #Specify Nomad datacenters to run job on
  datacenters = ["dc1"]

  type = "batch"

  group "deploy_tkjob1_group" {
    task "deploy_tkjob1_task" {
      driver = "raw_exec"
      config {
  # in this way we get the environament variable
          command = "/bin/bash"

          args = [
          "-l",
          "-c",
          "COMMAND ARGS"
          ]
      }
    }
  }
}

A simple script is provided for substituting the JOBNAME, COMMAND and ARGS

$ rtctkMakeNomadJob.sh
Usage: rtctkMakeNomadJob.sh <job_name> <exec_name> \"<args string>\"

An example use for a telemetry-based data task called data_task_1 might be:

$ rtctkMakeNomadJob.sh data_task_1 rtctkExampleDataTaskTelemetry \
      "\"-i\", \"data_task_1\", \"-s\", \"file:/path/to/service_disc.yaml\""

As you can see from the example, the use of backslash and quotation is tricky depending upon where the script is being called from, you should always check the generated file for correctness.

The above command would create a Nomad job file called data_task_1.nomad in the working directory containing:

job data_task_1 {
  #Specify Nomad datacenters to run job on
  datacenters = ["dc1"]

  type = "batch"

  group "deploy_tkjob1_group" {
    task "deploy_tkjob1_task" {
      driver = "raw_exec"

      config {
         command = "/bin/bash"

        args = [
          "-l",
          "-c",
          "rtctkExampleDataTaskTelemetry -i data_task_1 -s file:/path/to/service_disc.yaml"
        ]
      }
    }
  }
}

After ensuring that the Nomad agent is running and the file contents appear correct the resulting job file can be started and checked with the following commands:

$ nomad job run data_task_1.nomad

$ nomad job status

ID           Type   Priority  Status   Submit Date
data_task_1  batch  50        running  2022-06-03T09:18:20Z

An example shell script (rtctkExampleEndToEndCiiNomad.sh) is included with the Toolkit delivery demonstrating the use of Nomad Job creation, Nomad Job starting and a basic life-cycle.

RTC Tk components do not currently rely on any Nomad services, so for debugging purposes and during development, components can be launched directly from the command line. The above Nomad example would be the equivalent of executing COMMAND with argument ARGS as the user who started the Nomad Agent, i.e.

$ rtctkExampleDataTaskTelemetry -i data_task_1 -s file:/path/to/service_disc.yaml

Support for automatic configuration of multi node Nomad clusters is still under discussion at ESO.

You can interact with clients started by Nomad or from the command line sending commands using the command line tool rtctkClient, e.g. to send an Init command to the component data_task_1 use the following command (substituting your own location for the service_disc.yaml):

$ rtctkClient \
    -i data_task_1 \
    -s file:/path/to/service_disc.yaml \
    -c Init

If you want to extend this Nomad usage to support a multi node cluster it is necessary to run multiple Nomad agents and for the Nomad job file to specify on which machine the job should be executed. All of this is currently under discussion at ESO. An example of multi node Nomad usage can be found in: Distributed Minimalistic SRTC System.