Telemetry Republisher¶
Overview¶
The Telemetry Republisher RTC Component reads telemetry data in MUDPI format originating in the HRTC, and forwards (republishes) it using DDS reliable Multicast to one or more SRTC nodes.
Prerequisites¶
Telemetry data is published via FastDDS provided by ELT Development Environment. FastDDS QoS
profiles have to be provided in telemDataPathDdsQos.xml
installed in
$INTROOT/resource/config/rtctk/dds/
or any other location under rtctk/dds/
in $CFGPATH
.
Different file name can be provided in the component’s configuration (see Configuration
section below for details).
Note
The FASTRTPS_DEFAULT_PROFILES_FILE
environment variable shall not be used (set).
This is particularly problematic if the file that the variable points to contains the same
QoS profiles.
The Telemetry Republisher works optimally for MUDPI traffic with Ethernet MTU of size 9000. Nevertheless it works also with different sizes. Thus it is recommended to first exercise Telemetry Republisher with default (by OS set) MTU, and afterwards change the MTU size for specific MUDPI traffic.
Customisation¶
The Telemetry Republisher component does not require any compile-time customization. It runs out of the box and only needs to be configured accordingly, please refer to section Configuration for more details.
Running¶
The Telemetry Republisher can be started (after deployment) by invoking the command:
$ rtctkTelRepub tel_repub_1 file:$INTROOT/run/exampleTelRepub/service_disc.yaml
Mandatory command line arguments are the component instance name (first) and the service discovery endpoint (second). Service discovery information is retrieved from the specified service registry located in $INTROOT/run/exampleTelRepub/service_disc.yaml.
The component can be stopped either by sending Exit command or by pressing Ctrl-C.
Commands¶
To initialise the Telemetry Republisher component send Init using the Client application:
$ rtctkClient tel_repub_1 Init \
-s file:$INTROOT/run/exampleTelRepub/service_disc.yaml
As a result the component reads its configuration from the Run-time repository, it creates entities for listening to MUDPI traffic on a UDP socket, and entities for publishing DDS agnostic topic. At this stage the reading and publishing is not yet started, but subscription to created DDS topic(s) can be established.
This can be seen in the logs. E.g. a log states that “connection” to the particular DDS topic (TestTopic0) i.e. corresponding DDS subscriber comes up:
[10:26:06:580][INFO ][tel_repub_1] on_publication_match (TestTopic00)
When a DDS subscriber “disconnects” to the particular (TestTopic0) Telemetry Republisher with name tel_repub_1 logs such a kind of message:
[10:26:06:580][INFO ][tel_repub_1] on_publication_match (TestTopic00)
Commands Enable/Disable are used to transition between states On:NotOperational:Ready and On:Operational:Idle where the Run command can be applied.
When sending command Run the republisher starts listening to the MUDPI traffic on the sockets and publishes aggregated topics via DDS. The activity can be checked by looking at different Telemetry Republisher metrics (statistics): Online Database (OLDB) Data Points. Similar information can be obtained also in regular DEBUG messages for what the component log level needs to be set to DEBUG level. By default the log messages is produced every 60s, the message contains number of received MUDPI frames, number of frames per sample, receiving rate, estimated loop (sample) frequency, and the number of frames and samples skipped so far.
E.g. how to set log level to DEBUG:
$ rtctkClient tel_repub_1 SetLogLevel DEBUG \
-s file:$INTROOT/run/exampleTelRepub/service_disc.yaml
E.g. DEBUG logs for MUDPI topic with TopicId 3 and 2, number of frames is 6:
[09:54:28:743][DEBUG][tel_repub_1] [3] MUDPI processor received 13434 frames (6 frames/sample) 990.58/s, estimated loop freq 165.10, errors (0 : 1)
[09:54:29:457][DEBUG][tel_repub_1] [2] MUDPI processor received 15174 frames (6 frames/sample) 971.40/s, estimated loop freq 161.90, errors (0 : 1)
The listening and publishing is stopped by sending the Idle command.
Command Exit stops the component, means exiting the process. At this stage some diagnostic messages are logged. Highest occupancy of the internal queue/buffer in absolute number and percentage.
E.g. For topic TestTopic00 the max occupancy was just 10 slots, what means 0.033% of the queue whole size. High percentage is an indicator of possible performance problem on DDS publishing side of Telemetry Republisher.
[13:06:36:405][DEBUG][tel_repub_1] [TestTopic00] Max TelRepub buffer occupancy 10 (0.03333 %).
Similar diagnostic log messages we get for DDS publishers.
E.g. log message means that DDS publisher side has not detected any sample drop.
[13:06:36:405][INFO ][tel_repub_1] [TestTopic00] Received: 0 samples. No samples skipped
E.g. log message means that there were dropped 35 samples out of 9345. This indicates likely problem with DDS publishing, slow DDS subscriber, …
[17:12:45:745][INFO ][tel_repub_1] [TestTopic00] skipped: 35 samples out of 9345 ratio: 0.00374531835206. Last @: 7340
As all the Configuration for the Telemetry Republisher is static the Update command does not have any effect.
Configuration¶
Configuration for Telemetry Republisher component is stored as for other components in a file in YAML format. The configuration file name has to correspond to the name of the component instance. The configuration contains just the static part meaning that configuration can not be updated i.e. taken into account during running invoking Update command. If configuration is changed then the component should be restarted or reinitialized i.e. call Init.
The configuration can be divided into three groups:
common configuration
receivers configuration
DDS topics configuration
Common Configuration¶
Example configuration (YAML): In the common configuration part can be specified the DDS QoS Profile to be used for setting QoS DDS entities like DDS participant and DDS publisher, and allowed network interfaces for DDS.
Configuration Path |
Type |
Description |
---|---|---|
|
|
(optional) DDS QoS Profile to be used for setting QoS of DDS entities like DDS participant
and DDS publisher. The specified QoS Profile needs to be contained in the
|
|
|
(optional) Name of DDS QoS XML file. The file should be found in |
|
|
(optional) List of network interfaces to be used by DDS.
The interfaces are for the local machine where the component is running.
If given, this list will replace any settings under the |
An example of a common configuration block:
static:
dds_qos_profile:
type: RtcString
value: RtcTk_Default_Profile
dds_qos_file:
type: RtcString
value: telemDataPathDdsQos.xml
dds_interface_white_list:
type: RtcVectorString
value:
- 127.0.0.1
- 192.168.5.44
- lo
...
Receivers Configuration¶
The Telemetry Republisher can listen to more receivers which are to be specified in mudpi_receivers section. Each receiver is specified as rcv_NN where NN goes from 00 to two digit number (index) of receivers - 1. E.g. for two receiver we have: rcv_00 and rcv_01.
Note
rcv_1 wont work as index as it is just one digit, correctly this would be rcv_01. The index has to start with 00 (not 01). There should be no gap in numbers. E.g. rcv_00, rcv_01, rcv_09 will configure just two receivers.
For each receiver the following needs to be specified:
Configuration Path |
Type |
Description |
---|---|---|
|
|
IP address to be used for listening to. E.g one address corresponds to one receiver and can only listen on one NIC. Important It must be just an IP address and not for example a host name. |
|
|
Port to listen to. |
|
Defines optional NUMA policies for the UDP receiver thread. |
Example configuration for two receivers:
static:
# ...
mudpi_receivers:
rcv_00:
ip:
type: RtcString
value: 127.0.0.1
port:
type: RtcInt32
value: 6000
thread_policies:
cpu_affinity:
type: RtcString
value: "1"
rcv_01:
ip:
type: RtcString
value: 127.0.0.1
port:
type: RtcInt32
value: 6500
# ...
DDS Topic Configuration¶
Similarly as for receivers, the Telemetry Republisher can operate on many (DDS) topics. DDS topics are configured in topics section.
Each DDS topic is specified as topic_NN where NN goes from 00 to two digit number (index) of topics - 1. E.g. for two topics we have: topic_00 and topic_01.
Note
topic_1 wont work as index is just one digit correct would be topic_01. The index has to start with 00 (not 01). There should be no gap in numbers. E.g. topic_00, topic_01, topic_09 will configure just two topics.
For each topic the following needs to be specified:
Configuration Path |
Type |
Description |
---|---|---|
|
|
Topic name. Important: names should be unique per Telemetry Republisher. |
|
|
Map to MUDPI topic id. Important: each MUDPI topic id needs to have corresponding DDS topic i.e. topic that has configuration that maps (=has defined mudpi_topic) to that specific MUDPI topic id. |
|
|
Receiver index where topic specified in mudpi_topic will be received. Important: receiver with particular index needs to be configured in mudpi_receivers section. |
|
|
(optional) size of the internal queue between MUDPI receiver and DDS publisher in number of topic samples. |
|
|
(optional) If specified, and value is different than 0 the topic is generated with the specified frequency. As there is no need for corresponding MUDPI topic in this case mudpi_topic and rcv can be omitted. Important: The frequency should be reasonable not to get system too busy. |
|
Defines optional NUMA policies for the DDS publisher thread. |
Example configuring three topics:
static:
# ...
topics:
topic_00:
name:
type: RtcString
value: "TestTopic00"
mudpi_topic:
type: RtcInt32
value: 0
queue_size:
type: RtcInt32
value: 100
rcv:
type: RtcInt32
value: 0
topic_01:
name:
type: RtcString
value: "TestTopic01"
mudpi_topic:
type: RtcInt32
value: 1
rcv:
type: RtcInt32
value: 0
topic_02:
name:
type: RtcString
value: "TestTopic02"
mudpi_topic:
type: RtcInt32
value: 1
rcv:
type: RtcInt32
value: 0
The configuration can be always inspected using Configuration Tool e.g. to check if a certain configuration datapoint exists.
Errors¶
During the initialization i.e in On:NotOperational:Initialising several errors can occur:
In case of problem to create MUDPI/UDP receiver an error message is logged, and component goes to Error state.
[18:22:57:234][ERROR][tel_repub_1] Component tel_repub_1 problem to create MUDP receiver part: bind: Cannot assign requested address
[18:22:57:235][ERROR][tel_repub_1] Nested exceptions:
1. ActivityInitialising: failed
2. Component tel_repub_1 problem to create MUDP receiver part
3. bind: Cannot assign requested address
In the above case the problem is binding UDP socket to a particular IP address.
If there is no MUDPI receiver that a particular topic wants to use as a source of MUDPI topic data we get an error:
[14:22:21:256][ERROR][tel_repub_1] Activity.Initialising: failed, exception: Component tel_repub_1 Receiver Index out of range: 6 not possible to assign MUDPI topic Id: 0
Source file: ../reusableComponents/telRepub/src/telRepubBusinessLogic.cpp
Line no.: 249
Function: CreateDdsPubs
This would work if we have (at least) 7 receivers defined in Configuration (indexed from 0 to 6), and created.
A warn message like:
[18:31:40:082][WARN ][tel_repub_1] mlockall failed: RUNNING WITHOUT LOCKED MEMORY
means that memory can not be locked to prevent memory swapping and thus reduce the performance. Memory can be locked just if Telemetry Republisher is run as root.
During the publishing i.e. in On:Operational:Running state error messages can be logged:
If a sample is lost for particular MUDPI topic (TopicId: 3) the Telemetry Republisher reports (and continues) as:
[10:08:21:297][ERROR][tel_repub_1] [3] sampleId: 374758624 (frameId: 1) expected sampleId: 374758626 skipped samples : 1
In case if the received sample Id is lower than expected. This can result when the source of MUDPI restarted (without reinitializing the Republisher). The WARN log message looks like:
[11:00:24:309][WARN ][tel_repub_1] [3] sampleId: 0 (frameId: 1) expected sampleId 124. The received sampleId is lower than expected. This can result when the source of MUDPI data is restarted.
… and when it resynchronises we get message like:
[10:15:36:032][WARN ][tel_repub_1] [3] sampleId: 374762680 frameId: 1 Synched again.
In case if there is not enough space (=free slot) in the internal queue / buffer to push there sample we get error message like:
[11:12:23:022][tel_repub_1] [9] ERROR: problem again to get free slot in repub buffer for *current* sampleId: 338
What could be a consequence of slow publishing of DDS topic, what might indicate a problem with network, and/or DDS QoS configuration, and/or slow DDS subscriber.
In case if there is no mapping between MUDPI topic Id and DDS topic such a message is logged.
TopicId: 1234 has no mapping to DDS topic
This message means that topic with Id 1234 has no corresponding mapping. It might be that there is no defined DDS topic in the Configuration that maps to 1234 i.e. has no mudpi_topic datapoint.
Timeout to send (write) to topic is reported with log:
[08:49:05:104][ERROR][tel_repub_1] [TestTopic2] SampleId: 374896008. DDS write timeout!
The internal telemetry republisher queue overrun and thus samples are lost for a particular topic (TestTopic1) at particular SampleId
[08:49:10:178][DEBUG][tel_repub_1] [TestTopic1] SampleId: 374906361 overrun, so far skipped 1705 samples. Last @: 374906360
In both cases the problem could be slow subscribers, or some other DDS problem.
Online Database (OLDB) Data Points¶
Component Metrics¶
Telemetry Republisher uses Component Metrics Service to write the following metrics to the OLDB:
For each topic is under path /<component-name>/metrics/counter/<topic-name>/
possible to find
following performance counter metrics:
OLDB Path |
Type |
Description |
---|---|---|
|
|
MUDPI samples receiving frequency estimate. |
|
|
MUDPI frames received. |
|
|
MUDPI frame errors. |
|
|
MUDPI sample errors. |
|
|
DDS samples published. |
Information about Telemetry Republisher threads is possible to find under:
for DDS publisher threads - one per thread:
/<component-name>/metrics/thread/dds_publishers/dpub<topic-name>/
(for simulated DDS publisher/<component-name>/metrics/thread/dds_publishers/spub<topic-name>/
)for UDP receiver threads:
/<component-name>/metrics/thread/udp_receivers/udp_rcv<upd-receiver-idx>/
Note
For details about Data Points please refer to Component Metrics OLDB Data Points.
As the OLDB data path needs to be lowercase are topic names (<topic-name>
) converted to
lower case.
Thread names (dpub<topic-name>
/ spub<topic-name>
) are always truncated to 16
characters.
Limitations and Known Issues¶
The Data Wrangling mechanism is not yet implemented.
The payload size of the agnostic topic is limited to 2560000 bytes.
The performance depends also on the machine where the republisher runs.
In some cases when a subscriber (Telemetry Subscriber or Generic DDS Subscriber) crashes or have some other problems Telemetry Republisher gets in trouble and generates messages like:
[14:59:14:347][ERROR][tel_repub_3] [LoopData07]SampleId: 14702879. DDS write timeout!
...
[14:59:14:568][ERROR][tel_repub_3] [7] sampleId: 14702901 frameId: 1. Again problem to get free slot in repub buffer for *current* sample.
[14:59:14:568][ERROR][tel_repub_3] Processing packet for topicId: 7, failed with error: Queue Overflow
In this case is safest to restart the Telemetry Republisher.