[ros-users] [Discourse.ros.org] [Next Generation ROS] What should be the correct behavier of a "reliable" transmission? (throughput problem)

EwingKang ros.discourse at gmail.com
Tue Nov 13 09:59:12 UTC 2018



Hello, everyone.

We've been testing the ROS2 and DDS lately and came across the following issue.
First of all, I'm using ROS2 ardent with OpenSplice DDS. But I believe the behavior can also be apply to newer version and other DDS vendors. Please correct me if I'm wrong.
Test source: https://github.com/EwingKang/adlink_ros2_qos_test
Related article: https://index.ros.org/doc/ros2/About-Quality-of-Service-Settings/

TL;DR: We cannot guarantee the arrival of topics even with "reliable" QoS 

Full version:
We I'm running the throughput testing tools (link above), I've discovered that the publishing rate is significantly higher. In fact, I can publish at rate of over 85000 Hz with 4KB payload setup (that is, 332MB/s). While the subscribing node only got average of 1000 Hz, and the rate is very unstable. Originally I thought it was the problem of the QoS setting, but it turns out the "reliable" QoS is already the default.
After some weeks long investigation, we've finally discovered the root cause. To our understanding, because the default history QoS is set to KEEP_LAST, the DDS system will discards the data that's already in the buffer, which is not yet taken by the rmw layer, and replace it with the newest "LAST" data sample.

However, to achieve throughput measurement, I want every bit of my data to arrive at the destination.
To achieve that, I modify the node with history QoS=KEEP_ALL, plus 
adding modification to the [rmw_publisher.cpp](https://github.com/ros2/rmw_opensplice/blob/master/rmw_opensplice_cpp/src/rmw_publisher.cpp), ([line:153](https://github.com/ros2/rmw_opensplice/blob/master/rmw_opensplice_cpp/src/rmw_publisher.cpp#L153)) and [rmw_subscription.cpp](https://github.com/ros2/rmw_opensplice/blob/master/rmw_opensplice_cpp/src/rmw_subscription.cpp), ([line:155](https://github.com/ros2/rmw_opensplice/blob/master/rmw_opensplice_cpp/src/rmw_subscription.cpp#L155)) with 
```
// necessary so the memory won't burst within seconds
datawriter_qos.resource_limits.max_samples=100;
datareader_qos.resource_limits.max_samples=100;
```
respectively.
So when the reader buffer is full, the writer will be blocked by the reader, and thus every sample will arrive at the subscribing side. About the detail of this writer blocking behavior, I've been studying the Vortex OpenSplice C++ Reference Guide( http://www.prismtech.com/vortex/resources/documentation), p.347. 
With this setting, I can get >15,500 Hz, synchronized publishing/subscribing, and that's more than 60MB/s

So my question is, how should I view this result? Yes, the DDS did "reliably" sent the data over to the reading entity (at least that's what they claim). It's just that we can't poll the datareader fast enough so that the "KEEP_LAST" setting wipe out the existing data.
However, from the user stand point, this might seems a little weird: "reliable" QoS doesn't mean 100% data reception. Should we somehow change the ROS2 definition and behavior? Personally, I would agree with default to KEEP_ALL with a default DDS buffer size, say, 100. So when the internal buffer is full, the writer function will block to make sure no data is dropped. And this should be defined as "reliable" from the perspective of ROS2.

Appreciate any opinion, and please correct me if I'm wrong at any perspective.
Thanks.





---
[Visit Topic](https://discourse.ros.org/t/what-should-be-the-correct-behavier-of-a-reliable-transmission-throughput-problem/6826/1) or reply to this email to respond.




More information about the ros-users mailing list