[ros-users] ROS 2.0 Strategy review

Michael Haberler mail17 at mah.priv.at
Tue Sep 29 11:09:59 UTC 2015


> Am 28.09.2015 um 23:08 schrieb Brian Gerkey via ros-users <ros-users at lists.ros.org>:
> 
> To further emphasize the difference between DDS and things like ZeroMQ (zmq), and to motivate our decision to go with the former (see also http://design.ros2.org/articles/ros_with_zeromq.html):
> 
> While you might consider zmq a "known thing," and indeed it is widely used in a number of distributed systems, it's insufficient to say, "use zmq."  zmq specifies only the transport part of the system (how sockets are handled), saying nothing about discovery (how participants find each other) or serialization (how data is encoded on the wire).  Adding those features isn't impossible, or even necessarily that difficult.  You could, for example, combine zmq for transport with protobuf for serialization and UDP multicast for discovery [*].  

Or you could combine zeroMQ with Multicast DNS for service discovery. 

We use that approach with machinekit.io for over two years and it works rock solid. Other that UDP broadcast it can be made to work beyond collision domains, which IMO is a showstopper - does not scale for our purposes.

And - disregarding a 275-line C shim to glue zeroMQ and mDNS together, the rest of the stack is stock debian packages, and protocols which have an RFC. Done. Reuse whenever you can.



> zmq doesn't support unreliable transport, so you'd also need to add your own solution there (e.g., managing UDP sockets manually).  Still, it's all doable.

I still have to find a hard requirement for unreliable transport in this problem domain _on the middleware side of things_, but see below.

> 
> The problem is not in the effort required to build that system.  It's that you then have to define, document, and defend your custom combination of techniques and protocols.  When it comes time to convince someone to rely on it, you have to make the argument that your bespoke system is reliable, robust, free of nasty corner cases, and ready to be used in serious domains, whether that's a classroom full of undergrads, a government-funded R&D program, or a commercial product.
> 
> That argument can be made and won; after all, ROS today is a custom system combining various protocols and techniques (TCP, UDP, XML/RPC, custom serialization, discovery via a central master, etc.), and yet it is widely used and there are many ROS-based products and services in the marketplace.  But there are many, many more current and future robotics applications where ROS will *not* be chosen, in large part because of its bespoke nature.
> 
> At OSRF, we looked carefully at this issue, considered a wide variety of options, and came to the conclusion that while we could build on things like zmq, we would really be defining and building another custom middleware.  And things would just get more custom as we want to add features like quality-of-service (QoS).

To be candid: this QoS discussion focuses on the wrong locus. It lacks rigor in separation of concerns and finding appropriate solutions for each one.

As for a locus of guarantees, there are really two different domains:

(1) a 'hard RT' domain, where tasks need to finish within a certain time window. If they do not, a missed sample, or an underrun occurs. Example: trajectory playout.
(2) the 'feeder' domain, whose primary purpose in the 'towards RT' direction is to keep queues stuffed such that an underrun cannot happen. Example: feeding segments to a trajectory planner.

Middleware is localized in (2), not (1), as it mediates between hard RT and no-so-hard observers and generators.

For (1) to happen, it is not sufficient to  handwave with 'some QoS layer' and suddenly things start to work great - fact is: _the whole stack_ needs to support this very attribute, which includes the OS and networking. So any vehicle in locus (1) is a full stack affair, not a layer attribute. That is why we need RT kernels and time slotted Ethernet protocols: it is about deadline guarantees. 

saying 'we have a QoS layer' plus 'no, there is no supporting stack in place to actually guarantee deadlines of same' simply does not hold water.


As for (2): the 'we need QoS' argument is essentially the same as has been put forward against voice and video transmission over packet-switched networks some 20+ years ago, and we did see where that went. 

There is a simple answer to this, and it is applicable to the ROS middleware domain just alike: it is called "overprovisioning": If you have a stack which does 100k messages a second, and the consumption rate is a few thousand, you're safe to forget about the QoS thing. Work great for voice, for video, and will work great for ROS. Just ask the phone companies, Youtube and Netflix.


>  Do you want to support store-and-forward of messages?

That 'requirement' still has to be shown of actual value. It would help to come forward with a use case where this really is needed.

Fact is: at some point you _will_ have to drop messages no matter what, or you run out of space. So this is a question of 'when', not 'if'. And: it adds horrendous complexity to the stack - just look where AMPQ went. Brokers, single points of failures, the works.

I'd be very interested to see how - if at all - that feature will be handled on embedded devices. Suspicion: not at all. And by now you're into 'feature sets', 'supported profiles', the whole yadayada of compatibility problems just because the scope was set too wide to start with.


>  Priority among messages?  

IMO this is looking at the wrong locus again. Implementing message priority is consumer decision, and can trivially be done with multiple queues and handling those in priority order.

There is no scenario I can think of where message priority has to be an attribute of the _middleware_ per se. Requiring this adds complexity without adding value. Got an actual use case where message priority is actually easier on the using code than multiple queues?

I also note the impact of supporting message priority on middleware complexity is significant. Flow control very likely follows suit. And that buys me what?

> Limited-duration delivery retry?  Well, you have to invent your own rules for handling those cases, in addition to writing (and testing and debugging) the code to implement them.
> 
> By contrast, DDS (and its underlying wire protocol, RTPS) is an open, end-to-end middleware specification that is relied upon for serious applications in labs and industries around the world.  And the specification includes extensive QoS settings that give you essentially any kind of behavior between UDP fire-and-forget and TCP retry-forever.

by stating 'extensive QoS settings  ..' you just proved my point from above. Without full stack support the QoS argument is bogus when it comes to (1). For (2) it is not needed.

here is an example:

In the machinekit HAL environment we can run vanilla threads and RT-hardened threads in parallel.

Modulo kernel, jitter on the RT-hardened threads is on the order of 15-50uS. Good enough for (1).
Using vanilla posix threads for the same job, jitter it goes up to 3mS. Those threads will be the ones dealing with middleware handling. Good enough for (2).

However the servo cycle is usally <= 1mS. 

Please indicate where twisting a knob in a library will get rid of that 3mS spike. (hint: not going to happen). 

Btw, you do not have to get rid of that spike in the first place if you properly separate concerns: "applying QoS" here is really just solving the wrong problem.


>  That's exactly what we need for robotics applications.

No, you need solutions for (1) and (2), but they do not need to have the same attributes and stack support. This is drawing the wrong conclusion from what 'realtime' means to each of the domains: it is different things.

The way I approach such issues is: find the minimal set of requirements for each domain, then find an open source project which fulfills those, and which has a rock-solid community which actually takes care of keeping that blob alive (extra cred if money is on the line, like HFT or massive Google-internal use). This is how support outsourcing works by choosing the right components. It works superbly with protobuf, zeroMQ and Multicast DNS.


>  And there are multiple well-tested implementations of the specification.
> 
> DDS is by no means perfect, but we believe that it's the best foundation for ROS 2.

Well I still have to find those vibrant open source communities flocking around DDS, and how the 'support outsourcing' is going to pan out here.


I think looking at a spec sheet alone for such a decision runs the danger of disregarding a key aspect of open source social dynamics: you might miss the buy-in of a key constituency and - given the lack of interoperability - in essence bifurcate the project, with a good chance of sinking the whole ship while at it:

- a open-source-only variant which remains with ROS1 as DDS adoption is unlikely to happen
- a ROS2/DDS flavor used by folks which face institutional pressures to adopt DDS

If the hope for this project is 'Open Source Robotics', rather than 'a DDS sales channel', I think the decision is ill advised and flawed technically as well as socially. Changing middleware without a migration path is risky enough. While at it, going to a place where you stand a strong chance of loosing users along the way is extra divisive.


I strongly recommend to revisit the ROS2/DDS decision:

- put ROS-on-DDS on hold as just one option of several possibilities
- proceed in parallel exploring alternatives based on actual use cases, actual measurements, and a more rigorous separation of concerns than shown so far
- evaluate the cost of each one along several dimensions: fulfilling actual, proven-to-must-have requirements, integration effort, migration path, considering the upside of existing communities and their support contributions, and keep the factors for community buy-in in eyesight.

- Michael


> 
> brian.
> 
> [*] We're intimately familiar with this approach, as we've done exactly that in the ignition-transport library, which is being used for communication within Gazebo: http://ignitionrobotics.org/libraries/transport.  It works great for its intended use case, but it has the drawbacks inherent to any custom middleware solution, described above.
> 
> 
> On Sat, Sep 26, 2015 at 6:02 PM, BiggsGeoffrey via ros-users <ros-users at lists.ros.org> wrote:
> I always feel a little sad when someone paints the entire catalogue of OMG specifications with complaints against a technology from several decades ago. To anyone who has heard about how bad CORBA is, I encourage them to try out a recent version (in particular, the C++11 API). It’s not bad and it’s not slow. It’s good at what it was designed for, which is a distributed object system.
> 
> Having said that, DDS is much better for our needs, because it has a different focus.
> 
> Regarding the patents issue; the OMG specifications are open and freely available. There may be patents covering parts of them, but this is the same risk that any software, open or closed, faces in many parts of the world these days. I would be very surprised if someone couldn’t find a part of the zmq source that infringes some unknown, ambiguous software patent.
> 
> As for zmg being a “known thing”, DDS is known, too. It may not be as well-known by those who prefer open source, but for the many, many companies and institutes who use it, it is known and trusted. If it wasn’t meeting their needs, they would be instead pouring those expensive licensing fees into improving zmq or developing their own in-house technology. They don’t keep their experience a secret, either. I have spoken with people from industry who have enjoyed complaining about the problems they’ve had with DDS – and then gone on to mention that their chosen implementer fixed those problems promptly because they wanted to keep getting paid. Sure, in the open-source world we can fix problems we find ourselves (I hope you don’t need safety certification), but this doesn’t mean everyone keeps their experience a secret.
> 
> The OSRF made a sensible choice to chose an open, standardised protocol with many implementations, both commercial and open-source, available. Anyone who wants to implement the RTPS protocol can do so, while anyone who doesn’t has the choice between open-source implementations (fun fact: at least one of them uses CORBA internally) and commercial implementations. (Not every issue about this has been resolved yet; there are still concerns that Thibault has pointed out with different nodes using different implementations.)
> 
> Good wire protocols are hard, and leaving it up to the experts gives the OSRF more developer time for the robotics things that go on top.
> 
> If you still don’t like or want to use DDS, well, ROS is open source! The OSRF has abstracted DDS behind a messaging API. You can implement a version of the API that uses IPoAC if you like! I would travel to ROSCon just to see that talk.
> 
> Geoff
> 
> 
> From: ros-users <ros-users-bounces at lists.ros.org> on behalf of Linas Vepstas via ros-users <ros-users at lists.ros.org>
> Reply-To: "linasvepstas at gmail.com" <linasvepstas at gmail.com>, User discussions <ros-users at lists.ros.org>
> Date: Sunday, September 27, 2015 at 02:08
> To: Aaron Schiffman <aarondsc at yahoo.com>, User discussions <ros-users at lists.ros.org>
> Subject: Re: [ros-users] ROS 2.0 Strategy review
> 
> Hi Aaron,
> 
> Can you clarify? Do you mean "IP of DDS", or IP of something else?  Are DDS algos patented?  There used to be talk of zero-mq-based ROS, but that seems to have disappeared from the table. 
> 
> My knee-jerk reaction is to be a bit suspicious of OMG-created technologies; they sound great at first, but are often over-wrought (e.g. corba).  I'd never even heard a whisper about DDS before yesterday; I'm nervous about adopting a technology that has not yet gained any acceptance at all in the open-source community.  So, for example, whatever one's opinion of zmq might be, positive or negative, its a "known thing"; many people have used it, there is developer experience, a track record.  There's no such track record for DDS -- the proprietary world seems to be the primary consumer of the thing, and their experience with it is secret, and not shared. We don't actually know how well it works (although I admit it sounds really great, based on the wikipedia article).
> 
> Anyway: please clarify: IP of what? And who "owns" that IP, who has rights to it?
> 
> -- Linas.
> 
> On Sat, Sep 26, 2015 at 11:03 AM, Aaron Schiffman via ros-users <ros-users at lists.ros.org> wrote:
> This doesn't feel right sharing my some of my thoughts I held back since Roscon 2014 about ROS 2.0, but here goes:
> The ip ownership and patent of the underlying ROS 2.0 distributed udp protocol are of concern as a third party protocol implementor. Yes, ROS.org or OSRF may have explicit legal permission to use said protocol, but it is not truly an open/free platform when the public is at the mercy of the IP owner, unless the entire platform is contractually opened up and made free. 
> 
> As a ROS protocol implementor Ive personally held off on implementing ROS 2.0 protocols, while waiting to see how it pans out. I am still of the belief that the UDPROS protocol with enhancements can do everything the new protocol can do, but better. That really doesn't matter now though. 
> 
> I appreciate that osrf took the focus from protocols and put their limited resources to work on tools. In an r&d organization that would be the path I would expect to be the most rewarding, except that I've grown to appreciate think of ROS as a rock that the open robotics universe revolves around. Like I think of Linux, as an open operating system, except that ROS is more an open set of design frameworks like tcpip is a standard protocol with many implementors.
> 
> Wish I could be there in Hamburg with you all! The birds of a feather meetings, and the couple hours socializing with drinks were the most influential on my development direction this past year. Watching roscon on YouTube just will not be the same. 
> 
> I am so stoked about this upcoming year in Robotics I can hardly contain myself (probably a good reason for me to not be there in October:)
> 
> God bless Roscon 2015 in Hamburg!
> 
> Aaron
> Sent from Yahoo Mail on Android
> 
> 
> _______________________________________________
> ros-users mailing list
> ros-users at lists.ros.org
> http://lists.ros.org/mailman/listinfo/ros-users
> 
> 
> 
> _______________________________________________
> ros-users mailing list
> ros-users at lists.ros.org
> http://lists.ros.org/mailman/listinfo/ros-users
> 
> 
> _______________________________________________
> ros-users mailing list
> ros-users at lists.ros.org
> http://lists.ros.org/mailman/listinfo/ros-users



More information about the ros-users mailing list