On Thu, Dec 2, 2010 at 10:45 AM, Tully Foote <tfoote@willowgarage.com> wrote:
> Hi Pat,
>
> On Thu, Dec 2, 2010 at 9:32 AM, Patrick Bouffard
> <bouffard@eecs.berkeley.edu> wrote:
>>
>> Hi Tully,
>>
>> On Wed, Dec 1, 2010 at 4:44 PM, Tully Foote <tfoote@willowgarage.com>
>> wrote:
>> > Hi Patrick,
>> >
>> > onInit() is required to return.  See
>> >
>> > http://ros-users.122217.n3.nabble.com/questions-about-nodelets-td1869563.html#a1871466
>> > for past discussions.
>>
>> Yes, I saw that thread.
>>
>> > It is true that if you want to block n threads you need to have more
>> > than n
>> > threads available.  We have not set any rules as to what you can set the
>> > number of threads, but the rule of >1 thread per worker is not
>> > necessarily
>> > what you want if you are trying to optimize # threads vs # cpu cores, if
>> > you
>> > thrash 10 threads across 2 cores you can significantly decrease
>> > performance.
>>
>> But how would it compare to having 10 processes instead of 10 threads,
>> and the extra overhead of using the TCP/IP stack to communicate
>> between them? If the performance with nodelets isn't improved over the
>> separate processes option then I guess I don't see why one would go
>> with nodelets at all.
>
> In terms of pure computation nodelets do not change anything.  However
> nodelets allow zero copy transport of data between components.  Depending on
> what type of data and the speed at which you are processing data the
> overhead of TCP/IP stack becomes prohibitively expensive, which is what
> nodelets are designed to help with.  The places where this starts to become
> noticable are when you start streaming video or full camera frame point
> clouds at 30 Hz.

I can't imagine it can hurt latency either, could it? I wonder how
much sooner the subscriber callback gets executed in the nodelet setup
vs. regular (same-machine) TCP/UDP connections. It seems like there
must be some constant savings there, regardless of the message size or
frequency. Maybe I'm being too naive about everything that's going on
behind the scenes though?

>> > We thought about having separate thread pools for each nodelet
>> > inside the manager but that too starts to get unreasonably complicated,
>> > and
>> > adds overhead, plus potentially running out of threads. There ends up
>> > not
>> > being any solution which is completely general and addresses all use
>> > cases.
>>
>> That's fair, but I suspect that the use case that I describe is one of
>> the more common ones, and if I'm interpreting what I've seen on the
>> lists correctly, isn't that the direction that ROS is going in general
>> anyway (i.e. where the 'default' building block is a nodelet rather
>> than a node). At any rate it would be good to see a more extensive
>> example/tutorial for nodelets, that at least addresses the threading
>> issue. It may not cover all the use cases but it would still be an
>> improvement IMHO.
>
> Indeed we do need to improve documentation.  I've opened a ticket to remind
> myself https://code.ros.org/trac/ros-pkg/ticket/4602

Cool. One way to go would be a tutorial on how to transform a node
into a nodelet. It seems to be a fairly simple recipe, modulo the
blocking and threading stuff:
- add the necessary #includes
- get rid of int main()
- subclass nodelet::Nodelet
- move code from constructor to onInit()
- add the PLUGINLIB_DECLARE_CLASS macro
- add the <nodelet> item in the <export> part of the package manifest
- create the .xml file to define the nodelet as a plugin
- make the necessary changes to CMakeLists.txt (comment out a
rosbuild_add_executable, add a rosbuild_add_library)

> In general for high
> throughput environments nodelets are likely going to start to dominate due
> to the higher transport efficiency.  But the simplicity of having a separate
> process will mean that for low bandwidth things standalone nodes will likely
> persist.

But, does a nodelet run as standalone basically run just as well as an
ordinary node? If the overhead of the standalone nodelet is zero in
terms of speed and small in terms of memory footprint (are those valid
assumptions?), then creating a node from the beginning as a nodelet
seems like the way to go as you can easily combine it with a nodelet
manager down the road, without a recompile.

>> > Our decision was that if you're using the multithreaded interface you
>> > know
>> > that you have to think about the number of threads available.
>>
>> The nodelet design does seem to encourage on-the-fly nodelet loading
>> and unloading; in my case I might want to take advantage of that as
>> well. Is there a mechanism to change the number of worker threads as
>> nodelets are loaded/unloaded?
>
> There's not a mechanism to do so  I've ticketed it at
> https://code.ros.org/trac/ros-pkg/ticket/4603

Great. Ideally one would be able to specify a number of worker threads
to add in the nodelet loader arguments. Or if it makes more sense as
an argument to the Nodelet constructor or something like that that's
equally good, in fact maybe better as it's a little less in the
average user's face.

Cheers,
Pat