In general, nodelets don't guarantee that your CPU usage will go down.
 While it is often the case, if the message passing itself is already a
small part of your usage, nodelets won't help that much.  They will help
latency of the passing though.


>  * What is a good methodology for measuring the overhead of these messages?
>
>  * Is there some way to set different "command names" for the
> different nodelets so top or ps can identify which is which?
>
>  * How can I run "profile" on a nodelet or a collection of nodelets?
>

You profile nodelets the same way you profile any C++ application: with the
profiler of your choice.  I tend to use google perftools and/or cachegrind.
 There's also gprof and sysprof, and probably many more.


>
>  * Are any enhancements planned for rxgraph to report nodelet
> connections clearly?
>
> If not, I'll open an enhancement ticket. The rostopic and rosnode
> commands seem to report things correctly, so the right information
> must be available somewhere.
>

That information doesn't exist anywhere at the moment.  As far as the ROS
graph is concerned, it's just a single node.


>
> I am guessing that memory allocation for large, high-bandwidth
> messages could be a significant factor. Before, I pre-allocated the
> messages to avoid memory overhead on every cycle. (But, I suppose that
> just pushed the problem down into the publish() implementation.) Now,
> I have to allocate a new message and shared_ptr every time.
>

Don't guess, profile.  Allocation could be a bottleneck, but it's more
likely that filling in the data (or std::vector's 0-filling of primitive
types on resize) is the problem.


>
>  * Should I use the ros_realtime/allocators package in place of
> standard C++ new? Are there examples of this I can study?
>

The allocators package currently only has an aligned allocator, so that
won't help.  What might help is a growable (and shrinkable) version of
lockfree's ObjectPool.  You could probably try using those if you're OK
having a fixed-size pool of messages.

Josh