In general, nodelets don't guarantee that your CPU usage will go down. While it is often the case, if the message passing itself is already a small part of your usage, nodelets won't help that much. They will help latency of the passing though. > * What is a good methodology for measuring the overhead of these messages? > > * Is there some way to set different "command names" for the > different nodelets so top or ps can identify which is which? > > * How can I run "profile" on a nodelet or a collection of nodelets? > You profile nodelets the same way you profile any C++ application: with the profiler of your choice. I tend to use google perftools and/or cachegrind. There's also gprof and sysprof, and probably many more. > > * Are any enhancements planned for rxgraph to report nodelet > connections clearly? > > If not, I'll open an enhancement ticket. The rostopic and rosnode > commands seem to report things correctly, so the right information > must be available somewhere. > That information doesn't exist anywhere at the moment. As far as the ROS graph is concerned, it's just a single node. > > I am guessing that memory allocation for large, high-bandwidth > messages could be a significant factor. Before, I pre-allocated the > messages to avoid memory overhead on every cycle. (But, I suppose that > just pushed the problem down into the publish() implementation.) Now, > I have to allocate a new message and shared_ptr every time. > Don't guess, profile. Allocation could be a bottleneck, but it's more likely that filling in the data (or std::vector's 0-filling of primitive types on resize) is the problem. > > * Should I use the ros_realtime/allocators package in place of > standard C++ new? Are there examples of this I can study? > The allocators package currently only has an aligned allocator, so that won't help. What might help is a growable (and shrinkable) version of lockfree's ObjectPool. You could probably try using those if you're OK having a fixed-size pool of messages. Josh