In general, nodelets should not be abused. If the performance of nodes (copy/(de)serialization) is really what's
dragging your processing graph down, then sure, nodelets are offering a viable alternative. However, this should really
be profiled first for each application.

This isn't entirely true.  Adding processes also means adding threads, which increases context switches/etc.  Having everything in a single process means more control over everything.  There's a reason game consoles run a single process, and that process generally has # threads == # cores.  I think in the "ideal" world everything that can be a nodelet should be, and a fully-debugged application would be a single process running many nodelets.  In practice there are likely exceptions (like, say, GUI applications), but overall I think it's a reasonable goal.
