[ros-users] How can I be robust to a crashed roscore?

Fri Jun 24 21:06:23 UTC 2011

Hi All,

This didn't feel crisp enough for answers.ros.org, but if people feel that
it is the right forum for email, I can transfer my question to that site.

I'm thinking of using ROS for my non-ROS robot's basestation, and I'm trying
to figure out if I can satisfy my robustness requirements with a ROS based
system.

The basestation will most likely consist of at least 3 computers: A primary
communications & control (C&C) computer, a backup C&C computer and at least
one non-critical visualization computer.  The idea is for the visualization
machines to receive ROS messages from the primary and backup C&C computers,
and if the primary C&C computer crashes, the backup C&C computer can take
over all communications and control.  Once the primary C&C machine
reboots/recovers from the crash, it can then retake control of the robot.

Now comes the hard question:  Where should I run my roscore, and what
happens if it crashes?  Assuming that the roscore is running on the primary
C&C machine and this machine crashes, I believe everything else should still
run just fine (assuming we're not using the parameter server or negotiating
service connections at runtime).  And, is there any way that I can restart
my roscore and C&C nodes on the primary machine after the crash?

Maybe this involves patching the ROS Master to store the state of it's
connections to disk.  If so, any suggestions as to where to start looking in
the ROS Master code would be appreciated.

Thanks,
Vijay Pradeep
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.osuosl.org/pipermail/ros-users/attachments/20110624/543ad72c/attachment-0001.html>