[ros-users] roscore respawn problem

Michael Krainin mkrainin at cs.washington.edu
Fri Jul 2 18:11:43 UTC 2010


You were right that the problem is not rosout dying. In my latest set
of trials, the failure occurred about 200 trials in without rosout
respawning this time. On the other hand, there's no sign of deadlock.

The scenario is that node1 sends a ReadyMessage to node2 telling it
that it is ready for the next image/depth pair. node2 sends an image
and depth map to node3 (depth_to_cloud) to constuct a PointCloud from
the depth map to pass back along to node1. The problem seems to be
that the chain of messages gets broken at depth_to_cloud. Below is a
sample of the debug output I'm seeing from depth_to_cloud. This output
continues indefinitely until I kill the depth_to_cloud process. Am I
right to believe that this output explains the behavior I am seeing?
If so, what can I do about it?

Also, I think this output may be responsible for the respawning of
rosout. rosout uses quite a lot of memory during these connection
failure messages. Could it perhaps be trying to store all of these in
memory in addition to in the log files and eventually running out of
memory?

Thanks,
Mike

[roscpp_internal] [2010-07-02 09:45:14,190] [thread 0x7f2e9ba8f910]:
[DEBUG] Connection to publisher [TCPROS connection to [0.0.0.0:25344
on socket 53]] to topic [/rgbd/depth] dropped
[roscpp_internal] [2010-07-02 09:45:14,314] [thread 0x7f2e9a28c910]:
[DEBUG] Retrying connection to [pr-seattle-1:57774] for topic
[/rgbd/image]
[roscpp_internal] [2010-07-02 09:45:14,315] [thread 0x7f2e9a28c910]:
[DEBUG] Resolved publisher host [pr-seattle-1] to [127.0.1.1]
[roscpp_internal] [2010-07-02 09:45:14,315] [thread 0x7f2e9a28c910]:
[DEBUG] Enabling TCP Keepalive on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,315] [thread 0x7f2e9a28c910]:
[DEBUG] Connect succeeded to [pr-seattle-1:57774] on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,315] [thread 0x7f2e9a28c910]:
[DEBUG] recv() failed with error [Connection refused]
[roscpp_internal] [2010-07-02 09:45:14,319] [thread 0x7f2e9ba8f910]:
[DEBUG] Socket [53] received 0/65536 bytes, closing
[roscpp_internal] [2010-07-02 09:45:14,319] [thread 0x7f2e9ba8f910]:
[DEBUG] TCP socket [53] closed
[roscpp_internal] [2010-07-02 09:45:14,319] [thread 0x7f2e9ba8f910]:
[DEBUG] Connection to publisher [TCPROS connection to [0.0.0.0:25344
on socket 53]] to topic [/rgbd/image] dropped
[roscpp_internal] [2010-07-02 09:45:14,350] [thread 0x7f2e9a28c910]:
[DEBUG] Retrying connection to [pr-seattle-1:57774] for topic
[/rgbd/depth]
[roscpp_internal] [2010-07-02 09:45:14,350] [thread 0x7f2e9a28c910]:
[DEBUG] Resolved publisher host [pr-seattle-1] to [127.0.1.1]
[roscpp_internal] [2010-07-02 09:45:14,350] [thread 0x7f2e9a28c910]:
[DEBUG] Enabling TCP Keepalive on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,350] [thread 0x7f2e9a28c910]:
[DEBUG] Connect succeeded to [pr-seattle-1:57774] on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,351] [thread 0x7f2e9a28c910]:
[DEBUG] recv() failed with error [Connection refused]
[roscpp_internal] [2010-07-02 09:45:14,351] [thread 0x7f2e9ba8f910]:
[DEBUG] Socket [53] received 0/65536 bytes, closing
[roscpp_internal] [2010-07-02 09:45:14,351] [thread 0x7f2e9ba8f910]:
[DEBUG] TCP socket [53] closed
[roscpp_internal] [2010-07-02 09:45:14,351] [thread 0x7f2e9ba8f910]:
[DEBUG] Connection to publisher [TCPROS connection to [0.0.0.0:25344
on socket 53]] to topic [/rgbd/depth] dropped
[roscpp_internal] [2010-07-02 09:45:14,365] [thread 0x7f2e9a28c910]:
[DEBUG] Retrying connection to [pr-seattle-1:33946] for topic
[/rgbd/depth]
[roscpp_internal] [2010-07-02 09:45:14,365] [thread 0x7f2e9a28c910]:
[DEBUG] Resolved publisher host [pr-seattle-1] to [127.0.1.1]
[roscpp_internal] [2010-07-02 09:45:14,365] [thread 0x7f2e9a28c910]:
[DEBUG] Enabling TCP Keepalive on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,365] [thread 0x7f2e9a28c910]:
[DEBUG] Connect succeeded to [pr-seattle-1:33946] on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,365] [thread 0x7f2e9a28c910]:
[DEBUG] recv() failed with error [Connection refused]
[roscpp_internal] [2010-07-02 09:45:14,366] [thread 0x7f2e9ba8f910]:
[DEBUG] Socket [53] received 0/65536 bytes, closing
[roscpp_internal] [2010-07-02 09:45:14,366] [thread 0x7f2e9ba8f910]:
[DEBUG] TCP socket [53] closed
[roscpp_internal] [2010-07-02 09:45:14,367] [thread 0x7f2e9ba8f910]:
[DEBUG] Connection to publisher [TCPROS connection to [0.0.0.0:25344
on socket 53]] to topic [/rgbd/depth] dropped
[roscpp_internal] [2010-07-02 09:45:14,463] [thread 0x7f2e9a28c910]:
[DEBUG] Retrying connection to [pr-seattle-1:33946] for topic
[/rgbd/image]
[roscpp_internal] [2010-07-02 09:45:14,463] [thread 0x7f2e9a28c910]:
[DEBUG] Resolved publisher host [pr-seattle-1] to [127.0.1.1]
[roscpp_internal] [2010-07-02 09:45:14,463] [thread 0x7f2e9a28c910]:
[DEBUG] Enabling TCP Keepalive on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,463] [thread 0x7f2e9a28c910]:
[DEBUG] Connect succeeded to [pr-seattle-1:33946] on socket [53]
[roscpp_internal] [2010-07-02 09:45:14,463] [thread 0x7f2e9a28c910]:
[DEBUG] recv() failed with error [Connection refused]
[roscpp_internal] [2010-07-02 09:45:14,464] [thread 0x7f2e9ba8f910]:
[DEBUG] Socket [53] received 0/65536 bytes, closing
[roscpp_internal] [2010-07-02 09:45:14,464] [thread 0x7f2e9ba8f910]:
[DEBUG] TCP socket [53] closed

> rosout dying shouldn't affect this unless it's somehow deadlocked those nodes... can you attach gdb to them and get traces from all their threads with "thread apply all bt"?
>
> Josh



More information about the ros-users mailing list