[ros-users] Strange segfaults showing up in /var/log/messages

Mon Jan 17 21:06:26 UTC 2011

I'm trying to track down some weird behaviour and in doing so I
noticed messages like these in my system log:

These 2 messages show up after I run roslaunch:
Jan 17 12:13:45 pelican1 kernel: [ 9892.379940] nodelet[25142]:
segfault at 80000004 ip 00a56a04 sp bf889c78 error 4 in
libc-2.11.1.so[9e9000+153000]
Jan 17 12:13:45 pelican1 kernel: [ 9892.695073] asctec_adapter[25231]:
segfault at 6019 ip 0039c82a sp bfd4f36c error 4 in
libapr-1.so.0.3.8[385000+29000]

Then when I kill roslaunch with Ctrl+C, these 2 show up:
Jan 17 12:34:37 pelican1 kernel: [11144.236603] nodelet[26935]:
segfault at fffffff4 ip 00208cf4 sp b57c70f0 error 4 in
libros.so[110000+14f000]
Jan 17 12:34:38 pelican1 kernel: [11145.062698] asctec_adapter[26871]:
segfault at 6019 ip 0065282a sp bfdd952c error 4 in
libapr-1.so.0.3.8[63b000+29000]

.. if I go back in my messages files I find that these have been
occurring for some time: http://pastebin.com/JJNQtbSa

Stepping back from this particular clue, here is my larger problem:
For some reason my robot system (i.e. the complex of nodes spawned by
roslaunch) doesn't work every time I launch it. It seems that whenever
I first launch it after starting a new roscore, it does not work. By
'not work' I mean that, it appears that there is some kind of
deadlock, but no useful messages that I have been able to find in any
of the logs. I've tried things like increasing the number of worker
threads for my nodelet managers to no avail. What is strange is that
if I then kill roslaunch (Ctrl+C) and then launch again (with the same
roscore still running), it sometimes works.

There are two systems involved, the robot, which is a Atom running
Ubuntu Lucid 32-bit and with ROS unstable installed from debs, and the
'ground station' (laptop), a core i7 based machine running Ubuntu
Maverick 64-bit with ROS unstable also installed from debs. This is
the system where most of the nodes are run and also where roscore
runs.

One other thing--I only have this problem when I am using the Kinect,
but not otherwise. One difference is that when the Kinect is used,
there are two nodelet managers running, vs. just one otherwise. I do
notice that even without the kinect I still do get some segfault
messages in the system log upon shutdown:

Jan 17 13:03:44 pelican1 kernel: [12891.741308] nodelet[9018]:
segfault at bb7e0200 ip 007a88aa sp b61968ec error 4 in
libc-2.11.1.so[73d000+153000]
Jan 17 13:03:45 pelican1 kernel: [12892.129662] asctec_adapter[8665]:
segfault at 6019 ip 0059882a sp bfaa380c error 4 in
libapr-1.so.0.3.8[581000+29000]

.. so maybe they're a red herring, but it would be nice to understand
why they are there regardless.

Any thoughts on where to try and look to solve this? I'm very
perplexed by the segfault messages in the system log because I'd
always thought that segfaults showed up in the console every time and
these seem to not show up on the console nor in the logs.

Thanks,
Pat