[ros-users] Race condition in actionlib

Vijay Pradeep vpradeep at willowgarage.com
Fri Apr 1 03:00:55 UTC 2011


Hi Ryan,

I'm sorry that you're having trouble getting actionlib to work.  What
version of the common stack do you have?

Note that it is always possible for some of the ROS messages to be dropped.
If the result message doesn't make it to the action client, then
waitForResult is going to block forever.  I'd suggest adding a timeout on
waitForServer, and then preempting the goal if you end up waiting too long.

> I made each callback print when
> it was called, and I found that quite often, the program would block
> forever when I called waitForServer.
I'm guessing you meant "the program would block forever when I called *
waitForResult*"

I'm using common-1.4.3, and after running your example for ~10 minutes, it
doesn't seem to freeze for me.  I can rerun this test with the exact version
of common that you're using.  Is there anything you can do to make this
minimal example freeze more often? Are there some strategic sleeps you could
add to make it mimic your original app more closely?

If you can get the chores app to freeze, you could try attaching gdb to the
process and seeing where all the threads are (using the "info threads"
command inside gdb).  A bunch of them will be stuck on
pthread_cond_timedwait calls, but I'd be curious if there's a thread stuck
on a lock inside of actionlib.  That would be indicative of a race condition
in actionlib.

Vijay

On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller <rmiller4589 at gmail.com> wrote:

> Because of a timing issue I can't quite pin down, actionlib
> occasionally appears to exhibit a race condition. I found the problem
> after adding callbacks to my client. I made each callback print when
> it was called, and I found that quite often, the program would block
> forever when I called waitForServer. The problem was that the client's
> active callback was called but the server's execute method was never
> called.
>
> I have reduced the problem into a simple ROS package with a single
> executable that I have attached. After running the node for a while, I
> finally noticed the same problem. It's last output before blocking
> was:
>
> --- snip ---
> Current State: ABORTED
> Aborting.
> Active.
> Done.
> Current State: ABORTED
> Aborting.
> Active.
> --- snip ---
>
> In my actual code, the condition happens extremely frequently, but I
> found I could mitigate the problem by sleeping for one millisecond
> before returning from the server's execute method. (I should warn you
> that in the attached example, the problem almost never occurs).
>
> Is this likely a bug, or might I doing something wrong? Any
> suggestions would be appreciated. Thanks for the help.
>
> -Ryan
>
> _______________________________________________
> ros-users mailing list
> ros-users at code.ros.org
> https://code.ros.org/mailman/listinfo/ros-users
>
>


-- 
Vijay Pradeep
Systems Engineer
Willow Garage, Inc.
 <tfoote at willowgarage.com>vpradeep at willowgarage.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ros.org/pipermail/ros-users/attachments/20110331/9a216ca6/attachment-0002.html>


More information about the ros-users mailing list