[ros-users] Race condition in actionlib

Ryan Miller rmiller4589 at gmail.com
Fri Apr 1 14:51:20 UTC 2011


Hey Vijay, thanks for helping (and sorry for the double post; I
accidentally forgot to reply all, and I figure some people on the list
may be interested in the email). Yes, I did mean waitForResult, and I
am also using common-1.4.3 (diamondback too, if that's relevant).
Since you're having trouble duplicating the issue, I went back to
create an example closer to what I'm actually doing, and I managed to
get it to crash just as often. I've attached the project.

If it doesn't crash after about 5 seconds, restart it. It seems if it
continues for more than 5 seconds, it is more likely to continue
running for much longer, but it usually halts within a second.

Thanks a ton.

-Ryan

On Thu, Mar 31, 2011 at 11:00 PM, Vijay Pradeep
<vpradeep at willowgarage.com> wrote:
> Hi Ryan,
>
> I'm sorry that you're having trouble getting actionlib to work.  What
> version of the common stack do you have?
>
> Note that it is always possible for some of the ROS messages to be dropped.
> If the result message doesn't make it to the action client, then
> waitForResult is going to block forever.  I'd suggest adding a timeout on
> waitForServer, and then preempting the goal if you end up waiting too long.
>
>> I made each callback print when
>> it was called, and I found that quite often, the program would block
>> forever when I called waitForServer.
> I'm guessing you meant "the program would block forever when I called
> waitForResult"
>
> I'm using common-1.4.3, and after running your example for ~10 minutes, it
> doesn't seem to freeze for me.  I can rerun this test with the exact version
> of common that you're using.  Is there anything you can do to make this
> minimal example freeze more often? Are there some strategic sleeps you could
> add to make it mimic your original app more closely?
>
> If you can get the chores app to freeze, you could try attaching gdb to the
> process and seeing where all the threads are (using the "info threads"
> command inside gdb).  A bunch of them will be stuck on
> pthread_cond_timedwait calls, but I'd be curious if there's a thread stuck
> on a lock inside of actionlib.  That would be indicative of a race condition
> in actionlib.
>
> Vijay
>
> On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller <rmiller4589 at gmail.com> wrote:
>>
>> Because of a timing issue I can't quite pin down, actionlib
>> occasionally appears to exhibit a race condition. I found the problem
>> after adding callbacks to my client. I made each callback print when
>> it was called, and I found that quite often, the program would block
>> forever when I called waitForServer. The problem was that the client's
>> active callback was called but the server's execute method was never
>> called.
>>
>> I have reduced the problem into a simple ROS package with a single
>> executable that I have attached. After running the node for a while, I
>> finally noticed the same problem. It's last output before blocking
>> was:
>>
>> --- snip ---
>> Current State: ABORTED
>> Aborting.
>> Active.
>> Done.
>> Current State: ABORTED
>> Aborting.
>> Active.
>> --- snip ---
>>
>> In my actual code, the condition happens extremely frequently, but I
>> found I could mitigate the problem by sleeping for one millisecond
>> before returning from the server's execute method. (I should warn you
>> that in the attached example, the problem almost never occurs).
>>
>> Is this likely a bug, or might I doing something wrong? Any
>> suggestions would be appreciated. Thanks for the help.
>>
>> -Ryan
>>
>> _______________________________________________
>> ros-users mailing list
>> ros-users at code.ros.org
>> https://code.ros.org/mailman/listinfo/ros-users
>>
>
>
>
> --
> Vijay Pradeep
> Systems Engineer
> Willow Garage, Inc.
> vpradeep at willowgarage.com
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: chores.tar.gz
Type: application/x-gzip
Size: 24321 bytes
Desc: not available
URL: <http://lists.ros.org/pipermail/ros-users/attachments/20110401/da017a5e/attachment-0004.bin>


More information about the ros-users mailing list