Re: [ros-users] Race condition in actionlib

Top Page
Attachments:
Message as email
+ (text/plain)
+ chores.tar.gz (application/x-gzip)
Delete this message
Reply to this message
Author: User discussions
Date:  
To: Vijay Pradeep
CC: User discussions
Subject: Re: [ros-users] Race condition in actionlib
Hey Vijay, thanks for helping (and sorry for the double post; I
accidentally forgot to reply all, and I figure some people on the list
may be interested in the email). Yes, I did mean waitForResult, and I
am also using common-1.4.3 (diamondback too, if that's relevant).
Since you're having trouble duplicating the issue, I went back to
create an example closer to what I'm actually doing, and I managed to
get it to crash just as often. I've attached the project.

If it doesn't crash after about 5 seconds, restart it. It seems if it
continues for more than 5 seconds, it is more likely to continue
running for much longer, but it usually halts within a second.

Thanks a ton.

-Ryan

On Thu, Mar 31, 2011 at 11:00 PM, Vijay Pradeep
<> wrote:
> Hi Ryan,
>
> I'm sorry that you're having trouble getting actionlib to work.  What
> version of the common stack do you have?
>
> Note that it is always possible for some of the ROS messages to be dropped.
> If the result message doesn't make it to the action client, then
> waitForResult is going to block forever.  I'd suggest adding a timeout on
> waitForServer, and then preempting the goal if you end up waiting too long.
>
>> I made each callback print when
>> it was called, and I found that quite often, the program would block
>> forever when I called waitForServer.
> I'm guessing you meant "the program would block forever when I called
> waitForResult"
>
> I'm using common-1.4.3, and after running your example for ~10 minutes, it
> doesn't seem to freeze for me.  I can rerun this test with the exact version
> of common that you're using.  Is there anything you can do to make this
> minimal example freeze more often? Are there some strategic sleeps you could
> add to make it mimic your original app more closely?
>
> If you can get the chores app to freeze, you could try attaching gdb to the
> process and seeing where all the threads are (using the "info threads"
> command inside gdb).  A bunch of them will be stuck on
> pthread_cond_timedwait calls, but I'd be curious if there's a thread stuck
> on a lock inside of actionlib.  That would be indicative of a race condition
> in actionlib.
>
> Vijay
>
> On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller <> wrote:
>>
>> Because of a timing issue I can't quite pin down, actionlib
>> occasionally appears to exhibit a race condition. I found the problem
>> after adding callbacks to my client. I made each callback print when
>> it was called, and I found that quite often, the program would block
>> forever when I called waitForServer. The problem was that the client's
>> active callback was called but the server's execute method was never
>> called.
>>
>> I have reduced the problem into a simple ROS package with a single
>> executable that I have attached. After running the node for a while, I
>> finally noticed the same problem. It's last output before blocking
>> was:
>>
>> --- snip ---
>> Current State: ABORTED
>> Aborting.
>> Active.
>> Done.
>> Current State: ABORTED
>> Aborting.
>> Active.
>> --- snip ---
>>
>> In my actual code, the condition happens extremely frequently, but I
>> found I could mitigate the problem by sleeping for one millisecond
>> before returning from the server's execute method. (I should warn you
>> that in the attached example, the problem almost never occurs).
>>
>> Is this likely a bug, or might I doing something wrong? Any
>> suggestions would be appreciated. Thanks for the help.
>>
>> -Ryan
>>
>> _______________________________________________
>> ros-users mailing list
>>
>> https://code.ros.org/mailman/listinfo/ros-users
>>
>
>
>
> --
> Vijay Pradeep
> Systems Engineer
> Willow Garage, Inc.
>
>
>