Hey Vijay, thanks for helping (and sorry for the double post; I accidentally forgot to reply all, and I figure some people on the list may be interested in the email). Yes, I did mean waitForResult, and I am also using common-1.4.3 (diamondback too, if that's relevant). Since you're having trouble duplicating the issue, I went back to create an example closer to what I'm actually doing, and I managed to get it to crash just as often. I've attached the project. If it doesn't crash after about 5 seconds, restart it. It seems if it continues for more than 5 seconds, it is more likely to continue running for much longer, but it usually halts within a second. Thanks a ton. -Ryan On Thu, Mar 31, 2011 at 11:00 PM, Vijay Pradeep wrote: > Hi Ryan, > > I'm sorry that you're having trouble getting actionlib to work.  What > version of the common stack do you have? > > Note that it is always possible for some of the ROS messages to be dropped. > If the result message doesn't make it to the action client, then > waitForResult is going to block forever.  I'd suggest adding a timeout on > waitForServer, and then preempting the goal if you end up waiting too long. > >> I made each callback print when >> it was called, and I found that quite often, the program would block >> forever when I called waitForServer. > I'm guessing you meant "the program would block forever when I called > waitForResult" > > I'm using common-1.4.3, and after running your example for ~10 minutes, it > doesn't seem to freeze for me.  I can rerun this test with the exact version > of common that you're using.  Is there anything you can do to make this > minimal example freeze more often? Are there some strategic sleeps you could > add to make it mimic your original app more closely? > > If you can get the chores app to freeze, you could try attaching gdb to the > process and seeing where all the threads are (using the "info threads" > command inside gdb).  A bunch of them will be stuck on > pthread_cond_timedwait calls, but I'd be curious if there's a thread stuck > on a lock inside of actionlib.  That would be indicative of a race condition > in actionlib. > > Vijay > > On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller wrote: >> >> Because of a timing issue I can't quite pin down, actionlib >> occasionally appears to exhibit a race condition. I found the problem >> after adding callbacks to my client. I made each callback print when >> it was called, and I found that quite often, the program would block >> forever when I called waitForServer. The problem was that the client's >> active callback was called but the server's execute method was never >> called. >> >> I have reduced the problem into a simple ROS package with a single >> executable that I have attached. After running the node for a while, I >> finally noticed the same problem. It's last output before blocking >> was: >> >> --- snip --- >> Current State: ABORTED >> Aborting. >> Active. >> Done. >> Current State: ABORTED >> Aborting. >> Active. >> --- snip --- >> >> In my actual code, the condition happens extremely frequently, but I >> found I could mitigate the problem by sleeping for one millisecond >> before returning from the server's execute method. (I should warn you >> that in the attached example, the problem almost never occurs). >> >> Is this likely a bug, or might I doing something wrong? Any >> suggestions would be appreciated. Thanks for the help. >> >> -Ryan >> >> _______________________________________________ >> ros-users mailing list >> ros-users@code.ros.org >> https://code.ros.org/mailman/listinfo/ros-users >> > > > > -- > Vijay Pradeep > Systems Engineer > Willow Garage, Inc. > vpradeep@willowgarage.com > >