Hi Ryan, Here's one more thing to try. Turn on rosconsole debugging (See rosconole configuration ) You're going to want the following two lines in your rosonsole.config: log4j.logger.ros.actionlib=DEBUG log4j.logger.ros.chores=DEBUG You'll get an output that looks like this: [DEBUG] [1301680806.580203868]: Setting the current goal as aborted [DEBUG] [1301680806.580254603]: Setting status to aborted on goal, id: /test_client_server-1118-1301680806.579359151, stamp: 1301680806.58 [DEBUG] [1301680806.580301868]: Publishing result for goal with id: /test_client_server-1118-1301680806.579359151 and stamp: 1301680806.58 [DEBUG] [1301680806.580439068]: Getting status over the wire. [DEBUG] [1301680806.580509625]: Trying to transition to ACTIVE [DEBUG] [1301680806.580553872]: Transitioning CommState from WAITING_FOR_GOAL_ACK to ACTIVE [DEBUG] [1301680806.580603042]: Transitioning SimpleState from [PENDING] to [ACTIVE] Active. [DEBUG] [1301680806.580934461]: Trying to transition to WAITING_FOR_RESULT [DEBUG] [1301680806.580991611]: Transitioning CommState from ACTIVE to WAITING_FOR_RESULT [DEBUG] [1301680806.581051376]: Trying to transition to DONE [DEBUG] [1301680806.581108948]: Transitioning CommState from WAITING_FOR_RESULT to DONE [DEBUG] [1301680806.581164618]: Transitioning SimpleState from [ACTIVE] to [DONE] Done. [ INFO] [1301680806.581243326]: Hogging time or something Current State: ABORTED [DEBUG] [1301680806.581302953]: IN DELETER [DEBUG] [1301680806.581339010]: About to erase CommStateMachine [DEBUG] [1301680806.581377433]: Done erasing CommStateMachine [DEBUG] [1301680806.581414297]: about to start initGoal() [DEBUG] [1301680806.581489440]: Done with initGoal() [DEBUG] [1301680806.581544720]: The action server has received a new goal request [DEBUG] [1301680806.581638097]: A new goal has been recieved by the single goal action server [DEBUG] [1301680806.581705716]: Accepting a new goal [DEBUG] [1301680806.581758609]: Accepting goal, id: /test_client_server-1119-1301680806.581440270, stamp: 1301680806.58 Server called. Aborting goal. Can send me the debug output for the last few cycles when your app freezes? Vijay On Fri, Apr 1, 2011 at 10:05 AM, Vijay Pradeep wrote: > Hi Ryan, > > So I actually ran your original chores app overnight (~10 hours), and it > still didn't freeze. > > Also, right now, I tried running your new chores app about 30 times (for > about 5 seconds on each run), and it still hadn't frozen. I couldn't get > the app to build using your CMakeLists and Makefile, so I rewrote them based > on the standard "roscreate-pkg" versions of these files. I doubt this would > affect the functionality. > > The fact that it freezes close to startup is interesting. My guess would > be that it is having some sort of startup interaction with the AsyncSpinner. > > Instead of: > int main(int argc, char** argv) { > ros::init(argc, argv, "test_client_server"); > > ros::AsyncSpinner spinner(4); > * spinner.start();* > > ServerDude sd; > > ClientGuy client; > client.run(); > > return 0; > } > > Can you try: > int main(int argc, char** argv) { > ros::init(argc, argv, "test_client_server"); > > ros::AsyncSpinner spinner(4); > > ServerDude sd; > > ClientGuy client; > * spinner.start();* > client.run(); > > return 0; > } > > I would really like to get to the bottom of this, but it's hard to do much > on my end if I can't actually get it to halt. > > Vijay > > > On Fri, Apr 1, 2011 at 7:51 AM, Ryan Miller wrote: > >> Hey Vijay, thanks for helping (and sorry for the double post; I >> accidentally forgot to reply all, and I figure some people on the list >> may be interested in the email). Yes, I did mean waitForResult, and I >> am also using common-1.4.3 (diamondback too, if that's relevant). >> Since you're having trouble duplicating the issue, I went back to >> create an example closer to what I'm actually doing, and I managed to >> get it to crash just as often. I've attached the project. >> >> If it doesn't crash after about 5 seconds, restart it. It seems if it >> continues for more than 5 seconds, it is more likely to continue >> running for much longer, but it usually halts within a second. >> >> Thanks a ton. >> >> -Ryan >> >> On Thu, Mar 31, 2011 at 11:00 PM, Vijay Pradeep >> wrote: >> > Hi Ryan, >> > >> > I'm sorry that you're having trouble getting actionlib to work. What >> > version of the common stack do you have? >> > >> > Note that it is always possible for some of the ROS messages to be >> dropped. >> > If the result message doesn't make it to the action client, then >> > waitForResult is going to block forever. I'd suggest adding a timeout >> on >> > waitForServer, and then preempting the goal if you end up waiting too >> long. >> > >> >> I made each callback print when >> >> it was called, and I found that quite often, the program would block >> >> forever when I called waitForServer. >> > I'm guessing you meant "the program would block forever when I called >> > waitForResult" >> > >> > I'm using common-1.4.3, and after running your example for ~10 minutes, >> it >> > doesn't seem to freeze for me. I can rerun this test with the exact >> version >> > of common that you're using. Is there anything you can do to make this >> > minimal example freeze more often? Are there some strategic sleeps you >> could >> > add to make it mimic your original app more closely? >> > >> > If you can get the chores app to freeze, you could try attaching gdb to >> the >> > process and seeing where all the threads are (using the "info threads" >> > command inside gdb). A bunch of them will be stuck on >> > pthread_cond_timedwait calls, but I'd be curious if there's a thread >> stuck >> > on a lock inside of actionlib. That would be indicative of a race >> condition >> > in actionlib. >> > >> > Vijay >> > >> > On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller >> wrote: >> >> >> >> Because of a timing issue I can't quite pin down, actionlib >> >> occasionally appears to exhibit a race condition. I found the problem >> >> after adding callbacks to my client. I made each callback print when >> >> it was called, and I found that quite often, the program would block >> >> forever when I called waitForServer. The problem was that the client's >> >> active callback was called but the server's execute method was never >> >> called. >> >> >> >> I have reduced the problem into a simple ROS package with a single >> >> executable that I have attached. After running the node for a while, I >> >> finally noticed the same problem. It's last output before blocking >> >> was: >> >> >> >> --- snip --- >> >> Current State: ABORTED >> >> Aborting. >> >> Active. >> >> Done. >> >> Current State: ABORTED >> >> Aborting. >> >> Active. >> >> --- snip --- >> >> >> >> In my actual code, the condition happens extremely frequently, but I >> >> found I could mitigate the problem by sleeping for one millisecond >> >> before returning from the server's execute method. (I should warn you >> >> that in the attached example, the problem almost never occurs). >> >> >> >> Is this likely a bug, or might I doing something wrong? Any >> >> suggestions would be appreciated. Thanks for the help. >> >> >> >> -Ryan >> >> >> >> _______________________________________________ >> >> ros-users mailing list >> >> ros-users@code.ros.org >> >> https://code.ros.org/mailman/listinfo/ros-users >> >> >> > >> > >> > >> > -- >> > Vijay Pradeep >> > Systems Engineer >> > Willow Garage, Inc. >> > vpradeep@willowgarage.com >> > >> > >> > > > > -- > Vijay Pradeep > Systems Engineer > Willow Garage, Inc. > vpradeep@willowgarage.com > > -- Vijay Pradeep Systems Engineer Willow Garage, Inc. vpradeep@willowgarage.com