Sorry, forgot to attach the file. -Ryan On Fri, Apr 1, 2011 at 6:01 PM, Ryan Miller wrote: > Hi Vijay, > > A few questions then. Is it acceptable for messages to be dropped? I > thought actionlib used reliable data transfer. In my actual program, I am > depending on every message being sent, and I would rather not use delays and > risk my robot still locking up. > > To give you some perspective, I am using the client to plan a path to a > destination, and I am using the server to execute the motor controls to > drive on a path it is given. Using a timeout would be fine (it's what I'm > doing in my production code, actually). But with this new result, it seems I > should now cancel the goal any time waitForResult times out because I don't > know if the server is just taking a long time or if its message was dropped > and it's actually finished. What I was planning on doing was letting the > motion planner follow the path until it is in imminent danger of colliding > with something, in which case I would cancel the goal. Unfortunately, I have > a new problem now where I cannot even cancel the goal. > > After timing out, the client always thinks it's goal is active, and I am > unable to cancel the goal. The output I am getting from the attached example > is: > > --- snip --- > *[ INFO] [1301694170.491982098]: Sending goal > [ INFO] [1301694170.492052098]: Waiting for result... > [ INFO] [1301694170.492138988]: Server is aborting goal. > [ INFO] [1301694170.492193892]: After waiting for the result, the state is > ABORTED > [ INFO] [1301694170.492232340]: Sending goal > [ INFO] [1301694170.492279138]: Waiting for result... > [ INFO] [1301694170.492341676]: Server is aborting goal. > [ INFO] [1301694170.492388975]: After waiting for the result, the state is > ABORTED* > [ INFO] [1301694170.492443452]: Sending goal > [ INFO] [1301694170.492543467]: Server is aborting goal. > [ INFO] [1301694170.492620163]: Waiting for result... > [ INFO] [1301694171.492789064]: After waiting for the result, the state is > ACTIVE > [ INFO] [1301694171.492909135]: Timeout, cancelling the goal > *[ INFO] [1301694171.492953475]: After cancelling, the state is ACTIVE* > *[ INFO] [1301694171.492977808]: The server is not done. > [ INFO] [1301694172.493114619]: The server is not done. > [ INFO] [1301694173.493301543]: The server is not done. > [ INFO] [1301694174.493476758]: The server is not done.* > ^C > --- snip --- > (Normal behaviour is italicised. Unexpected behaviour is bold) > > Might this be another problem, or am I misusing SimpleActionClient? > > Thanks again, > > -Ryan > > > On Fri, Apr 1, 2011 at 4:54 PM, Vijay Pradeep wrote: > >> Hi Ryan, >> >> Ok, I think we're actually getting somewhere..... >> >> >> > When I move the called to spinner.start() to where you suggested, the >> client gets stuck on its call to waitForServer as follows: >> Totally my bad. What I meant for you to do was the following: >> 1) Construct client & server >> 2) Spin up threads >> 3) waitForServer >> 4) Start sending goals >> >> To do this, you'll have to move spinner.start() to where I suggested, and >> also move client_.waitForServer() to the top of run(). Currently, the >> app is trying to waitForServer without any spin threads running, which is >> causing it to get stuck. Sorry about that. I'd also suggest trying to run >> with 1 spinner instead of 4, and see if that helps. >> >> Thanks for the gdb thread info. It looks like all the threads we're >> worried about are happily waiting for inputs in pthread conditions. This >> makes it seem even more like a message was being dropped. >> >> Also, I finally got it to freeze on my side, *with* debug on. See >> output: >> >> Done. >> [ INFO] [1301690082.884716841]: Hogging time or something >> Current State: ABORTED >> [DEBUG] [1301690082.884784275]: IN DELETER >> [DEBUG] [1301690082.884838215]: About to erase CommStateMachine >> [DEBUG] [1301690082.884891598]: Done erasing CommStateMachine >> [DEBUG] [1301690082.884948750]: about to start initGoal() >> [DEBUG] [1301690082.885037423]: The action server has received a new goal >> request >> [DEBUG] [1301690082.885152395]: A new goal has been recieved by the single >> goal action server >> [DEBUG] [1301690082.885263065]: Accepting a new goal >> [DEBUG] [1301690082.885330292]: Accepting goal, id: >> /test_client_server-281-1301690082.884983820, stamp: 1301690082.88 >> Server called. Aborting goal. >> [DEBUG] [1301690082.885555049]: Setting the current goal as aborted >> [DEBUG] [1301690082.885618287]: Setting status to aborted on goal, id: >> /test_client_server-281-1301690082.884983820, stamp: 1301690082.88 >> [DEBUG] [1301690082.885675967]: Publishing result for goal with id: >> /test_client_server-281-1301690082.884983820 and stamp: 1301690082.88 >> [DEBUG] [1301690082.885743824]: Getting status over the wire. >> [DEBUG] [1301690082.885956181]: Done with initGoal() >> [DEBUG] [1301690082.887905041]: Getting status over the wire. >> [DEBUG] [1301690082.888000669]: Trying to transition to ACTIVE >> [DEBUG] [1301690082.888064094]: Transitioning CommState from >> WAITING_FOR_GOAL_ACK to ACTIVE >> [DEBUG] [1301690082.888130878]: Transitioning SimpleState from [PENDING] >> to [ACTIVE] >> Active. >> [DEBUG] [1301690082.888203846]: Trying to transition to WAITING_FOR_RESULT >> *[DEBUG] [1301690082.888262411]: Transitioning CommState from ACTIVE to >> WAITING_FOR_RESULT* >> [DEBUG] [1301690083.088191441]: Getting status over the wire. >> [DEBUG] [1301690083.288523198]: Getting status over the wire. >> [DEBUG] [1301690083.488836856]: Getting status over the wire. >> [DEBUG] [1301690083.688115542]: Getting status over the wire. >> [DEBUG] [1301690083.888460218]: Getting status over the wire. >> [DEBUG] [1301690084.088794991]: Getting status over the wire. >> [DEBUG] [1301690084.288137860]: Getting status over the wire. >> [DEBUG] [1301690084.488456451]: Getting status over the wire. >> [DEBUG] [1301690084.688742829]: Getting status over the wire. >> [DEBUG] [1301690084.888085527]: Getting status over the wire. >> [DEBUG] [1301690085.088418107]: Getting status over the wire. >> >> On the client side, the goal transitioned to WAITING_FOR_RESULT, which >> means that the server successfully set the goal to a terminal state (in this >> case, ABORTED). However, for some reason, the client never gets the result >> message. With so many messages flying across so quickly, I guess one of the >> result messages happens to get dropped. This also explains why adding a >> sleep in the server callback stopped the freezing. The sleep significantly >> slowed down the messaging rate, preventing the result message from being >> dropped. >> >> It is definitely best practice to always add a timeout to >> waitForResult(). I've updated the actionlib quickstart guideaccordingly. >> >> And, please let me know if any of the spinner tweaks I suggested make a >> difference. >> >> Vijay >> >> >> On Fri, Apr 1, 2011 at 1:07 PM, Ryan Miller wrote: >> >>> Hi Vijay, >>> >>> When I move the called to spinner.start() to where you suggested, the >>> client gets stuck on its call to waitForServer as follows: >>> >>> --- snip --- >>> $ rosrun chores chores >>> [DEBUG] [1301684841.767308639]: Spinning up a thread for the >>> SimpleActionClient >>> [DEBUG] [1301684841.794376567]: goalConnectCallback: Adding >>> [/test_client_server] to goalSubscribers >>> [DEBUG] [1301684841.794470922]: Goal Subscribers (1 total) >>> - /test_client_server >>> [DEBUG] [1301684841.800129524]: cancelConnectCallback: Adding >>> [/test_client_server] to cancelSubscribers >>> [DEBUG] [1301684841.800199855]: cancel Subscribers (1 total) >>> - /test_client_server >>> [DEBUG] [1301684841.809336199]: isServerConnected: Didn't receive status >>> yet, so not connected yet >>> [DEBUG] [1301684841.809441030]: isServerConnected: Didn't receive status >>> yet, so not connected yet >>> [DEBUG] [1301684842.309665431]: isServerConnected: Didn't receive status >>> yet, so not connected yet >>> [DEBUG] [1301684842.810034890]: isServerConnected: Didn't receive status >>> yet, so not connected yet >>> ... >>> [DEBUG] [1301684842.810034890]: isServerConnected: Didn't receive status >>> yet, so not connected yet >>> --- snip --- >>> >>> Also, it appears I have a heisenbug on my hands because when I turn on >>> rosconsole debugging, the problem seemingly disappears, but when I keep it >>> off, it crashes on almost every run. By the way, the single call to ROS_INFO >>> I placed in the code is critical to making the program halt more often. >>> Remove that, and you'll have more trouble getting the program to halt. >>> >>> Is there any other information that might help you replicate this >>> problem? For now, I'll just throw some facts here about the computer I'm >>> using. I am running Ubuntu 10.10 64-bit. I have replicated the problem on a >>> relatively slow laptop and two pretty fast desktops. The specific package >>> information for the version of ros-diamondback-common I am using is: >>> >>> --- snip --- >>> $ dpkg -s ros-diamondback-common >>> Package: ros-diamondback-common >>> Status: install ok installed >>> Priority: optional >>> Section: unknown >>> Installed-Size: 34224 >>> Maintainer: Tully Foote tfoote@willowgarage.com >>> Architecture: amd64 >>> Version: 1.4.3-s1299027665~maverick >>> Depends: ros-diamondback-common-msgs (= 1.4.0-s1299027564~maverick), >>> ros-diamondback-ros (= 1.4.6-s1298587732~maverick), ros-diamondback-ros-comm >>> (= 1.4.5-s1299027222~maverick), libc6, build-essential, cmake, python-yaml, >>> subversion, uuid-dev, libcppunit-dev >>> Description: common code for personal robots >>> This stack contains tools built on top of ROS core which are commonly >>> used throughout the ROS ecosystem. >>> Wg-Rosdistro: diamondback >>> --- snip --- >>> >>> Here is some information from gdb from when the program halts. I've >>> printed a list of the running threads as well as the backtraces of a few of >>> the ones in different functions. >>> >>> --- snip --- >>> Current State: ABORTED >>> Server called. Aborting goal. >>> Active. >>> ^C >>> Program received signal SIGINT, Interrupt. >>> pthread_cond_timedwait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 212 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: No >>> such file or directory. >>> in ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S >>> (gdb) info thread >>> 12 Thread 0x7fffecea2700 (LWP 9649) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 11 Thread 0x7fffed6a3700 (LWP 9646) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 10 Thread 0x7fffedea4700 (LWP 9640) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 9 Thread 0x7fffee6a5700 (LWP 9639) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 8 Thread 0x7fffeeea6700 (LWP 9638) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 7 Thread 0x7fffef6a7700 (LWP 9637) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 6 Thread 0x7fffefea8700 (LWP 9636) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 5 Thread 0x7ffff06a9700 (LWP 9635) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 4 Thread 0x7ffff0eaa700 (LWP 9630) pthread_cond_wait@@GLIBC_2.3.2 () >>> at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> 3 Thread 0x7ffff16ab700 (LWP 9629) 0x00007ffff534e2c3 in select () at >>> ../sysdeps/unix/syscall-template.S:82 >>> 2 Thread 0x7ffff1eac700 (LWP 9628) 0x00007ffff5349203 in __poll >>> (fds=, nfds=, timeout=100) at >>> ../sysdeps/unix/sysv/linux/poll.c:87 >>> * 1 Thread 0x7ffff7fcd760 (LWP 9609) pthread_cond_timedwait@@GLIBC_2.3.2 >>> () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> (gdb) bt >>> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> #1 0x000000000044c698 in timed_wait > >>> (this=0x7fffffffe110, m=..., wait_duration=) at >>> /usr/include/boost/thread/pthread/condition_variable.hpp:110 >>> #2 >>> boost::condition_variable_any::timed_wait, >>> boost::posix_time::milliseconds> (this=0x7fffffffe110, m=..., >>> wait_duration=) >>> at /usr/include/boost/thread/pthread/condition_variable.hpp:133 >>> #3 0x000000000044d355 in >>> actionlib::SimpleActionClient > >>> >::waitForResult (this=0x7fffffffe040, timeout=...) >>> at >>> /opt/ros/diamondback/stacks/common/actionlib/include/actionlib/client/simple_action_client.h:567 >>> #4 0x00000000004583f6 in ClientGuy::run (this=0x7fffffffe040) at >>> /home/pesckal/prometheus/ros/chores/src/main.cpp:50 >>> #5 0x000000000042d8d5 in main (argc=1, argv=) at >>> /home/pesckal/prometheus/ros/chores/src/main.cpp:87 >>> (gdb) thread 2 >>> [Switching to thread 2 (Thread 0x7ffff1eac700 (LWP 9628))]#0 >>> 0x00007ffff5349203 in __poll (fds=, nfds=>> optimized out>, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:87 >>> 87 ../sysdeps/unix/sysv/linux/poll.c: No such file or directory. >>> in ../sysdeps/unix/sysv/linux/poll.c >>> (gdb) bt >>> #0 0x00007ffff5349203 in __poll (fds=, nfds=>> optimized out>, timeout=100) at ../sysdeps/unix/sysv/linux/poll.c:87 >>> #1 0x00007ffff78c169a in ros::PollSet::update (this=0x69c650, >>> poll_timeout=100) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/poll_set.cpp:202 >>> #2 0x00007ffff7906778 in ros::PollManager::threadFunc (this=0x69c650) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/poll_manager.cpp:95 >>> #3 0x00007ffff6df5230 in thread_proxy () from >>> /usr/lib/libboost_thread.so.1.42.0 >>> #4 0x00007ffff55f9971 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007ffff535592d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> (gdb) thread 3 >>> [Switching to thread 3 (Thread 0x7ffff16ab700 (LWP 9629))]#0 >>> 0x00007ffff534e2c3 in select () at ../sysdeps/unix/syscall-template.S:82 >>> 82 ../sysdeps/unix/syscall-template.S: No such file or directory. >>> in ../sysdeps/unix/syscall-template.S >>> (gdb) bt >>> #0 0x00007ffff534e2c3 in select () at >>> ../sysdeps/unix/syscall-template.S:82 >>> #1 0x00007ffff723313a in XmlRpc::XmlRpcDispatch::work (this=0x69ac18, >>> timeout=) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/utilities/xmlrpcpp/src/XmlRpcDispatch.cpp:110 >>> #2 0x00007ffff7903de9 in ros::XMLRPCManager::serverThreadFunc >>> (this=0x69abc0) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/xmlrpc_manager.cpp:256 >>> #3 0x00007ffff6df5230 in thread_proxy () from >>> /usr/lib/libboost_thread.so.1.42.0 >>> #4 0x00007ffff55f9971 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007ffff535592d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> (gdb) thread 4 >>> [Switching to thread 4 (Thread 0x7ffff0eaa700 (LWP 9630))]#0 >>> pthread_cond_wait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> 162 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S: No >>> such file or directory. >>> in ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S >>> (gdb) bt >>> #0 pthread_cond_wait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 >>> #1 0x00007ffff79603b3 in wait (this=) at >>> /usr/include/boost/thread/pthread/condition_variable.hpp:20 >>> #2 ros::ROSOutAppender::logThread (this=) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/rosout_appender.cpp:135 >>> #3 0x00007ffff6df5230 in thread_proxy () from >>> /usr/lib/libboost_thread.so.1.42.0 >>> #4 0x00007ffff55f9971 in start_thread (arg=) at >>> pthread_create.c:304 >>> #5 0x00007ffff535592d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #6 0x0000000000000000 in ?? () >>> (gdb) thread 5 >>> [Switching to thread 5 (Thread 0x7ffff06a9700 (LWP 9635))]#0 >>> pthread_cond_timedwait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> 212 ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S: >>> No such file or directory. >>> in ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S >>> (gdb) bt >>> #0 pthread_cond_timedwait@@GLIBC_2.3.2 () at >>> ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:212 >>> #1 0x00007ffff796885d in bool >>> boost::condition_variable::timed_wait>> 1000000l> >(boost::unique_lock&, >>> boost::date_time::subsecond_duration>> 1000000l> const&) () from >>> /opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/lib/libros.so >>> #2 0x00007ffff7964e99 in ros::CallbackQueue::callAvailable >>> (this=0x69c410, timeout=DWARF-2 expression error: DW_OP_reg operations must >>> be used either alone or in conjuction with DW_OP_piece or DW_OP_bit_piece. >>> ) >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/callback_queue.cpp:307 >>> #3 0x00007ffff78f22e1 in ros::internalCallbackQueueThreadFunc () >>> at >>> /tmp/buildd/ros-diamondback-ros-comm-1.4.5/debian/ros-diamondback-ros-comm/opt/ros/diamondback/stacks/ros_comm/clients/cpp/roscpp/src/libros/init.cpp:265 >>> #4 0x00007ffff6df5230 in thread_proxy () from >>> /usr/lib/libboost_thread.so.1.42.0 >>> #5 0x00007ffff55f9971 in start_thread (arg=) at >>> pthread_create.c:304 >>> #6 0x00007ffff535592d in clone () at >>> ../sysdeps/unix/sysv/linux/x86_64/clone.S:112 >>> #7 0x0000000000000000 in ?? () >>> (gdb) >>> --- snip --- >>> >>> Also, I'm sorry to hear you had to create a new package to compile the >>> program. As a sanity check, I also created a new package and copied over the >>> cpp and action files, but this did not change the result I was having, as >>> expected. (I've attached the package, for reference.) >>> >>> I really appreciate the help, thank you. >>> >>> -Ryan >>> >>> >>> On Fri, Apr 1, 2011 at 2:03 PM, Vijay Pradeep >>> wrote: >>> > Hi Ryan, >>> > >>> > Here's one more thing to try. Turn on rosconsole debugging (See >>> rosconole >>> > configuration) >>> > >>> > You're going to want the following two lines in your rosonsole.config: >>> > log4j.logger.ros.actionlib=DEBUG >>> > log4j.logger.ros.chores=DEBUG >>> > >>> > >>> > You'll get an output that looks like this: >>> > [DEBUG] [1301680806.580203868]: Setting the current goal as aborted >>> > [DEBUG] [1301680806.580254603]: Setting status to aborted on goal, id: >>> > /test_client_server-1118-1301680806.579359151, stamp: 1301680806.58 >>> > [DEBUG] [1301680806.580301868]: Publishing result for goal with id: >>> > /test_client_server-1118-1301680806.579359151 and stamp: 1301680806.58 >>> > [DEBUG] [1301680806.580439068]: Getting status over the wire. >>> > [DEBUG] [1301680806.580509625]: Trying to transition to ACTIVE >>> > [DEBUG] [1301680806.580553872]: Transitioning CommState from >>> > WAITING_FOR_GOAL_ACK to ACTIVE >>> > [DEBUG] [1301680806.580603042]: Transitioning SimpleState from >>> [PENDING] to >>> > [ACTIVE] >>> > Active. >>> > [DEBUG] [1301680806.580934461]: Trying to transition to >>> WAITING_FOR_RESULT >>> > [DEBUG] [1301680806.580991611]: Transitioning CommState from ACTIVE to >>> > WAITING_FOR_RESULT >>> > [DEBUG] [1301680806.581051376]: Trying to transition to DONE >>> > [DEBUG] [1301680806.581108948]: Transitioning CommState from >>> > WAITING_FOR_RESULT to DONE >>> > [DEBUG] [1301680806.581164618]: Transitioning SimpleState from [ACTIVE] >>> to >>> > [DONE] >>> > Done. >>> > [ INFO] [1301680806.581243326]: Hogging time or something >>> > Current State: ABORTED >>> > [DEBUG] [1301680806.581302953]: IN DELETER >>> > [DEBUG] [1301680806.581339010]: About to erase CommStateMachine >>> > [DEBUG] [1301680806.581377433]: Done erasing CommStateMachine >>> > [DEBUG] [1301680806.581414297]: about to start initGoal() >>> > [DEBUG] [1301680806.581489440]: Done with initGoal() >>> > [DEBUG] [1301680806.581544720]: The action server has received a new >>> goal >>> > request >>> > [DEBUG] [1301680806.581638097]: A new goal has been recieved by the >>> single >>> > goal action server >>> > [DEBUG] [1301680806.581705716]: Accepting a new goal >>> > [DEBUG] [1301680806.581758609]: Accepting goal, id: >>> > /test_client_server-1119-1301680806.581440270, stamp: 1301680806.58 >>> > Server called. Aborting goal. >>> > >>> > Can send me the debug output for the last few cycles when your app >>> freezes? >>> > >>> > Vijay >>> > >>> > On Fri, Apr 1, 2011 at 10:05 AM, Vijay Pradeep < >>> vpradeep@willowgarage.com> >>> > wrote: >>> >> >>> >> Hi Ryan, >>> >> >>> >> So I actually ran your original chores app overnight (~10 hours), and >>> it >>> >> still didn't freeze. >>> >> >>> >> Also, right now, I tried running your new chores app about 30 times >>> (for >>> >> about 5 seconds on each run), and it still hadn't frozen. I couldn't >>> get >>> >> the app to build using your CMakeLists and Makefile, so I rewrote them >>> based >>> >> on the standard "roscreate-pkg" versions of these files. I doubt this >>> would >>> >> affect the functionality. >>> >> >>> >> The fact that it freezes close to startup is interesting. My guess >>> would >>> >> be that it is having some sort of startup interaction with the >>> AsyncSpinner. >>> >> >>> >> Instead of: >>> >> int main(int argc, char** argv) { >>> >> ros::init(argc, argv, "test_client_server"); >>> >> >>> >> ros::AsyncSpinner spinner(4); >>> >> spinner.start(); >>> >> >>> >> ServerDude sd; >>> >> >>> >> ClientGuy client; >>> >> client.run(); >>> >> >>> >> return 0; >>> >> } >>> >> >>> >> Can you try: >>> >> int main(int argc, char** argv) { >>> >> ros::init(argc, argv, "test_client_server"); >>> >> >>> >> ros::AsyncSpinner spinner(4); >>> >> >>> >> ServerDude sd; >>> >> >>> >> ClientGuy client; >>> >> spinner.start(); >>> >> client.run(); >>> >> >>> >> return 0; >>> >> } >>> >> >>> >> I would really like to get to the bottom of this, but it's hard to do >>> much >>> >> on my end if I can't actually get it to halt. >>> >> >>> >> Vijay >>> >> >>> >> On Fri, Apr 1, 2011 at 7:51 AM, Ryan Miller >>> wrote: >>> >>> >>> >>> Hey Vijay, thanks for helping (and sorry for the double post; I >>> >>> accidentally forgot to reply all, and I figure some people on the >>> list >>> >>> may be interested in the email). Yes, I did mean waitForResult, and I >>> >>> am also using common-1.4.3 (diamondback too, if that's relevant). >>> >>> Since you're having trouble duplicating the issue, I went back to >>> >>> create an example closer to what I'm actually doing, and I managed to >>> >>> get it to crash just as often. I've attached the project. >>> >>> >>> >>> If it doesn't crash after about 5 seconds, restart it. It seems if it >>> >>> continues for more than 5 seconds, it is more likely to continue >>> >>> running for much longer, but it usually halts within a second. >>> >>> >>> >>> Thanks a ton. >>> >>> >>> >>> -Ryan >>> >>> >>> >>> On Thu, Mar 31, 2011 at 11:00 PM, Vijay Pradeep >>> >>> wrote: >>> >>> > Hi Ryan, >>> >>> > >>> >>> > I'm sorry that you're having trouble getting actionlib to work. >>> What >>> >>> > version of the common stack do you have? >>> >>> > >>> >>> > Note that it is always possible for some of the ROS messages to be >>> >>> > dropped. >>> >>> > If the result message doesn't make it to the action client, then >>> >>> > waitForResult is going to block forever. I'd suggest adding a >>> timeout >>> >>> > on >>> >>> > waitForServer, and then preempting the goal if you end up waiting >>> too >>> >>> > long. >>> >>> > >>> >>> >> I made each callback print when >>> >>> >> it was called, and I found that quite often, the program would >>> block >>> >>> >> forever when I called waitForServer. >>> >>> > I'm guessing you meant "the program would block forever when I >>> called >>> >>> > waitForResult" >>> >>> > >>> >>> > I'm using common-1.4.3, and after running your example for ~10 >>> minutes, >>> >>> > it >>> >>> > doesn't seem to freeze for me. I can rerun this test with the >>> exact >>> >>> > version >>> >>> > of common that you're using. Is there anything you can do to make >>> this >>> >>> > minimal example freeze more often? Are there some strategic sleeps >>> you >>> >>> > could >>> >>> > add to make it mimic your original app more closely? >>> >>> > >>> >>> > If you can get the chores app to freeze, you could try attaching >>> gdb to >>> >>> > the >>> >>> > process and seeing where all the threads are (using the "info >>> threads" >>> >>> > command inside gdb). A bunch of them will be stuck on >>> >>> > pthread_cond_timedwait calls, but I'd be curious if there's a >>> thread >>> >>> > stuck >>> >>> > on a lock inside of actionlib. That would be indicative of a race >>> >>> > condition >>> >>> > in actionlib. >>> >>> > >>> >>> > Vijay >>> >>> > >>> >>> > On Thu, Mar 31, 2011 at 6:55 PM, Ryan Miller < >>> rmiller4589@gmail.com> >>> >>> > wrote: >>> >>> >> >>> >>> >> Because of a timing issue I can't quite pin down, actionlib >>> >>> >> occasionally appears to exhibit a race condition. I found the >>> problem >>> >>> >> after adding callbacks to my client. I made each callback print >>> when >>> >>> >> it was called, and I found that quite often, the program would >>> block >>> >>> >> forever when I called waitForServer. The problem was that the >>> client's >>> >>> >> active callback was called but the server's execute method was >>> never >>> >>> >> called. >>> >>> >> >>> >>> >> I have reduced the problem into a simple ROS package with a single >>> >>> >> executable that I have attached. After running the node for a >>> while, I >>> >>> >> finally noticed the same problem. It's last output before blocking >>> >>> >> was: >>> >>> >> >>> >>> >> --- snip --- >>> >>> >> Current State: ABORTED >>> >>> >> Aborting. >>> >>> >> Active. >>> >>> >> Done. >>> >>> >> Current State: ABORTED >>> >>> >> Aborting. >>> >>> >> Active. >>> >>> >> --- snip --- >>> >>> >> >>> >>> >> In my actual code, the condition happens extremely frequently, but >>> I >>> >>> >> found I could mitigate the problem by sleeping for one millisecond >>> >>> >> before returning from the server's execute method. (I should warn >>> you >>> >>> >> that in the attached example, the problem almost never occurs). >>> >>> >> >>> >>> >> Is this likely a bug, or might I doing something wrong? Any >>> >>> >> suggestions would be appreciated. Thanks for the help. >>> >>> >> >>> >>> >> -Ryan >>> >>> >> >>> >>> >> _______________________________________________ >>> >>> >> ros-users mailing list >>> >>> >> ros-users@code.ros.org >>> >>> >> https://code.ros.org/mailman/listinfo/ros-users >>> >>> >> >>> >>> > >>> >>> > >>> >>> > >>> >>> > -- >>> >>> > Vijay Pradeep >>> >>> > Systems Engineer >>> >>> > Willow Garage, Inc. >>> >>> > vpradeep@willowgarage.com >>> >>> > >>> >>> > >>> >> >>> >> >>> >> >>> >> -- >>> >> Vijay Pradeep >>> >> Systems Engineer >>> >> Willow Garage, Inc. >>> >> vpradeep@willowgarage.com >>> >> >>> > >>> > >>> > >>> > -- >>> > Vijay Pradeep >>> > Systems Engineer >>> > Willow Garage, Inc. >>> > vpradeep@willowgarage.com >>> > >>> > >>> >>> >> >> >> -- >> Vijay Pradeep >> Systems Engineer >> Willow Garage, Inc. >> vpradeep@willowgarage.com >> >> >