On Wed, Jan 26, 2011 at 10:53 PM, Blaise Gassend wrote: > I could believe that the polling thread is reaquiring the lock before > the reconfigure thread does. I don't think that there is any guarantee > that a thread that is waiting on a lock will get the lock before the > thread that released the lock reaquires it. I was about to respond with something like, "that's crazy, of course the waiting thread will get the lock." Then I asked Google. This discussion is on point: http://www.mail-archive.com/linux-il@cs.huji.ac.il/msg48835.html Seems that the POSIX spec says that fairness in mutex acquisition is optional, and that the Linux implementation doesn't guarantee it, especially in tight lock/unlock loops. That was certainly news to me, and I'm sure that I've written code in the past that assumed mutex fairness. > Have you tried putting in a short (1ms) usleep in the poll loop? That > would be a quick way to test this theory (though probably not the > cleanest long term solution. There are a few suggestions in the discussion linked above, including setting the thread scheduling policy, and explicitly yielding from within the mutex hoarding thread, similar to Blaise's suggestion. brian. > On Wed, Jan 26, 2011 at 8:48 PM, Jack O'Quin wrote: >> Any hints about this problem would be welcome. It causes serious >> usability issues with the camera1394/driver nodelet. >> >> What I imagine should happen is that the reconfig() method generally >> gets invoked while the poll() thread is holding the lock and waiting >> for data from the camera. The reconfig() should wait on the same lock >> until the image frame is published and the lock released. >> >> Instead it seems to be delayed for many seconds, sometimes 10 or 20, >> which is hundreds of frames at 15 fps. Could there be some starvation, >> convoying or livelock issue at work here? Why would reconfig() not >> gain the lock on the next cycle? >> >> The reconfig() may hold the lock for some time, because some >> parameters require closing and re-opening  the camera. If the poll() >> finds the camera closed, it sleeps for a second (without the lock), >> and then tries again. >> >> As you see, the code is pretty simple. Maybe there's something I am >> missing. Or, maybe there is something about these nodelet threads I >> need to understand better. >> >> On Mon, Jan 24, 2011 at 5:24 PM, Jack O'Quin wrote: >>> I am testing the latest camera1394 trunk version on unstable. This >>> package implements both a camera1394_node driver and a >>> camera1394/driver nodelet. Most of the code is common to the node and >>> nodelet implementations. >>> >>> I noticed that the dynamic reconfigure GUI, which is quite responsive >>> with the node driver, responds very sluggishly with the nodelet >>> implementation (sometimes 10 or more seconds). >>> >>> There are threading differences in these two implementations. The >>> nodelet version spawns a boost thread to poll the device, while the >>> node version runs the poll() and ros::spinOnce() in the same main >>> loop. Due to the separate nodelet poll thread, there is a lock to >>> serialize access to the device between poll() and reconfig() calls: >>> >>> {{{ >>>  /** device poll */ >>>  void Camera1394Driver::poll(void) >>>  { >>>    bool do_sleep = false; >>>    { >>>      // do not run concurrently with reconfig() >>>      boost::mutex::scoped_lock lock(mutex_); >>>      do_sleep = (state_ == Driver::CLOSED); >>>      if (!do_sleep) >>>        { >>>          // driver is open, read the next image still holding lock >>>          sensor_msgs::ImagePtr image(new sensor_msgs::Image); >>>          if (read(image)) >>>            { >>>              publish(image); >>>            } >>>        } >>>    } >>> >>>    if (do_sleep) >>>      { >>>        // device was closed, sleeping avoids busy wait >>>        // (DO NOT hold the lock while sleeping) >>>        cycle_.sleep(); >>>      } >>>  } >>> }}} >>> >>> The reconfig() method holds the same lock until the callback returns. >>> >>> I recently changed this to fix a race condition when reconfig() tries >>> to open the device but fails. But, sluggish performance persists. I >>> experimented with using the MT nodelet interfaces, but that made no >>> observable improvement. >>> >>> I don't understand the nodelet implementation well enough to know what >>> to try. I would appreciate some hints or suggestions for how to fix >>> this. It's a serious usability problem right now. >>> >>> Is there any way to call something like ros::spinOnce() in the poll thread? >> >> -- >>  joq >> _______________________________________________ >> ros-users mailing list >> ros-users@code.ros.org >> https://code.ros.org/mailman/listinfo/ros-users >> > _______________________________________________ > ros-users mailing list > ros-users@code.ros.org > https://code.ros.org/mailman/listinfo/ros-users >