[ros-users] need help: set_parameters service callback very sluggish with nodelet-based camera1394 driver

Brian Gerkey gerkey at willowgarage.com
Thu Jan 27 07:53:34 UTC 2011


On Wed, Jan 26, 2011 at 10:53 PM, Blaise Gassend
<blaise at willowgarage.com> wrote:
> I could believe that the polling thread is reaquiring the lock before
> the reconfigure thread does. I don't think that there is any guarantee
> that a thread that is waiting on a lock will get the lock before the
> thread that released the lock reaquires it.

I was about to respond with something like, "that's crazy, of course
the waiting thread will get the lock."  Then I asked Google.  This
discussion is on point:
  http://www.mail-archive.com/linux-il@cs.huji.ac.il/msg48835.html
Seems that the POSIX spec says that fairness in mutex acquisition is
optional, and that the Linux implementation doesn't guarantee it,
especially in tight lock/unlock loops.    That was certainly news to
me, and I'm sure that I've written code in the past that assumed mutex
fairness.

> Have you tried putting in a short (1ms) usleep in the poll loop? That
> would be a quick way to test this theory (though probably not the
> cleanest long term solution.

There are a few suggestions in the discussion linked above, including
setting the thread scheduling policy, and explicitly yielding from
within the mutex hoarding thread, similar to Blaise's suggestion.

	brian.


> On Wed, Jan 26, 2011 at 8:48 PM, Jack O'Quin <jack.oquin at gmail.com> wrote:
>> Any hints about this problem would be welcome. It causes serious
>> usability issues with the camera1394/driver nodelet.
>>
>> What I imagine should happen is that the reconfig() method generally
>> gets invoked while the poll() thread is holding the lock and waiting
>> for data from the camera. The reconfig() should wait on the same lock
>> until the image frame is published and the lock released.
>>
>> Instead it seems to be delayed for many seconds, sometimes 10 or 20,
>> which is hundreds of frames at 15 fps. Could there be some starvation,
>> convoying or livelock issue at work here? Why would reconfig() not
>> gain the lock on the next cycle?
>>
>> The reconfig() may hold the lock for some time, because some
>> parameters require closing and re-opening  the camera. If the poll()
>> finds the camera closed, it sleeps for a second (without the lock),
>> and then tries again.
>>
>> As you see, the code is pretty simple. Maybe there's something I am
>> missing. Or, maybe there is something about these nodelet threads I
>> need to understand better.
>>
>> On Mon, Jan 24, 2011 at 5:24 PM, Jack O'Quin <jack.oquin at gmail.com> wrote:
>>> I am testing the latest camera1394 trunk version on unstable. This
>>> package implements both a camera1394_node driver and a
>>> camera1394/driver nodelet. Most of the code is common to the node and
>>> nodelet implementations.
>>>
>>> I noticed that the dynamic reconfigure GUI, which is quite responsive
>>> with the node driver, responds very sluggishly with the nodelet
>>> implementation (sometimes 10 or more seconds).
>>>
>>> There are threading differences in these two implementations. The
>>> nodelet version spawns a boost thread to poll the device, while the
>>> node version runs the poll() and ros::spinOnce() in the same main
>>> loop. Due to the separate nodelet poll thread, there is a lock to
>>> serialize access to the device between poll() and reconfig() calls:
>>>
>>> {{{
>>>  /** device poll */
>>>  void Camera1394Driver::poll(void)
>>>  {
>>>    bool do_sleep = false;
>>>    {
>>>      // do not run concurrently with reconfig()
>>>      boost::mutex::scoped_lock lock(mutex_);
>>>      do_sleep = (state_ == Driver::CLOSED);
>>>      if (!do_sleep)
>>>        {
>>>          // driver is open, read the next image still holding lock
>>>          sensor_msgs::ImagePtr image(new sensor_msgs::Image);
>>>          if (read(image))
>>>            {
>>>              publish(image);
>>>            }
>>>        }
>>>    }
>>>
>>>    if (do_sleep)
>>>      {
>>>        // device was closed, sleeping avoids busy wait
>>>        // (DO NOT hold the lock while sleeping)
>>>        cycle_.sleep();
>>>      }
>>>  }
>>> }}}
>>>
>>> The reconfig() method holds the same lock until the callback returns.
>>>
>>> I recently changed this to fix a race condition when reconfig() tries
>>> to open the device but fails. But, sluggish performance persists. I
>>> experimented with using the MT nodelet interfaces, but that made no
>>> observable improvement.
>>>
>>> I don't understand the nodelet implementation well enough to know what
>>> to try. I would appreciate some hints or suggestions for how to fix
>>> this. It's a serious usability problem right now.
>>>
>>> Is there any way to call something like ros::spinOnce() in the poll thread?
>>
>> --
>>  joq
>> _______________________________________________
>> ros-users mailing list
>> ros-users at code.ros.org
>> https://code.ros.org/mailman/listinfo/ros-users
>>
> _______________________________________________
> ros-users mailing list
> ros-users at code.ros.org
> https://code.ros.org/mailman/listinfo/ros-users
>



More information about the ros-users mailing list