[ros-users] socket error

Arjun akarjun at gmail.com
Wed Feb 2 10:25:26 UTC 2011


Hi Ken,
 Thanks for offering to debug this issue remotely for me, but I am glad to
let you know that I just fixed this issue. It turns out that it had
something to do with my graphics driver. I did a step by step debugging of
my rosnode and found out that the interrupted system call error occurred
only when openrave was rendering objects loaded in its environment. I have
no concrete explanation as to why this happens, but switching to an older
driver for my graphics card seems to solve this problem (and a few unrelated
ones as well).
 I also came across this very interesting read on the connection between
Unix sockets and the interrupted system call error:
http://www.madore.org/~david/computers/connect-intr.html
It turns out that socket implementations in Unix are not all that robust, so
"maybe" my faulty graphics driver was taking longer than usual to make
reads/writes causing the socket to return with an error, whereas the
connection was still being attempted. So, when the code in xmlrpc.py tried
to make a connection again, it saw that this connection already existed and
returned an error.. <http://www.madore.org/~david/computers/connect-intr.html>
The following debug messages in the log files saying that some connections
already existed seems to give some backing to my theory:
rosout-1.log:[roscpp_internal] [2011-01-31 16:31:26,526] [thread
0x7f2d043cc760]: [DEBUG] Publisher update for [/rosout]:  already have these
connections:
rosout-1.log:[roscpp_internal] [2011-01-31 16:31:36,397] [thread
0x7f2cfe2d7700]: [DEBUG] Publisher update for [/rosout]: http://luk:51689/,
 already have these connections:
rosout-1.log:[roscpp_internal] [2011-01-31 16:31:39,300] [thread
0x7f2cfe2d7700]: [DEBUG] Publisher update for [/rosout]:  already have these
connections: http://luk:51689/,
rosout.log:Publisher update for [/rosout]:  already have these connections:
rosout.log:Publisher update for [/rosout]: http://luk:51689/,  already have
these connections:
rosout.log:Publisher update for [/rosout]:  already have these connections:
http://luk:51689/,

Thanks again for your help Ken and congrats to WG and you on releasing
Diamondback beta!
-Arjun.


On Tue, Feb 1, 2011 at 12:10 AM, Ken Conley <kwc at willowgarage.com> wrote:

> The patched xmlrpc.py is intended to catch interrupted system calls.
> For whatever reason, the exception is not matching the except blocks.
> An aggressive workaround for you would be to just ignore all
> exceptions unless on shutdown, e.g. to change the end to:
>
>          except Exception as e:
>              if self.is_shutdown:
>                  pass
>
> but this isn't an acceptable general solution.  This is the sort of
> problem that could be fix in <5 mins if I can get to a terminal where
> this happening, but is otherwise difficult to offer debug-by-e-mail
> advise.
>
>  - Ken
>
>
> On Mon, Jan 31, 2011 at 2:17 PM, Arjun <akarjun at gmail.com> wrote:
> > Hi Ken,
> > I made the changes you asked me to in xmlrpc.py and here's the debug
> message
> > I get on the console now:
> > Unhandled exception in thread started by <bound method TCPServer.run of
> > <rospy.impl.tcpros_base.TCPServer object at 0x29cf4d0>>
> > Traceback (most recent call last):
> >   File "/home/aarumbak/ros/ros/core/rospy/src/rospy/impl/tcpros_base.py",
> > line 141, in run
> > Creating action server for manipulation/right_arm/put
> > [ERROR] 1296509499.251038: ERROR: error running XML-RPC server:
> > Traceback (most recent call last):
> >   File "/home/aarumbak/ros/ros/core/rospy/src/rospy/impl/msnode.py", line
> > 86, in run
> >     super(ROSNode, self).run()
> >   File "/home/aarumbak/ros/ros/core/roslib/src/roslib/xmlrpc.py", line
> 246,
> > in run
> >     raise Exception("unhandled exception [%s]"%(str(e)))
> > Exception: unhandled exception [(4, 'Interrupted system call')]
> >     (client_sock, client_addr) = self.server_sock.accept()
> >   File "/usr/lib/python2.6/socket.py", line 197, in accept
> >     sock, addr = self._sock.accept()
> > socket.error: [INFO] 1296509499.252298: Manipulation applet is dying.
> RIP.
> > [Errno 4] Interrupted system call
> > I've also attached the log file for my rosnode to this email. The
> exception
> > says that it is an Interrupted system call and it appears to be the
> socket
> > accept call. I was wondering if it would make sense to write a signal
> > handler or wrap this with some kind of No_Interrupts macro so that the
> > accept call doesnt get interrupted again. But this would ofcourse be
> > cosmetic and not treat the real cause. I am going to try debugging this
> > more, since getting the software to work is absolutely essential for my
> > project. But if I cant go much further, I will try running this on Ubuntu
> > 10.04, which for some reason does not seem to give this problem. Thanks a
> > lot and please let me know if there are other things I should try to fix
> > this.
> > -Arjun.
> >
> >
> >
> > On Mon, Jan 31, 2011 at 2:59 PM, Ken Conley <kwc at willowgarage.com>
> wrote:
> >>
> >> Hi Arjun,
> >>
> >> The behavior is a bit weird.  The socket.error should have been caught
> >> in xmlrpc instead of passed on.  In Python 2.6, socket.error should be
> >> a subclass of IOError.
> >>
> >> Can you try adding a block like this to xmrpc.py and seeing if it
> >> changes the behavior?  It's basically a copy of the IOError branch,
> >> but with the specific socket.error type.
> >>
> >>  - Ken
> >>
> >> Before the final except block, after the IOError block:
> >>
> >>           except socket.error as (errno, errstr):
> >>               # check for interrupted call, which can occur if we're
> >>               # embedded in a program using signals.  All other
> >>               # exceptions break _run.
> >>               if self.is_shutdown:
> >>                   pass
> >>               elif errno != 4:
> >>                   self.is_shutdown = True
> >>                   logger.error("serve forever IOError: %s,
> >> %s"%(errno, errstr))
> >>                   raise
> >>
> >>
> >> Also, for more debugging, modify the final except block:
> >>
> >>           except Exception as e:
> >>               if self.is_shutdown:
> >>                   pass
> >>               else:
> >>                   raise Exception("unhandled exception [%s]"(str(e)) )
> >>
> >> On Mon, Jan 31, 2011 at 9:08 AM, Arjun <akarjun at gmail.com> wrote:
> >> > Hi Ken,
> >> > I got the ros core code from the ros1.2 branch again today and rebuilt
> >> > it.
> >> > It has the code from the patch you sent me and I can also confirm that
> >> > the
> >> > xmlrpc.py file does end with the code snippet from your email. The
> debug
> >> > message on the console output now has more information and so does the
> >> > log
> >> > file for my rosnode. The console debug message for my node now says,
> >> > Traceback (most recent call last):
> >> >   File
> >> > "/home/aarumbak/ros/ros/core/rospy/src/rospy/impl/tcpros_base.py",
> >> > line 141, in run
> >> >     (client_sock, client_addr) = self.server_sock.accept()
> >> >   File "/usr/lib/python2.6/socket.py", line 197, in accept
> >> >     sock, addr = self._sock.accept()
> >> > socket.error: [Errno 4] Interrupted system call
> >> > [ERROR] 1296493143.981902: ERROR: error running XML-RPC server:
> >> > Traceback (most recent call last):
> >> >   File "/home/aarumbak/ros/ros/core/rospy/src/rospy/impl/msnode.py",
> >> > line
> >> > 86, in run
> >> >     super(ROSNode, self).run()
> >> >   File "/home/aarumbak/ros/ros/core/roslib/src/roslib/xmlrpc.py", line
> >> > 221,
> >> > in run
> >> >     self.server.serve_forever()
> >> >   File "/usr/lib/python2.6/SocketServer.py", line 224, in
> serve_forever
> >> >     r, w, e = select.select([self], [], [], poll_interval)
> >> > error: (4, 'Interrupted system call')
> >> > I have also attached the log files with this email. Thanks a lot!
> >> > -Arjun.
> >> >
> >> >
> >> > On Mon, Jan 31, 2011 at 2:37 AM, Ken Conley <kwc at willowgarage.com>
> >> > wrote:
> >> >>
> >> >> Hi Arjun,
> >> >>
> >> >> The necessary info is missing due to a bug in the call to the logger.
> >> >> Can you try the attached patch?
> >> >>
> >> >> Also, can you confirm that your roslib/src/roslib/xmlrpc.py ends with
> >> >> the code block below?
> >> >>
> >> >> thanks,
> >> >> Ken
> >> >>
> >> >>        while not self.is_shutdown:
> >> >>            try:
> >> >>                self.server.serve_forever()
> >> >>            except IOError as (errno, errstr):
> >> >>                # check for interrupted call, which can occur if we're
> >> >>                # embedded in a program using signals.  All other
> >> >>                # exceptions break _run.
> >> >>                if self.is_shutdown:
> >> >>                    pass
> >> >>                elif errno != 4:
> >> >>                    self.is_shutdown = True
> >> >>                    logger.error("serve forever IOError: %s,
> >> >> %s"%(errno, errstr))
> >> >>                    raise
> >> >>            except:
> >> >>                if self.is_shutdown:
> >> >>                    pass
> >> >>                else:
> >> >>                    raise
> >> >>
> >> >>
> >> >> On Sun, Jan 30, 2011 at 10:10 PM, Arjun <akarjun at gmail.com> wrote:
> >> >> > Hi Ken,
> >> >> > Thanks a lot for offering to take a look. I've attached the log
> file
> >> >> > for
> >> >> > my
> >> >> > node and also the master.log file with this email. I mentioned
> >> >> > openrave
> >> >> > only
> >> >> > because the previous person with the socket error used it as well
> and
> >> >> > I
> >> >> > was
> >> >> > wondering if there was some connection there.
> >> >> > -Arjun.
> >> >> >
> >> >> > On Mon, Jan 31, 2011 at 12:29 AM, Ken Conley <kwc at willowgarage.com
> >
> >> >> > wrote:
> >> >> >>
> >> >> >> Hi Arjun,
> >> >> >>
> >> >> >> I would need the log file from your actual node instead.  The
> rosout
> >> >> >> log file is just for the rosout node.  There should be a log file
> if
> >> >> >> you "roscd log" and look for your node's name.
> >> >> >>
> >> >> >> Regardless, the patch for xmlrpc.py is not relevant here, as this
> is
> >> >> >> a
> >> >> >> different section of code.  The log file would hopefully provide
> >> >> >> more
> >> >> >> detail as to whether the above error is the cause or just a
> symptom.
> >> >> >> I haven't used rospy inside of openrave, so I'm not sure I can be
> of
> >> >> >> much help, though I could think of ways to make the code more
> robust
> >> >> >> to whatever the problem is.
> >> >> >>
> >> >> >>  - Ken
> >> >> >>
> >> >> >> On Sun, Jan 30, 2011 at 1:46 AM, Arjun <akarjun at gmail.com> wrote:
> >> >> >> > Hi all,
> >> >> >> >  I am getting a socket error when I launch my program (which
> uses
> >> >> >> > openrave).
> >> >> >> > The launch file launches a single node running on the same
> machine
> >> >> >> > as
> >> >> >> > the
> >> >> >> > roscore. I am using Ubuntu 10.10 and my Python install is
> version
> >> >> >> > 2.6. I
> >> >> >> > did
> >> >> >> > look up the archives and found that someone else had mentioned
> >> >> >> > this
> >> >> >> > same
> >> >> >> > problem about a couple of weeks ago and Ken Conley had addressed
> >> >> >> > it.
> >> >> >> > I
> >> >> >> > followed the advice from Ken Conley in that thread and changed
> my
> >> >> >> > .rosinstall file to install from the ros1.2 branch instead just
> >> >> >> > for
> >> >> >> > the
> >> >> >> > ros
> >> >> >> > stack. This gave me the latest xmlrpc.py file, which I verified
> >> >> >> > with
> >> >> >> > the
> >> >> >> > previous thread, but I still get the same error.
> >> >> >> > Here's the error:
> >> >> >> > Unhandled exception in thread started by <bound method
> >> >> >> > TCPServer.run
> >> >> >> > of
> >> >> >> > <rospy.impl.tcpros_base.TCPServer object at 0x37a0990>>
> >> >> >> > Traceback (most recent call last):
> >> >> >> >   File
> >> >> >> >
> "/home/aarumbak/ros/ros/core/rospy/src/rospy/impl/tcpros_base.py",
> >> >> >> > line 141, in run
> >> >> >> >     (client_sock, client_addr) = self.server_sock.accept()
> >> >> >> >   File "/usr/lib/python2.6/socket.py", line 197, in accept
> >> >> >> >     sock, addr = self._sock.accept()
> >> >> >> > socket[INFO] 1296375790.973933: Manipulation applet is dying.
> RIP.
> >> >> >> > .error: [Errno 4] Interrupted system call
> >> >> >> > FYI, we run this software on Ubuntu 10.04 and I've never seen
> this
> >> >> >> > error
> >> >> >> > before. I got this error on Ubuntu 9.10 before (strangely the
> >> >> >> > problem
> >> >> >> > went
> >> >> >> > away then) and now in 10.10. Any help would be much appreciated.
> >> >> >> > -Arjun.
> >> >> >> > attachment: relevant log file.
> >> >> >> >
> >> >> >> >
> >> >> >> > _______________________________________________
> >> >> >> > ros-users mailing list
> >> >> >> > ros-users at code.ros.org
> >> >> >> > https://code.ros.org/mailman/listinfo/ros-users
> >> >> >> >
> >> >> >> >
> >> >> >> _______________________________________________
> >> >> >> ros-users mailing list
> >> >> >> ros-users at code.ros.org
> >> >> >> https://code.ros.org/mailman/listinfo/ros-users
> >> >> >
> >> >> >
> >> >> > _______________________________________________
> >> >> > ros-users mailing list
> >> >> > ros-users at code.ros.org
> >> >> > https://code.ros.org/mailman/listinfo/ros-users
> >> >> >
> >> >> >
> >> >>
> >> >> _______________________________________________
> >> >> ros-users mailing list
> >> >> ros-users at code.ros.org
> >> >> https://code.ros.org/mailman/listinfo/ros-users
> >> >>
> >> >
> >> >
> >> > _______________________________________________
> >> > ros-users mailing list
> >> > ros-users at code.ros.org
> >> > https://code.ros.org/mailman/listinfo/ros-users
> >> >
> >> >
> >> _______________________________________________
> >> ros-users mailing list
> >> ros-users at code.ros.org
> >> https://code.ros.org/mailman/listinfo/ros-users
> >
> >
> > _______________________________________________
> > ros-users mailing list
> > ros-users at code.ros.org
> > https://code.ros.org/mailman/listinfo/ros-users
> >
> >
> _______________________________________________
> ros-users mailing list
> ros-users at code.ros.org
> https://code.ros.org/mailman/listinfo/ros-users
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ros.org/pipermail/ros-users/attachments/20110202/d14158ba/attachment-0003.html>


More information about the ros-users mailing list