[ros-users] roscore not starting -- multiple network interfaces a problem?

Ken Conley kwc at willowgarage.com
Wed Mar 2 21:51:42 UTC 2011


On Sun, Feb 27, 2011 at 1:32 PM, Patrick Bouffard
<bouffard at eecs.berkeley.edu> wrote:
> Thanks Ken,
>
> I had played around with a few combinations of ROS_IP, ROS_HOSTNAME
> and ROS_MASTER_URI last night but I guess I didn't hit on the right
> one. By setting:
>
> export ROS_IP=10.32.43.1
> export ROS_MASTER_URI=http://10.32.43.1:11311
>
> .. roscore starts without a hiccup. I noticed also that if I only set
> ROS_MASTER_URI, that it also works, though there is a pause between
> when it prints out "NODES" and "auto-starting new master". So I'm
> thinking it's best to have both set but I'd like to have a bit more
> clarity on what the difference is.
>
> For the record here's what happened when I tried your test steps:
>
> {{{
> In [1]: import xmlrpclib, os
>
> In [2]: s = xmlrpclib.ServerProxy(os.environ['ROS_MASTER_URI'])
>
> In [3]: s
> Out[3]: <ServerProxy for localhost:11311/RPC2>
>
> In [4]: s.getParam('/', '/rosdistro')
> ^C---------------------------------------------------------------------------
> KeyboardInterrupt                         Traceback (most recent call last)
>
> /home/bouffard/<ipython console> in <module>()
>
> /usr/lib/python2.6/xmlrpclib.pyc in __call__(self, *args)
>   1197         return _Method(self.__send, "%s.%s" % (self.__name, name))
>   1198     def __call__(self, *args):
> -> 1199         return self.__send(self.__name, args)
>   1200
>   1201 ##
>
>
> /usr/lib/python2.6/xmlrpclib.pyc in __request(self, methodname, params)
>   1487             self.__handler,
>   1488             request,
> -> 1489             verbose=self.__verbose
>   1490             )
>   1491
>
> /usr/lib/python2.6/xmlrpclib.pyc in request(self, host, handler,
> request_body, verbose)
>   1233         self.send_host(h, host)
>   1234         self.send_user_agent(h)
> -> 1235         self.send_content(h, request_body)
>   1236
>   1237         errcode, errmsg, headers = h.getreply()
>
> /usr/lib/python2.6/xmlrpclib.pyc in send_content(self, connection, request_body)
>   1347         connection.putheader("Content-Type", "text/xml")
>   1348         connection.putheader("Content-Length", str(len(request_body)))
> -> 1349         connection.endheaders()
>   1350         if request_body:
>   1351             connection.send(request_body)
>
> /usr/lib/python2.6/httplib.pyc in endheaders(self)
>    906             raise CannotSendHeader()
>    907
> --> 908         self._send_output()
>    909
>    910     def request(self, method, url, body=None, headers={}):
>
> /usr/lib/python2.6/httplib.pyc in _send_output(self)
>    778         msg = "\r\n".join(self._buffer)
>    779         del self._buffer[:]
> --> 780         self.send(msg)
>    781
>    782     def putrequest(self, method, url, skip_host=0,
> skip_accept_encoding=0):
>
> /usr/lib/python2.6/httplib.pyc in send(self, str)
>    737         if self.sock is None:
>    738             if self.auto_open:
> --> 739                 self.connect()
>    740             else:
>    741                 raise NotConnected()
>
> /usr/lib/python2.6/httplib.pyc in connect(self)
>    718         """Connect to the host and port specified in __init__."""
>    719         self.sock = socket.create_connection((self.host,self.port),
> --> 720                                              self.timeout)
>    721
>    722         if self._tunnel_host:
>
> /usr/lib/python2.6/socket.pyc in create_connection(address, timeout)
>    552             if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
>    553                 sock.settimeout(timeout)
> --> 554             sock.connect(sa)
>    555             return sock
>    556
>
> /usr/lib/python2.6/socket.pyc in connect(self, *args)
>
> KeyboardInterrupt:
>
> In [5]:
> }}}
>
> One thing that's still a bit confusing to me is this statement in the
> ROS_IP/ROS_HOSTNAME section of the EnvironmentVariables page:
>
> """
> With the exception of 'localhost', it does not affect the actual bound
> address as ROS components bind to all available network interfaces. If
> the value is set to localhost, the ROS component will bind only to the
> loopback interface. This will prevent remote components from being
> able to talk to your local component.
> """
>
> Is this referring only to ROS_HOSTNAME? I was thinking that it would
> apply as well to ROS_IP=127.0.0.1. It might be clearer if each of
> these variables had its own section.

(sorry for the late reply)

The variables for all intents and purposes are identical.

> Also, based on what we've seen is there a (low priority, mind you)
> ticket warranted here? Not sure if it would be a defect on roscore or
> perhaps an enhancement to roswtf to give the hint that ROS_MASTER_URI
> (and maybe also ROS_IP/ROS_HOSTNAME) should be set under certain
> conditions. Or even an enhancement to roscore so that if it takes
> longer than some timeout at that stage of startup you get a hint as to
> what to do.

Sure, a ticket with some suggested language would be great.  The
latter one -- a timeout, is much trickier.  We used to have more
heuristic timeout code, but it turns out those run into a lot of
problems on embedded platforms.  Ultimately, providing useful advice
on diagnosing network issues is a hard problem.

 - Ken

>
> Cheers,
> Pat
>
>
> On Sun, Feb 27, 2011 at 10:34 AM, Ken Conley <kwc at willowgarage.com> wrote:
>> On Sun, Feb 27, 2011 at 12:59 AM, Patrick Bouffard
>> <bouffard at eecs.berkeley.edu> wrote:
>>> Hi, I've just setup a new Ubuntu 10.10 box that will be running some
>>> ROS nodes, occasionally including roscore. I installed diamondback
>>> from debs this evening. This particular machine has a more complex
>>> networking setup than others I've setup before and I suspect that is
>>> giving me issues with running ROS.
>>>
>>> I'm pretty sure everything is setup as it ought to be in terms of my
>>> .bashrc (just source /opt/ros/diamondback/setup.bash). But when I run
>>> roscore it just hangs. After waiting awhile, after pressing Ctrl+C
>>> once, the following is output:
>>
>> This is saying to me that something is wrong whenever something tries
>> to talk to the host described in the master URI.  The only network
>> call that occurs by this point is a call to check the existing
>> parameter server.
>>
>> Here is a pure Python script you can use to test this behavior:
>>
>> import xmlrpclib, os
>> s = xmlrpclib.ServerProxy(os.environ['ROS_MASTER_URI'])
>> s.getParam('/', '/rosdistro')
>>
>> You can change the os.environ['ROS_MASTER_URI'] to use different
>> hostnames/IP addresses to test the behavior of the network you setup.
>>
>>>
>>> {{{
>>> ^C... logging to
>>> /home/bouffard/.ros/log/ea02b894-424c-11e0-a499-00226bbd5586/roslaunch-lynx-3561.log
>>> Checking log directory for disk usage. This may take awhile.
>>> Press Ctrl-C to interrupt
>>> Done checking log file disk usage. Usage is <1GB.
>>>
>>> started roslaunch server http://lynx:52141/
>>> ros_comm version 1.4.4
>>>
>>> SUMMARY
>>> ========
>>>
>>> PARAMETERS
>>>  * /rosversion
>>>  * /rosdistro
>>>
>>> NODES
>>>
>>> auto-starting new master
>>> process[master]: started with pid [3576]
>>> ROS_MASTER_URI=http://lynx:11311/
>>>
>>> setting /run_id to ea02b894-424c-11e0-a499-00226bbd5586
>>> process[rosout-1]: started with pid [3589]
>>> started core service [/rosout]
>>> }}}
>>>
>>> At this point things seem to be working; roswtf returns no errors or
>>> warnings, I can run, e.g., rxconsole, rostopic list outputs /rosout
>>> and /rosout_agg, etc. But having to hit Ctrl+C is not so great.
>>>
>>> Also, without roscore running, if I run roswtf it also hangs after displaying:
>>>
>>> {{{
>>> bouffard at lynx:~$ roswtf
>>> Loaded plugin tf.tfwtf
>>> No package or stack in context
>>> ================================================================================
>>> Static checks summary:
>>>
>>> No errors or warnings
>>> ================================================================================
>>> }}}
>>>
>>> If I then hit Ctrl+C I get the following traceback:
>>>
>>> {{{
>>> ^CTraceback (most recent call last):
>>>  File "/opt/ros/diamondback/ros/bin/roswtf", line 35, in <module>
>>>    roswtf.roswtf_main()
>>>  File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 93, in roswtf_main
>>>    _roswtf_main()
>>>  File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 208, in _roswtf_main
>>>    master = master_online()
>>>  File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 100, in master_online
>>>    master.getPid('/roswtf')
>>>  File "/usr/lib/python2.6/xmlrpclib.py", line 1199, in __call__
>>>    return self.__send(self.__name, args)
>>>  File "/usr/lib/python2.6/xmlrpclib.py", line 1489, in __request
>>>    verbose=self.__verbose
>>>  File "/usr/lib/python2.6/xmlrpclib.py", line 1235, in request
>>>    self.send_content(h, request_body)
>>>  File "/usr/lib/python2.6/xmlrpclib.py", line 1349, in send_content
>>>    connection.endheaders()
>>>  File "/usr/lib/python2.6/httplib.py", line 908, in endheaders
>>>    self._send_output()
>>>  File "/usr/lib/python2.6/httplib.py", line 780, in _send_output
>>>    self.send(msg)
>>>  File "/usr/lib/python2.6/httplib.py", line 739, in send
>>>    self.connect()
>>>  File "/usr/lib/python2.6/httplib.py", line 720, in connect
>>>    self.timeout)
>>>  File "/usr/lib/python2.6/socket.py", line 554, in create_connection
>>>    sock.connect(sa)
>>>  File "<string>", line 1, in connect
>>> KeyboardInterrupt
>>> bouffard at lynx:~$
>>> }}}
>>
>> My theory is that this is the same pause as described above.  It's
>> hanging in an xmlrpc call to the master (aka Parameter Server).
>>
>>> Just to check that it wasn't something in the latest diamondback
>>> release candidate, I dist-upgrade'd and tried these same commands on
>>> another couple machines (that have been running some version of ROS
>>> for awhile and are similarly configured, Ubuntu 10.0, diamondback
>>> debs) with no problems.
>>>
>>> Based on the roswtf traceback and the main weirdness of the current
>>> box being its network config (it has three wired network interfaces),
>>> I'm suspecting it has something to do with that. However, I still see
>>> the same behaviour if I sudo ifdown all the interfaces besides lo.
>>>
>>> I'm not a networking expert but I noticed on the EnvironmentVariables
>>> wiki page: ".. ROS components bind to all available network
>>> interfaces.". Could this have something to do with my issues?
>>
>> You can change this behavior by setting ROS_IP or ROS_HOSTNAME.  Using
>> either tells a particular process to bind to a specific interface.
>> All evidence thus far is that something is wrong with how the 'lynx'
>> hostname is configured.
>>
>>  - Ken
>>
>>> Here's the output of ifconfig -a in case that helps:
>>>
>>> {{{
>>> bouffard at lynx:~$ ifconfig -a
>>> eth1      Link encap:Ethernet  HWaddr xx:xx;xx:xx:xx:xx
>>>          inet addr:128.32.43.208  Bcast:128.32.43.255  Mask:255.255.255.0
>>>          inet6 addr: fe80::218:8bff:fe74:766d/64 Scope:Link
>>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>          RX packets:1066 errors:0 dropped:0 overruns:0 frame:0
>>>          TX packets:504 errors:0 dropped:0 overruns:0 carrier:0
>>>          collisions:0 txqueuelen:1000
>>>          RX bytes:329324 (329.3 KB)  TX bytes:116257 (116.2 KB)
>>>          Interrupt:17
>>>
>>> eth2      Link encap:Ethernet  HWaddr xx:xx;xx:xx:xx:xx
>>>          inet addr:192.168.1.1  Bcast:192.168.1.255  Mask:255.255.255.0
>>>          inet6 addr: fe80::e291:f5ff:fe94:cc3/64 Scope:Link
>>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>          RX packets:1264 errors:0 dropped:0 overruns:0 frame:0
>>>          TX packets:1342 errors:0 dropped:0 overruns:0 carrier:0
>>>          collisions:0 txqueuelen:1000
>>>          RX bytes:78005 (78.0 KB)  TX bytes:67277 (67.2 KB)
>>>          Interrupt:17 Base address:0xef00
>>>
>>> eth3      Link encap:Ethernet  HWaddr xx:xx;xx:xx:xx:xx
>>>          inet addr:10.32.43.1  Bcast:10.32.43.255  Mask:255.255.255.0
>>>          inet6 addr: fe80::222:6bff:febd:5586/64 Scope:Link
>>>          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>>>          RX packets:235591 errors:0 dropped:0 overruns:0 frame:0
>>>          TX packets:451507 errors:0 dropped:0 overruns:0 carrier:0
>>>          collisions:0 txqueuelen:1000
>>>          RX bytes:17233068 (17.2 MB)  TX bytes:629352191 (629.3 MB)
>>>          Interrupt:16 Base address:0x2e00
>>>
>>> lo        Link encap:Local Loopback
>>>          inet addr:127.0.0.1  Mask:255.0.0.0
>>>          inet6 addr: ::1/128 Scope:Host
>>>          UP LOOPBACK RUNNING  MTU:16436  Metric:1
>>>          RX packets:8526 errors:0 dropped:0 overruns:0 frame:0
>>>          TX packets:8526 errors:0 dropped:0 overruns:0 carrier:0
>>>          collisions:0 txqueuelen:0
>>>          RX bytes:817773 (817.7 KB)  TX bytes:817773 (817.7 KB)
>>> }}}
>>>
>>> eth1 is the connection to the internet, eth2 is a crossover cable to
>>> another machine, and eth3 connected to a private subnet. Iptables is
>>> configured to allow machines on the 10.32.43.x subnet to access the
>>> internet via eth1. It's possible something I did in setting that up
>>> had the side-effect of messing with ROS, as I said I'm no networking
>>> expert. Hopefully one of you is! :)
>>>
>>> Thanks,
>>> Pat
>>> _______________________________________________
>>> ros-users mailing list
>>> ros-users at code.ros.org
>>> https://code.ros.org/mailman/listinfo/ros-users
>>>
>>
>



More information about the ros-users mailing list