[ros-users] roscore not starting -- multiple network interfaces a problem?
Ken Conley
kwc at willowgarage.com
Wed Mar 2 21:51:42 UTC 2011
On Sun, Feb 27, 2011 at 1:32 PM, Patrick Bouffard
<bouffard at eecs.berkeley.edu> wrote:
> Thanks Ken,
>
> I had played around with a few combinations of ROS_IP, ROS_HOSTNAME
> and ROS_MASTER_URI last night but I guess I didn't hit on the right
> one. By setting:
>
> export ROS_IP=10.32.43.1
> export ROS_MASTER_URI=http://10.32.43.1:11311
>
> .. roscore starts without a hiccup. I noticed also that if I only set
> ROS_MASTER_URI, that it also works, though there is a pause between
> when it prints out "NODES" and "auto-starting new master". So I'm
> thinking it's best to have both set but I'd like to have a bit more
> clarity on what the difference is.
>
> For the record here's what happened when I tried your test steps:
>
> {{{
> In [1]: import xmlrpclib, os
>
> In [2]: s = xmlrpclib.ServerProxy(os.environ['ROS_MASTER_URI'])
>
> In [3]: s
> Out[3]: <ServerProxy for localhost:11311/RPC2>
>
> In [4]: s.getParam('/', '/rosdistro')
> ^C---------------------------------------------------------------------------
> KeyboardInterrupt Traceback (most recent call last)
>
> /home/bouffard/<ipython console> in <module>()
>
> /usr/lib/python2.6/xmlrpclib.pyc in __call__(self, *args)
> 1197 return _Method(self.__send, "%s.%s" % (self.__name, name))
> 1198 def __call__(self, *args):
> -> 1199 return self.__send(self.__name, args)
> 1200
> 1201 ##
>
>
> /usr/lib/python2.6/xmlrpclib.pyc in __request(self, methodname, params)
> 1487 self.__handler,
> 1488 request,
> -> 1489 verbose=self.__verbose
> 1490 )
> 1491
>
> /usr/lib/python2.6/xmlrpclib.pyc in request(self, host, handler,
> request_body, verbose)
> 1233 self.send_host(h, host)
> 1234 self.send_user_agent(h)
> -> 1235 self.send_content(h, request_body)
> 1236
> 1237 errcode, errmsg, headers = h.getreply()
>
> /usr/lib/python2.6/xmlrpclib.pyc in send_content(self, connection, request_body)
> 1347 connection.putheader("Content-Type", "text/xml")
> 1348 connection.putheader("Content-Length", str(len(request_body)))
> -> 1349 connection.endheaders()
> 1350 if request_body:
> 1351 connection.send(request_body)
>
> /usr/lib/python2.6/httplib.pyc in endheaders(self)
> 906 raise CannotSendHeader()
> 907
> --> 908 self._send_output()
> 909
> 910 def request(self, method, url, body=None, headers={}):
>
> /usr/lib/python2.6/httplib.pyc in _send_output(self)
> 778 msg = "\r\n".join(self._buffer)
> 779 del self._buffer[:]
> --> 780 self.send(msg)
> 781
> 782 def putrequest(self, method, url, skip_host=0,
> skip_accept_encoding=0):
>
> /usr/lib/python2.6/httplib.pyc in send(self, str)
> 737 if self.sock is None:
> 738 if self.auto_open:
> --> 739 self.connect()
> 740 else:
> 741 raise NotConnected()
>
> /usr/lib/python2.6/httplib.pyc in connect(self)
> 718 """Connect to the host and port specified in __init__."""
> 719 self.sock = socket.create_connection((self.host,self.port),
> --> 720 self.timeout)
> 721
> 722 if self._tunnel_host:
>
> /usr/lib/python2.6/socket.pyc in create_connection(address, timeout)
> 552 if timeout is not _GLOBAL_DEFAULT_TIMEOUT:
> 553 sock.settimeout(timeout)
> --> 554 sock.connect(sa)
> 555 return sock
> 556
>
> /usr/lib/python2.6/socket.pyc in connect(self, *args)
>
> KeyboardInterrupt:
>
> In [5]:
> }}}
>
> One thing that's still a bit confusing to me is this statement in the
> ROS_IP/ROS_HOSTNAME section of the EnvironmentVariables page:
>
> """
> With the exception of 'localhost', it does not affect the actual bound
> address as ROS components bind to all available network interfaces. If
> the value is set to localhost, the ROS component will bind only to the
> loopback interface. This will prevent remote components from being
> able to talk to your local component.
> """
>
> Is this referring only to ROS_HOSTNAME? I was thinking that it would
> apply as well to ROS_IP=127.0.0.1. It might be clearer if each of
> these variables had its own section.
(sorry for the late reply)
The variables for all intents and purposes are identical.
> Also, based on what we've seen is there a (low priority, mind you)
> ticket warranted here? Not sure if it would be a defect on roscore or
> perhaps an enhancement to roswtf to give the hint that ROS_MASTER_URI
> (and maybe also ROS_IP/ROS_HOSTNAME) should be set under certain
> conditions. Or even an enhancement to roscore so that if it takes
> longer than some timeout at that stage of startup you get a hint as to
> what to do.
Sure, a ticket with some suggested language would be great. The
latter one -- a timeout, is much trickier. We used to have more
heuristic timeout code, but it turns out those run into a lot of
problems on embedded platforms. Ultimately, providing useful advice
on diagnosing network issues is a hard problem.
- Ken
>
> Cheers,
> Pat
>
>
> On Sun, Feb 27, 2011 at 10:34 AM, Ken Conley <kwc at willowgarage.com> wrote:
>> On Sun, Feb 27, 2011 at 12:59 AM, Patrick Bouffard
>> <bouffard at eecs.berkeley.edu> wrote:
>>> Hi, I've just setup a new Ubuntu 10.10 box that will be running some
>>> ROS nodes, occasionally including roscore. I installed diamondback
>>> from debs this evening. This particular machine has a more complex
>>> networking setup than others I've setup before and I suspect that is
>>> giving me issues with running ROS.
>>>
>>> I'm pretty sure everything is setup as it ought to be in terms of my
>>> .bashrc (just source /opt/ros/diamondback/setup.bash). But when I run
>>> roscore it just hangs. After waiting awhile, after pressing Ctrl+C
>>> once, the following is output:
>>
>> This is saying to me that something is wrong whenever something tries
>> to talk to the host described in the master URI. The only network
>> call that occurs by this point is a call to check the existing
>> parameter server.
>>
>> Here is a pure Python script you can use to test this behavior:
>>
>> import xmlrpclib, os
>> s = xmlrpclib.ServerProxy(os.environ['ROS_MASTER_URI'])
>> s.getParam('/', '/rosdistro')
>>
>> You can change the os.environ['ROS_MASTER_URI'] to use different
>> hostnames/IP addresses to test the behavior of the network you setup.
>>
>>>
>>> {{{
>>> ^C... logging to
>>> /home/bouffard/.ros/log/ea02b894-424c-11e0-a499-00226bbd5586/roslaunch-lynx-3561.log
>>> Checking log directory for disk usage. This may take awhile.
>>> Press Ctrl-C to interrupt
>>> Done checking log file disk usage. Usage is <1GB.
>>>
>>> started roslaunch server http://lynx:52141/
>>> ros_comm version 1.4.4
>>>
>>> SUMMARY
>>> ========
>>>
>>> PARAMETERS
>>> * /rosversion
>>> * /rosdistro
>>>
>>> NODES
>>>
>>> auto-starting new master
>>> process[master]: started with pid [3576]
>>> ROS_MASTER_URI=http://lynx:11311/
>>>
>>> setting /run_id to ea02b894-424c-11e0-a499-00226bbd5586
>>> process[rosout-1]: started with pid [3589]
>>> started core service [/rosout]
>>> }}}
>>>
>>> At this point things seem to be working; roswtf returns no errors or
>>> warnings, I can run, e.g., rxconsole, rostopic list outputs /rosout
>>> and /rosout_agg, etc. But having to hit Ctrl+C is not so great.
>>>
>>> Also, without roscore running, if I run roswtf it also hangs after displaying:
>>>
>>> {{{
>>> bouffard at lynx:~$ roswtf
>>> Loaded plugin tf.tfwtf
>>> No package or stack in context
>>> ================================================================================
>>> Static checks summary:
>>>
>>> No errors or warnings
>>> ================================================================================
>>> }}}
>>>
>>> If I then hit Ctrl+C I get the following traceback:
>>>
>>> {{{
>>> ^CTraceback (most recent call last):
>>> File "/opt/ros/diamondback/ros/bin/roswtf", line 35, in <module>
>>> roswtf.roswtf_main()
>>> File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 93, in roswtf_main
>>> _roswtf_main()
>>> File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 208, in _roswtf_main
>>> master = master_online()
>>> File "/opt/ros/diamondback/stacks/ros_comm/utilities/roswtf/src/roswtf/__init__.py",
>>> line 100, in master_online
>>> master.getPid('/roswtf')
>>> File "/usr/lib/python2.6/xmlrpclib.py", line 1199, in __call__
>>> return self.__send(self.__name, args)
>>> File "/usr/lib/python2.6/xmlrpclib.py", line 1489, in __request
>>> verbose=self.__verbose
>>> File "/usr/lib/python2.6/xmlrpclib.py", line 1235, in request
>>> self.send_content(h, request_body)
>>> File "/usr/lib/python2.6/xmlrpclib.py", line 1349, in send_content
>>> connection.endheaders()
>>> File "/usr/lib/python2.6/httplib.py", line 908, in endheaders
>>> self._send_output()
>>> File "/usr/lib/python2.6/httplib.py", line 780, in _send_output
>>> self.send(msg)
>>> File "/usr/lib/python2.6/httplib.py", line 739, in send
>>> self.connect()
>>> File "/usr/lib/python2.6/httplib.py", line 720, in connect
>>> self.timeout)
>>> File "/usr/lib/python2.6/socket.py", line 554, in create_connection
>>> sock.connect(sa)
>>> File "<string>", line 1, in connect
>>> KeyboardInterrupt
>>> bouffard at lynx:~$
>>> }}}
>>
>> My theory is that this is the same pause as described above. It's
>> hanging in an xmlrpc call to the master (aka Parameter Server).
>>
>>> Just to check that it wasn't something in the latest diamondback
>>> release candidate, I dist-upgrade'd and tried these same commands on
>>> another couple machines (that have been running some version of ROS
>>> for awhile and are similarly configured, Ubuntu 10.0, diamondback
>>> debs) with no problems.
>>>
>>> Based on the roswtf traceback and the main weirdness of the current
>>> box being its network config (it has three wired network interfaces),
>>> I'm suspecting it has something to do with that. However, I still see
>>> the same behaviour if I sudo ifdown all the interfaces besides lo.
>>>
>>> I'm not a networking expert but I noticed on the EnvironmentVariables
>>> wiki page: ".. ROS components bind to all available network
>>> interfaces.". Could this have something to do with my issues?
>>
>> You can change this behavior by setting ROS_IP or ROS_HOSTNAME. Using
>> either tells a particular process to bind to a specific interface.
>> All evidence thus far is that something is wrong with how the 'lynx'
>> hostname is configured.
>>
>> - Ken
>>
>>> Here's the output of ifconfig -a in case that helps:
>>>
>>> {{{
>>> bouffard at lynx:~$ ifconfig -a
>>> eth1 Link encap:Ethernet HWaddr xx:xx;xx:xx:xx:xx
>>> inet addr:128.32.43.208 Bcast:128.32.43.255 Mask:255.255.255.0
>>> inet6 addr: fe80::218:8bff:fe74:766d/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:1066 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:504 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:329324 (329.3 KB) TX bytes:116257 (116.2 KB)
>>> Interrupt:17
>>>
>>> eth2 Link encap:Ethernet HWaddr xx:xx;xx:xx:xx:xx
>>> inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
>>> inet6 addr: fe80::e291:f5ff:fe94:cc3/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:1264 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:1342 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:78005 (78.0 KB) TX bytes:67277 (67.2 KB)
>>> Interrupt:17 Base address:0xef00
>>>
>>> eth3 Link encap:Ethernet HWaddr xx:xx;xx:xx:xx:xx
>>> inet addr:10.32.43.1 Bcast:10.32.43.255 Mask:255.255.255.0
>>> inet6 addr: fe80::222:6bff:febd:5586/64 Scope:Link
>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
>>> RX packets:235591 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:451507 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:1000
>>> RX bytes:17233068 (17.2 MB) TX bytes:629352191 (629.3 MB)
>>> Interrupt:16 Base address:0x2e00
>>>
>>> lo Link encap:Local Loopback
>>> inet addr:127.0.0.1 Mask:255.0.0.0
>>> inet6 addr: ::1/128 Scope:Host
>>> UP LOOPBACK RUNNING MTU:16436 Metric:1
>>> RX packets:8526 errors:0 dropped:0 overruns:0 frame:0
>>> TX packets:8526 errors:0 dropped:0 overruns:0 carrier:0
>>> collisions:0 txqueuelen:0
>>> RX bytes:817773 (817.7 KB) TX bytes:817773 (817.7 KB)
>>> }}}
>>>
>>> eth1 is the connection to the internet, eth2 is a crossover cable to
>>> another machine, and eth3 connected to a private subnet. Iptables is
>>> configured to allow machines on the 10.32.43.x subnet to access the
>>> internet via eth1. It's possible something I did in setting that up
>>> had the side-effect of messing with ROS, as I said I'm no networking
>>> expert. Hopefully one of you is! :)
>>>
>>> Thanks,
>>> Pat
>>> _______________________________________________
>>> ros-users mailing list
>>> ros-users at code.ros.org
>>> https://code.ros.org/mailman/listinfo/ros-users
>>>
>>
>
More information about the ros-users
mailing list