[ros-release] Build Farm Spam and Debugging

William Woodall william at osrfoundation.org
Thu Apr 7 20:33:06 UTC 2016


On Thu, Apr 7, 2016 at 12:42 PM, David Lu!! via ros-release <
ros-release at lists.ros.org> wrote:

> I apologize for using the pejorative term spam. I was using it to
> refer to the large quantities of unwanted and arguably unnecessary
> emails. My ideal build-farm related emails would have the following
> qualities. (This is an ideal, not a list of demands.)
>

It definitely a well founded criticism that there are lots of emails from
the build farm that either don't convey information effectively (bad UX) or
seem redundant or unnecessary.


>
> 1) One email per problem.
>   * If my code doesn't compile on any machine, it should email me
> once, even if it means that the build fails for multiple machines.
>   * If I haven't changed anything, I don't need another email 15
> minutes later. Maybe if I haven't fixed it in N days, send another
> email.
>

This would be really hard to do since it would imply that the email sending
code can tell which problems are the same. For example, consider the same
package fails on Trusty and on Vivid, but each for a different reason.
Should it have sent you a single email (potentially hiding one of the
issues)? How can it reliably know? I think it's better just to send both at
the risk of being redundant.

Even well designed and developed services like travis-ci and drone.io don't
have this ability. AFAIK, they will send you a failure email for each item
in the matrix that fails.

It would be nice if we could group these failures and send a single email
per package, even if there are multiple failures, but because of the
asynchronous nature of the buildfarm you'd have to wait for all jobs to
finish to let someone know about any failures. So you're trading off
responsiveness for conciseness, which isn't a clear thing to choose in my
opinion. So you could argue that there should be a setting. And that's all
well and good until you realize how complicated implementing that is within
Jenkins. :)

If a package is failing you'll only get an email once per day per platform
(except in rare cases, like a bad source deb has been released), at least
in my experience. I suppose those could be further throttled to once per
week, but sorting out the one for all platforms is problematic for the
reasons I described above.

I sympathize with what you're saying, but unfortunately I don't see that
there are simple solutions to be had.


> 2) Clear Messages
>   * One email subject was "Build failed in Jenkins:
> Ksrc_dJ__map_msgs__debian_jessie__source #96". As someone who has lots
> of experience breaking the build, I can kinda parse this out, but
> there's lots of garbage in there. Something like "[jenkins] map_msgs
> source failed to build on jade (debian) #96" reads nicer.
>

Yeah, I agree that the UX of the emails from the build farm is not great.
Fortunately Jenkins seems to be infinitely flexible, but unfortunately only
by writing Groovy scripts using a poorly documented API :). We sunk a lot
of time on the build farm and I believe we actually improved a lot of
stuff, but we didn't get to this part.


>   * At the very bottom of the email and log is the relevant error
> message, but its never easy for me to pick out what the actual problem
> is on these, whether it was something I did, something OSRF changed on
> the build farm, or some momentary github hiccup.

3) Relevant to me
>   * I'm unsure whether this is still an issue in the new build farm,
> but I very rarely care if upstream or downstream packages momentarily
> break.
>

If you think it's hard for you to know when the problem is yours or the
systems, imagine writing a program to determine that. It's not easy. The
new build farm does lots of things to try to eat temporary internet
problems and try again without mailing you, for example:
https://github.com/ros-infrastructure/ros_buildfarm/blob/19dfc96e3e4924dfd633c8a24fe994c3a1df43e9/scripts/wrapper/apt-get.py#L101-L104

But it's really hard to catch all of them. I get every single failure email
for Jade on the new farm, and think the new farm sends far fewer false
positives then the old one thanks to Dirk and Tully's persistence trying to
squash false positives when they see them. So if you get an email and you
think it's a false positive please look to see if there's a ticket for it
on ros-infrastructure/ros_buildfarm, and if there's not then make one or
email here. Otherwise we'll never catch all of them. I won't promise we can
address everyone you report to us, but we will try as we have time.

I think our best chance to reduce the confusion on where the problem lies
is to let the build farm stabilize (since the new rewrite) and
asymptotically approach zero false positives over time by fixing false
positives when we can. And that's what we're doing at the moment which will
hopefully minimize the "something OSRF changed on
the build farm" and the "some momentary github hiccup" issues. Sorting out
the reason for the rest of the actual failures can only be solved by
knowledge I think. Maybe someone should start an FAQ or a build farm
troubleshooting guide where people can jot down common issues that are
actual issues and not false positive failures.


>
> I'm sure none of these is easy to implement, and would require lots of
> intelligence programmed in that doesn't exist yet. However, I
> occasionally consider forwarding all my build farm emails to another
> account which sends me summaries of them just to cut down.
>

You're totally right, and just to be clear I'm not disagreeing with you in
this email, I'm just trying to spread some more details for those who are
interested :).


>
> Thanks for fixing the problem and for the offer of future help.
>
> Sincerely,
> David!!
>
> On Thu, Apr 7, 2016 at 2:57 PM, Dirk Thomas via ros-release
> <ros-release at lists.ros.org> wrote:
> > Jackie and I came up with a fix for the Debian jobs:
> > https://github.com/ros-infrastructure/ros_buildfarm/pull/281 It fixes
> the
> > locale on Debian and afterwards is able to parse and decode the manifest
> > correctly. The job in question has passed now.
> >
> > The emails are sent for a reason and not really "spam". They are
> notifying
> > you about a problem with a package you. Please feel free to ask
> questions on
> > this mailing list if a job is "bugging" you with emails and you are not
> sure
> > how to address it. That is always better then to ignore the
> notifications.
> >
> > Thanks,
> > - Dirk
> >
> > On Thu, Apr 7, 2016 at 10:55 AM, Dirk Thomas <dthomas at osrfoundation.org>
> > wrote:
> >>
> >> Before drawing any conclusions please take a close look to the actual
> >> error message:
> >>
> >> ```
> >> Traceback (most recent call last):
> >>   File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 34, in
> >> <module>
> >>     sys.exit(main())
> >>   File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 30, in
> >> main
> >>     args.os_name, args.os_code_name, args.source_dir)
> >>   File "/tmp/ros_buildfarm/ros_buildfarm/sourcedeb_job.py", line 52, in
> >> get_sources
> >>     pkg = parse_package(sources_dir)
> >>   File "/usr/lib/python3/dist-packages/catkin_pkg/package.py", line 370,
> >> in parse_package
> >>     return parse_package_string(f.read(), filename, warnings=warnings)
> >>   File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
> >>     return codecs.ascii_decode(input, self.errors)[0]
> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
> 183:
> >> ordinal not in range(128)
> >> ```
> >>
> >> Decoding a non-ASCII character when then codec is set to `ascii` can't
> >> work. Since `!` is a valid ASCII character that can't be the reason. So
> it
> >> is likely the `é`.
> >>
> >> What needs to change here? Well, the first important fact to note is
> that
> >> the sourcedeb jobs for the same package pass on all Ubuntu platforms. So
> >> obviously the buildfarm as well as catkin_pkg are able to handle this.
> >>
> >> Why is it failing for this Debian job then? Debian has just recently
> been
> >> added and there must be something different. Looking at the console
> output
> >> again the following warnings just a few lines above should be enough:
> >>
> >> ```
> >> perl: warning: Setting locale failed.
> >> perl: warning: Please check that your locale settings:
> >> LANGUAGE = (unset),
> >> LC_ALL = (unset),
> >> LANG = "en_US.UTF-8"
> >>     are supported and installed on your system.
> >> perl: warning: Falling back to the standard locale ("C").
> >> perl: warning: Setting locale failed.
> >> perl: warning: Please check that your locale settings:
> >> LANGUAGE = (unset),
> >> LC_ALL = (unset),
> >> LANG = "en_US.UTF-8"
> >>     are supported and installed on your system.
> >> perl: warning: Falling back to the standard locale ("C").
> >> ```
> >>
> >> While the job tries to use UTF-8 it fails to do so on Debian and falls
> >> back to the standard locale "C". And with that locale it is simply
> >> impossible to decode the special character.
> >>
> >> As a conclusion the recent modification to enable Debian jobs
> >> (
> https://github.com/ros-infrastructure/ros_buildfarm/commit/5106e704ecdb2d55ac513da66ec8d699731bb859#diff-8859c2a0d6adc3dc403698f904632818
> )
> >> needs fixing since it is not enabling the UTF-8 locale as expected.
> >>
> >> Cheers,
> >> - Dirk
> >>
> >>
> >> On Thu, Apr 7, 2016 at 10:07 AM, Jackie Kay via ros-release
> >> <ros-release at lists.ros.org> wrote:
> >>>
> >>> I also find the build farm spam quite annoying. I get the same
> >>> notification failure whenever anything on Kinetic breaks. :)
> >>>
> >>> Glancing at the package.xml for map_msgs, could it be that the accented
> >>> "é" in the author field (Stéphane Magnenat) is breaking unicode
> parsing, not
> >>> the exclamation points?
> >>>
> >>>
> >>>
> https://github.com/ros-planning/navigation_msgs/blob/jade-devel/map_msgs/package.xml
> >>>
> >>> I will look into sanitizing the characters so that we can support
> package
> >>> maintainers and authors with unicode characters in their names.
> >>>
> >>>
> >>> On Thu, Apr 7, 2016 at 9:32 AM, Tully Foote via ros-release
> >>> <ros-release at lists.ros.org> wrote:
> >>>>
> >>>> I think the exclamation points in your name are breaking the unicode
> >>>> parsing on the debian target. Looking at the documentation the
> maintainer
> >>>> field only supports ASCII characters.
> >>>>
> >>>> The debian maintainer field format is defined by RFC822
> >>>>
> https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Maintainer
> >>>> There's a note about problems with full stop.
> >>>>
> >>>> And RFC822 can be seen here
> >>>> https://www.w3.org/Protocols/rfc822/3_Lexical.html
> >>>>
> >>>> I think it's likely that debian is more strict in it's interpretation
> of
> >>>> the formatting. And the quickest solution will be to remove the
> characters
> >>>> from the maintainer name. It could either be patched in the jessie
> specific
> >>>> branch in the release repo. The simplest solution is a rerelease
> without the
> >>>> characters.
> >>>>
> >>>> You could open an issue on
> >>>> https://github.com/ros-infrastructure/ros_buildfarm and we could
> look at
> >>>> sanitizing the characters in that field, but developing the correct
> >>>> replacement/substitituion policy may be a challenge.
> >>>>
> >>>> B) Sourcedebs are triggered on a 15 minute cycle since people want
> them
> >>>> to come out quickly. We typically will roll back any releases which
> fail
> >>>> their sourcedeb jobs since it's rarely a platform specific issue. And
> if the
> >>>> sourcedeb is not working we might as well pull it from the farm.
> >>>>
> >>>> Tully
> >>>>
> >>>> On Thu, Apr 7, 2016 at 6:27 AM, David Lu!! via ros-release
> >>>> <ros-release at lists.ros.org> wrote:
> >>>>>
> >>>>> I am but a humble navigation code monkey, and am not trained in the
> >>>>> ways of the build farm. Can someone help me interpret what's going on
> >>>>> that's causing
> >>>>> A) A specific build to fail e.g.
> >>>>>
> >>>>>
> http://build.ros.org/job/Ksrc_dJ__map_msgs__debian_jessie__source/87/console
> >>>>> B) So many emails (see attached)
> >>>>>
> >>>>> _______________________________________________
> >>>>> ros-release mailing list
> >>>>> ros-release at lists.ros.org
> >>>>> http://lists.ros.org/mailman/listinfo/ros-release
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> ros-release mailing list
> >>>> ros-release at lists.ros.org
> >>>> http://lists.ros.org/mailman/listinfo/ros-release
> >>>>
> >>>
> >>>
> >>> _______________________________________________
> >>> ros-release mailing list
> >>> ros-release at lists.ros.org
> >>> http://lists.ros.org/mailman/listinfo/ros-release
> >>>
> >>
> >
> >
> > _______________________________________________
> > ros-release mailing list
> > ros-release at lists.ros.org
> > http://lists.ros.org/mailman/listinfo/ros-release
> >
> _______________________________________________
> ros-release mailing list
> ros-release at lists.ros.org
> http://lists.ros.org/mailman/listinfo/ros-release
>



-- 
William Woodall
ROS Development Team
william at osrfoundation.org
http://wjwwood.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ros.org/pipermail/ros-release/attachments/20160407/8510d06a/attachment-0001.html>


More information about the ros-release mailing list