[ros-release] Build Farm Spam and Debugging

David Lu!! davidvlu+ros at gmail.com
Thu Apr 7 20:43:38 UTC 2016


Thanks William. I can definitely relate to how difficult it is to
cover all cases. I went through a lot of that when trying to make
ros_crawl into some sort of usable frontend to the build farm.
(http://metrorobots.com/ros_crawl/users.html#vincent_rabaud_gmail_com)
Unfortunately, ros_crawl was only out for two weeks before the build
farm changed and stopped generating the data I was using to generate
it.

On Thu, Apr 7, 2016 at 4:33 PM, William Woodall
<william at osrfoundation.org> wrote:
> On Thu, Apr 7, 2016 at 12:42 PM, David Lu!! via ros-release
> <ros-release at lists.ros.org> wrote:
>>
>> I apologize for using the pejorative term spam. I was using it to
>> refer to the large quantities of unwanted and arguably unnecessary
>> emails. My ideal build-farm related emails would have the following
>> qualities. (This is an ideal, not a list of demands.)
>
>
> It definitely a well founded criticism that there are lots of emails from
> the build farm that either don't convey information effectively (bad UX) or
> seem redundant or unnecessary.
>
>>
>>
>> 1) One email per problem.
>>   * If my code doesn't compile on any machine, it should email me
>> once, even if it means that the build fails for multiple machines.
>>   * If I haven't changed anything, I don't need another email 15
>> minutes later. Maybe if I haven't fixed it in N days, send another
>> email.
>
>
> This would be really hard to do since it would imply that the email sending
> code can tell which problems are the same. For example, consider the same
> package fails on Trusty and on Vivid, but each for a different reason.
> Should it have sent you a single email (potentially hiding one of the
> issues)? How can it reliably know? I think it's better just to send both at
> the risk of being redundant.
>
> Even well designed and developed services like travis-ci and drone.io don't
> have this ability. AFAIK, they will send you a failure email for each item
> in the matrix that fails.
>
> It would be nice if we could group these failures and send a single email
> per package, even if there are multiple failures, but because of the
> asynchronous nature of the buildfarm you'd have to wait for all jobs to
> finish to let someone know about any failures. So you're trading off
> responsiveness for conciseness, which isn't a clear thing to choose in my
> opinion. So you could argue that there should be a setting. And that's all
> well and good until you realize how complicated implementing that is within
> Jenkins. :)
>
> If a package is failing you'll only get an email once per day per platform
> (except in rare cases, like a bad source deb has been released), at least in
> my experience. I suppose those could be further throttled to once per week,
> but sorting out the one for all platforms is problematic for the reasons I
> described above.
>
> I sympathize with what you're saying, but unfortunately I don't see that
> there are simple solutions to be had.
>
>>
>> 2) Clear Messages
>>   * One email subject was "Build failed in Jenkins:
>> Ksrc_dJ__map_msgs__debian_jessie__source #96". As someone who has lots
>> of experience breaking the build, I can kinda parse this out, but
>> there's lots of garbage in there. Something like "[jenkins] map_msgs
>> source failed to build on jade (debian) #96" reads nicer.
>
>
> Yeah, I agree that the UX of the emails from the build farm is not great.
> Fortunately Jenkins seems to be infinitely flexible, but unfortunately only
> by writing Groovy scripts using a poorly documented API :). We sunk a lot of
> time on the build farm and I believe we actually improved a lot of stuff,
> but we didn't get to this part.
>
>>
>>   * At the very bottom of the email and log is the relevant error
>> message, but its never easy for me to pick out what the actual problem
>> is on these, whether it was something I did, something OSRF changed on
>> the build farm, or some momentary github hiccup.
>>
>> 3) Relevant to me
>>   * I'm unsure whether this is still an issue in the new build farm,
>> but I very rarely care if upstream or downstream packages momentarily
>> break.
>
>
> If you think it's hard for you to know when the problem is yours or the
> systems, imagine writing a program to determine that. It's not easy. The new
> build farm does lots of things to try to eat temporary internet problems and
> try again without mailing you, for example:
> https://github.com/ros-infrastructure/ros_buildfarm/blob/19dfc96e3e4924dfd633c8a24fe994c3a1df43e9/scripts/wrapper/apt-get.py#L101-L104
>
> But it's really hard to catch all of them. I get every single failure email
> for Jade on the new farm, and think the new farm sends far fewer false
> positives then the old one thanks to Dirk and Tully's persistence trying to
> squash false positives when they see them. So if you get an email and you
> think it's a false positive please look to see if there's a ticket for it on
> ros-infrastructure/ros_buildfarm, and if there's not then make one or email
> here. Otherwise we'll never catch all of them. I won't promise we can
> address everyone you report to us, but we will try as we have time.
>
> I think our best chance to reduce the confusion on where the problem lies is
> to let the build farm stabilize (since the new rewrite) and asymptotically
> approach zero false positives over time by fixing false positives when we
> can. And that's what we're doing at the moment which will hopefully minimize
> the "something OSRF changed on
> the build farm" and the "some momentary github hiccup" issues. Sorting out
> the reason for the rest of the actual failures can only be solved by
> knowledge I think. Maybe someone should start an FAQ or a build farm
> troubleshooting guide where people can jot down common issues that are
> actual issues and not false positive failures.
>
>>
>>
>> I'm sure none of these is easy to implement, and would require lots of
>> intelligence programmed in that doesn't exist yet. However, I
>> occasionally consider forwarding all my build farm emails to another
>> account which sends me summaries of them just to cut down.
>
>
> You're totally right, and just to be clear I'm not disagreeing with you in
> this email, I'm just trying to spread some more details for those who are
> interested :).
>
>>
>>
>> Thanks for fixing the problem and for the offer of future help.
>>
>> Sincerely,
>> David!!
>>
>> On Thu, Apr 7, 2016 at 2:57 PM, Dirk Thomas via ros-release
>> <ros-release at lists.ros.org> wrote:
>> > Jackie and I came up with a fix for the Debian jobs:
>> > https://github.com/ros-infrastructure/ros_buildfarm/pull/281 It fixes
>> > the
>> > locale on Debian and afterwards is able to parse and decode the manifest
>> > correctly. The job in question has passed now.
>> >
>> > The emails are sent for a reason and not really "spam". They are
>> > notifying
>> > you about a problem with a package you. Please feel free to ask
>> > questions on
>> > this mailing list if a job is "bugging" you with emails and you are not
>> > sure
>> > how to address it. That is always better then to ignore the
>> > notifications.
>> >
>> > Thanks,
>> > - Dirk
>> >
>> > On Thu, Apr 7, 2016 at 10:55 AM, Dirk Thomas <dthomas at osrfoundation.org>
>> > wrote:
>> >>
>> >> Before drawing any conclusions please take a close look to the actual
>> >> error message:
>> >>
>> >> ```
>> >> Traceback (most recent call last):
>> >>   File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 34, in
>> >> <module>
>> >>     sys.exit(main())
>> >>   File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 30, in
>> >> main
>> >>     args.os_name, args.os_code_name, args.source_dir)
>> >>   File "/tmp/ros_buildfarm/ros_buildfarm/sourcedeb_job.py", line 52, in
>> >> get_sources
>> >>     pkg = parse_package(sources_dir)
>> >>   File "/usr/lib/python3/dist-packages/catkin_pkg/package.py", line
>> >> 370,
>> >> in parse_package
>> >>     return parse_package_string(f.read(), filename, warnings=warnings)
>> >>   File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode
>> >>     return codecs.ascii_decode(input, self.errors)[0]
>> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position
>> >> 183:
>> >> ordinal not in range(128)
>> >> ```
>> >>
>> >> Decoding a non-ASCII character when then codec is set to `ascii` can't
>> >> work. Since `!` is a valid ASCII character that can't be the reason. So
>> >> it
>> >> is likely the `é`.
>> >>
>> >> What needs to change here? Well, the first important fact to note is
>> >> that
>> >> the sourcedeb jobs for the same package pass on all Ubuntu platforms.
>> >> So
>> >> obviously the buildfarm as well as catkin_pkg are able to handle this.
>> >>
>> >> Why is it failing for this Debian job then? Debian has just recently
>> >> been
>> >> added and there must be something different. Looking at the console
>> >> output
>> >> again the following warnings just a few lines above should be enough:
>> >>
>> >> ```
>> >> perl: warning: Setting locale failed.
>> >> perl: warning: Please check that your locale settings:
>> >> LANGUAGE = (unset),
>> >> LC_ALL = (unset),
>> >> LANG = "en_US.UTF-8"
>> >>     are supported and installed on your system.
>> >> perl: warning: Falling back to the standard locale ("C").
>> >> perl: warning: Setting locale failed.
>> >> perl: warning: Please check that your locale settings:
>> >> LANGUAGE = (unset),
>> >> LC_ALL = (unset),
>> >> LANG = "en_US.UTF-8"
>> >>     are supported and installed on your system.
>> >> perl: warning: Falling back to the standard locale ("C").
>> >> ```
>> >>
>> >> While the job tries to use UTF-8 it fails to do so on Debian and falls
>> >> back to the standard locale "C". And with that locale it is simply
>> >> impossible to decode the special character.
>> >>
>> >> As a conclusion the recent modification to enable Debian jobs
>> >>
>> >> (https://github.com/ros-infrastructure/ros_buildfarm/commit/5106e704ecdb2d55ac513da66ec8d699731bb859#diff-8859c2a0d6adc3dc403698f904632818)
>> >> needs fixing since it is not enabling the UTF-8 locale as expected.
>> >>
>> >> Cheers,
>> >> - Dirk
>> >>
>> >>
>> >> On Thu, Apr 7, 2016 at 10:07 AM, Jackie Kay via ros-release
>> >> <ros-release at lists.ros.org> wrote:
>> >>>
>> >>> I also find the build farm spam quite annoying. I get the same
>> >>> notification failure whenever anything on Kinetic breaks. :)
>> >>>
>> >>> Glancing at the package.xml for map_msgs, could it be that the
>> >>> accented
>> >>> "é" in the author field (Stéphane Magnenat) is breaking unicode
>> >>> parsing, not
>> >>> the exclamation points?
>> >>>
>> >>>
>> >>>
>> >>> https://github.com/ros-planning/navigation_msgs/blob/jade-devel/map_msgs/package.xml
>> >>>
>> >>> I will look into sanitizing the characters so that we can support
>> >>> package
>> >>> maintainers and authors with unicode characters in their names.
>> >>>
>> >>>
>> >>> On Thu, Apr 7, 2016 at 9:32 AM, Tully Foote via ros-release
>> >>> <ros-release at lists.ros.org> wrote:
>> >>>>
>> >>>> I think the exclamation points in your name are breaking the unicode
>> >>>> parsing on the debian target. Looking at the documentation the
>> >>>> maintainer
>> >>>> field only supports ASCII characters.
>> >>>>
>> >>>> The debian maintainer field format is defined by RFC822
>> >>>>
>> >>>> https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Maintainer
>> >>>> There's a note about problems with full stop.
>> >>>>
>> >>>> And RFC822 can be seen here
>> >>>> https://www.w3.org/Protocols/rfc822/3_Lexical.html
>> >>>>
>> >>>> I think it's likely that debian is more strict in it's interpretation
>> >>>> of
>> >>>> the formatting. And the quickest solution will be to remove the
>> >>>> characters
>> >>>> from the maintainer name. It could either be patched in the jessie
>> >>>> specific
>> >>>> branch in the release repo. The simplest solution is a rerelease
>> >>>> without the
>> >>>> characters.
>> >>>>
>> >>>> You could open an issue on
>> >>>> https://github.com/ros-infrastructure/ros_buildfarm and we could look
>> >>>> at
>> >>>> sanitizing the characters in that field, but developing the correct
>> >>>> replacement/substitituion policy may be a challenge.
>> >>>>
>> >>>> B) Sourcedebs are triggered on a 15 minute cycle since people want
>> >>>> them
>> >>>> to come out quickly. We typically will roll back any releases which
>> >>>> fail
>> >>>> their sourcedeb jobs since it's rarely a platform specific issue. And
>> >>>> if the
>> >>>> sourcedeb is not working we might as well pull it from the farm.
>> >>>>
>> >>>> Tully
>> >>>>
>> >>>> On Thu, Apr 7, 2016 at 6:27 AM, David Lu!! via ros-release
>> >>>> <ros-release at lists.ros.org> wrote:
>> >>>>>
>> >>>>> I am but a humble navigation code monkey, and am not trained in the
>> >>>>> ways of the build farm. Can someone help me interpret what's going
>> >>>>> on
>> >>>>> that's causing
>> >>>>> A) A specific build to fail e.g.
>> >>>>>
>> >>>>>
>> >>>>> http://build.ros.org/job/Ksrc_dJ__map_msgs__debian_jessie__source/87/console
>> >>>>> B) So many emails (see attached)
>> >>>>>
>> >>>>> _______________________________________________
>> >>>>> ros-release mailing list
>> >>>>> ros-release at lists.ros.org
>> >>>>> http://lists.ros.org/mailman/listinfo/ros-release
>> >>>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> ros-release mailing list
>> >>>> ros-release at lists.ros.org
>> >>>> http://lists.ros.org/mailman/listinfo/ros-release
>> >>>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> ros-release mailing list
>> >>> ros-release at lists.ros.org
>> >>> http://lists.ros.org/mailman/listinfo/ros-release
>> >>>
>> >>
>> >
>> >
>> > _______________________________________________
>> > ros-release mailing list
>> > ros-release at lists.ros.org
>> > http://lists.ros.org/mailman/listinfo/ros-release
>> >
>> _______________________________________________
>> ros-release mailing list
>> ros-release at lists.ros.org
>> http://lists.ros.org/mailman/listinfo/ros-release
>
>
>
>
> --
> William Woodall
> ROS Development Team
> william at osrfoundation.org
> http://wjwwood.io/


More information about the ros-release mailing list