Thanks William. I can definitely relate to how difficult it is to cover all cases. I went through a lot of that when trying to make ros_crawl into some sort of usable frontend to the build farm. (http://metrorobots.com/ros_crawl/users.html#vincent_rabaud_gmail_com) Unfortunately, ros_crawl was only out for two weeks before the build farm changed and stopped generating the data I was using to generate it. On Thu, Apr 7, 2016 at 4:33 PM, William Woodall wrote: > On Thu, Apr 7, 2016 at 12:42 PM, David Lu!! via ros-release > wrote: >> >> I apologize for using the pejorative term spam. I was using it to >> refer to the large quantities of unwanted and arguably unnecessary >> emails. My ideal build-farm related emails would have the following >> qualities. (This is an ideal, not a list of demands.) > > > It definitely a well founded criticism that there are lots of emails from > the build farm that either don't convey information effectively (bad UX) or > seem redundant or unnecessary. > >> >> >> 1) One email per problem. >> * If my code doesn't compile on any machine, it should email me >> once, even if it means that the build fails for multiple machines. >> * If I haven't changed anything, I don't need another email 15 >> minutes later. Maybe if I haven't fixed it in N days, send another >> email. > > > This would be really hard to do since it would imply that the email sending > code can tell which problems are the same. For example, consider the same > package fails on Trusty and on Vivid, but each for a different reason. > Should it have sent you a single email (potentially hiding one of the > issues)? How can it reliably know? I think it's better just to send both at > the risk of being redundant. > > Even well designed and developed services like travis-ci and drone.io don't > have this ability. AFAIK, they will send you a failure email for each item > in the matrix that fails. > > It would be nice if we could group these failures and send a single email > per package, even if there are multiple failures, but because of the > asynchronous nature of the buildfarm you'd have to wait for all jobs to > finish to let someone know about any failures. So you're trading off > responsiveness for conciseness, which isn't a clear thing to choose in my > opinion. So you could argue that there should be a setting. And that's all > well and good until you realize how complicated implementing that is within > Jenkins. :) > > If a package is failing you'll only get an email once per day per platform > (except in rare cases, like a bad source deb has been released), at least in > my experience. I suppose those could be further throttled to once per week, > but sorting out the one for all platforms is problematic for the reasons I > described above. > > I sympathize with what you're saying, but unfortunately I don't see that > there are simple solutions to be had. > >> >> 2) Clear Messages >> * One email subject was "Build failed in Jenkins: >> Ksrc_dJ__map_msgs__debian_jessie__source #96". As someone who has lots >> of experience breaking the build, I can kinda parse this out, but >> there's lots of garbage in there. Something like "[jenkins] map_msgs >> source failed to build on jade (debian) #96" reads nicer. > > > Yeah, I agree that the UX of the emails from the build farm is not great. > Fortunately Jenkins seems to be infinitely flexible, but unfortunately only > by writing Groovy scripts using a poorly documented API :). We sunk a lot of > time on the build farm and I believe we actually improved a lot of stuff, > but we didn't get to this part. > >> >> * At the very bottom of the email and log is the relevant error >> message, but its never easy for me to pick out what the actual problem >> is on these, whether it was something I did, something OSRF changed on >> the build farm, or some momentary github hiccup. >> >> 3) Relevant to me >> * I'm unsure whether this is still an issue in the new build farm, >> but I very rarely care if upstream or downstream packages momentarily >> break. > > > If you think it's hard for you to know when the problem is yours or the > systems, imagine writing a program to determine that. It's not easy. The new > build farm does lots of things to try to eat temporary internet problems and > try again without mailing you, for example: > https://github.com/ros-infrastructure/ros_buildfarm/blob/19dfc96e3e4924dfd633c8a24fe994c3a1df43e9/scripts/wrapper/apt-get.py#L101-L104 > > But it's really hard to catch all of them. I get every single failure email > for Jade on the new farm, and think the new farm sends far fewer false > positives then the old one thanks to Dirk and Tully's persistence trying to > squash false positives when they see them. So if you get an email and you > think it's a false positive please look to see if there's a ticket for it on > ros-infrastructure/ros_buildfarm, and if there's not then make one or email > here. Otherwise we'll never catch all of them. I won't promise we can > address everyone you report to us, but we will try as we have time. > > I think our best chance to reduce the confusion on where the problem lies is > to let the build farm stabilize (since the new rewrite) and asymptotically > approach zero false positives over time by fixing false positives when we > can. And that's what we're doing at the moment which will hopefully minimize > the "something OSRF changed on > the build farm" and the "some momentary github hiccup" issues. Sorting out > the reason for the rest of the actual failures can only be solved by > knowledge I think. Maybe someone should start an FAQ or a build farm > troubleshooting guide where people can jot down common issues that are > actual issues and not false positive failures. > >> >> >> I'm sure none of these is easy to implement, and would require lots of >> intelligence programmed in that doesn't exist yet. However, I >> occasionally consider forwarding all my build farm emails to another >> account which sends me summaries of them just to cut down. > > > You're totally right, and just to be clear I'm not disagreeing with you in > this email, I'm just trying to spread some more details for those who are > interested :). > >> >> >> Thanks for fixing the problem and for the offer of future help. >> >> Sincerely, >> David!! >> >> On Thu, Apr 7, 2016 at 2:57 PM, Dirk Thomas via ros-release >> wrote: >> > Jackie and I came up with a fix for the Debian jobs: >> > https://github.com/ros-infrastructure/ros_buildfarm/pull/281 It fixes >> > the >> > locale on Debian and afterwards is able to parse and decode the manifest >> > correctly. The job in question has passed now. >> > >> > The emails are sent for a reason and not really "spam". They are >> > notifying >> > you about a problem with a package you. Please feel free to ask >> > questions on >> > this mailing list if a job is "bugging" you with emails and you are not >> > sure >> > how to address it. That is always better then to ignore the >> > notifications. >> > >> > Thanks, >> > - Dirk >> > >> > On Thu, Apr 7, 2016 at 10:55 AM, Dirk Thomas >> > wrote: >> >> >> >> Before drawing any conclusions please take a close look to the actual >> >> error message: >> >> >> >> ``` >> >> Traceback (most recent call last): >> >> File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 34, in >> >> >> >> sys.exit(main()) >> >> File "/tmp/ros_buildfarm/scripts/release/get_sources.py", line 30, in >> >> main >> >> args.os_name, args.os_code_name, args.source_dir) >> >> File "/tmp/ros_buildfarm/ros_buildfarm/sourcedeb_job.py", line 52, in >> >> get_sources >> >> pkg = parse_package(sources_dir) >> >> File "/usr/lib/python3/dist-packages/catkin_pkg/package.py", line >> >> 370, >> >> in parse_package >> >> return parse_package_string(f.read(), filename, warnings=warnings) >> >> File "/usr/lib/python3.4/encodings/ascii.py", line 26, in decode >> >> return codecs.ascii_decode(input, self.errors)[0] >> >> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position >> >> 183: >> >> ordinal not in range(128) >> >> ``` >> >> >> >> Decoding a non-ASCII character when then codec is set to `ascii` can't >> >> work. Since `!` is a valid ASCII character that can't be the reason. So >> >> it >> >> is likely the `é`. >> >> >> >> What needs to change here? Well, the first important fact to note is >> >> that >> >> the sourcedeb jobs for the same package pass on all Ubuntu platforms. >> >> So >> >> obviously the buildfarm as well as catkin_pkg are able to handle this. >> >> >> >> Why is it failing for this Debian job then? Debian has just recently >> >> been >> >> added and there must be something different. Looking at the console >> >> output >> >> again the following warnings just a few lines above should be enough: >> >> >> >> ``` >> >> perl: warning: Setting locale failed. >> >> perl: warning: Please check that your locale settings: >> >> LANGUAGE = (unset), >> >> LC_ALL = (unset), >> >> LANG = "en_US.UTF-8" >> >> are supported and installed on your system. >> >> perl: warning: Falling back to the standard locale ("C"). >> >> perl: warning: Setting locale failed. >> >> perl: warning: Please check that your locale settings: >> >> LANGUAGE = (unset), >> >> LC_ALL = (unset), >> >> LANG = "en_US.UTF-8" >> >> are supported and installed on your system. >> >> perl: warning: Falling back to the standard locale ("C"). >> >> ``` >> >> >> >> While the job tries to use UTF-8 it fails to do so on Debian and falls >> >> back to the standard locale "C". And with that locale it is simply >> >> impossible to decode the special character. >> >> >> >> As a conclusion the recent modification to enable Debian jobs >> >> >> >> (https://github.com/ros-infrastructure/ros_buildfarm/commit/5106e704ecdb2d55ac513da66ec8d699731bb859#diff-8859c2a0d6adc3dc403698f904632818) >> >> needs fixing since it is not enabling the UTF-8 locale as expected. >> >> >> >> Cheers, >> >> - Dirk >> >> >> >> >> >> On Thu, Apr 7, 2016 at 10:07 AM, Jackie Kay via ros-release >> >> wrote: >> >>> >> >>> I also find the build farm spam quite annoying. I get the same >> >>> notification failure whenever anything on Kinetic breaks. :) >> >>> >> >>> Glancing at the package.xml for map_msgs, could it be that the >> >>> accented >> >>> "é" in the author field (Stéphane Magnenat) is breaking unicode >> >>> parsing, not >> >>> the exclamation points? >> >>> >> >>> >> >>> >> >>> https://github.com/ros-planning/navigation_msgs/blob/jade-devel/map_msgs/package.xml >> >>> >> >>> I will look into sanitizing the characters so that we can support >> >>> package >> >>> maintainers and authors with unicode characters in their names. >> >>> >> >>> >> >>> On Thu, Apr 7, 2016 at 9:32 AM, Tully Foote via ros-release >> >>> wrote: >> >>>> >> >>>> I think the exclamation points in your name are breaking the unicode >> >>>> parsing on the debian target. Looking at the documentation the >> >>>> maintainer >> >>>> field only supports ASCII characters. >> >>>> >> >>>> The debian maintainer field format is defined by RFC822 >> >>>> >> >>>> https://www.debian.org/doc/debian-policy/ch-controlfields.html#s-f-Maintainer >> >>>> There's a note about problems with full stop. >> >>>> >> >>>> And RFC822 can be seen here >> >>>> https://www.w3.org/Protocols/rfc822/3_Lexical.html >> >>>> >> >>>> I think it's likely that debian is more strict in it's interpretation >> >>>> of >> >>>> the formatting. And the quickest solution will be to remove the >> >>>> characters >> >>>> from the maintainer name. It could either be patched in the jessie >> >>>> specific >> >>>> branch in the release repo. The simplest solution is a rerelease >> >>>> without the >> >>>> characters. >> >>>> >> >>>> You could open an issue on >> >>>> https://github.com/ros-infrastructure/ros_buildfarm and we could look >> >>>> at >> >>>> sanitizing the characters in that field, but developing the correct >> >>>> replacement/substitituion policy may be a challenge. >> >>>> >> >>>> B) Sourcedebs are triggered on a 15 minute cycle since people want >> >>>> them >> >>>> to come out quickly. We typically will roll back any releases which >> >>>> fail >> >>>> their sourcedeb jobs since it's rarely a platform specific issue. And >> >>>> if the >> >>>> sourcedeb is not working we might as well pull it from the farm. >> >>>> >> >>>> Tully >> >>>> >> >>>> On Thu, Apr 7, 2016 at 6:27 AM, David Lu!! via ros-release >> >>>> wrote: >> >>>>> >> >>>>> I am but a humble navigation code monkey, and am not trained in the >> >>>>> ways of the build farm. Can someone help me interpret what's going >> >>>>> on >> >>>>> that's causing >> >>>>> A) A specific build to fail e.g. >> >>>>> >> >>>>> >> >>>>> http://build.ros.org/job/Ksrc_dJ__map_msgs__debian_jessie__source/87/console >> >>>>> B) So many emails (see attached) >> >>>>> >> >>>>> _______________________________________________ >> >>>>> ros-release mailing list >> >>>>> ros-release@lists.ros.org >> >>>>> http://lists.ros.org/mailman/listinfo/ros-release >> >>>>> >> >>>> >> >>>> >> >>>> _______________________________________________ >> >>>> ros-release mailing list >> >>>> ros-release@lists.ros.org >> >>>> http://lists.ros.org/mailman/listinfo/ros-release >> >>>> >> >>> >> >>> >> >>> _______________________________________________ >> >>> ros-release mailing list >> >>> ros-release@lists.ros.org >> >>> http://lists.ros.org/mailman/listinfo/ros-release >> >>> >> >> >> > >> > >> > _______________________________________________ >> > ros-release mailing list >> > ros-release@lists.ros.org >> > http://lists.ros.org/mailman/listinfo/ros-release >> > >> _______________________________________________ >> ros-release mailing list >> ros-release@lists.ros.org >> http://lists.ros.org/mailman/listinfo/ros-release > > > > > -- > William Woodall > ROS Development Team > william@osrfoundation.org > http://wjwwood.io/ _______________________________________________ ros-release mailing list ros-release@lists.ros.org http://lists.ros.org/mailman/listinfo/ros-release