[ros-users] [Discourse.ros.org] [Packaging and Release Management] Need to sync new release of rqt_topic (Indigo, Jade, Kinetic)

Martin Günther ros.discourse at gmail.com
Mon Mar 6 12:40:52 UTC 2017




[quote="dirk-thomas, post:9, topic:1410"]
It seems that the notification system aiming to let the maintainers as well as the ROS distro maintainer know about the problem worked as expected. It would be great if you could describe your point of view. Is there anything which can be done differently in order to avoid similar problems in the future? Either for the infrastructure, the process, or anything else.
[/quote]

I can't speak for @130s, but something similar has happened to me before: I oversaw a valid buildfarm email because it was lost in the noise. Here's a breakdown of the Jenkins "Build Failed" emails I got over the last 23 days:

* 10 emails on 7 separate days caused by `KeyError: "The cache has no package named 'apt-src'"` (I'm still getting those)
* 1 email caused by `"Pulling repository docker.io/osrf/ubuntu_armhf \n Could not reach any registry endpoint`
* 1 email caused by `E: Package 'curl' has no installation candidate`
* 0 emails caused by something I did

Also, it's really hard to spot the cause of the error. The emails are just a wall of text without highlighting, so even scrolling to the bottom takes a while, especially on a phone (which is where I do the initial screening of my email most of the time). When you've conscientiously spent a full minute trying to figure out whether it's a real build failure, only to find it's not in 90% of cases, it just conditions people to simply ignore the emails (that's at least what happens to me). Especially because the only thing you can do to fix it is wait until the buildfarm sorts itself out.

So, what could be done?

* Distinguish between "Failed" (your fault) and "Errored" (the buildfarm's fault) states, like Travis does. The easiest way to get this right 100% of the time is probably to split the install into "setup" and "build" sections. If anything fails during setup, it's "Errored", if it fails during build, it's "Failed".
* Provide a dashboard website, where you can see the status of all your own jobs at a glance.
* Include the number of unsuccessful builds / days since last successful build in the email subject. If something is continuously failing for two weeks, it might not be a buildfarm fluke, but something I can fix. With the current stream of emails alternating between "Build Failed" and "Build is back to normal", it's hard to see when there's one repo that's consistently failing.
* Try to get the number of false positives to as close to 0 as possible! Travis has *never* sent me a bogus "Failed" email. Perhaps one approach would be to try to identify "Errored" states heuristically; something like "the repo didn't have any commits since the last successful build", or "this is the first failed build in a row". Then retry the job a couple of times *before* sending out the email alert.






---
[Visit Topic](https://discourse.ros.org/t/need-to-sync-new-release-of-rqt-topic-indigo-jade-kinetic/1410/12) or reply to this email to respond.




More information about the ros-users mailing list