[ros-release] Fwd: Torrents of Emails

Dirk Thomas dthomas at osrfoundation.org
Wed Apr 2 16:32:14 UTC 2014

Please keep in mind that any proposed solution needs to be not only
implemented but also be maintained in the future.
E.g. currently we do not run any mail server at all so introducing that to
the build farm would improve complexity and maintenance effort.
This would also increase the need for infrastructure when running custom
build farm in the future.

If someone has specific proposals how the buildfarm should suppress error
message not relevant to the maintainer and would like to contribute a pull
request which integrates that improved notification that would be highly
appreciated. With Groovy scripts we are already able to retrigger jobs if
known error cases appear in the console output. With a little bit of extra
work it would be possible to suppress those emails. But then again Jenkins
will also send a notification when the job gets back to stable (and
detecting that reliable as well will again require more effort).

Imo we should aim to address the actual problems: e.g. the farm should not
loose network connectivity that frequently. If this would only happen once
/ twice per year we would not need to change the notification system at all
(since the maintainers could easily ignore the emails in such rare error
conditions). Another approach would be to identify these fatal conditions
earlier (e.g. using Groovy scripts) and pause the farm automatically before
hundreds of jobs are failing for the same reason.

- Dirk

On Wed, Apr 2, 2014 at 4:35 AM, Mike Purvis
<mpurvis at clearpathrobotics.com>wrote:

> Not sure how much work this would be compared with modifying the plugin,
> but putting all the email through an SMTP relay might make it possible to
> filter by text strings, or simply introduce a 1hr send delay across the
> board, plus some rules like "if there's more than 1000 emails queued up,
> assume a system problem and hold/blackhole all of them.
> M.
> On 2 April 2014 03:30, Tully Foote <tfoote at osrfoundation.org> wrote:
>> Preventing email overload is definitely something we work hard to avoid.
>> We know that if there are too many false positives people will simply
>> ignore them.
>> And we shutdown the farm as soon as we diagnosed the issue:
>> http://status.ros.org/  Unfortunately when you run a very parallelized
>> system if there's a systematic failure, such as the code hosting going
>> down, a lot of jobs fail quickly.
>> One thing from travis testing Travis distinguishes between build/test
>> errors vs configuration errors. There's a ticket open to add this
>> enhancement https://github.com/ros-infrastructure/buildfarm/issues/116but unfortunately this is not something that Jenkins differentiates so it
>> will take a lot of doing to make this happen on top. An approach I could
>> see for this is to customize the emailing plugin and be able to pass it
>> flags earlier in the process confirm that the configuration and setup has
>> completed successfully. And likewise the actual results should be shown the
>> same way too with the job aborting instead of failing when the
>> configuration/setup phase fails.
>> Tully
>> On Tue, Apr 1, 2014 at 10:22 PM, Dave Coleman <davetcoleman at gmail.com>wrote:
>>> +1!!
>>>  ----------------------------------------------------------------------
>>>> Message: 1
>>>> Date: Tue, 1 Apr 2014 17:45:05 -0500
>>>> From: "David Lu!!" <davidlu at wustl.edu>
>>>> To: ros-release at code.ros.org
>>>> Subject: [ros-release] Fwd: Torrents of Emails
>>>> Message-ID:
>>>>         <CABd+9SqtEdzDXEEhaHT=2V8Xb7B1ofi2Tq8Lo4SvU3b7z=
>>>> ZGBw at mail.gmail.com>
>>>> Content-Type: text/plain; charset="iso-8859-1"
>>>> So I'd like to start a hopefully constructive discussion of the build
>>>> farm's email practices. I've attached a picture of the onslaught my
>>>> inbox just received. I'm fairly certain none of the builds failing is
>>>> my fault. But it does lead me to some questions.
>>>> 1) Are there settings for email that I haven't set to reduce the
>>>> number of emails I get? Or am I automatically subscribed because I'm a
>>>> maintainer?
>>>> 2) Is there a way to condense the emails? If I got a single email
>>>> telling me which packages I maintain failed to build, but this seems a
>>>> bit much.
>>>> 3) I've whined about this before, but is there some way we can make
>>>> the error messages more legible? I'm sure for people familiar with the
>>>> build farm, the errors may make sense, but as a maintainer, I have no
>>>> idea what I'm supposed to do. If the answer is nothing, why am I
>>>> getting email?
>>>> I realize accomplishing some of these tasks will likely involve
>>>> substantial amounts of work, but I feel it merits discussion
>>>> nonetheless.
>>>> -David
>>>> -------------- next part --------------
>>>> A non-text attachment was scrubbed...
>>>> Name: Screenshot from 2014-04-01 17:30:36.png
>>>> Type: image/png
>>>> Size: 635975 bytes
>>>> Desc: not available
>>>> URL: <
>>>> http://lists.ros.org/pipermail/ros-release/attachments/20140401/9ae7884d/attachment.png
>>>> >
>>>> ------------------------------
>>>> _______________________________________________
>>>> ros-release mailing list
>>>> ros-release at code.ros.org
>>>> http://lists.ros.org/mailman/listinfo/ros-release
>>>> End of ros-release Digest, Vol 30, Issue 2
>>>> ******************************************
>>> _______________________________________________
>>> ros-release mailing list
>>> ros-release at code.ros.org
>>> http://lists.ros.org/mailman/listinfo/ros-release
>> _______________________________________________
>> ros-release mailing list
>> ros-release at code.ros.org
>> http://lists.ros.org/mailman/listinfo/ros-release
> _______________________________________________
> ros-release mailing list
> ros-release at code.ros.org
> http://lists.ros.org/mailman/listinfo/ros-release
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ros.org/pipermail/ros-release/attachments/20140402/dab9c7a3/attachment.html>

More information about the ros-release mailing list