Please keep in mind that any proposed solution needs to be not only implemented but also be maintained in the future.
E.g. currently we do not run any mail server at all so introducing that to the build farm would improve complexity and maintenance effort.
This would also increase the need for infrastructure when running custom build farm in the future.

If someone has specific proposals how the buildfarm should suppress error message not relevant to the maintainer and would like to contribute a pull request which integrates that improved notification that would be highly appreciated. With Groovy scripts we are already able to retrigger jobs if known error cases appear in the console output. With a little bit of extra work it would be possible to suppress those emails. But then again Jenkins will also send a notification when the job gets back to stable (and detecting that reliable as well will again require more effort).

Imo we should aim to address the actual problems: e.g. the farm should not loose network connectivity that frequently. If this would only happen once / twice per year we would not need to change the notification system at all (since the maintainers could easily ignore the emails in such rare error conditions). Another approach would be to identify these fatal conditions earlier (e.g. using Groovy scripts) and pause the farm automatically before hundreds of jobs are failing for the same reason.

- Dirk


On Wed, Apr 2, 2014 at 4:35 AM, Mike Purvis <mpurvis@clearpathrobotics.com> wrote:
Not sure how much work this would be compared with modifying the plugin, but putting all the email through an SMTP relay might make it possible to filter by text strings, or simply introduce a 1hr send delay across the board, plus some rules like "if there's more than 1000 emails queued up, assume a system problem and hold/blackhole all of them.

M.


On 2 April 2014 03:30, Tully Foote <tfoote@osrfoundation.org> wrote:
Preventing email overload is definitely something we work hard to avoid. We know that if there are too many false positives people will simply ignore them. 

And we shutdown the farm as soon as we diagnosed the issue: http://status.ros.org/  Unfortunately when you run a very parallelized system if there's a systematic failure, such as the code hosting going down, a lot of jobs fail quickly. 

One thing from travis testing Travis distinguishes between build/test errors vs configuration errors. There's a ticket open to add this enhancement https://github.com/ros-infrastructure/buildfarm/issues/116 but unfortunately this is not something that Jenkins differentiates so it will take a lot of doing to make this happen on top. An approach I could see for this is to customize the emailing plugin and be able to pass it flags earlier in the process confirm that the configuration and setup has completed successfully. And likewise the actual results should be shown the same way too with the job aborting instead of failing when the configuration/setup phase fails. 

Tully




On Tue, Apr 1, 2014 at 10:22 PM, Dave Coleman <davetcoleman@gmail.com> wrote:
+1!!
 


----------------------------------------------------------------------

Message: 1
Date: Tue, 1 Apr 2014 17:45:05 -0500
From: "David Lu!!" <davidlu@wustl.edu>
To: ros-release@code.ros.org
Subject: [ros-release] Fwd: Torrents of Emails
Message-ID:
        <CABd+9SqtEdzDXEEhaHT=2V8Xb7B1ofi2Tq8Lo4SvU3b7z=ZGBw@mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"


So I'd like to start a hopefully constructive discussion of the build
farm's email practices. I've attached a picture of the onslaught my
inbox just received. I'm fairly certain none of the builds failing is
my fault. But it does lead me to some questions.

1) Are there settings for email that I haven't set to reduce the
number of emails I get? Or am I automatically subscribed because I'm a
maintainer?
2) Is there a way to condense the emails? If I got a single email
telling me which packages I maintain failed to build, but this seems a
bit much.
3) I've whined about this before, but is there some way we can make
the error messages more legible? I'm sure for people familiar with the
build farm, the errors may make sense, but as a maintainer, I have no
idea what I'm supposed to do. If the answer is nothing, why am I
getting email?

I realize accomplishing some of these tasks will likely involve
substantial amounts of work, but I feel it merits discussion
nonetheless.

-David
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Screenshot from 2014-04-01 17:30:36.png
Type: image/png
Size: 635975 bytes
Desc: not available
URL: <http://lists.ros.org/pipermail/ros-release/attachments/20140401/9ae7884d/attachment.png>

------------------------------


_______________________________________________
ros-release mailing list
ros-release@code.ros.org
http://lists.ros.org/mailman/listinfo/ros-release


End of ros-release Digest, Vol 30, Issue 2
******************************************


_______________________________________________
ros-release mailing list
ros-release@code.ros.org
http://lists.ros.org/mailman/listinfo/ros-release



_______________________________________________
ros-release mailing list
ros-release@code.ros.org
http://lists.ros.org/mailman/listinfo/ros-release



_______________________________________________
ros-release mailing list
ros-release@code.ros.org
http://lists.ros.org/mailman/listinfo/ros-release