[ros-users] Reinventing the wheel: training and evaluation of algorithms for objects detection in images

Tue Nov 23 19:11:22 UTC 2010

Whaw ! That just looks fantastic !

Indeed eblearn appeared in my review as the one the most interesting
tools (from a software and from a research point of view).
I have to say the API looks very clean and nice, and the step by step
tutorial are exactly what I have in mind.
https://wave.google.com/wave/waveref/googlewave.com/w+yItwe-AnA/~/conv+root/b+yItwe-AnC

Of course the list you give in your email makes even more clear how
eblearn can be of interest.

I had never heard of visiongrader but indeed it goes exactly in the
direction I had in mind.

I guess there are three main aspects I would like to "depart from" eblearn:

- Abritrary features.
Indeed, as researcher, I believe training features from data and tasks
is better than handwriting them.
However I think there is value in providing a set of "hard coded" features:
  1. So users can compare between them
  2. When speed matters, then we may want to have your own hardwired
GPU enabled feature

- Multiple learning algorithms side by side
I am aware that energy based learning is a very generic and powerful
framework (I have already read some papers and seen some of the online
videos on the topic).
However I am pretty sure that not every machine learning/detection
method fits inside this box (I am thinking for instance hough voting
and part based models).
Being able to provide "arbitrary" algorithms for detections will make
it easier for users to compare different methods.

- Meta training inside instead of outside
Up to my knowledge/understanding some learning algorithms work in a
"online fashion", one pass over the data and they are ready to go;
here the data is pushed over the learner.
Other algorithms are more a in "dedicated student" mode where they
need to review some of the harder case ("review some chapters of the
book"); here the learner needs to be able to do multiple passes of the
data and request more examples.
>From your description this seems to be done "using outside scripts", I
would like it to be more explicitly expressed inside the
software/framework rather than outside.

All and all eblearn + visiongrader seem like great tools, with a nice
and well thought API.
I will definitely look more in detail about their internals (I already
had played around with eblearn a few months ago) and use/re-use as
much as possible the existing code base.

>From the response of the community it seems that there a agreement on:
- There is a problem to be solved
- It would be nice to have a "one-stop-shop"

I have already though about this issue in the past, and I think I have
a decent grasp of the issues.
I will give a shot on creating such open source tool.
I have decided to name it "sponge" (because "it learns like a sponge")
and it will be hosted at Github.

https://github.com/rodrigob/sponge

Obviously right now the project is empty.
I will be working on the design on the github wiki (and google wave
linked there), and hopefully start pushing code in the next weeks.

Let us hope I get to create code that will be of use to other people.
Best regards,
rodrigob.

On Tue, Nov 23, 2010 at 5:16 PM, Pierre S <pierre.sermanet at gmail.com> wrote:
> oui, voila ma reponse:
> Hi Rodrigo,
>
> I maintain the eblearn and visiongrader open-source projects and they might
> correspond to what you need. I'll just enumerate some aspects that seem to
> fit your description:
>
> eblearn (http://eblearn.sourceforge.net/)
> - training by minimizing energies (rather than probabilities) is quite
> general and can be applied to lots of tasks (cf. Tutorial on Energy-Based
> Learning, Lecun et al 2006)
> - it is currently mostly focused on classification/detection, but the design
> is based on generic modules so that anybody can write its own modules and
> use the framework for a new task
>
> - modules are usually serially connected but can also be branched and merged
> to create more complex machines
> - one can easily create a new network of machines via configuration files by
> defining the list of modules the machine contains, e.g. pre-processing,
> convolutions, pooling, etc, then feed it to the train executable and then
> detection executable. e.g.:
> http://eblearn.svn.sourceforge.net/viewvc/eblearn/trunk/tools/demos/pedestrians/inria/inria_meta.conf?revision=1391&view=markup
> - so currently, training to recognize a new class of images involves taking
> the existing scripts in demos/pedestrian/inria for example, modify them a
> bit and feed them to the training and detection programs.
> - I am considering adding a graphical tool on top to graphically build and
> connect modules together in serial or parallel to make things even easier.
>
> - the metarun executable is a very handy tool that takes a configuration
> file as input, generate all possible configurations if a variable has
> multiple values (e.g. multiple learning rates), runs all of them (in
> parallel or sequentially) and sends you the results by email (ranks and
> plots best solutions). this makes life much easier when having to try
> several configuration and compare them.
> - similarly, the detection executable is multi-threaded so that if you allow
> multiple cores, it will process 1 image on each core.
> - the library can be boosted by several other libraries such as IPP or
> openMP.
> - there also is a cluster version of the detector using MPI
>
> - the core libraries are self-contained and don't require any external
>  third-party and are clearly separated from the helper libraries such as
> guis and dataset creation tools.
> - eblearn is therefore available on Android using only the lightweight core
> minimum (even stripped of STL). The android version is currently missing the
> fixed-point precision for speed, but is usable.
> - the gui is made simple to use, here is a small example:
> http://eblearn.sourceforge.net/demos/simple/index.shtml
> - eblearn contains a set of tools to compile and preprocess training data
> easily given an image directory. it can also read PASCAL VOC xml.
> - the matrix data format (.mat) is fully compatible with lush (open source
> scripted / compiled framework)
> - therefore eblearn is natively compatible with MNIST and NORB datasets
>
> - unsupervised training is being ported right now and boosts performances (I
> obtain state-of-the-art results on INRIA pedestrians)
> - the advantage of this framework over traditional hand-crafted features is
> that features can be learned and optimized for a given task
> - it wasn't really designed to contain any handcrafted features, but you are
> welcome to add your own modules and plug them into the classifiers or even
> feed them to a second layer of features to be learned. This is just a matter
> of writing a SIFT module for example.
>
> - eblearn is written in C++ and contains some shell and python scripts for
> meta training
> - some scripts are available to automatize boostrapping and perform several
> passes of training / false positives extraction / data compilation
> - adding a python interface to eblearn might be a good idea for some people
> - eblearn is lacking more tutorials right now, I will resolve this in the
> near future.
>
> visiongrader (http://visiongrader.sourceforge.net/)
>
> - written in python
> - designed to be a generic evaluation tool
> - one only needs to write a parser for a new dataset, right now it's been
> used to parse INRIA pedestrians and caltech pedestrian datasets
> - multiple curves are currently available: DET and ROC
> - multiple matching criteria can be implemented, currently contains the 50%
> overlap criteria.
> - a visualizer allows to show bounding boxes overlayed on images, show which
> ones are matched or not, and a slider bar allows to vary detection threshold
> - the visualizer also allows to select and save new bounding boxes in
> different formats (work in progress)
> - it is still a bit rough but usable
>
> Pierre
> On Tue, Nov 23, 2010 at 11:11 AM, Rodrigo Benenson
> <rodrigo.benenson at gmail.com> wrote:
>>
>> Ca prend plusieurs heures avant d'etre mis en ligne (inspection
>> manuelle de tout les messages).
>> Peut-etre un copier coler en courriel direct pour voir le message
>> aujourd'hui ?
>>
>> Merci beaucoup de votre reponse.
>> Cordialement,
>> rodrigob.
>>
>> On Tue, Nov 23, 2010 at 2:16 PM, Pierre S <pierre.sermanet at gmail.com>
>> wrote:
>> > J'ai repondu sur yahoogroups mais le message n'apparait pas, ca prend
>> > peut
>> > etre un peu de temps, dis moi si tu le recois.
>> >
>> > On Tue, Nov 23, 2010 at 6:01 AM, Pierre S <pierre.sermanet at gmail.com>
>> > wrote:
>> >>
>> >> Salut Rodrigo,
>> >> Effectivement c'est une bonne idee, je vais repondre sur la liste
>> >> merci.
>> >> Pierre
>> >>
>> >> On Mon, Nov 22, 2010 at 6:13 AM, Rodrigo Benenson
>> >> <rodrigo.benenson at gmail.com> wrote:
>> >>>
>> >>> Désolé pour le double courriel.
>> >>> En temps que contributeur principal dans EbLearn je pense que vous
>> >>> avez votre mot à dire sur cette question (active dans les listes de
>> >>> courriel OpenCv-users et Ros-users ).
>> >>>
>> >>> Je serais très content si vous pouviez participer à la discussion,
>> >>> donner votre avis et partager votre expérience sur le sujet.
>> >>>
>> >>> Je considère EbLearn un "très bon exemple" de je ce que j'imagine,
>> >>> mais avec un support pour plus de bases de données et pour plus
>> >>> d'algorithmes différents.
>> >>>
>> >>> Très cordialement,
>> >>> rodrigo benenson phd.
>> >>>
>> >>> On Mon, Nov 22, 2010 at 12:00 PM, Rodrigo Benenson
>> >>> <rodrigo.benenson at gmail.com> wrote:
>> >>> > ---------- Forwarded message ----------
>> >>> > To: OpenCV at yahoogroups.com
>> >>> >
>> >>> >
>> >>> > Hello all.
>> >>> >
>> >>> > I'm contacting you because I am considering starting a new open
>> >>> > source
>> >>> > project to solve a specific problem: training and evaluating objects
>> >>> > detection algorithms.
>> >>> >
>> >>> > Hundreds of students have been there before: "I want to create a
>> >>> > program that detects objects in images".
>> >>> > They choose a dataset for training (e.g. INRIA pedestrians), a
>> >>> > feature
>> >>> > descriptor (e.g. HOG), a machine learning method (e.g. linear SVM),
>> >>> > and then, they write the code to get it all together.
>> >>> >
>> >>> > In the best case they will take bits and pieces from multiple places
>> >>> > and spend a few weeks on the glue code. In the worst case they will
>> >>> > spend months reimplementing existing methods.
>> >>> >
>> >>> > It is time to stop the madness.
>> >>> > Training detectors for objects detection in images is a specific and
>> >>> > well defined problem.
>> >>> > It is time to share our effort and build a reference open source
>> >>> > tool
>> >>> > to solve this common problem.
>> >>> > We should have an open source tool that provides all the common
>> >>> > bits,
>> >>> > the glue and allows us to focus on what we really care: the
>> >>> > algorithms.
>> >>> >
>> >>> > In some sense OpenCv 2.2 helps a lot to the task, however OpenCv is
>> >>> > aimed to be a generic library not a specific application framework.
>> >>> > In
>> >>> > that sense it will never provide the desired "install, run, see the
>> >>> > graphs coming out" experience.
>> >>> >
>> >>> > Also ROS.org helps a lot the task, by providing a generic framework
>> >>> > to
>> >>> > create and exchange software modules, along with standard tools for
>> >>> > messages passing, data storage and exploration. However this
>> >>> > framework
>> >>> > by itself has a non negligible learning curve and it is unfamiliar
>> >>> > to
>> >>> > anyone outside the robotics community.
>> >>> >
>> >>> > I currently have my own idea of how things could be. However before
>> >>> > creating "yet one more framework" I would like to have your input on
>> >>> > the topic.
>> >>> >
>> >>> > I have created a short form to collect your opinions. I would be
>> >>> > very
>> >>> > glad if you could help me go in the right direction by giving your
>> >>> > input.
>> >>> >
>> >>> >
>> >>> >
>> >>> > https://spreadsheets.google.com/viewform?formkey=dFFzaDlLM1liVGNOS2FENnhrc1VWckE6MQ
>> >>> >
>> >>> > The form is anonymous and the results are public.
>> >>> >
>> >>> > Based your opinions and ideas I will do my best to move forward an
>> >>> > usable open source solution.
>> >>> > Further information will be posted at
>> >>> > https://wave.google.com/wave/waveref/googlewave.com/w+yH-HOCb6H
>> >>> >
>> >>> > Best regards,
>> >>> > rodrigo benenson phd.
>> >>> >
>> >>> > ps: If you are interested do not hesitate to send a message. You can
>> >>> > contact me via github as "rodrigob".
>> >>> >
>> >>
>> >
>> >
>
>