Sat May 13 09:57:43 UTC 2017

First off: Thanks for this proposal! A standardized set of vision messages has been sorely missing for years.

Regarding [Detection3D](https://github.com/Kukanani/vision_msgs_proposal/blob/28acc935ddf6ef887fd5b3f5999cd7e14e8ee7e8/msg/Detection3D.msg):

I strongly believe we need a separate Pose for each object hypothesis. For example, when meshes are used to represent the object classes, the Pose specifies the pose of the mesh's reference frame in the `Detection3D.header.frame_id` frame. For example, the reference frame of the following mesh is at the bottom of the mug and in the center of the mug's round part, not at the center of the mesh's bounding box:

<img src="/uploads/ros/original/1X/f102dd3bd238347f63c920557e3baa161ee6da6e.png" width="500" height="500">

Without a Pose for each object class, we cannot express "this object could be either a laptop in its usual orientation, or a book lying flat (i.e., rotated by 90 if your mesh is of a book standing upright)".

My proposal would be to either include an array of Poses in a 3D-specific `CategoryDistribution` message, or (since we now have 3 arrays that must be the same size) as an array of `ObjectHypothesis` messages (or whatever we want to call it) that would have one `id`, `score` and `Pose` each.


I was also sorry to see that [BoundingBox3D](https://github.com/Kukanani/vision_msgs_proposal/blob/2a1682f322dc08bcaf268d0833dd1fc4758aedaf/msg/BoundingBox3D.msg) was removed. (This was meant to represent a bounding box of the points in `source_cloud`, right?) I've always included this in my own message definitions, and I've found it extremely useful.

On the other hand, this information can be re-computed from the `source_cloud`, so I can live with that (although it's a bit wasteful). Also, other people might prefer to use a `shape_msgs/Mesh bounding_mesh`, like in [object_recognition_msgs/RecognizedObject](https://github.com/wg-perception/object_recognition_msgs/blob/d55acd8aa14fac162e19573ccce6a266b4df23c2/msg/RecognizedObject.msg#L25-L27), or something completely different, and it would overcomplicate the message if we'd include all possible kinds of extra information.

