[ros-users] actionlib design questions

Wed Sep 15 19:04:26 UTC 2010

On Wed, 15 Sep 2010, Vijay Pradeep wrote:

> > So why are they implemented in two separate code trees? Why do they not
> > share everything they could share?
> actionlib is not designed to be a solve-all FSM language for robotics.  It
> instead defines a very specific protocol for handling the interaction
> between a client that dispatches long running goals, and a server that can
> execute these goals.  There are in fact two closely coupled state machines
> defined inside of actionlib, but these are definitely quite hard-coded to
> govern this specific interaction.  We definitely could have designed
> actionlib to rely on some generic state machine engine to generate it's two
> hardcoded state machines.

Makes sense.

> > Because your robot will live on, so somewhere it will continue with
> > something else. Practically speaking, that means that another FSM will
> take
> > over, in one way or another; but having half a dozen of termination states
> > only leads to a combinatorial explosion of how to go on with the next
> > FSM...
> ...
> > But there is a practical difference between mathematical
> > models and usable code! Having multiple terminal states leads to
> > exponential "fan out"... In other words, it makes sense to reduce the
> > number of termination states; a lot of sense.
> 
> There is definitely an argument that the server-side state machine would be
> clearer if there was only one terminal state that also had an exit code
> attached to it.  We decided to fan out these terminal states, as it seemed
> more convenient and understandable on the server-side.  Note that in the
> client side, we actually have condensed all the terminal goal states into an
> exit code, which becomes a part of the WaitingForResult and Done states. 

Again, two unfortunately chosen state names...

> Although, fanning and condensing these states are interchangeable, in
> hindsight, it may have been better to be consistent between the server and
> client.

Indeed. But that's what refactoring and new releases are for: to improve
upon previous implementations :-) It's never too late. And judging from the
reaction in this thread, there are plenty of people interested in joining
forces, at least to think together before re-implementing things.

> > In addition, (and that's my
> > other comment in the previous post), the _meaning_ of the state machine
> > shown in the ROS documentation is not far away from one single terminal
> > state; it has just not been explained that way, because of the semantic
> > confusion ("error" rather...) of naming states according to the event that
> > led to them. Again, a state must be called according to what activity the
> > state represents. (This is also not "mathematical necessary", but just
> good
> > practice :-)
> 
> Just to clarify, the server side and client side state machines refer to an
> individual goal, not the actual ActionServer & ActionClient.  Thus, if a
> specific goal has been aborted by the ActionServer, why is it unreasonable
> for the goal to be in an Aborted state?  What would be a better name?

I repeat my motivation/guideline behind state machine design: the name of a
state should reflect what one is _doing_ in that state, not the name of the
_event_ that brought you into that state or that you expect to bring you out
of that state. Because the latter choices make it impossible to extend the
state machine to a situation where more than one incoming and outgoing
events are needed.

So, "aborted" is not a good name for an _activity_. It _might_ be for an
event. But even in the latter case, it would be much semantically cleaner
to give a name that indicated the _cause_ of the event, and not the
_result_.

My experience is that the importance of having very clear semantic/naming
conventions (standards...!) can not be overestimated. Not only for new
people trying to grasp a new piece of software that they encounter, but
also (especially even) the developers that need/want to
extend/simplify/reuse that piece of software.

> >> .. how can it reply with whether the
> >> goal I queried about succeeded, failed or any other states that indicate
> >> that the server will not be doing any more processing of that goal?
> 
> > The _event_ that brought you in that 'terminal state' already provided
> that
> > information! If you have to keep it in memory (which is often not a bad
> > thing to do) then you have to store that in the "world model" stat(e)(us),
> > and not waste a FSM state for that purpose.
> 
> By having multiple terminal states, we're trying to give just enough
> information for the client to infer the pertinent parts of how the goal
> moved through the server side state machines.  This can be used to answer
> questions like:
> - Did the ActionServer ever start processing my goal?
> - Did the goal prematurely end because of some error, or did someone ask it
> to stop?
> - Did the goal not start because it was invalid, or because someone canceled
> it before it stopped?
> All of these questions could be answered by storing all of the server side
> state transitions in some world model, but wouldn't acting on these various
> paths also lead to the same 'fan out' of possible actions that multiple
> terminal states could lead to?

This goal of having a consultable "memory" is very appropriate. But it
should be solved by a "database" of events, not by the state of a
coordinating state machine. Coupling both things leads to unreusable pieces
of code. As is testified by the ActiobLib and SMACH libraries. And by the
fact that ROS basically grows by _adding_ complete new software libraries,
instead of trying to reuse (and hence improve) existing libraries to the
maximal extend possible.

> It's also quite possible that answering these questions in a generic way
> isn't useful.  If so, we could condense all the terminal states into one,
> and let the user send back any custom relevant information in the result.
> 
> > - how to apply this to the "resource allocation" problem? Going from
> >  allocation of external resources, to the internal coordination of the
> >  arms, head and platform in the PR2.
> - We are definitely interested in getting a better understanding of how to
> do resource allocation in our system.  We'd love to hear what approaches
> have worked well for you so far.

Short answer: give every resource that has to be shared its own component,
and make it a singleton with an API that allows other components to make
use of its resource services.

Example from ROS: the point cloud. This is the first (and currently still
only?) example where the resource sharing between different clients and
producers resulted in an unmaintainable situation. I do have the opinion (I
can be wrong!) that this particular problem was solved in a particular way,
without any effort to make it into a reusable template for all resource
sharing problems. But I repeat: I can be wrong, since it is impossible to
keep track of all software within the ROS ecosystem. And that fact also
is not a good omen for maintainability...

Thanks for taking the time to drive this discussion! I think it will lead
somewhere!

Herman