Output and by-reference parameters

by Colin LeMahieu (modified: 2010 Oct 19)

This request is retracted, see http://www.eiffelroom.org/node/467 for a demonstration of an Object Oriented way of doing output and byref parameters.

I'm trying to lay out a case for adding parameter passing by value and/or output parameters. I think this functionality would address a functionality hole in the language and I hope the case is made that this is more than just a syntax-sugar request.

The main three driving factors for this request are:

Maintaining Command Query Separation
Simplifying void-safety when communicating data between procedures
Allowing breaking down of large procedures without performance penalty

Eiffel has difficulty when trying to communicate information between procedures. The main mechanism to communicate information between procedures is to either break CQS and set a `Result' on a function that modifies state, or use object state in order to communicate information between procedures. An example of this is with IO_MEDIUM.read_xxx variants.

1) read_xxx is not a pure function; returning what was read breaks CQS because the input cursor is advanced
2) read_xxx is not a pure procedure; it needs to communicate information about what happened in the procedure namely what was read.
3) read_xxx needs to communicate information about the procedure but this information is not relevant to the state of the IO_MEDIUM across all threads or processors in SCOOP terminology. Typically the processor that invoked the read_xxx is the only processor that's interested in the information that was read.

Typically the current way this is dealt with is by ignoring the drawbacks of Issue 3 and writing the information that needs to be communicated to object state. The two big issues with this are void-safety and performance.

The performance penalty is big and the only way to fix the issue is to structurally change the program. When information from a procedure is written to object state, it needs to be written out to main memory and cannot be stored in a CPU register or on the stack which is typically cached. If one looks at the Eiffel compiler code you can see how this issue was worked around in the compiler, by manually inlining features and creating very large procedures so locals can be used to hold procedure information. This results in two bad things, large procedures and duplicated code.

The void-safety issue can be solved in only two ways, dummy values when procedure information is not set i.e. dummy values for IO_MEDIUM.last_xxx or by making all information variables detachable. The dummy value strategy has a drawback in that it's hard to make dummy values of some objects. Dummy values for STRING_8 may be obvious but a dummy value of a complex class may not be obvious. The detachable variable strategy has the drawback in that all accesses of IO_MEDIUM.last_xxx need to do object tests, even though the attachment could be statically provable if both read_xxx' and last_xxx' were inlined.

One condition for output parameters that makes it particularly difficult is that it needs to change the attachment value from detachable to attached. Essentially:a_feat local a: STRING -- `a' is attached do -- `a' is not yet set b_feat (a) -- This would fail current CAPs -- `a' is now assigned end b_feat (input: STRING) do -- `input' is not assigned input := "hello" -- Changes from detachable to attached. Assignment to parameters is not allowed -- `input' is now assigned endBy-ref parameters would not change the attachment of a parameter.

The other issue is that parameters are not assignable in Eiffel for good reason, creating a type of `output' or 'byref' parameters would make this different.

One syntax option is to separate the parameter block in to by-value, by-reference, or by-out sections. The first parameter block function as existing parameters, they can only be used and not assigned. `passref' parameters can be both used and assigned however they're not scratch space, the parameter is passed by reference. The last parameter block can only be assigned to and if the parameters are attached, they must be assigned to.a_feat local a: STRING b: detachable STRING c: STRING do a := "hello" b_feat (a, passref b, passout c) -- `a' = "hello" -- `b' = "hello" aliased with `a' -- `c' = "hello" aliased with `a' b_feat (a, passref b, passout c) -- `a' = "hello" -- `b' = "hello" aliased with `a' -- `c' = "hellohello" end b_feat (one: STRING passref two: detachable STRING passout three: STRING) do if attached two as two_l then three := one + two_l else two := one three := one end end

This allows us to not use global state for procedure information as in our `read_xxx' procedures:read_string (passout target: STRING_8) do <read from IO> target := <data_read_from_IO> end

And allows breaking up of procedures without breaking CQS and without performance penalty:a_feat local i: INTEGER do from until i > 100_000_000 loop very_big_feat (passref i) end end very_big_feat (passref i: INTEGER) local j: INTEGER do -- Lots of operations medium_procedure (i, passref j) -- Other long operations if j > 50 then i := i + 1 else i := i + 2 end end

I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.

----------------------------

Another option that would address the performance issue but would not address the void-safe kluge issue, would be decorating parameters as "not assigned". This would also allow a CAP in creation procedures that would allow mutually recursive references of void-safe objects without the need for `stable' on attributes. If a local reference object was created and it was never assigned to object state and only passed as an argument to a routine where the parameter was marked as "not assigned" then the object could be allocated on the stack instead of on the heap. If an parameter is marked as "not assigned" it can only be passed as an argument to routines where the parameter is "not assigned" and never be assigned to an attribute.

Comments

Colin Adams (14 years ago 14/9/2010)
I tackle this in the following way:
- I use a DS_CELL [information-type] as an output argument.
- Pre-conditions say this cell must be non-void, but it's contents must be void.
- Postcondition says the cell's contents must be non-void (in most use cases).
- Colin LeMahieu (14 years ago 14/9/2010)
  This is what I use right now and it works well I agree. The drawbacks I see are, DS_CELL needs to be allocated through the heap memory allocator which is a performance hit compared to manual inlining and large procedures and it's still a little klugy with void-safety in that one needs to test if the item is attached. This is satisfies requirements for by-ref except memory allocation.
- Colin LeMahieu (14 years ago 16/9/2010)
  Berend and I were hypothesizing on how to do output parameters without language changes. What do you think of this:
  
  Lets take the IO_MEDIUM.read_string example.
  
  If IO_MEDIUM.read_string were actually:
  
  class IO_MEDIUM feature read_string (target: CELL [STRING]) do gather_from_io_medium target.put (string_from_io_medium) end end
  
  change it to
  
  class IO_MEDIUM end expanded class STRING_FROM_IO_MEDIUM feature make (source: IO_MEDIUM) do gather_from_io_medium item := string_from_io_medium end item: STRING end
  
  This way `item' is attached via the creation procedure CAP and since {STRING_FROM_IO_MEDIUM} is expanded, it gives the performance gain from stack allocation.
Colin Adams (14 years ago 16/9/2010)
You need to complete the sketch

You need to show the new implementation for read_line (and the old one, for that matter). I think I get the idea, but I suspect it won't be thread-safe re-entrant (not that that matters for read_line, but for general applicability it does).