Output and by-reference parameters
This request is retracted, see http://www.eiffelroom.org/node/467 for a demonstration of an Object Oriented way of doing output and byref parameters.
I'm trying to lay out a case for adding parameter passing by value and/or output parameters. I think this functionality would address a functionality hole in the language and I hope the case is made that this is more than just a syntax-sugar request.
The main three driving factors for this request are:
- Maintaining Command Query Separation
- Simplifying void-safety when communicating data between procedures
- Allowing breaking down of large procedures without performance penalty
Eiffel has difficulty when trying to communicate information between procedures. The main mechanism to communicate information between procedures is to either break CQS and set a `Result' on a function that modifies state, or use object state in order to communicate information between procedures. An example of this is with IO_MEDIUM.read_xxx variants.
1) read_xxx is not a pure function; returning what was read breaks CQS because the input cursor is advanced
2) read_xxx is not a pure procedure; it needs to communicate information about what happened in the procedure namely what was read.
3) read_xxx needs to communicate information about the procedure but this information is not relevant to the state of the IO_MEDIUM across all threads or processors in SCOOP terminology. Typically the processor that invoked the read_xxx is the only processor that's interested in the information that was read.
Typically the current way this is dealt with is by ignoring the drawbacks of Issue 3 and writing the information that needs to be communicated to object state. The two big issues with this are void-safety and performance.
The performance penalty is big and the only way to fix the issue is to structurally change the program. When information from a procedure is written to object state, it needs to be written out to main memory and cannot be stored in a CPU register or on the stack which is typically cached. If one looks at the Eiffel compiler code you can see how this issue was worked around in the compiler, by manually inlining features and creating very large procedures so locals can be used to hold procedure information. This results in two bad things, large procedures and duplicated code.
The void-safety issue can be solved in only two ways, dummy values when procedure information is not set i.e. dummy values for IO_MEDIUM.last_xxx or by making all information variables detachable. The dummy value strategy has a drawback in that it's hard to make dummy values of some objects. Dummy values for STRING_8 may be obvious but a dummy value of a complex class may not be obvious. The detachable variable strategy has the drawback in that all accesses of IO_MEDIUM.last_xxx need to do object tests, even though the attachment could be statically provable if both read_xxx' and
last_xxx' were inlined.
One condition for output parameters that makes it particularly difficult is that it needs to change the attachment value from detachable to attached. Essentially:
The other issue is that parameters are not assignable in Eiffel for good reason, creating a type of `output' or 'byref' parameters would make this different.
One syntax option is to separate the parameter block in to by-value, by-reference, or by-out sections. The first parameter block function as existing parameters, they can only be used and not assigned. `passref' parameters can be both used and assigned however they're not scratch space, the parameter is passed by reference. The last parameter block can only be assigned to and if the parameters are attached, they must be assigned to.
This allows us to not use global state for procedure information as in our `read_xxx' procedures:
And allows breaking up of procedures without breaking CQS and without performance penalty:
I'm not tied to any particular syntax solution to the problem, I'm interested if anyone thinks this issue is worth addressing and if so, any critiques on the above rough syntax.
----------------------------
Another option that would address the performance issue but would not address the void-safe kluge issue, would be decorating parameters as "not assigned". This would also allow a CAP in creation procedures that would allow mutually recursive references of void-safe objects without the need for `stable' on attributes. If a local reference object was created and it was never assigned to object state and only passed as an argument to a routine where the parameter was marked as "not assigned" then the object could be allocated on the stack instead of on the heap. If an parameter is marked as "not assigned" it can only be passed as an argument to routines where the parameter is "not assigned" and never be assigned to an attribute.
I tackle this in the following way:
This is what I use right now and it works well I agree. The drawbacks I see are, DS_CELL needs to be allocated through the heap memory allocator which is a performance hit compared to manual inlining and large procedures and it's still a little klugy with void-safety in that one needs to test if the item is attached. This is satisfies requirements for by-ref except memory allocation.
Berend and I were hypothesizing on how to do output parameters without language changes. What do you think of this:
Lets take the IO_MEDIUM.read_string example.
If IO_MEDIUM.read_string were actually:
change it to
This way `item' is attached via the creation procedure CAP and since {STRING_FROM_IO_MEDIUM} is expanded, it gives the performance gain from stack allocation.
You need to complete the sketch
You need to show the new implementation for read_line (and the old one, for that matter). I think I get the idea, but I suspect it won't be thread-safe re-entrant (not that that matters for read_line, but for general applicability it does).