The five ways to compute something
Optimally, there should always only be one way to do something. Or there should be only one obvious way, which should be the best. And if you diverge from that obvious way, it is "your gun, your foot, your choice".
But, sometimes there are just many ways to achieve the same goal. This is
when we have to step back and look at the options at our disposal, analyse
pros and cons of the different paths ahead, and develop insights and best
practices.
something". So, it purpose is not to model some abstract data type, or to have a
longer lifespan within the system. Instead, its only purpose is to derive some
output values from some input values, using computation.
There might be reasons for not implement this computation in the class that abstracts the data. For example, it might be too problem specific. Or we want to keep the interface of the class contain the data clean. Or we want to reduce dependencies. It might also be that the computation requires inputs from numerous abstractions and could not easily place the computation in any one of them.
Whatever the reason, we end up with one "calculator" class whose only job is to contain the mathematical function we need.
Example: Statistics on DOUBLE
A good example for such a problem is a class that computes statistics for a given ARRAY[DOUBLE]. We need three statistic values on the array of doubles: minimum, maximum and average.
We do not want to subclass ARRAY[DOUBLE] to include our code, because this would not represent an abstraction by itself.
We might already have an application that uses ARRAY[DOUBLE] everywhere, and we have no control on object creation.
Adding it to ARRAY is also no option, because not all generic arguments are numbers.
I have identified five different patterns to do such a computation in Eiffel. In the following sections, I will describe each one of these patterns, and list what I perceive as the pros and cons for using this pattern.
Pattern 1 - Functional programming
- PRO: Very simple
- PRO: Can be used by creating an instance or inheriting from STATISTICS
- PRO: Creates very compact code when used
- CONTRA: No benefits of OO
- CONTRA: Option/operand seperation not possible
- CONTRA: All values are computed individually
Pattern 2 - Compute on create
- PRO: Simple structure
- PRO: Values can be computed together
- CONTRA: Option/operand separation not possible
- CONTRA: Unintuitive to have complex computations on creation
Pattern 3 - Compute on demand
- PRO: Well readable OO code
- PRO: Compute when the value is needed
- PRO: Allows the underlying data to change
- CONTRA: All values are computed individually
- CONTRA: Needs to keep a reference to the target (prevents GC)
Pattern 4 - Explicit compute
- PRO: Very explicit when the computation is performed
- PRO: Clean, contracted abstraction of the computation
- PRO: Option/operand separation very intuitive
- CONTRA: Verbose when written
- CONTRA: Verbose when used
Pattern 5 - Lazy compute
- PRO: Clean, contracted abstraction of the computation
- PRO: Option/operand separation possible
- CONTRA: Hides when the computation is performed
- CONTRA: Extremely verbose when written
Discussion
I have seen all of these patterns being used somewhere in real code. From my personal experience, I try to use pattern 4 when possible. While this is the most verbose when used, it also makes itvery explicit when the computation is performed. The benefit of this is that option/operand separationfeels very natural and easy to understand from a client perspective (just configure everything andthe "push the compute button").
Pattern 1 is nice if the computations are extremely simple and probably not going to change, forexample for primitive mathematical operations like 'absolute' or 'square_root'.
I had to struggle with pattern 2 a few times, because it really does not feel intuitive that complexcomputations are done on object creation. Also, the inability to use option/operand separation hurtsclean code in the long run. I try to avoid it.
Pattern 3 feels like a "view" on top of the data. It is nice if the different computations are independent. Aliasing creates its own issues and dangers, and it is also easy to unnecessarily compute the value againand again.
Last, but not least, I consider pattern 5 as "over-engineered". While it looks beautiful from an OO point of view, it hides the point the values are computed. Option/Operand separation becomes tricky.
What do you think?
Multithreading
You might want to consider your pattern from the concurrent access point of view as well. I see 3 concerns:
<li>
Computing at creation or on demand.
This can be solved with the addition of a simple make_computed creation routine. Because of this, I don't think we should include a separate pattern for computing on creation, all patterns could include this facility.
<li>Computing and returning the result, vs. storing the result for further use.
It's a balance between performance and memory consumption, a very common concern.
<li>Modifying the object within the query.
Having a command to compute a result, and a separate query to get it, enables concurrent, shared use of the object (you have to adapt to a specific concurrency model of course, so this is only a general consideration). The balance is between simplicity and usability (in the most complex environment: concurrent threads).
I guess the combination of concerns 2 and 3 means having 4 equally useful patterns. One would use the pattern that best fits their needs.
You comments on MT are good, though a discussion on this topic would probably worth an article by itself.
I do not understand merging 2 and 3 - did you mean merging 2 (compute on make) and 4 (explicit compute)?
Sorry, by 2 and 3 I meant concerns in my own post, not patterns in yours. My point was that finding a balance between performance and memory consumption is one concern, and deciding if the computation should be separated from collecting the result is another, and that it would mean possibly 4 different patterns.
But, in fact, the pure functional approach (your pattern #1) works both in single-threaded and multi-threaded environments, so that makes 3 possibilities, not 4.
I have another observation about your own approach: pattern #5 implicitly requires that data must be available before the computation result is requested, i.e. the call to, say, average, has no parameter. It makes sense. However, in pattern #4 you could have "a_array" be a parameter of "compute" and not stored at creation time. Typically, if your computation is explicit, you will perform it as soon as data is available, and therefore use pattern #2 compute on create.
If the memory consumption and thread-safety are your criteria, then you are perfectly right with your argument.
I would never write an Eiffel application using multi-threading, the language is just not made for that. SCOOP might be a possibility, though. And yes, I know that I have to enable multi-threading for some libraries, but that is not the point.
My arguments are purely on questions of interface design, maintainability and simplicity.
For example, there is a huge difference between pattern 2 and pattern 4. Even if we add a 'make_computed' to pattern 4, it is still much more complex than 2. This is because instances of pattern 4 can always be in one of two states: not computed and computed. This problem does not occur with pattern 2, contractual obligation are significantly reduced.
And to be honest, few people write a classes with two different modes of operation. Normally, you just want to get the job done and continue.
SImple creation procedure and API
Great article!
For simplicity and redeability, I think that the creation procedures should only set up the context within which each of the queries then operate.
I like queries without preconditions on the object-state. Pattern 4 make more complex the API and is more error-prone. However I prefer it over the pattern 5.
I prefer the pattern 1 in simple cases and patterns 3 in more complex ones.
Note: Pattern 4 could be more compact:
Thanks for the feedback - good comments.
Hmm - it might be useful to expose 'make' to reinitialize the calculation for a second computation - preventing object creation. But few people do that, so having an extra 'initialize (array)' that is called from make might be better. Undecided on that one ...
While I was coding, I tried to remember the 'attribute' syntax, but kept to the old style. You are right, with 'attribute', it is much cleaner.