The five ways to compute something

by Bernd Schoeller (modified: 2014 Sep 11)

Optimally, there should always only be one way to do something. Or there should be only one obvious way, which should be the best. And if you diverge from that obvious way, it is "your gun, your foot, your choice".

But, sometimes there are just many ways to achieve the same goal. This is when we have to step back and look at the options at our disposal, analyse pros and cons of the different paths ahead, and develop insights and best practices.
something". So, it purpose is not to model some abstract data type, or to have a longer lifespan within the system. Instead, its only purpose is to derive some output values from some input values, using computation.

There might be reasons for not implement this computation in the class that abstracts the data. For example, it might be too problem specific. Or we want to keep the interface of the class contain the data clean. Or we want to reduce dependencies. It might also be that the computation requires inputs from numerous abstractions and could not easily place the computation in any one of them.

Whatever the reason, we end up with one "calculator" class whose only job is to contain the mathematical function we need.

Example: Statistics on DOUBLE

A good example for such a problem is a class that computes statistics for a given ARRAY[DOUBLE]. We need three statistic values on the array of doubles: minimum, maximum and average.

We do not want to subclass ARRAY[DOUBLE] to include our code, because this would not represent an abstraction by itself.

We might already have an application that uses ARRAY[DOUBLE] everywhere, and we have no control on object creation.

Adding it to ARRAY is also no option, because not all generic arguments are numbers.

I have identified five different patterns to do such a computation in Eiffel. In the following sections, I will describe each one of these patterns, and list what I perceive as the pros and cons for using this pattern.

Pattern 1 - Functional programming

class STATISTICS feature -- Support functions average (a_array: ARRAY[DOUBLE]): DOUBLE -- Average of `a_array' do ... Compute average ... end maximum (a_array: ARRAY[DOUBLE]): DOUBLE -- Maximum of `a_array' do ... Compute maximum ... end minimum (a_array: ARRAY[DOUBLE]): DOUBLE -- Minimum of `a_array' do ... Compute minimum ... end end

  • PRO: Very simple
  • PRO: Can be used by creating an instance or inheriting from STATISTICS
  • PRO: Creates very compact code when used
  • CONTRA: No benefits of OO
  • CONTRA: Option/operand seperation not possible
  • CONTRA: All values are computed individually

Pattern 2 - Compute on create

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Create statistics for `a_array' of inputs. do ... Compute average, minimum, maximum ... end feature -- Access average: DOUBLE -- Average value maximum: DOUBLE -- Maximum value minimum: DOUBLE -- Minimum value end

  • PRO: Simple structure
  • PRO: Values can be computed together
  • CONTRA: Option/operand separation not possible
  • CONTRA: Unintuitive to have complex computations on creation

Pattern 3 - Compute on demand

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array ensure target_set: target = a_array end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation average: DOUBLE -- Average value of `target' do ... Computate average ... end maximum: DOUBLE -- Maximum value of `target' do ... Compute maximum ... end minimum: DOUBLE -- Minimum value of `target' do .. Compute minimum ... end end

  • PRO: Well readable OO code
  • PRO: Compute when the value is needed
  • PRO: Allows the underlying data to change
  • CONTRA: All values are computed individually
  • CONTRA: Needs to keep a reference to the target (prevents GC)

Pattern 4 - Explicit compute

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array computed := False ensure target_set: target = a_array not_computed: not computed end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation computed: BOOLEAN -- Have the results been computed? average: DOUBLE -- Average value of `target' require computed: computed do Result := internal_average end maximum: DOUBLE -- Maximum value of `target' require computed: computed do Result := internal_maximum end minimum: DOUBLE -- Minimum value of `target' require computed: computed do Result := internal_minimum end feature -- Computation compute -- Compute the statistics require not_computed: not computed do ... Compute internal_average, internal_minimum, internal_maximum ... computed := True target := Void ensure computed: computed end feature {NONE} -- Implementation internal_average: DOUBLE -- Average value internal_maximum: DOUBLE -- Maximum value internal_minimum: DOUBLE -- Minimum value end

  • PRO: Very explicit when the computation is performed
  • PRO: Clean, contracted abstraction of the computation
  • PRO: Option/operand separation very intuitive
  • CONTRA: Verbose when written
  • CONTRA: Verbose when used

Pattern 5 - Lazy compute

class STATISTICS create make feature -- Initialization make (a_array: ARRAY[DOUBLE]) -- Initialize statistics for `a_array' of inputs. do target := a_array computed := False ensure target_set: target = a_array end feature -- Access target: ARRAY[DOUBLE] -- Target of the computation average: DOUBLE -- Average value of `target' do ensure_computed Result := internal_average end maximum: DOUBLE -- Maximum value of `target' do ensure_computed Result := internal_maximum end minimum: DOUBLE -- Minimum value of `target' do ensure_computed Result := internal_minimum end feature {NONE} -- Implementation ensure_computed -- Ensure the results are available do if not computed then ... Compute internal_average, internal_minimum, internal_maximum ... computed := True end ensure computed: computed end computed: BOOLEAN -- Have the results been computed? internal_average: DOUBLE -- Average value internal_maximum: DOUBLE -- Maximum value internal_minimum: DOUBLE -- Minimum value end

  • PRO: Clean, contracted abstraction of the computation
  • PRO: Option/operand separation possible
  • CONTRA: Hides when the computation is performed
  • CONTRA: Extremely verbose when written

Discussion

I have seen all of these patterns being used somewhere in real code. From my personal experience, I try to use pattern 4 when possible. While this is the most verbose when used, it also makes itvery explicit when the computation is performed. The benefit of this is that option/operand separationfeels very natural and easy to understand from a client perspective (just configure everything andthe "push the compute button").

Pattern 1 is nice if the computations are extremely simple and probably not going to change, forexample for primitive mathematical operations like 'absolute' or 'square_root'.

I had to struggle with pattern 2 a few times, because it really does not feel intuitive that complexcomputations are done on object creation. Also, the inability to use option/operand separation hurtsclean code in the long run. I try to avoid it.

Pattern 3 feels like a "view" on top of the data. It is nice if the different computations are independent. Aliasing creates its own issues and dangers, and it is also easy to unnecessarily compute the value againand again.

Last, but not least, I consider pattern 5 as "over-engineered". While it looks beautiful from an OO point of view, it hides the point the values are computed. Option/Operand separation becomes tricky.

What do you think?

Comments
  • David Le Bansais (10 years ago 11/9/2014)

    Multithreading

    You might want to consider your pattern from the concurrent access point of view as well. I see 3 concerns:

      <li>

      Computing at creation or on demand.

      This can be solved with the addition of a simple make_computed creation routine. Because of this, I don't think we should include a separate pattern for computing on creation, all patterns could include this facility.

      <li>

      Computing and returning the result, vs. storing the result for further use.

      It's a balance between performance and memory consumption, a very common concern.

      <li>

      Modifying the object within the query.

      Having a command to compute a result, and a separate query to get it, enables concurrent, shared use of the object (you have to adapt to a specific concurrency model of course, so this is only a general consideration). The balance is between simplicity and usability (in the most complex environment: concurrent threads).

    I guess the combination of concerns 2 and 3 means having 4 equally useful patterns. One would use the pattern that best fits their needs.

    • Bernd Schoeller (10 years ago 11/9/2014)

      You comments on MT are good, though a discussion on this topic would probably worth an article by itself.

      I do not understand merging 2 and 3 - did you mean merging 2 (compute on make) and 4 (explicit compute)?

      • David Le Bansais (10 years ago 11/9/2014)

        Sorry, by 2 and 3 I meant concerns in my own post, not patterns in yours. My point was that finding a balance between performance and memory consumption is one concern, and deciding if the computation should be separated from collecting the result is another, and that it would mean possibly 4 different patterns.

        But, in fact, the pure functional approach (your pattern #1) works both in single-threaded and multi-threaded environments, so that makes 3 possibilities, not 4.

        I have another observation about your own approach: pattern #5 implicitly requires that data must be available before the computation result is requested, i.e. the call to, say, average, has no parameter. It makes sense. However, in pattern #4 you could have "a_array" be a parameter of "compute" and not stored at creation time. Typically, if your computation is explicit, you will perform it as soon as data is available, and therefore use pattern #2 compute on create.

        • Bernd Schoeller (10 years ago 12/9/2014)

          If the memory consumption and thread-safety are your criteria, then you are perfectly right with your argument.

          I would never write an Eiffel application using multi-threading, the language is just not made for that. SCOOP might be a possibility, though. And yes, I know that I have to enable multi-threading for some libraries, but that is not the point.

          My arguments are purely on questions of interface design, maintainability and simplicity.

          For example, there is a huge difference between pattern 2 and pattern 4. Even if we add a 'make_computed' to pattern 4, it is still much more complex than 2. This is because instances of pattern 4 can always be in one of two states: not computed and computed. This problem does not occur with pattern 2, contractual obligation are significantly reduced.

          And to be honest, few people write a classes with two different modes of operation. Normally, you just want to get the job done and continue.

  • Victorien Elvinger (10 years ago 17/9/2014)

    SImple creation procedure and API

    Great article!

    For simplicity and redeability, I think that the creation procedures should only set up the context within which each of the queries then operate.

    I like queries without preconditions on the object-state. Pattern 4 make more complex the API and is more error-prone. However I prefer it over the pattern 5.

    I prefer the pattern 1 in simple cases and patterns 3 in more complex ones.

    Note: Pattern 4 could be more compact:

    class STATISTICS create make feature {NONE} -- Initialization make (a_array: ARRAY [REAL_32]) -- Initialize statistics for `a_array' of inputs. do target := a_array ensure target_set: target = a_array not_computed: not computed end feature -- Status report computed: BOOLEAN -- Have the results been computed? do Result := target = Void end feature -- Access average: REAL_32 -- Average value. require computed: computed attribute end maximum: REAL_32 -- Maximum value. require computed: computed attribute end minimum: REAL_32 -- Minimum value. require computed: computed attribute end feature -- Computation compute -- Compute the statistics. require not_computed: not computed do ... Compute statistics ... target := Void ensure computed: computed end feature {NONE} -- Implementation target: detachable ARRAY [REAL_32] -- Target of the computation. end

    • Bernd Schoeller (10 years ago 17/9/2014)

      Thanks for the feedback - good comments.

      Hmm - it might be useful to expose 'make' to reinitialize the calculation for a second computation - preventing object creation. But few people do that, so having an extra 'initialize (array)' that is called from make might be better. Undecided on that one ...

      While I was coding, I tried to remember the 'attribute' syntax, but kept to the old style. You are right, with 'attribute', it is much cleaner.