EiffelBase Data Structures, Lists
Many applications need sequential structures, also called linear structures, in particular lists and circular chains. Apart from three classes describing individual list cells, all the classes involved are descendants of class LINEAR , one of the deferred classes describing general traversal properties and introduced in the chapter that described the general data structure taxonomy. More precisely, all but one of the classes of interest for the present discussion are descendants, direct or indirect, from a class called CHAIN which describes general sequential structures possessing a cursor as well as insertion properties. The exception is class COUNTABLE_SEQUENCE , which describes infinite structures; all the others describe finite structures.
CHAIN is an heir of SEQUENCE , which describes a more general notion of sequence. SEQUENCE is a descendant of LINEAR . There are two main categories of sequential structures: some, called circular chains, are cyclic; others, called lists, are not. Another distinction exists between dynamic structures, which may be extended at will, and fixed ones, which have a bounded capacity.
In all of the structures under review you may insert two or more occurrences of a given item in such a way that the occurrences are distinguishable. In other words, the structures are bags rather than just sets, although it is possible to use them to implement sets.
Higher Level Traversal Classes
The list and chain classes are characterized, for their traversal properties, as being linear and, more precisely, bilinear. In the traversal hierarchy, the relevant deferred classes are LINEAR and BILINEAR , introduced in the discussion of the general taxonomy.
Linear structures
LINEAR describes sequential structures that may be traversed one way. It introduces in particular the following features, illustrated on the figure below:
-
after , a boolean-valued query which determines whether you have moved past the last position (a more precise specification is given below). -
off , a boolean-valued query which is false if and only if there is no item at the current position; for LINEAR this is the same as:
is_empty and not after
-
item , a query which returns the item at the current position - provided of course there is one, as expressed by the precondition:
not off
-
start , a command to move to the first position if any (if is_empty is true the command has no effect). -
forth , a command to advance by one position; the precondition is not after. -
finish , a command to move to the last position; the precondition is:
not is_empty
There is also a procedure
An invariant property of LINEAR structures is that the current position may go off one step past the last item if any, but no further. The precondition of
Bilinear structures
BILINEAR describes linear structures which may be traversed both ways. It inherits from LINEAR and extends it with two new features which ensure complete symmetry between the two directions of movement:
-
before , a boolean-valued query which determines whether you have moved to the left of the first position (a more precise specification is given below). -
back , a command to move backward by one position; the precondition is not before.
For bilinear structures the position can range between 0 (not just 1) and count + 1. Query off is accordingly redefined so as to yield the value of after or before.
Invariant properties for after, before and off
The redefinition of off = is_empty or after
This property, however, would be too constraining. More precisely, it is always true that the right-hand side implies the left-hand-side: if a linear structure is either empty or after, then it is off. But the converse is not true, since certain kinds of linear structure, for example bilinear ones, may be off but neither empty nor after.
The actual invariant for class BILINEAR is obtained in three stages. In class TRAVERSABLE the feature off is deferred and a basic property of that feature is expressed by the invariant clause empty_constraint:is_empty implies off
In LINEAR , feature after_constraint:after implies off
Finally BILINEAR , an heir of LINEAR , redefines before or after
and adds the invariant clause before_constraint: before implies off
The new implementation of after or before
would not guarantee the invariant clause inherited from TRAVERSABLE were it not for another clause introduced in BILINEAR : empty_property: is_empty implies (after or before )
which indicates that an empty bilinear structure must always be not_both: not(after and before)
The flat-short form of BILINEAR shows the complete reconstructed invariant: not_both: not (after and before)
empty_property: is_empty implies (after or before)
before_constraint: before implies off
after_constraint: after implies off
empty_constraint: is_empty implies off
Iteration patterns
For a more general form of this scheme, applicable to circular chains as well as other linear structures, replace from
lin.start
some_optional_initializing_operation (lin)
until
lin.off
loop
lin.some_action (lin.item)
lin.forth
end
The value of lin.off
is always true for an empty structure, so in this case the loop will, correctly, execute only its initialization actions if present.
This is a very common pattern, which you will find in the library classes themselves (for example has is implemented in this way) and many application clients. The iterator classes corresponding to linear structures (LINEAR_ITERATOR , TWO_WAY_CHAIN_ITERATOR ) turn this pattern and several related ones into actual reusable routines.
For bilinear structures there is another traversal mechanism going backward rather than forward; it is the same as above except that
A precise view of after and before
Getting the specification of
For every one of the structures under discussion there is a notion of current position, which we may call the cursor position even though for the moment the cursor is a virtual notion only. (Actual cursor objects will come later when we combine LINEAR , BILINEAR and other classes from the traversal hierarchy with CURSOR_STRUCTURE and other classes from the collection hierarchy.) The informal definition is that
-
start ,forth ,after . -
finish ,back ,before .
So for an empty list both after = (index = count + 1)
before = (index = 0)
which express elementary definitions for after = (is_empty or (index = count + 1))
before = (is_empty or (index = 0))
When a structure is created, some initializations will have to be made; the default initializations will usually lead to a value of 0 rather than 1 for index, although this dissymetry is not apparent in the assertions. Although acceptable, this solution leads to small but unpleasant complications, in particular frequent conditional instructions of the form if after and not is_empty then...
The solution finally retained for the Base libraries uses a different technique, which has turned out to be preferable. The idea is to replace the conceptual picture by one in which there are always two fictitious sentinel items. The two sentinel items are only present conceptually. They are of course not taken into account for the computation of
The sentinel items always appear at positions 0 and 0 <= index
index <= count + 1
before = (index = 0)
after = (index = count + 1)
not (after and before)
The last property given indicates that a structure can never be both and
Some lessons
This discussion has illustrated some of the important patterns of reasoning that are frequently involved in serious object-oriented design. Among the lessons are four ideas which you may find useful in many different cases. First, consistency is once again the central principle. Throughout the design of a class library we must constantly ask ourselves:
- How do I make my next design decision compatible with the previous ones?
- How do I take my next design decision so that it will be easy - or at least possible - to make future ones compatible with it?
Another frequent concern, partly a consequence of consistency, is symmetry. To mathematicians and physicists, symmetry considerations are often important in guiding the search for a solution to a problem; if the problem exhibits a certain symmetry, a candidate solution will be rejected if it does not satisfy that symmetry. Such was the situation here: since the structure's specification is symmetric with respect to the two possible directions of traversal, so too should the feature design be.
The third lesson is also well-known in mathematics and physics: the usefulness of looking at limit cases. To check that a design is sound it is often useful to examine what becomes of it when it is applied to extreme situations - in particular, as was done in this example, empty structures.
Finally, the only way to make delicate design decisions is to express the issues clearly through assertions, most notably invariants. To analyze the properties under discussion, and weigh the various alternatives, we need the precision of mathematical logic. Once again note that without assertions it would be impossible to build a good library; we would have no way to know precisely what we are talking about.
Sequences And Chains
Still deferred, classes SEQUENCE andCHAIN provide the basis for all list and chain classes, as well as for many trees and for dispensers.
SEQUENCE is constructed with the full extent of the technique described in the discussion of the taxonomy: using multiple inheritance to combine one class each from the access, traversal and storage hierarchy. SEQUENCE indeed has three parents:
- ACTIVE gives the access properties. A sequence is an active structure with a notion of current item. Remember that active structures are a special case of bags.
- BILINEAR , as studied above, indicates that a sequence may be traversed both ways.
- FINITE, from the storage hierarchy, indicates that the class describes finite sequences. (A class COUNTABLE_SEQUENCE is also present, as described below.)
To the features of BILINEAR , SEQUENCE principally adds features for adding, changing and removing items. A few procedures in particular serve to insert items at the end:
-
s .put ( v )
adds v at the end of a sequence s. -
extend andforce , at the SEQUENCE level, do the same asput . -
s .append ( s1 )
adds to the end of s the items of s1 (another sequence), preserving their s1 order.
Other procedures work on the current position:
-
s.
remove removes the item at current position. -
s.replace ( v )
replaces by v the item at current position.
SEQUENCE, however, does not provide a procedure to insert an item at the current position, since not all implementations of sequences support this possibility; you will find it in descendants of SEQUENCE seen below.
Yet another group of features are based on the first occurrence of a certain item, or on all occurrences:
-
s.prune ( v ) removes the first occurrence of v in s
, if any. -
s.prune_all ( v ) removes all occurrences of v
.
These procedures have various abstract preconditions: s .extendible for additions, s .writable for replacements, s .
Chains
Chains are sequences with a few more properties: items may be accessed through their indices, and it is possible to define cursor objects attached to individual items.
Class CHAIN is an heir of SEQUENCE . It gets its access properties from CURSOR_STRUCTURE (which adds the notion of cursor to the features of ACTIVE , already present in SEQUENCE ) and is also an heir of INDEXABLE . This ancestry implies in particular the presence of the following features:
-
cursor , from CURSOR_STRUCTURE , which makes it possible to keep a reference to an item of the structure. -
i_th andput_i_th from TABLE , via INDEXABLE , which make it possible to access and replace the value of an item given by its integer index.
These features were called
Procedure
Dynamic chains
By default, chains can only be extended at the end, through
- Procedure
put_front adds an item before the first. (As noted, the procedures to add an item after the last are already available in chains.) - Procedures
put_left andput_right add an item at the left and right of the cursor position. - Procedures
remove_left and remove_right remove an item at the left and right or the cursor position. - Procedures
merge_left andmerge_right are similar toput_left andput_right but insert another dynamic chain rather than a single item. As the word 'merge' suggests, the merged structure, passed as argument, does not survive the process; it is emptied of its items. To preserve it, perform atwin orcopy before the merge operation.
The class also provides implementations of
Lists And Circular Structures
A chain is a finite sequential structure. This property means that items are arranged in a linear order and may be traversed from the first to the last. To do this you may use a loop of the form shown above, based on procedures
This property leaves room for several variants. In particular chains may be straight or circular.
- A straight chain, which from now on will be called a list, has a beginning and an end.
- A circular chain, as represented by class CIRCULAR and its descendants, has a much more flexible notion of first item. It is organized so that every item has a successor.
This representation is conceptual only; in fact the implementations of circular chains found in the Base libraries are based on lists, implemented in one of the ways described below (in particular linked and arrayed).
The major originality of circular chains is that unless the structure is empty procedure
first. The symmetric property applies to not after
Similarly, the precondition for back isnot before
For lists, after becomes true when the cursor moves past the last item. For circular chains, however, after and before are never true except for an empty structure; this is expressed by the invariant clauses of class CIRCULAR :not before
For a non-empty circular chain, then, you can circle forever around the items, using forth or back.
Choosing the first item
For a list, the first and last items are fixed, and correspond to specific places in the physical representation.
A circular chain also needs a notion of first item, if only to enable a client to initiate a traversal through procedure start. Similarly, there is a last item - the one just before the first in a cyclic traversal. (If the chain has just one item, it is both first and last.)
For circular chains, however, there is no reason why the first item should always remain the same. One of the benefits that clients may expect from the use of a circular
structure is the ability to choose any item as the logical first. Class CIRCULAR offers for that purpose the procedure
In such cases the circular chain classes have features called forth is
-- Move cursor to next item, cyclically.
do
standard_forth
if standard_after then
standard_start
end
if isfirst then
exhausted := True
end
end
Traversing a list or circular chain
The properties of from
lin.start
until
lin.off
loop
...
lin.forth
end
would not work if lin
is a non-empty circular structure:
Using from
lin.start
some_optional_initializing_operation (lin)
until
lin.exhausted
loop
...
lin.some_action (lin.item)
lin.forth
end
This form is applicable to all linear structures, circular or not, since
Because
Dynamic structures
For both lists and circular chains, the most flexible variants, said to be dynamic, allow insertions and deletions at any position.
The corresponding classes are descendants of DYNAMIC_LIST and DYNAMIC_CIRCULAR , themselves heirs of DYNAMIC_CHAIN studied above.
Infinite sequences
Class COUNTABLE_SEQUENCES , built by inheritance from COUNTABLE , LINEAR and ACTIVE , is similar to SEQUENCE but describes infinite rather than finite sequences.
Implementations
We have by now seen the concepts underlying the linear structures of the Base libraries, especially lists and circular chains. Let us look at the techniques used to implement them.
Linked and arrayed implementations
Most of the implementations belong to one of four general categories, better described
as two categories with two subcategories each:
- Linked implementations, which may be one-way or two-way.
- Arrayed implementations, which may be resizable or fixed.
A linked implementation uses linked cells, each containing an item and a reference to the next cell. One-way structures are described by classes whose names begin with LINKED_, for example LINKED_LIST . Two-way structures use cells which, in addition to the reference to the next cell, also include a reference to the previous one. Their names begin with TWO_WAY_.
An arrayed implementation uses an array to represent a linear structure. If the array is resizable, the corresponding class name begins with ARRAYED_, for example
ARRAYED_LIST ; if not, the prefix is FIXED_.
Linked structures
A linked structure requires two classes: one, such as LINKED_LIST , describes the list proper; the other, such as LINKABLE , describes the individual list cells. The figure should help understand the difference; it describes a linked list, but the implementation of linked circular chains is similar.
The instance of type LINKED_LIST shown at the top contains general information about the list, such as the number of items (my_list: LINKED_LIST [SOME_TYPE]
will have as its run-time value (if not void) a reference to such an object, which is really a list header. The actual list content is given by the LINKABLE instances, each of which contains a value of type
Clearly, a header of type LINKED_LIST [SOME_TYPE]
will be associated with cells of type LINKABLE [SOME_TYPE]
.
Features such as active and first are used only for the implementation; they are not exported, and so you will not find them in the flat-short specifications, although the figures show them to illustrate the representation technique.
A similar implementation is used for two-way-linked structures such as two-way lists and two-way circular chains.
Linked cells
The classes describing list cells are descendants of a deferred class called CELL , whose features are:
-
item , the contents of the cell. -
put ( v :
like
item )
, which replaces the contents of the cell by a new value.
Class LINKABLE is an effective descendant of CELL , used for one-way linked structures. It introduces features
cell will be linked. Two-way linked structures use BI_LINKABLE , an heir of LINKABLE which to the above features adds
Caution: Do not confuse the
It may be implemented asitem: G is
-- Current item
do
Result := active.item
end
using the
One-way and two-way linked chains
If you look at the interfaces of one-way and two-way linked structures, you will notice that they are almost identical. This is because it is possible to implement features such as
Although correct, such an implementation is of course rather inefficient since it requires a traversal of the list. In terms of algorithmic complexity, it is in O (
Caution: As a consequence, you should not use one-way linked structures if you need to execute more than occasional
Two-way linked structures, such as those described by TWO_WAY_LIST and TWO_WAY_CIRCULAR , treat the two directions symmetrically, so that
Arrayed chains
Arrayed structures as described by ARRAYED_LIST , FIXED_LIST and ARRAYED_CIRCULAR use arrays for their implementations. A list or circular chain of
An instance of FIXED_LIST , as the name suggests, has a fixed number of items. In particular:
- Query extendible has value false for FIXED_LIST : you may replace existing items, but not add any, even at the end. A FIXED_LIST is created with a certain number of items and retains that number.
- As a result, FIXED_LIST joins the deferred feature count of LIST with the feature count of ARRAY, which satisfies the property
count =capacity . - Query
prunable has value false too: it is not possible to remove an item from a fixed list.
In contrast, ARRAYED_LIST has almost the same interface as LINKED_LIST . In particular, it is possible to add items at the end using procedure
Caution: The situation of these features in ARRAYED_LIST is similar to the situation of
Arrayed structures, however, use up less space than linked representations. So they are appropriate for chains on which, except possibly for insertions at the end, few insertion and removal operations or none at all are expected after creation. FIXED_LIST offers few advantages over ARRAYED_LIST . FIXED_LIST may be useful, however, for cases in which the fixed number of items is part of the specification, and any attempt to add more items must be treated as an error. For circular chains only one variant is available, ARRAYED_CIRCULAR , although writing a
Multi-arrayed lists
For lists one more variant is available, combining some of the advantages of arrayed and linked implementations: MULTI_ARRAY_LIST . With this implementation a list is
divided into a number of blocks. Each block is an array, but the successive arrays are linked.
Sorted Linear Structures
The class COMPARABLE_STRUCT , an heir of BILINEAR , is declared asdeferred class
COMPARABLE_STRUCT [G -> COMPARABLE]
inherit
BILINEAR
feature
...
As indicated by the constrained generic parameter it describes bilinear structures whose items may be compared by a total order relation.
Caution: The class name COMPARABLE_STRUCT, chosen for brevity's sake, is slightly misleading: it is not the structures that are comparable but their items.
COMPARABLE_STRUCT introduces the features
structure with a total order relation. SORTED_STRUCT , an heir of COMPARABLE_STRUCT , describes structures that can be sorted; it introduces the query sorted and the command sort.
The deferred class PART_SORTED_LIST describes lists whose items are kept ordered in a way that is compatible with a partial order relation defined on them. The class is declared as deferred class
PART_SORTED_LIST [G -> COMPARABLE]...
An implementation based on two-way linked lists is available through the effective heir SORTED_TWO_WAY_LIST .
The deferred class SORTED_LIST , which inherits from PART_SORTED_LIST , assumes that the order relation on G is a total order. As a result, the class is able to introduce features