Read-only strings

by Colin Adams (modified: 2007 Apr 13)

Why aren't literal strings in Eiffel read-only? Indeed, why isn't class STRING read-only? I think (as do others) that Java gets in right in this respect, with it's separation of String and StringBuffer.

All sorts of hard-to-debug problems can arise because there is no read-only variant of STRING. We had one of these occur in the Gobo XML library - the XML parser was corrupted because one of the event filters was editing one of the strings emitted by the parser.

Our work-around for this problem was the following class:

indexing description: "STRINGs with copy-on-write semantics" library: "Gobo Eiffel String Library" copyright: "Copyright (c) 2005, Colin Adams and others" license: "MIT License" date: "$Date: 2007-01-26 18:55:25 +0000 (Fri, 26 Jan 2007) $" revision: "$Revision: 5877 $" class interface ST_COPY_ON_WRITE_STRING create make feature -- Access item: STRING_8 -- String safe_item: STRING_8 -- Version of item that is safe for editing ensure safe_to_edit: changed same_as_item: Result /= Void and then Result = item feature -- Element change append_character (c: CHARACTER_8) -- Append `c' at end. ensure new_count: item.count = old item.count + 1 appended: item.item (item.count) = c safe_to_edit: changed append_string (s: STRING_8) -- Append a copy of `s' at end. require s_not_void: s /= Void ensure safe_to_edit: changed fill_with (c: CHARACTER_8) -- Replace every character with `c'. ensure same_count: old item.count = item.count filled: item.occurrences (c) = item.count safe_to_edit: changed insert_character (c: CHARACTER_8; i: INTEGER_32) -- Insert `c' at index `i', shifting characters between -- ranks `i' and `count' rightwards. require valid_insertion_index: 1 <= i and i <= item.count + 1 ensure one_more_character: item.count = old item.count + 1 inserted: item.item (i) = c safe_to_edit: changed put (c: CHARACTER_8; i: INTEGER_32) -- Replace character at index `i' by `c' require valid_index: item.valid_index (i) ensure stable_count: item.count = old item.count replaced: item.item (i) = c safe_to_edit: changed invariant item_not_void: item /= Void end -- class ST_COPY_ON_WRITE_STRING

Hm. Now I look at it, it seems that the contract for append_string could be strengthened in the postconditions.

The _8 suffixes are only present because I used the EiffelStudio interface view.

This class can be used in a lot of situations to avoid problems. The basic idea is to avoid duplicating strings needlessly.

But it would be much better if we could follow the Java line here.

I think it ought to be possible to do this without breaking (much) existing code, by making use of the convert keyword. Rather than having STRING_GENERAL inherit from READ_ONLY_STRING_GENERAL, we have parallel hierarchies, and say that the read-only versions can convert to the existing versions (the creation procedures involved would of course copy the characters of the string).

Then we could change string literals to be of type READ_ONLY_STRING (or rather, one of its aliases), and everything should work just fine.

Comments
  • Peter Gummer (17 years ago 14/4/2007)

    Mutable strings are evil

    I, too, often encounter hard-to-track-down bugs in Eiffel code caused by the fact that STRING is mutable. Every other OO language that I've worked with (Delphi, C#, Java) treats strings as immutable. This is a bit weird, because it means that strings are reference types with some expanded semantics; so the Eiffel approach is more consistent with the rest of the type system. But having a few years of Eiffel development under my belt now, I can safely say that it's not just a prejudice based on what I'm used to: mutable strings are evil!

  • Colin Adams (17 years ago 14/4/2007)

    CONSTANT_STRING

    CONSTANT_STRING_GENERAL, CONSTANT_STRING_8 and CONSTANT_STRING_32 look like better names. Colin Adams

    • Colin Adams (17 years ago 15/4/2007)

      Performance of string comparisons

      The disadvantage of using convert is that the cost of string comparisons, already an expensive operation, is increased by the need to create a temporary object. Colin Adams

  • Paul Bates (17 years ago 17/4/2007)

    I agree

    Immutable STRING variants are something that I've brought up a number of times. I fully agree that Eiffel needs immutable strings. Keep up the comments.

    • Colin Adams (17 years ago 17/4/2007)

      Questionnaire

      It might be worth running a poll on this. Options such as:

      1. Keep the status Quo
      2. Have STRING_GENERAL inherit CONSTANT_STRING_GENERAL (breaking all existing Eiffel code)
      3. as my suggestion, despite the performance implications

      Colin Adams

      • Eric Bezault (17 years ago 17/4/2007)

        Inherit and convert

        Can't we have both 2) and 3)? Mutable strings would conform to constant strings and constant strings would convert to mutable strings. Comparison would use constant strings as argument and hence conformance would be involved (no performance implications). The problem with STRING_GENERAL inheriting from CONSTANT_STRING_GENERAL is that even though it's a good way for a feature to state that it won't alter strings passed as arguments if declared as constant, it does not mean that this string will not be modified by another feature (or the same feature after assignment attempt) if its dynamic type is in fact one of a mutable string. So we think that the string passed as CONSTANT_GENERAL_STRING will not change, but it can. In fact it's not surprising. Even if the inheritance is appealing, a mutable string is not a constant string. It's like having RECTANGLE inherit from SQUARE. Hmmm, so I guess that in order to be 100% safe we need conversion in both ways. For comparison and performance, we probably need a common ancestor to STRING_GENERAL and CONSTANT_STRING_GENERAL. READONLY_STRING_GENERAL? We can only read the content of the string through this interface, but it's not necessarily a constant string. Polymorphically it can be attached to a mutable string whose content can be modified.

        • Colin Adams (17 years ago 17/4/2007)

          Comparison

          I think READABLE_STRING is better than READONLY_STRING_GENERAL (because the dynamic type may also be writable, and I think we will not need _8 and _32 descendants for this class).

          For comparison, there is same_string and is_equal. Same_string can accept a READABLE_STRING, and so avoid a conversion, but what about is_equal? It takes a like Current argument, and so a conversion will be involved. Colin Adams

          • Paul Bates (17 years ago 18/4/2007)

            Taking the discussion elsewhere.

            I'm in the process of creating a Wiki page with rationale and implementation suggestions. I'll post a link when it in at least a legible draft state.

          • Martin Seiler (17 years ago 18/4/2007)

            I think the signature of is_equal is subject to change anyway.

            like Current will be replaced with ANY IIRC.

            -- mTn-_-|