Introduction
Lately I wrote a small utility to augment the editing capabilities of EiffelStudio. At it's core is a simple finite state machine for parsing a files contents at the line level rather than the word level.
Concept of a finite state machine
In computer science, a finite state machine is a programming abstraction used to model the behaviour of a system that has a finite number of states, an elevator or a coffee machine for example. The machine can transit from one state to another, typically triggering some event or action. This change is refered to as a transition. For example the coffee machine detects that the cup is nearly full and stops the flow of liquid. One of the most common examples of a state machine is a code parser.
State machines and Eiffel agents
Eiffel agents are very useful for writing state machine programs and can be used to model both the state of the machine and the transition action. The following class found in Eiffel-Loop is a very simple state machine that is surprisingly useful for many applications. It traverses a linear sequence of data objects, starting in an initial state and finishing either when the state becomes final or when the end of the line has been reached. To change the state of the machine all you have to do is assign the state attribute another agent with an open operand of type G. This defines both the state of the machine and the action to be taken when in that state.
class
EL_STATE_MACHINE [G]
feature {NONE} -- Initialization
make
do
state := agent final
create tuple
end
feature -- Basic operations
traverse (initial: like state; sequence: LINEAR [G])
--
local
final_state: EL_PROCEDURE
do
create final_state.make (agent final)
from
sequence.start; state := initial
until
sequence.after or final_state.same_procedure (state)
loop
call (sequence.item)
sequence.forth
end
end
feature {NONE} -- Implementation
call (item: G)
-- call state procedure with item
do
tuple.put_reference (item, 1)
state.set_operands (tuple)
state.apply
end
final (item: G)
--
do
end
state: PROCEDURE [like Current, TUPLE [G]]
tuple: TUPLE [G]
end
Some technical points
- The class EL_PROCEDURE was written expressly for the purpose of detecting whether two procedures represent the same routine using only the 'encaps_rout_disp' pointer attribute. But you do need to remember to force a C compilation in EiffelStudio for it to work.
- The operands are set using a class tuple attribute rather than state.call ([item]) in order to reduce garbage collection.
A state machine for line processing
A useful implementation of EL_STATE_MACHINE is the following class which can be used to process lines of text from a file or some other source.
class
EL_PLAIN_TEXT_LINE_STATE_MACHINE
inherit
EL_STATE_MACHINE [EL_ASTRING]
rename
traverse as do_with_lines
end
feature -- Basic operations
do_once_with_file_lines (initial: like state; lines: EL_LINE_SOURCE [FILE])
do
do_with_lines (initial, lines)
lines.close
end
end
A utility example
Here is presented a utility illustrating the use of EL_PLAIN_TEXT_LINE_STATE_MACHINE. It operates on an Eiffel source file expanding a shorthand notation for common coding patterns as follows.
1. Expand feature codes
Expand lines beginning with @f xx where xx is a two letter code representing one of the standard feature group catagories: Access, Element Change, etc. Lines beginning with @f {xx
will be expanded as feature {NONE} -- <group name>
.For example: @f {in
expands as feature {NONE} -- Initialization
.
2. Expand argument assignments
Expand routines containing shorthand for attribute assignments, inserting an assignment line. For example this shorthand codefeature {NONE} -- Initialization
make (output_file_path:@; input_file_path:@)
--
do
endwill be expanded asfeature {NONE} -- Initialization
make (a_output_file_path: like output_file_path; a_input_file_path: like input_file_path)
--
do
output_file_path := a_output_file_path; input_file_path := a_input_file_path
end
3. Expand setter notation
Expand the notation for set attribute procedures so for example this code:feature -- Element change
@set bit_rate_per_channelfeature -- Element change
set_bit_rate_per_channel (a_bit_rate_per_channel: like bit_rate_per_channel)
do
bit_rate_per_channel := a_bit_rate_per_channel
end
4. Sort features alphabetically
Reorder features within a group so the names will appear in alphabetical order in the EiffelStudio feature view.
Feature editor class
This is code for the top level class that achieves all that using the EL_PLAIN_TEXT_LINE_STATE_MACHINE class.class
EIFFEL_FEATURE_EDITOR
inherit
EL_COMMAND
EL_PLAIN_TEXT_LINE_STATE_MACHINE
rename
make as make_machine
redefine
call
end
EL_MODULE_LOG
FEATURE_CONSTANTS
create
make, default_create
feature {EL_COMMAND_LINE_SUB_APPLICATTION} -- Initialization
make (a_source_path: like source_path)
do
make_machine
source_path := a_source_path
create class_header.make (20)
create feature_groups.make (8)
create class_footer.make (1)
create code_line.make_empty
end
feature -- Basic operations
execute
local
input_lines: EL_FILE_LINE_SOURCE
output: EL_PLAIN_TEXT_FILE
output_lines: EL_ARRAYED_LIST [READABLE_STRING_GENERAL]
do
log.enter ("execute")
create input_lines.make_latin_1 (source_path)
do_once_with_file_lines (agent find_first_feature_block, input_lines)
create output_lines.make (class_footer.count + class_header.count + feature_groups.count * 5)
across class_header as line loop
output_lines.extend (line.item)
end
across feature_groups as group loop
across group.item.header as line loop
output_lines.extend (line.item)
end
group.item.features.sort
across group.item.features as l_feature loop
l_feature.item.expand_inserts
across l_feature.item.lines as line loop
output_lines.extend (line.item)
end
end
end
across class_footer as line loop
output_lines.extend (line.item)
end
create output.make_open_write (source_path)
output.set_encoding_from_other (input_lines)
output.put_lines (output_lines)
output.close
log.exit
end
feature {NONE} -- State handlers
find_first_feature_block (line: EL_ASTRING)
do
if line.starts_with (Keyword_feature) then
feature_groups.extend (create {CLASS_FEATURE_GROUP}.make (line))
state := agent find_first_feature
else
class_header.extend (line)
end
end
find_first_feature (line: EL_ASTRING)
-- find first feature in feature group
do
if is_feature_declaration (line) then
feature_groups.last.features.extend (create {CLASS_FEATURE}.make (line))
state := agent find_next_feature
else
feature_groups.last.header.extend (line)
end
end
find_next_feature (line: EL_ASTRING)
-- find next feature in feature group
do
if Footer_start_keywords.has (line) then
fill_class_footer (line)
state := agent fill_class_footer
elseif line.starts_with (Keyword_feature) then
feature_groups.extend (create {CLASS_FEATURE_GROUP}.make (line))
state := agent find_first_feature
elseif is_feature_declaration (line) then
feature_groups.last.features.extend (create {CLASS_FEATURE}.make (line))
state := agent find_next_feature
else
feature_groups.last.features.last.lines.extend (line)
end
end
fill_class_footer (line: EL_ASTRING)
do
class_footer.extend (line)
end
feature {NONE} -- Implementation
call (line: EL_ASTRING)
local
white_space, code, label: EL_ASTRING
parts: EL_ASTRING_LIST
do
line.right_adjust
if line.starts_with (Feature_abbreviation) then
create parts.make_with_words (line)
if parts.first ~ Feature_abbreviation and parts.count = 2 then
label := "Expanded "
label.append (line)
line.wipe_out
line.append_string ("feature ")
code := parts.i_th (2)
if code [1] = '{' then
line.append_string ("{NONE} ")
code.remove_head (1)
end
line.append_string ("-- ")
Feature_catagories.search (code)
if Feature_catagories.found then
line.append (Feature_catagories.found_item)
else
line.append (code)
end
log_or_io.put_labeled_string (label, line)
log_or_io.put_new_line
end
end
code_line := line.twin
code_line.left_adjust
white_space := line.substring (1, line.count - code_line.count)
tab_count := white_space.occurrences ('%T')
Precursor (line)
end
is_feature_declaration (line: EL_ASTRING): BOOLEAN
-- True if line begins declaration of attribute or routine
do
Result := tab_count = 1 and not code_line.is_empty
end
code_line: EL_ASTRING
tab_count: INTEGER
class_header: EL_ASTRING_LIST
class_footer: EL_ASTRING_LIST
feature_groups: EL_ARRAYED_LIST [CLASS_FEATURE_GROUP]
source_path: EL_FILE_PATH
feature {NONE} -- Constants
Feature_abbreviation: EL_ASTRING
once
Result := "@f"
end
Keyword_feature: EL_ASTRING
once
Result := "feature"
end
Keyword_invariant: EL_ASTRING
once
Result := "invariant"
end
Keyword_end: EL_ASTRING
once
Result := "end"
end
Keyword_note: EL_ASTRING
once
Result := "note"
end
Footer_start_keywords: ARRAY [EL_ASTRING]
once
Result := << Keyword_invariant, Keyword_end, Keyword_note >>
Result.compare_objects
end
end
A vCard splitter
This example is used for splitting a monolithic addressbook in vCard file format, into one file per contact. This is to satisfy the requirements of some Nokia phones like the Asha 300.
class
VCF_CONTACT_SPLITTER
inherit
EL_COMMAND
EL_PLAIN_TEXT_LINE_STATE_MACHINE
rename
make as make_machine
end
EL_MODULE_LOG
EL_MODULE_FILE_SYSTEM
create
default_create, make
feature {EL_SUB_APPLICATION} -- Initialization
make (a_vcf_path: EL_FILE_PATH)
do
make_machine
vcf_path := a_vcf_path
create output_dir.make (vcf_path.to_string)
output_dir.remove_extension
File_system.make_directory (output_dir)
create record_lines.make (10)
end
feature -- Basic operations
execute
local
source_lines: EL_FILE_LINE_SOURCE
do
log.enter ("execute")
create source_lines.make (vcf_path)
source_lines.set_encoding (source_lines.Encoding_iso_8859, 1)
do_once_with_file_lines (agent find_begin_record, source_lines)
log.exit
end
feature {NONE} -- State handlers
find_begin_record (line: EL_ASTRING)
do
if line.starts_with ("BEGIN:") then
state := agent find_end_record
find_end_record (line)
end
end
find_end_record (a_line: EL_ASTRING)
local
output_path: EL_FILE_PATH
file: PLAIN_TEXT_FILE
parts: LIST [EL_ASTRING]
first_name, last_name: EL_ASTRING
do
record_lines.extend (a_line)
if a_line.starts_with ("END:") then
output_path := output_dir + record_id
output_path.add_extension ("vcf")
log_or_io.put_path_field ("Writing", output_path)
log_or_io.put_new_line
create file.make_open_write (output_path)
across record_lines as line loop
file.put_string (line.item.to_utf8)
file.put_character ('%R'); file.put_new_line -- Windows new line
end
file.close
record_lines.wipe_out
state := agent find_begin_record
elseif a_line.starts_with ("N:") then
parts := a_line.split (';')
last_name := parts.i_th (1); first_name := parts.i_th (2)
last_name.remove_head (2)
record_lines.finish
record_lines.replace (Name_template #$ [first_name, last_name])
elseif a_line.starts_with ("X-IRMC-LUID:") then
record_id := a_line.substring (a_line.index_of (':', 1) + 1, a_line.count)
end
end
feature {NONE} -- Implementation
vcf_path: EL_FILE_PATH
output_dir: EL_DIR_PATH
record_id: EL_ASTRING
record_lines: ARRAYED_LIST [EL_ASTRING]
feature {NONE} -- Constants
Name_template: EL_ASTRING
once
Result := "N:$S;$S"
end
end
EL_ASTRING
Anyone wondering what the EL_ASTRING class is about should read ISO-8859 is dead, long live ISO-8859. Last year I made the brave (or foolish) decision to use it throughout Eiffel-Loop. So far I am happy with the results but I am aware it may turn some people off using Eiffel-Loop, but I hope I am wrong about this.
I will make news about EL_ASTRING the subject of another article.
Addendum
I feel I should apologise that this is not immediately downloadable from https://github.com/finnianr/eiffel-loop. This is because I use an NTFS partition for my code on Linux to make it easier to do cross platform development. The downside is I need to run a script to copy the source trees into a Linux partition and fix the file permissions etc. I need to fix this broken script before I can share the code. Will post a comment when this is done.
What about UTF-8?
In anticipation of the question, what if the source is encoded as UTF-8?
A. EL_FILE_LINE_SOURCE changes it's encoding if it finds a BOM at the file start. EL_PLAIN_TEXT_FILE encodes accordingly writing a BOM if required.