Let's talk about the encoding library
Do you know about the "encoding" library?
Overview
This provides ways to convert a text from an encoding to another, and in addition, it provides a way to print unicode to the console (which is not supported by default by ANY.io.put_string (...)
).
Where can I find it ?
- with EiffelStudio: $ISE_LIBRARY/library/encoding
- subversion: https://svn.eiffel.com/eiffelstudio/trunk/Src/library/encoding
The main interfaces
SYSTEM_ENCODINGS
This interface provides most used encoding, such as utf8, utf32, ISO-8859-1, ... and also has a convenient way to get the encoding of the system, or the console.
ENCODING
The main interface to convert a text from an encoding to another thanks to the functionconvert_to (a_to_encoding: ENCODING; a_string: READABLE_STRING_GENERAL) -- Convert `a_string' from current encoding to `a_to_encoding'. -- If either current or `a_to_encoding' is not `is_valid', or an error occurs during conversion, -- `last_conversion_successful' is unset. -- Conversion result can be retrieved via `last_converted_string' or `last_converted_stream'. require a_to_encoding_not_void: a_to_encoding /= Void a_string_not_void: a_string /= Void
Converting text from an encoding to another
For instance, if you want to convert a text from UTF-32 to ISO-8859-1 encodingclass TEST inherit SYSTEM_ENCODINGS feature test local s: STRING do utf32.convert_to (iso_8859_1, {STRING_32} "my unicode text") s := utf32.last_converted_string_8 end end
There are a few useful status reports like
{ENCODING}.last_conversion_successful: BOOLEAN
: to ensure the conversion went well{ENCODING}.last_conversion_lost_data: BOOLEAN
: to know if the last conversion lost data (could happen for instance when converting true unicode text to ISO-8859-1).{ENCODING}.last_conversion_string_32: STRING_32
: to get the unicode converted text.
You can also create a custom ENCODING by passing code page, most know are available via CODE_PAGE_CONSTANTS
, note that you can use the "i18n" library to get dynamically code page by its name.
Write unicode into the console
Thanks to the class LOCALIZED_PRINTER
, it is possible to output unicode into the console, either use localized_print (a_str: detachable READABLE_STRING_GENERAL)
or localized_print_error (a_str: detachable READABLE_STRING_GENERAL)
(to output in the stderr). It is assuming `a_str' is a UTF-32 string.
Alternative solutions
Note that Eiffel Base includes a UTF_CONVERTER
class, that is specialized for UTF-* conversions, and it may be enough for most of an application need, the encoding libraries is still needed for specific encoding, and also to output unicode into the console.
Related library
i18n the Internationalization and localization library
- i18n stands for InternationalizatioN (I+18 character+N).
- It provides Internationalization and localization functionalities.
- Please see $ISE_LIBRARY/library/i18n (or subversion https://svn.eiffel.com/eiffelstudio/trunk/Src/library/i18n )
- Documentation: http://dev.eiffel.com/Internationalization/User_guide
- And among others functionalities, it can provide encoding code page value to be used with the encoding library.