Programming Python (91 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
9.92Mb size Format: txt, pdf, ePub
Unicode (Internationalized) text support

Finally, because
Python 3.X now fully supports Unicode text, this version
of PyEdit does, too—it allows text of arbitrary Unicode encodings and
character sets to be opened and saved in files, viewed and edited in
its GUI, and searched by its directory search utility. This support is
reflected in PyMailGUI’s user interface in a variety of ways:

  • Opens must ask the user for an encoding (suggesting the
    platform default) if one is not provided by the client application
    or configuration

  • Saves of new files must ask for an encoding if one is not
    provided by configuration

  • Display and edit must rely on the GUI toolkit’s own support
    for Unicode text

  • Grep directory searches must allow for input of an encoding
    to apply to all files in the tree and skip files that fail to
    decode, as described earlier

The net result is to support Internationalized text which may
differ from the platform’s default encoding. This is particularly
useful for text files fetched over the Internet by email or FTP.
Chapter 14
’s PyMailGUI, for example, uses an
embedded PyEdit object to view text attachments of arbitrary origin
and encoding. The Grep utility’s Unicode support was described
earlier; the remainder of this model essentially reduces to file opens
and saves, as the next section describes.

Unicode file and display model

Because strings are always Unicode code-point strings once
they are created in memory, Unicode support really means supporting
arbitrary encodings for text files when they are read and written.
Recall that text can be stored in files in a variety of Unicode
encoding format schemes; strings are decoded from these formats when
read and encoded to them when written. Unless text is always stored
in files using the platform’s default encoding, we need to know
which encoding to use, both to load and to save.

To make this work, PyEdit uses the approaches described in
detail in
Chapter 9
, which we
won’t repeat in full here. In brief, though, tkinter’s
Text
widget accepts content as either
str
and
bytes
and always returns it as
str
. PyEdit maps this interface to and
from Python file objects as follows:

Input files (Open)

Decoding from
file bytes to strings in general requires the
name of an encoding type that is compatible with data in the
file, and fails if the two do not agree (e.g., decoding 8-bit
data to ASCII). In some cases, the Unicode type of the text
file to be opened may be unknown.

To load, PyEdit first tries to open input files in text
mode to read
str
strings,
using an encoding obtained from a variety of sources—a method
argument for a known type (e.g., from headers of email
attachments or source files opened by demos), a user dialog
reply, a configuration module setting, and the platform
default. Whenever prompting users for an open encoding, the
dialog is prefilled with the first choice implied by the
configuration file, as a default and suggestion.

If all these encoding sources fail to decode, the file
is opened in binary mode to read text as
bytes
without an encoding name,
effectively delegating encoding issues to the Tk GUI library;
in this case, any
\r\n
end-lines are manually converted to
\n
on Windows so they correctly
display and save later. Binary mode is used only as a last
resort, to avoid relying on Tk’s policies and limited
character set support for raw bytes.

Text Processing

The tkinter
Text
widget returns its content on request as
str
strings, regardless of whether
str
or
bytes
were inserted. Because of
that, all text processing of content fetched from the GUI is
conducted in terms of
str
Unicode strings here.

Output files (Save, Save As)

Encoding from strings to file bytes is generally more
flexible than decoding and need not use the same encoding from
which the string was decoded, but can also fail if the chosen
scheme is too narrow to handle the string’s content (e.g.,
encoding 8-bit text to ASCII).

To save, PyEdit opens output files in text mode to
perform end-line mappings and Unicode encoding of
str
content. An encoding name is
again fetched from one of a variety of sources—the same
encoding used when the file was first opened or saved (if
any), a user dialog reply, a configuration module setting, and
the platform default. Unlike opens, save dialogs that prompt
for encodings are prefilled with the known encoding if there
is one as a suggestion; otherwise, the dialog is prefilled
with the next configured choice as a default, as for
opens.

The user input dialog on opens and saves is the only GUI
implication of these policies; other options are selected in
configuration module assignments. Since it’s impossible to predict
all possible use case scenarios, PyEdit takes a liberal approach: it
supports all conceivable modes, and allows the way it obtains file
encodings to be heavily tailored by users in the package’s own
textConfig
module. It attempts
one encoding name source after another, if enabled in
textConfig
, until it finds an encoding
that works. This aims to provide maximum flexibility in the face of
an uncertain Unicode world.

For example, subject to settings in the configuration file,
saves reuse the encoding used for the file when it was opened or
initially saved, if known. Both new files begun from scratch (with
New or manual text inserts) and files opened in binary mode as a
last resort have no known encoding until saved, but files previously
opened as text do. Also subject to configuration file settings, we
may prompt users for an encoding on Save As (and possibly Save)
because they may have a preference for new files they create. We
also may prompt when opening an existing file, because this requires
its current encoding; although the user may not always know what
this is (e.g., files fetched over the Internet), the user may wish
to provide it in others. Rather than choosing a course of action in
such cases, we rely on user configuration.

All of this is really relevant only to PyEdit clients that
request an initial file load or allow files to be opened and saved
in the GUI. Because content can be inserted as
str
or
bytes
, clients can always open and read
input files themselves prior to creating a text editor object and
insert the text manually for viewing. Moreover, clients can fetch
content manually and save in any fashion preferred. Such a manual
approach might prove useful if PyEdit’s polices are undesirable for
a given context. Since the
Text
widget always returns content as a
str
, the rest of this program is
unaffected by the data type of text inserted.

Keep in mind that these policies are still subject to the
Unicode support and constraints of the underlying Tk GUI toolkit, as
well as Python’s tkinter interface to it. Although PyEdit allows
text to be loaded and saved in arbitrary Unicode encodings, it
cannot guarantee that the GUI library will display such text as you
wish. That is, even if we get the Unicode story right on the Python
side of the fence, we’re still at the mercy of other software layers
which are beyond the scope of this book. Tk seems to be robust
across a wide range of character sets if we pass it already decoded
Python
str
Unicode strings (see
the Internationalization support in
Chapter 14
’s PyMailGUI for samples), but your
mileage might vary.

Unicode options and choices

Also keep in mind that the Unicode policies adopted in PyEdit
reflect the use cases of its sole current user, and have not been
broadly tested for ergonomics and generality; as a book example,
this doesn’t enjoy the built-in test environment of open source
projects. Other schemes and source orderings might work well, too,
and it’s impossible to guess the preferences of every user in every
context. For instance:

  • It’s not clear if user prompts should be attempted before
    configuration settings, or vice-versa.

  • Perhaps we also should always ask the user for an encoding
    as a last resort, irrespective of configuration settings.

  • For saves, we could also try to guess an encoding to apply
    to the
    str
    content (e.g., try
    UTF-8, Latin-1, and other common types), but our guess may not
    be what the user has in mind.

  • It’s likely that users will wish to save a file in the
    same encoding with which it was first opened, or initially saved
    if started from scratch. PyEdit provides support to do so, or
    else the GUI might ask for a given file’s encoding more than
    once. However, because some users might also want to use Save
    again to overwrite the same file with a different encoding, this
    can be disabled in the configuration module. The latter role
    might sound like a Save As, but the next bullet explains why it
    may not.

  • Similarly, it’s not obvious if Save As should also reuse
    the encoding used when the file was first opened or initially
    saved or ask for a new one—is this a new file entirely, or a
    copy of the prior text with its known encoding under a new name?
    Because of such ambiguities, we allow the known-encoding memory
    feature to be disabled for Save As, or for both Save and Save As
    in the configuration module. As shipped, it is enabled for Save
    only, not Save As. In all cases, save encoding prompt dialogs
    are prefilled with a known encoding name as a default.

  • The ordering of choice seems debatable in general. For
    instance, perhaps Save As should fall back on the known encoding
    if not asking the user; as is, if configured to not ask and not
    use a known encoding, this operation will fall back on saving
    per an encoding in the configuration file or the platform
    default (e.g., UTF-8), which may be less than ideal for email
    parts of known encodings.

And so on. Because such user interface choices require wider
use to resolve well, the general and partly heuristic policy here is
to support every option for illustration purposes in this book, and
rely on user configuration settings to resolve choices. In practice,
though, such wide flexibility may turn out to be overkill; most
users probably just require one of the policies supported
here.

It may also prove better to allow Unicode policies to be
selected in the GUI itself, instead of coded in a configuration
module. For instance, perhaps every Open, Save, and Save As should
allow a Unicode encoding selection, which defaults to the last known
encoding, if any. Implementing this as a pull-down encoding list or
entry field in the Save and Open dialogs would avoid an extra pop up
and achieve much the same
flexibility
.

In PyEdit’s current implementation, enabling user prompts in
the configuration file for both opens and saves will have much the
same effect, and at least based upon use cases I’ve encountered to
date, that is probably the best policy to adopt for most
contexts.

Hence, as shipped:

  • Open uses a passed-in encoding, if any, or else prompts
    for an encoding name first

  • Save reuses a known encoding if it has one, and otherwise
    prompts for new file saves

  • Save As always prompts for an encoding name first for the
    new file

  • Grep allows an encoding to be input in its dialog to apply
    to the full tree searched

On the other hand, because the platform default will probably
work silently without extra GUI complexity for the vast majority of
users anyhow, the
textConfig
setting can prevent the pop ups altogether and fall back on an
explicit encoding or platform default. Ultimately, structuring
encoding selection well requires the sort of broad user experience
and feedback which is outside this book’s scope, not the guesses of
a single developer. As always, feel free to tailor as you
like.

See the
test
subdirectory
in the examples for a few Unicode text files to experiment with
opening and saving, in conjunction with
textConfig
changes. As suggested when we
saw Figures
11-5
and
11-6
, this directory contains files
that use International character sets, saved in different encodings.
For instance, file
email-part--koi8-r
there is formatted per
the Russian encoding koi8-r, and
email-part--koi8-r--utf8
is the same file
saved in UTF-8 encoding format; the latter works well in Notepad on
Windows, but the former will only display properly when giving an
explicit encoding name to PyEdit.

Better yet, make a few Unicode files yourself, by changing
textConfig
to hardcode encodings
or always ask for encodings—thanks largely to Python 3.X’s Unicode
support, PyEdit allows you to save and load in whatever encoding
you wish.

More on Quit checks: The event
revisited

Before we get to
the code, one of version 2.1’s changes merits a few
additional words, because it illustrates the fundamentals of tkinter
window closure in a realistic context. We learned in
Chapter 8
that tkinter also has a

event for the
bind
method which is run when
windows and widgets are destroyed. Although we could bind this event
on PyEdit windows or their text widgets to catch destroys on program
exit, this won’t quite help with the use case here. Scripts cannot
generally do anything GUI-related in this event’s callback, because
the GUI is being torn down. In particular, both testing a text widget
for modifications and fetching its content in a

handler can fail with an
exception. Popping up a save verification dialog at this point may act
oddly, too: it only shows up after some of the window’s widgets may
have already been erased (including the text widget whose contents the
user may wish to inspect and save!), and it might sometimes refuse to
go away altogether.

As also mentioned in
Chapter 8
, running a
quit
method call does not trigger any

events, but does
trigger a fatal Python error message on exit. To use destroy events at
all, PyEdit would have to be redesigned to close windows on Quit
requests with the
destroy
method
only, and rely on the
Tk
root
window destruction protocol for exits; immediate shutdowns would be
unsupported, or require tools such as
sys.exit
. Since

doesn’t allow GUI operations
anyhow, this change is unwarranted. Code after
mainloop
won’t help here either, because
mainloop
is called outside PyEdit’s
code, and this is far too late to detect text changes and save in any
event (pun nearly accidental).

In other words,

won’t help—it doesn’t
support the goal of verifying saves on window closes, and it doesn’t
address the issue of
quit
and
destroy
calls run for widgets
outside the scope of PyEdit window classes. Because of such
complications, PyEdit instead relies on checking for changes in each
individual window before closed, and for changes in its cross-process
window list before quits in any of its main windows. Applications that
follow its expected window model check for changes automatically.
Applications that embed a PyEdit as a component of a larger GUI, or
use it in other ways that are outside PyEdit’s control, are
responsible for testing for edit changes on closes if they should be
saved, before the PyEdit object or its widgets are destroyed.

To experiment with the

event’s behavior yourself,
see file
destroyer.py
in the book
examples package; it simulates what PyEdit would need to do on

. Here is the
crucial subset of its code, with comments that explain
behavior:

def onDeleteRequest():
print('Got wm delete') # on window X: can cancel destroy
root.destroy() # triggers
def doRootDestroy(event):
print('Got event ') # called for each widget in root
if event.widget == text:
print('for text')
print(text.edit_modified()) # <= Tcl error: invalid widget
ans = askyesno('Save stuff?', 'Save?') # <= may behave badly
if ans: print(text.get('1.0', END+'-1c')) # <= Tcl error: invalid widget
root = Tk()
text = Text(root, undo=1, autoseparators=1)
text.pack()
root.bind('', doRootDestroy) # for root and children
root.protocol('WM_DELETE_WINDOW', onDeleteRequest) # on window X button
Button(root, text='Destroy', command=root.destroy).pack() # triggers
Button(root, text='Quit', command=root.quit).pack() # <= fatal Python error,
mainloop() # no on quit()

See the code listings in the next section for more on all of the
above. Also be sure to see the mail file’s documentation string for a
list of suggested enhancements and open issues (noted under “TBD”).
PyEdit is largely designed to work according to my preferences, but
it’s open to customization for yours.

Other books

Karen Mercury by Manifested Destiny [How the West Was Done 4]
Bereavements by Richard Lortz
Nine Perfect Strangers by Liane Moriarty
Swift as Desire by Laura Esquivel
One Wicked Sin by Nicola Cornick
The Prophecy of the Gems by Flavia Bujor