Programming Python (5 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
12.84Mb size Format: txt, pdf, ePub

[
2
]
No, I’m serious. In the Python classes I teach, I had for many
years regularly used the name “Bob Smith,” age 40.5, and jobs
“developer” and “manager” as a supposedly fictitious database
record—until a class in Chicago, where I met a student named Bob
Smith, who was 40.5 and was a developer and manager. The world is
stranger than it seems.

Step 2: Storing Records Persistently

So far, we’ve
settled on a dictionary-based representation for our
database of records, and we’ve reviewed some Python data structure
concepts along the way. As mentioned, though, the objects we’ve seen so
far are temporary—they live in memory and they go away as soon as we exit
Python or the Python program that created them. To make our people
persistent, they need to be stored in a file of some sort.

Using Formatted Files

One way to
keep our data around between program runs is to write all
the data out to a simple text file, in a formatted way. Provided the
saving and loading tools agree on the format selected, we’re free to use
any custom scheme we like.

Test data script

So that we
don’t have to keep working interactively, let’s first
write a script that initializes the data we are going to store (if
you’ve done any Python work in the past, you know that the interactive
prompt tends to become tedious once you leave the realm of simple
one-liners).
Example 1-1
creates
the sort of records and database dictionary we’ve been working with so
far, but because it is a module, we can import it repeatedly without
having to retype the code each time. In a sense, this module is a
database itself, but its program code format doesn’t support automatic
or end-user updates as is.

Example 1-1. PP4E\Preview\initdata.py

# initialize data to be stored in files, pickles, shelves
# records
bob = {'name': 'Bob Smith', 'age': 42, 'pay': 30000, 'job': 'dev'}
sue = {'name': 'Sue Jones', 'age': 45, 'pay': 40000, 'job': 'hdw'}
tom = {'name': 'Tom', 'age': 50, 'pay': 0, 'job': None}
# database
db = {}
db['bob'] = bob
db['sue'] = sue
db['tom'] = tom
if __name__ == '__main__': # when run as a script
for key in db:
print(key, '=>\n ', db[key])

As usual, the
__name__
test
at the bottom of
Example 1-1
is
true only when this file is run, not when it is imported. When run as
a top-level script (e.g., from a command line, via an icon click, or
within the IDLE GUI), the file’s self-test code under this test dumps
the database’s contents to the standard output stream (remember,
that’s what
print
function-call
statements do by default).

Here is the script in action being run from a system command
line on Windows. Type the following command in a Command Prompt window
after a
cd
to the directory where
the file is stored, and use a similar console window on other types of
computers:

...\PP4E\Preview>
python initdata.py
bob =>
{'job': 'dev', 'pay': 30000, 'age': 42, 'name': 'Bob Smith'}
sue =>
{'job': 'hdw', 'pay': 40000, 'age': 45, 'name': 'Sue Jones'}
tom =>
{'job': None, 'pay': 0, 'age': 50, 'name': 'Tom'}
File name conventions

Since this is our
first source file (a.k.a. “script”), here are three
usage notes for this book’s examples:

  • The text
    ...\PP4E\Preview>
    in the first line
    of the preceding example listing stands for your operating
    system’s prompt, which can vary per platform; you type just the
    text that follows this prompt (
    python
    initdata.py
    ).

  • Like all examples in this book, the system prompt also gives
    the directory in the downloadable book examples package where this
    command should be run. When running this script using a
    command-line in a system shell, make sure the shell’s current
    working directory is
    PP4E\Preview
    . This can matter for
    examples that use files in the working directory.

  • Similarly, the label that precedes every example file’s code
    listing tells you where the source file resides in the examples
    package. Per the
    Example 1-1
    listing label shown earlier, this script’s full filename is
    PP4E\Preview\initdata.py
    in
    the
    examples
    tree.

We’ll use these conventions throughout the book; see the Preface
for more on getting the examples if you wish to work along. I
occasionally give more of the directory path in system prompts when
it’s useful to provide the extra execution context, especially in the
system part of the book (e.g., a “C:\” prefix from Windows or more
directory names).

Script start-up pointers

I gave
pointers for using the interactive prompt earlier. Now
that we’ve started running script files, here are also a few quick
startup pointers for using Python scripts in general:

  • On some platforms, you may need to type the full directory
    path to the Python program on your machine; if Python isn’t on
    your system path setting on Windows, for example, replace
    python
    in the command with
    C:\Python31\python
    (this assumes you’re
    using Python 3.1).

  • On most Windows systems you also don’t need to type
    python
    on the command line at all; just
    type the file’s name to run it, since Python is registered to open
    “.py” script files.

  • You can also run this file inside Python’s standard IDLE GUI
    (open the file and use the Run menu in the text edit window), and
    in similar ways from any of the available third-party Python IDEs
    (e.g., Komodo, Eclipse, NetBeans, and the Wing IDE).

  • If you click the program’s file icon to launch it on
    Windows, be sure to add an
    input()
    call to the bottom of the script
    to keep the output window up. On other systems, icon clicks may
    require a
    #!
    line at the top
    and executable permission via a
    chmod
    command.

I’ll assume here that you’re able to run Python code one way or
another. Again, if you’re stuck, see other books such as
Learning
Python
for the full story on launching Python
programs.

Data format script

Now, all we
have to do is store all of this in-memory data in a
file. There are a variety of ways to accomplish this; one of the most
basic is to write one piece of data at a time, with separators between
each that we can use when reloading to break the data apart.
Example 1-2
shows one way to code
this idea.

Example 1-2. PP4E\Preview\make_db_file.py

"""
Save in-memory database object to a file with custom formatting;
assume 'endrec.', 'enddb.', and '=>' are not used in the data;
assume db is dict of dict; warning: eval can be dangerous - it
runs strings as code; could also eval() record dict all at once;
could also dbfile.write(key + '\n') vs print(key, file=dbfile);
"""
dbfilename = 'people-file'
ENDDB = 'enddb.'
ENDREC = 'endrec.'
RECSEP = '=>'
def storeDbase(db, dbfilename=dbfilename):
"formatted dump of database to flat file"
dbfile = open(dbfilename, 'w')
for key in db:
print(key, file=dbfile)
for (name, value) in db[key].items():
print(name + RECSEP + repr(value), file=dbfile)
print(ENDREC, file=dbfile)
print(ENDDB, file=dbfile)
dbfile.close()
def loadDbase(dbfilename=dbfilename):
"parse data to reconstruct database"
dbfile = open(dbfilename)
import sys
sys.stdin = dbfile
db = {}
key = input()
while key != ENDDB:
rec = {}
field = input()
while field != ENDREC:
name, value = field.split(RECSEP)
rec[name] = eval(value)
field = input()
db[key] = rec
key = input()
return db
if __name__ == '__main__':
from initdata import db
storeDbase(db)

This is a somewhat complex program, partly because it has both
saving and loading logic and partly because it does its job the hard
way; as we’ll see in a moment, there are better ways to get objects
into files than by manually formatting and parsing them. For simple
tasks, though, this does work; running
Example 1-2
as a script writes the
database out to a flat file. It has no printed output, but we can
inspect the database file interactively after this script is run,
either within IDLE or from a console window where you’re running these
examples (as is, the database file shows up in the current working
directory):

...\PP4E\Preview>
python make_db_file.py
...\PP4E\Preview>
python
>>>
for line in open('people-file'):
...
print(line, end='')
...
bob
job=>'dev'
pay=>30000
age=>42
name=>'Bob Smith'
endrec.
sue
job=>'hdw'
pay=>40000
age=>45
name=>'Sue Jones'
endrec.
tom
job=>None
pay=>0
age=>50
name=>'Tom'
endrec.
enddb.

This file is simply our database’s content with added
formatting. Its data originates from the test data initialization
module we wrote in
Example 1-1
because that is the module from which
Example 1-2
’s self-test code imports
its data. In practice,
Example 1-2
itself could be imported
and used to store a variety of databases and files.

Notice how data to be written is formatted with the as-code
repr
call and is re-created with
the
eval
call, which treats strings
as Python code. That allows us to store and re-create things like the
None
object, but it is potentially
unsafe; you shouldn’t use
eval
if
you can’t be sure that the database won’t contain malicious code. For
our purposes, however, there’s probably no cause for
alarm.

Utility scripts

To test
further,
Example 1-3
reloads the database
from a file each time it is run.

Example 1-3. PP4E\Preview\dump_db_file.py

from make_db_file import loadDbase
db = loadDbase()
for key in db:
print(key, '=>\n ', db[key])
print(db['sue']['name'])

And
Example 1-4
makes
changes by loading, updating, and storing again.

Example 1-4. PP4E\Preview\update_db_file.py

from make_db_file import loadDbase, storeDbase
db = loadDbase()
db['sue']['pay'] *= 1.10
db['tom']['name'] = 'Tom Tom'
storeDbase(db)

Here are the dump script and the update script in action at a
system command line; both Sue’s pay and Tom’s name change between
script runs. The main point to notice is that the data stays around
after each script exits—our objects have become persistent simply
because they are mapped to and from text files:

...\PP4E\Preview>
python dump_db_file.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones
...\PP4E\Preview>
python update_db_file.py
...\PP4E\Preview>
python dump_db_file.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay':
44000.0
, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
{'pay': 0, 'job': None, 'age': 50, 'name':
'Tom Tom'
}
Sue Jones

As is, we’ll have to write Python code in scripts or at the
interactive command line for each specific database update we need to
perform (later in this chapter, we’ll do better by providing
generalized console, GUI, and web-based interfaces instead). But at a
basic level, our text file is a database of records. As we’ll learn in
the next section, though, it turns out that we’ve just done a lot of
pointless
work.

Using Pickle Files

The formatted text
file scheme of the prior section works, but it has some
major limitations. For one thing, it has to read the entire database
from the file just to fetch one record, and it must write the entire
database back to the file after each set of updates. Although storing
one record’s text per file would work around this limitation, it would
also complicate the program further.

For another thing, the text file approach assumes that the data
separators it writes out to the file will not appear in the data to be
stored: if the characters
=>
happen to appear in the data, for example, the scheme will fail. We
might work around this by generating XML text to represent records in
the text file, using Python’s XML parsing tools, which we’ll meet later
in this text, to reload; XML tags would avoid collisions with actual
data’s text, but creating and parsing XML would complicate the program
substantially too.

Perhaps worst of all, the formatted text file scheme is already
complex without being general: it is tied to the
dictionary-of-dictionaries structure, and it can’t handle anything else
without being greatly expanded. It would be nice if a general tool
existed that could translate any sort of Python data to a format that
could be saved in a file in a single step.

That is exactly what the Python
pickle
module is designed to do. The
pickle
module translates an in-memory Python
object into a
serialized
byte stream—a string of
bytes that can be written to any file-like object. The
pickle
module also knows how to reconstruct
the original object in memory, given the serialized byte stream: we get
back the exact same object. In a sense, the
pickle
module replaces proprietary data
formats—its serialized format is general and efficient enough for any
program. With
pickle
, there is no
need to manually translate objects to data when storing them
persistently, and no need to manually parse a complex format to get them
back. Pickling is similar in spirit to XML representations, but it’s
both more Python-specific, and much simpler to code.

The net effect is that pickling allows us to store and fetch
native Python objects as they are and in a single step—we use normal
Python syntax to process pickled records. Despite what it does, the
pickle
module is remarkably easy to
use.
Example 1-5
shows how to
store our records in a flat file, using
pickle
.

Example 1-5. PP4E\Preview\make_db_pickle.py

from initdata import db
import pickle
dbfile = open('people-pickle', 'wb') # use binary mode files in 3.X
pickle.dump(db, dbfile) # data is bytes, not str
dbfile.close()

When run, this script stores the entire database (the dictionary
of dictionaries defined in
Example 1-1
) to a flat file named
people-pickle
in the current working directory. The
pickle
module handles the work of
converting the object to a string.
Example 1-6
shows how to access the
pickled database after it has been created; we simply open the file and
pass its content back to
pickle
to
remake the object from its serialized string.

Example 1-6. PP4E\Preview\dump_db_pickle.py

import pickle
dbfile = open('people-pickle', 'rb') # use binary mode files in 3.X
db = pickle.load(dbfile)
for key in db:
print(key, '=>\n ', db[key])
print(db['sue']['name'])

Here are these two scripts at work, at the system command line
again; naturally, they can also be run in IDLE, and you can open and
inspect the pickle file by running the same sort of code interactively
as well:

...\PP4E\Preview>
python make_db_pickle.py
...\PP4E\Preview>
python dump_db_pickle.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay': 40000, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
{'pay': 0, 'job': None, 'age': 50, 'name': 'Tom'}
Sue Jones

Updating with a pickle file is similar to a manually formatted
file, except that Python is doing all of the formatting work for us.
Example 1-7
shows how.

Example 1-7. PP4E\Preview\update-db-pickle.py

import pickle
dbfile = open('people-pickle', 'rb')
db = pickle.load(dbfile)
dbfile.close()
db['sue']['pay'] *= 1.10
db['tom']['name'] = 'Tom Tom'
dbfile = open('people-pickle', 'wb')
pickle.dump(db, dbfile)
dbfile.close()

Notice how the entire database is written back to the file after
the records are changed in memory, just as for the manually formatted
approach; this might become slow for very large databases, but we’ll
ignore this for the moment. Here are our update and dump scripts in
action—as in the prior section, Sue’s pay and Tom’s name change between
scripts because they are written back to a file (this time, a pickle
file):

...\PP4E\Preview>
python update_db_pickle.py
...\PP4E\Preview>
python dump_db_pickle.py
bob =>
{'pay': 30000, 'job': 'dev', 'age': 42, 'name': 'Bob Smith'}
sue =>
{'pay':
44000.0
, 'job': 'hdw', 'age': 45, 'name': 'Sue Jones'}
tom =>
{'pay': 0, 'job': None, 'age': 50, 'name':
'Tom Tom'
}
Sue Jones

As we’ll learn in
Chapter 17
,
the Python pickling system supports nearly arbitrary object types—lists,
dictionaries, class instances, nested structures, and more. There, we’ll
also learn about the pickler’s text and binary storage protocols; as of
Python 3, all protocols use
bytes
objects to represent pickled data, which in turn requires pickle files
to be opened in binary mode for all protocols. As we’ll see later in
this chapter, the pickler and its data format also underlie shelves and
ZODB databases, and pickled class instances provide both data and
behavior for objects stored.

In fact, pickling is more general than these examples may imply.
Because they accept any object that provides an interface compatible
with files, pickling and unpickling may be used to transfer native
Python objects to a variety of media. Using a network socket, for
instance, allows us to ship pickled Python objects across a network and
provides an alternative to larger protocols such as SOAP and
XML-RPC.

Other books

The Secret Book Club by Ann M. Martin
Racing the Devil by Jaden Terrell
Giacomo Joyce by James Joyce
Utopia by More, Sir Saint Thomas
Reasonable Doubt by Williams, Whitney Gracia
Tasmanian Devil by David Owen