Programming Python (111 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
2.12Mb size Format: txt, pdf, ePub

[
47
]
We’ll see three more
getfile
programs before we leave Internet
scripting. The next chapter’s
getfile.py
fetches a file with the higher-level FTP interface instead of using
raw socket calls, and its
http-getfile
scripts
fetch files over the HTTP protocol. Later,
Chapter 15
presents a server-side
getfile.py
CGI script that transfers file
contents over the HTTP port in response to a request made in a web
browser client (files are sent as the output of a CGI script). All
four of the download schemes presented in this text ultimately use
sockets, but only the version here makes that use explicit.

Chapter 13. Client-Side Scripting
“Socket to Me!”

The preceding chapter introduced Internet fundamentals and explored
sockets—the underlying communications mechanism over which bytes flow on
the Net. In this chapter, we climb the encapsulation hierarchy one level
and shift our focus to Python tools that support the client-side
interfaces of common Internet protocols.

We talked about the Internet’s higher-level protocols in the
abstract at the start of the preceding chapter, and you should probably
review that material if you skipped over it the first time around. In
short, protocols define the structure of the conversations that take place
to accomplish most of the Internet tasks we’re all familiar with—reading
email, transferring files by FTP, fetching web pages, and so on.

At the most basic level, all of these protocol dialogs happen over
sockets using fixed and standard message structures and ports, so in some
sense this chapter builds upon the last. But as we’ll see, Python’s
protocol modules hide most of the underlying
details—
scripts generally need to deal only
with simple objects and methods, and Python automates the socket and
messaging logic required by the protocol.

In this chapter, we’ll concentrate on the FTP and email protocol
modules in Python, and we’ll peek at a few others along the way (NNTP
news, HTTP web pages, and so on). Because it is so prevalent, we will
especially focus on email in much of this chapter, as well as in the two
to follow—we’ll use tools and techniques introduced here in the larger
PyMailGUI and PyMailCGI client and server-side programs of
Chapters
14
and
16
.

All of the tools employed in examples here are in the standard
Python library and come with the Python system. All of the examples here
are also designed to run on the client side of a network connection—these
scripts connect to an already running server to request interaction and
can be run from a basic PC or other client device (they require only a
server to converse with). And as usual, all the code here is also designed
to teach us something about Python programming in general—we’ll refactor
FTP examples and package email code to show object-oriented programming
(OOP) in action.

In the next chapter, we’ll look at a complete client-side program
example before moving on to explore scripts designed to be run on the
server side instead. Python programs can also produce pages on a web
server, and there is support in the Python world for implementing the
server side of things like HTTP, email, and FTP. For now,
let’s focus on the client.
[
48
]

[
48
]
There is also support in the Python world for other technologies
that some might classify as “client-side scripting,” too, such as
Jython/Java applets; XML-RPC and SOAP web services; and Rich Internet
Application tools like Flex, Silverlight, pyjamas, and AJAX. These
were all introduced early in
Chapter 12
.
Such tools are generally bound up with the notion of web-based
interactions—they either extend the functionality of a web browser
running on a client machine, or simplify web server access in clients.
We’ll study browser-based techniques in Chapters
15
and
16
; here, client-side scripting means
the client side of common Internet protocols such as FTP and email,
independent of the Web or web browsers. At the bottom, web browsers
are really just desktop GUI applications that make use of client-side
protocols, including those we’ll study here, such as HTTP and FTP. See
Chapter 12
as well as the end of this
chapter for more on other client-side techniques.

FTP: Transferring Files over the Net

As we saw in the
preceding chapter, sockets see plenty of action on the Net.
For instance, the last chapter’s
getfile
example allowed us to transfer entire
files between machines. In practice, though, higher-level protocols are
behind much of what happens on the Net. Protocols run on top of sockets,
but they hide much of the complexity of the network scripting examples of
the prior chapter.

FTP—the File Transfer Protocol—is one of the more commonly used
Internet protocols. It defines a higher-level conversation model that is
based on exchanging command strings and file contents over sockets. By
using FTP, we can accomplish the same task as the prior chapter’s
getfile
script, but the interface is simpler,
standard and more general—FTP lets us ask for files from any server
machine that supports FTP, without requiring that it run our custom
getfile
script
. FTP also supports more advanced operations such as
uploading files to the server, getting remote directory listings, and
more.

Really, FTP runs on top of two sockets: one for passing control
commands between client and server (port 21), and another for transferring
bytes. By using a two-socket model, FTP avoids the possibility of
deadlocks (i.e., transfers on the data socket do not block dialogs on the
control socket). Ultimately, though, Python’s
ftplib
support module allows us to upload and
download files at a remote server machine by FTP, without dealing in raw
socket calls or FTP protocol details.

Transferring Files with ftplib

Because the
Python FTP interface is so easy to use, let’s jump right
into a realistic example. The script in
Example 13-1
automatically fetches
(a.k.a. “downloads”) and opens a remote file with Python. More
specifically, this Python script does the
following
:

  1. Downloads an image file (by default) from a remote FTP
    site

  2. Opens the downloaded file with a utility we wrote in
    Example 6-23
    , in
    Chapter 6

The download portion will run on any machine with Python and an
Internet connection, though you’ll probably want to change the script’s
settings so it accesses a server and file of your own. The opening part
works if your
playfile.py
supports your platform; see
Chapter 6
for details, and change as
needed.

Example 13-1. PP4E\Internet\Ftp\getone.py

#!/usr/local/bin/python
"""
A Python script to download and play a media file by FTP. Uses ftplib, the ftp
protocol handler which uses sockets. Ftp runs on 2 sockets (one for data, one
for control--on ports 20 and 21) and imposes message text formats, but Python's
ftplib module hides most of this protocol's details. Change for your site/file.
"""
import os, sys
from getpass import getpass # hidden password input
from ftplib import FTP # socket-based FTP tools
nonpassive = False # force active mode FTP for server?
filename = 'monkeys.jpg' # file to be downloaded
dirname = '.' # remote directory to fetch from
sitename = 'ftp.rmi.net' # FTP site to contact
userinfo = ('lutz', getpass('Pswd?')) # use () for anonymous
if len(sys.argv) > 1: filename = sys.argv[1] # filename on command line?
print('Connecting...')
connection = FTP(sitename) # connect to FTP site
connection.login(*userinfo) # default is anonymous login
connection.cwd(dirname) # xfer 1k at a time to localfile
if nonpassive: # force active FTP if server requires
connection.set_pasv(False)
print('Downloading...')
localfile = open(filename, 'wb') # local file to store download
connection.retrbinary('RETR ' + filename, localfile.write, 1024)
connection.quit()
localfile.close()
if input('Open file?') in ['Y', 'y']:
from PP4E.System.Media.playfile import playfile
playfile(filename)

Most of the FTP protocol details are encapsulated by the Python
ftplib
module imported here. This
script uses some of the simplest interfaces in
ftplib
(we’ll see others later in this chapter),
but they are representative of the module in general.

To open a connection to a remote (or local) FTP server, create an
instance of the
ftplib.FTP
object,
passing in the string name (domain or IP style) of the machine you wish to
connect to:

connection = FTP(sitename)                  # connect to ftp site

Assuming this call doesn’t throw an exception, the resulting FTP
object exports methods that correspond to the usual FTP operations. In
fact, Python scripts act much like typical FTP client programs—just
replace commands you would normally type or select with method
calls:

connection.login(*userinfo)                 # default is anonymous login
connection.cwd(dirname) # xfer 1k at a time to localfile

Once connected, we log in and change to the remote directory from
which we want to fetch a file. The
login
method allows us to pass in a username and
password as additional optional arguments to specify an account login; by
default, it performs anonymous FTP. Notice the use of the
nonpassive
flag in this script:

if nonpassive:                              # force active FTP if server requires
connection.set_pasv(False)

If this flag is set to
True
, the
script will transfer the file in active FTP mode rather than the default
passive mode. We’ll finesse the details of the difference here (it has to
do with which end of the dialog chooses port numbers for the transfer),
but if you have trouble doing transfers with any of the FTP scripts in
this chapter, try using active mode as a first step. In Python 2.1 and
later, passive FTP mode is on by default. Now, open a local file to
receive the file’s content, and fetch the file:

localfile = open(filename, 'wb')
connection.retrbinary('RETR ' + filename, localfile.write, 1024)

Once we’re in the target remote directory, we simply call the
retrbinary
method to download the
target server file in binary mode. The
retrbinary
call will take a while to complete, since it must download a
big file. It gets three arguments:

  • An FTP command string; here, the string
    RETR
    filename
    ,
    which is the standard format for FTP retrievals.

  • A function or method to which Python passes each chunk of the
    downloaded file’s bytes; here, the
    write
    method of a newly created and opened
    local file.

  • A size for those chunks of bytes; here, 1,024 bytes are
    downloaded at a time, but the default is reasonable if this argument
    is omitted.

Because this script creates a local file named
localfile
of the same name as the remote file
being fetched, and passes its
write
method to the FTP retrieval method, the remote file’s contents will
automatically appear in a local, client-side file after the download is
finished.

Observe how this file is opened in
wb
binary output mode. If this script is run on
Windows we want to avoid automatically expanding any
\n
bytes into
\r\n
byte sequences; as we saw in
Chapter 4
, this happens automatically on
Windows when writing files opened in
w
text mode. We also want to avoid Unicode issues in Python 3.X—as we also
saw in
Chapter 4
, strings are encoded
when written in text mode and this isn’t appropriate for binary data such
as images. A text-mode file would also not allow for the
bytes
strings passed to
write
by the FTP library’s
retrbinary
in any event, so
rb
is effectively required here (more on output
file modes later).

Finally, we call the FTP
quit
method to break the connection with the server and manually
close
the local file to force it to be complete
before it is further processed (it’s not impossible that parts of the file
are still held in buffers before the
close
call):

connection.quit()
localfile.close()

And that’s all there is to it—all the FTP, socket, and networking
details are hidden behind the
ftplib
interface module. Here is this script in action on a Windows 7 machine;
after the download, the image file pops up in a Windows picture viewer on
my laptop, as captured in
Figure 13-1
. Change the server
and file assignments in this script to test on your own, and be sure your
PYTHONPATH
environment variable
includes the
PP4E
root’s container,
as we’re importing across directories on the examples tree here:

C:\...\PP4E\Internet\Ftp>
python getone.py
Pswd?
Connecting...
Downloading...
Open file?y

Figure 13-1. Image file downloaded by FTP and opened locally

Notice how the standard Python
getpass.getpass
is used
to ask for an FTP password. Like the
input
built-in function, this call prompts for
and reads a line of text from the console user; unlike
input
,
getpass
does not echo typed characters on the
screen at all (see the
moreplus
stream
redirection example of
Chapter 3
for
related tools). This is handy for protecting things like passwords from
potentially prying eyes. Be careful, though—after issuing a warning, the
IDLE GUI echoes the password anyhow!

The main thing to notice is that this otherwise typical Python
script fetches information from an arbitrarily remote FTP site and
machine. Given an Internet link, any information published by an FTP
server on the Net can be fetched by and incorporated into Python scripts
using interfaces such as these.

Using urllib to Download Files

In fact, FTP is just one way to
transfer information across the Net, and there are more
general tools in the Python library to accomplish the prior script’s
download. Perhaps the most straightforward is the Python
urllib.request
module
: given an Internet address string—a URL, or Universal
Resource Locator—this module opens a connection to the specified server
and returns a file-like object ready to be read with normal file object
method calls (e.g.,
read
,
readline
).

We can use such a higher-level interface to download anything with
an address on the Web—files published by FTP sites (using URLs that
start with
ftp://
); web pages and output of scripts
that live on remote servers (using
http://
URLs);
and even local files (using
file://
URLs). For
instance, the script in
Example 13-2
does the same as the one
in
Example 13-1
, but it uses
the general
urllib.request
module to
fetch the source distribution file, instead of the protocol-specific
ftplib
.

Example 13-2. PP4E\Internet\Ftp\getone-urllib.py

#!/usr/local/bin/python
"""
A Python script to download a file by FTP by its URL string; use higher-level
urllib instead of ftplib to fetch file; urllib supports FTP, HTTP, client-side
HTTPS, and local files, and handles proxies, redirects, cookies, and more;
urllib also allows downloads of html pages, images, text, etc.; see also
Python html/xml parsers for web pages fetched by urllib in Chapter 19;
"""
import os, getpass
from urllib.request import urlopen # socket-based web tools
filename = 'monkeys.jpg' # remote/local filename
password = getpass.getpass('Pswd?')
remoteaddr = 'ftp://lutz:%[email protected]/%s;type=i' % (password, filename)
print('Downloading', remoteaddr)
# this works too:
# urllib.request.urlretrieve(remoteaddr, filename)
remotefile = urlopen(remoteaddr) # returns input file-like object
localfile = open(filename, 'wb') # where to store data locally
localfile.write(remotefile.read())
localfile.close()
remotefile.close()

Note how we use a binary mode output file again;
urllib
fetches return byte strings, even for
HTTP web pages. Don’t sweat the details of the URL string used here; it
is fairly complex, and we’ll explain its structure and that of URLs in
general in
Chapter 15
. We’ll also use
urllib
again in this and later
chapters to fetch web pages, format generated URL strings, and get the
output of remote scripts on the Web.

Technically speaking,
urllib.request
supports a variety of Internet
protocols (HTTP, FTP, and local files). Unlike
ftplib
,
urllib.request
is generally used for reading
remote objects, not for writing or uploading them (though the HTTP and
FTP protocols support file uploads too). As with
ftplib
, retrievals must generally be run in
threads if blocking is a concern. But the basic interface shown in this
script is straightforward. The call:

remotefile = urllib.request.urlopen(remoteaddr)  # returns input file-like object

contacts the server named in the
remoteaddr
URL string and returns a file-like
object connected to its download stream (here, an FTP-based socket).
Calling this file’s
read
method pulls
down the file’s contents, which are written to a local client-side file.
An even simpler interface:

urllib.request.urlretrieve(remoteaddr, filename)

also does the work of opening a local file and writing the
downloaded bytes into it—things we do manually in the script as coded.
This comes in handy if we want to download a file, but it is less useful
if we want to process its data immediately.

Either way, the end result is the same: the desired server file
shows up on the client machine. The output is similar to the original
version, but we don’t try to automatically open this time (I’ve changed
the password in the URL here to protect the innocent):

C:\...\PP4E\Internet\Ftp>
getone-urllib.py
Pswd?
Downloading ftp://lutz:[email protected]/monkeys.jpg;type=i
C:\...\PP4E\Internet\Ftp>
fc monkeys.jpg test\monkeys.jpg
FC: no differences encountered
C:\...\PP4E\Internet\Ftp>
start monkeys.jpg

For more
urllib
download
examples, see the section on HTTP later in this chapter, and the
server-side examples in
Chapter 15
. As
we’ll see in
Chapter 15
, in bigger terms,
tools like the
urllib.request urlopen
function allow scripts to both download remote files and invoke programs
that are located on a remote server machine, and so serves as a useful
tool for testing and using web sites in Python scripts. In
Chapter 15
, we’ll also see that
urllib.parse
includes tools for formatting
(escaping) URL strings for safe
transmission.

Other books

Adventurous Kate by W C AURORA
The Bodies We Wear by Jeyn Roberts
Switch by Janelle Stalder
KooKooLand by Gloria Norris
Firebird by Helaine Mario
Walk like a Man by Robert J. Wiersema