Programming Python (103 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
8.9Mb size Format: txt, pdf, ePub
Python’s Internet Library Modules

If all of this
sounds horribly complex, cheer up: Python’s standard
protocol modules handle all the details. For example, the Python
library’s
ftplib
module manages
all the socket and message-level handshaking implied by the FTP
protocol. Scripts that import
ftplib
have access to a much higher-level interface for FTPing files and can be
largely ignorant of both the underlying FTP protocol and
the sockets over which it runs.
[
44
]

In fact, each supported protocol is represented in Python’s
standard library by either a module package of the same name as the
protocol or by a module file with a name of the form
xxxlib.py
, where
xxx
is
replaced by the protocol’s name. The last column in
Table 12-1
gives the module
name for some standard protocol modules. For instance, FTP is supported
by the module file
ftplib.py
and HTTP by package
http.*
. Moreover, within the protocol
modules, the top-level interface object is often the name of the
protocol. So, for instance, to start an FTP session in a Python script,
you run
import ftplib
and pass
appropriate parameters in a call to
ftplib.FTP
; for Telnet, create a
telnetlib.Telnet
instance.

In addition to the protocol implementation modules in
Table 12-1
, Python’s standard
library also contains modules for fetching replies from web servers for
a web page request (
urllib.request
),
parsing and handling data once it has been transferred over sockets or
protocols (
html.parser
, the
email.*
and
xml.*
packages), and more.
Table 12-2
lists
some of the more commonly used modules in this category.

Table 12-2. Common Internet-related standard modules

Python
modules

Utility

socket
,
ssl

Network and IPC
communications support (TCP/IP, UDP, etc.), plus
SSL secure sockets
wrapper

cgi

Server-side
CGI script support: parse input stream, escape
HTML text, and so on

urllib.request

Fetch web
pages from their addresses (URLs)

urllib.parse

Parse URL
string into components, escape URL
text

http.client
,
ftplib
,
nntplib

HTTP (web),
FTP (file transfer), and NNTP (news) client
protocol modules

http.cookies
,
http.cookiejar

HTTP cookies
support (data stored on clients by website
request, server- and client-side support)

poplib
,
imaplib
,
smtplib

POP,
IMAP (mail fetch), and SMTP (mail send) protocol
modules

telnetlib

Telnet protocol
module

html.parser
,
xml.*

Parse
web page contents (HTML and XML
documents)

xdrlib
,
socket

Encode
binary data portably for
transmission

struct, pickle

Encode
Python objects as packed binary data or serialized
byte strings for transmission

email.*

Parse and
compose email messages with headers, attachments,
and encodings

mailbox

Process on
disk mailboxes and their messages

mimetypes

Guess file
content types from names and extensions from
types

uu
,
binhex
,
base64
,
binascii
,
quopri
,
email.*

Encode and
decode binary (or other) data transmitted as text
(automatic in
email
package)

socketserver

Framework
for general Net servers

http.server

Basic
HTTP server implementation, with request handlers
for simple and CGI-aware servers

We will meet many of the modules in this table in the next few
chapters of this book, but not all of them. Moreover, there are
additional Internet modules in Python not shown here. The modules
demonstrated in this book will be representative, but as always, be sure
to see Python’s standard Library Reference Manual for more complete and
up-to-date lists and
details.

More on Protocol Standards

If you want the full story on
protocols and ports, at this writing you can find a
comprehensive list of all ports reserved for protocols or registered
as used by various common systems by searching the web pages
maintained by the Internet Engineering Task Force (IETF)
and the Internet Assigned Numbers Authority
(IANA)
. The IETF is the organization responsible for
maintaining web protocols and standards. The IANA is the central
coordinator for the assignment of unique parameter values for Internet
protocols. Another standards body, the W3 (for WWW), also maintains
relevant documents. See these web pages for more details:

http://www.ietf.org

http://www.iana.org/numbers.html

http://www.iana.org/assignments/port-numbers

http://www.w3.org

It’s not impossible that more recent repositories for standard
protocol specifications will arise during this book’s shelf life, but
the IETF website will likely be the main authority for some time to
come. If you do look, though, be warned that the details are, well,
detailed. Because Python’s protocol modules hide most of the socket
and messaging complexity documented in the protocol standards, you
usually don’t need to memorize these documents to get web work done
with Python.

[
43
]
Some books also use the term
protocol
to
refer to lower-level transport schemes such as TCP. In this book, we
use
protocol
to refer to higher-level
structures built on top of sockets; see a networking text if you are
curious about what happens at lower levels.

[
44
]
Since Python is an open source system, you can read the source
code of the
ftplib
module if you
are curious about how the underlying protocol actually works. See
the
ftplib.py
file in the standard source
library directory in your machine. Its code is complex (since it
must format messages and manage two sockets), but with the other
standard Internet protocol modules, it is a good example of
low-level socket programming.

Socket Programming

Now that we’ve seen
how sockets figure into the Internet picture, let’s move on
to explore the tools that Python provides for programming sockets with
Python scripts. This section shows you how to use the Python socket
interface to perform low-level network communications. In later chapters,
we will instead use one of the higher-level protocol modules that hide
underlying sockets. Python’s socket interfaces can be used directly,
though, to implement custom network dialogs and to access standard
protocols manually.

As previewed in
Chapter 5
, the basic
socket interface in Python is the standard library’s
socket
module. Like the
os
POSIX module, Python’s
socket
module is just a thin wrapper (interface
layer) over the underlying C library’s socket calls. Like Python files,
it’s also object-based—methods of a socket object implemented by this
module call out to the corresponding C library’s operations after data
conversions. For instance, the C library’s
send
and
recv
function calls become methods of socket objects in Python.

Python’s
socket
module supports
socket programming on any machine that supports BSD-style sockets—Windows,
Macs, Linux, Unix, and so on—and so provides a portable socket interface.
In addition, this module supports all commonly used socket
types—
TCP/IP, UDP, datagram, and Unix
domain—and can be used as both a network interface API and a general IPC
mechanism between processes running on the same machine.

From a functional perspective, sockets are a programmer’s device for
transferring bytes between programs, possibly running on different
computers. Although sockets themselves transfer only byte strings, we can
also transfer Python objects through them by using Python’s
pickle
module. Because this module converts
Python objects such as lists, dictionaries, and class instances to and
from byte strings, it provides the extra step needed to ship higher-level
objects through sockets when required.

Python’s
struct
module can also
be used to format Python objects as packed binary data byte strings for
transmission, but is generally limited in scope to objects that map to
types in the C programming language. The
pickle
module supports transmission of larger
object, such as dictionaries and class instances. For other tasks,
including most standard Internet protocols, simpler formatted byte strings
suffice. We’ll learn more about
pickle
later in this chapter and book.

Beyond basic data communication tasks, the
socket
module also includes a variety of more
advanced tools. For instance, it has calls for the following and
more:

  • Converting bytes to a standard network ordering (
    ntohl
    ,
    htonl
    )

  • Querying machine name and address
    (
    gethostname
    ,
    gethostbyname
    )

  • Wrapping socket objects in a file object interface (
    sockobj.makefile
    )

  • Making socket calls nonblocking (
    sockobj.setblocking
    )

  • Setting socket timeouts (
    sockobj.settimeout
    )

Provided your Python was compiled with Secure Sockets Layer (SSL)
support, the
ssl
standard library
module also supports encrypted transfers with its
ssl.wrap_socket
call. This call wraps a socket
object in SSL logic, which is used in turn by other standard library
modules to support the HTTPS secure website protocol (
http.client
and
urllib.request
), secure email transfers
(
poplib
and
smtplib
), and more. We’ll meet some of these
other modules later in this part of the book, but we won’t study all of
the
socket
module’s advanced features
in this text; see the Python library manual for usage details omitted
here.

Socket Basics

Although we won’t get into
advanced socket use in this chapter, basic socket
transfers are remarkably easy to code in Python. To create a connection
between machines, Python programs import the
socket
module, create a socket object, and
call the object’s methods to establish connections and send and receive
data.

Sockets are inherently bidirectional in nature, and socket object
methods map directly to socket calls in the C library. For example, the
script in
Example 12-1
implements a
program that simply listens for a connection on a socket and echoes back
over a socket whatever it receives through that socket, adding
Echo=>
string prefixes.

Example 12-1. PP4E\Internet\Sockets\echo-server.py

"""
Server side: open a TCP/IP socket on a port, listen for a message from
a client, and send an echo reply; this is a simple one-shot listen/reply
conversation per client, but it goes into an infinite loop to listen for
more clients as long as this server script runs; the client may run on
a remote machine, or on same computer if it uses 'localhost' for server
"""
from socket import * # get socket constructor and constants
myHost = '' # '' = all available interfaces on host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object
sockobj.bind((myHost, myPort)) # bind it to server port number
sockobj.listen(5) # listen, allow 5 pending connects
while True: # listen until process killed
connection, address = sockobj.accept() # wait for next client connect
print('Server connected by', address) # connection is a new socket
while True:
data = connection.recv(1024) # read next line on client socket
if not data: break # send a reply line to the client
connection.send(b'Echo=>' + data) # until eof when socket closed
connection.close()

As mentioned earlier, we usually call programs like this that
listen for incoming connections
servers
because
they provide a service that can be accessed at a given machine and port
on the Internet. Programs that connect to such a server to access its
service are generally called
clients
.
Example 12-2
shows a simple client
implemented in Python.

Example 12-2. PP4E\Internet\Sockets\echo-client.py

"""
Client side: use sockets to send data to the server, and print server's
reply to each message line; 'localhost' means that the server is running
on the same machine as the client, which lets us test client and server
on one machine; to test over the Internet, run a server on a remote
machine, and set serverHost or argv[1] to machine's domain name or IP addr;
Python sockets are a portable BSD socket interface, with object methods
for the standard socket calls available in the system's C library;
"""
import sys
from socket import * # portable socket interface plus constants
serverHost = 'localhost' # server name, or: 'starship.python.net'
serverPort = 50007 # non-reserved port used by the server
message = [b'Hello network world'] # default text to send to server
# requires bytes: b'' or str,encode()
if len(sys.argv) > 1:
serverHost = sys.argv[1] # server from cmd line arg 1
if len(sys.argv) > 2: # text from cmd line args 2..n
message = (x.encode() for x in sys.argv[2:])
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort)) # connect to server machine + port
for line in message:
sockobj.send(line) # send line to server over socket
data = sockobj.recv(1024) # receive line from server: up to 1k
print('Client received:', data) # bytes are quoted, was `x`, repr(x)
sockobj.close() # close socket to send eof to server
Server socket calls

Before we see these
programs in action, let’s take a minute to explain how
this client and server do their stuff. Both are fairly simple examples
of socket scripts, but they illustrate the common call patterns of
most socket-based programs. In fact, this is boilerplate code: most
connected socket programs generally make the same socket calls that
our two scripts do, so let’s step through the important points of
these scripts line by line.

Programs such as
Example 12-1
that provide services for
other programs with sockets generally start out by following this
sequence of calls:

sockobj = socket(AF_INET,
SOCK_STREAM)

Uses the
Python socket module to create a TCP socket
object. The names
AF_INET
and
SOCK_STREAM
are preassigned
variables defined by and imported from the socket module; using
them in combination means “create a TCP/IP socket,” the standard
communication device for the Internet. More specifically,
AF_INET
means the IP address
protocol, and
SOCK_STREAM
means the TCP transfer protocol. The
AF_INET
/
SOCK_STREAM
combination is the default
because it is so common, but it’s typical to make this
explicit.

If you use other names in this call, you can instead
create things like UDP connectionless sockets (use
SOCK_DGRAM
second) and Unix domain
sockets on the local machine (use
AF_UNIX
first), but we won’t do so in
this book. See the Python library manual for details on these
and other socket module options. Using other socket types is
mostly a matter of using different forms of boilerplate
code.

sockobj.bind((myHost,
myPort))

Associates the
socket object with an address—for IP addresses, we
pass a server machine name and port number on that machine. This
is where the server identifies the machine and port associated
with the socket. In server programs, the hostname is typically
an empty string (“”), which means the machine that the script
runs on (formally, all available local and remote interfaces on
the machine), and the port is a number outside the range 0 to
1023 (which is reserved for standard protocols, described
earlier).

Note that each unique socket dialog you support must have
its own port number; if you try to open a socket on a port
already in use, Python will raise an exception. Also notice the
nested parentheses in this call—for the
AF_INET
address protocol socket here,
we pass the host/port socket address to
bind
as a two-item tuple object (pass
a string for
AF_UNIX
).
Technically,
bind
takes a
tuple of values appropriate for the type of socket
created.

sockobj.listen(5)

Starts
listening for incoming client connections and
allows for a backlog of up to five pending requests. The value
passed sets the number of incoming client requests queued by the
operating system before new requests are denied (which happens
only if a server isn’t fast enough to process requests before
the queues fill up). A value of 5 is usually enough for most
socket-based programs; the value must be at least 1.

At this point, the server is ready to accept connection requests
from client programs running on remote machines (or the same machine)
and falls into an infinite loop—
while
True
(or the equivalent
while
1
for older Pythons and ex-C programmers)—waiting for them
to arrive:

connection, address =
sockobj.accept()

Waits for the next client connection request to occur;
when it does, the
accept
call
returns a brand-new socket object over which data can be
transferred from and to the connected client. Connections are
accepted on
sockobj
, but
communication with a client happens on
connection
, the new socket. This call
actually returns a two-item tuple—
address
is the connecting client’s
Internet address. We can call
accept
more than one time, to service
multiple client connections; that’s why each call returns a new,
distinct socket for talking to a particular client.

Once we have a client connection, we fall into another loop to
receive data from the client in blocks of up to 1,024 bytes at a time,
and echo each block back to the client:

data =
connection.recv(1024)

Reads at most 1,024 more bytes of the next message sent
from a client (i.e., coming across the network or IPC
connection), and returns it to the script as a byte string. We
get back an empty byte string when the client has
finished—end-of-file is triggered when the client closes its end
of the socket.

connection.send(b'Echo=>' +
data)

Sends the latest byte string data block back to the client
program, prepending the string
'Echo=>'
to it first. The client
program can then
recv
what we
send
here—the next reply
line. Technically this call sends as much data as possible, and
returns the number of bytes actually sent. To be fully robust,
some programs may need to resend unsent portions or use
connection.sendall
to force all bytes
to be sent.

connection.close()

Shuts down the connection with this particular
client.

Transferring byte strings and objects

So far we’ve seen
calls used to transfer data in a server, but what is it
that is actually shipped through a socket? As we learned in
Chapter 5
, sockets by themselves always deal
in binary
byte strings
, not text. To your
scripts, this means you must send and will receive
bytes
strings, not
str
, though you can convert to and from text
as needed with
bytes.decode
and
str.encode
methods. In our scripts,
we use
b'...' bytes
literals to
satisfy socket data requirements. In other contexts, tools such as the
struct
and
pickle
modules return the byte strings we
need automatically, so no extra steps are needed.

For example, although the socket model is limited to
transferring byte strings, you can send and receive nearly arbitrary
Python
objects
with the standard library
pickle
object serialization module. Its
dumps
and
loads
calls convert Python objects to and
from byte strings, ready for direct socket transfer:

>>>
import pickle
>>>
x = pickle.dumps([99, 100])
# on sending end... convert to byte strings
>>>
x
# string passed to send, returned by recv
b'\x80\x03]q\x00(KcKde.'
>>>
pickle.loads(x)
# on receiving end... convert back to object
[99, 100]

For simpler types that correspond to those in the C language,
the
struct
module provides the
byte-string conversion we need as well:

>>>
import struct
>>>
x = struct.pack('>ii', 99, 100)
# convert simpler types for transmission
>>>
x
b'\x00\x00\x00c\x00\x00\x00d'
>>>
struct.unpack('>ii', x)
(99, 100)

When converted this way, Python native objects become candidates
for socket-based transfers. See
Chapter 4
for more on
struct
. We previewed
pickle
and object serialization in
Chapter 1
, but we’ll learn more about it and its
few pickleability constraints when we explore data persistence in
Chapter 17
.

In fact there are a variety of ways to extend the basic socket
transfer model. For instance, much like
os.fdopen
and
open
for the file descriptors we studied in
Chapter 4
, the
socket.makefile
method allows you to wrap
sockets in text-mode file objects that handle text encodings for you
automatically. This call also allows you to specify nondefault Unicode
encodings and end-line behaviors in text mode with extra arguments in
3.X just like the
open
built-in
function. Because its result mimics file interfaces, the
socket.makefile
call additionally allows the
pickle
module’s file-based calls to
transfer objects over sockets implicitly. We’ll see more on socket
file wrappers later in this chapter.

For our simpler scripts here, hardcoded byte strings and direct
socket calls do the job. After talking with a given connected client,
the server in
Example 12-1
goes
back to its infinite loop and waits for the next client connection
request. Let’s move on to see what happened on the other side of the
fence.

Client socket calls

The actual socket-related
calls in client programs like the one shown in
Example 12-2
are even simpler; in
fact, half of that script is preparation logic. The main thing to keep
in mind is that the client and server must specify the same port
number when opening their sockets and the client must identify the
machine on which the server is running; in our scripts, server and
client agree to use port number 50007 for their conversation, outside
the standard protocol range. Here are the client’s socket
calls:

sockobj = socket(AF_INET,
SOCK_STREAM)

Creates a Python socket object in the client program, just
like the server.

sockobj.connect((serverHost,
serverPort))

Opens a
connection to the machine and port on which the
server program is listening for client connections. This is
where the client specifies the string name of the service to be
contacted. In the client, we can either specify the name of the
remote machine as a domain name (e.g.,
starship.python.net
) or numeric IP address.
We can also give the server name as
localhost
(or the equivalent IP
address
127.0.0.1
) to specify
that the server program is running on the same machine as the
client; that comes in handy for debugging servers without having
to connect to the Net. And again, the client’s port number must
match the server’s exactly. Note the nested parentheses
again—just as in server
bind
calls, we really pass the server’s host/port address to
connect
in a tuple object.

Once the client establishes a connection to the server, it falls
into a loop, sending a message one line at a time and printing
whatever the server sends back after each line is sent:

sockobj.send(line)

Transfers
the next byte-string message line to the server
over the socket. Notice that the default list of lines contains
bytes
strings (
b'...'
). Just as on the server, data
passed through the socket must be a byte string, though it can
be the result of a manual
str.encode
encoding call or an object
conversion with
pickle
or
struct
if desired. When lines
to be sent are given as command-line arguments instead, they
must be converted from
str
to
bytes
; the client arranges
this by encoding in a generator expression (a call
map(str.encode, sys.argv[2:])
would
have the same effect).

data =
sockobj.recv(1024)

Reads the next reply line sent by the server program.
Technically, this reads up to 1,024 bytes of the next reply
message and returns it as a byte string.

sockobj.close()

Closes the
connection with the server, sending it the
end-of-file signal.

And that’s it. The server exchanges one or more lines of text
with each client that connects. The operating system takes care of
locating remote machines, routing bytes sent between programs and
possibly across the Internet, and (with TCP) making sure that our
messages arrive intact. That involves a lot of processing, too—our
strings may ultimately travel around the world, crossing phone wires,
satellite links, and more along the way. But we can be happily
ignorant of what goes on beneath the socket call layer when
programming in
Python.

Other books

Searching for Shona by Anderson, Margaret J.
The Pink and the Grey by Anthony Camber
The Great Gatsby by Francis Scott Fitzgerald
Warning Track by Meghan Quinn
Brutally Beautiful by Christine Zolendz
Griffin of Darkwood by Becky Citra
Wolf Tales V by Kate Douglas