If all of this
sounds horribly complex, cheer up: Python’s standard
protocol modules handle all the details. For example, the Python
library’sftplib
module manages
all the socket and message-level handshaking implied by the FTP
protocol. Scripts that importftplib
have access to a much higher-level interface for FTPing files and can be
largely ignorant of both the underlying FTP protocol and
the sockets over which it runs.
[
44
]
In fact, each supported protocol is represented in Python’s
standard library by either a module package of the same name as the
protocol or by a module file with a name of the form
xxxlib.py
, where
xxx
is
replaced by the protocol’s name. The last column in
Table 12-1
gives the module
name for some standard protocol modules. For instance, FTP is supported
by the module file
ftplib.py
and HTTP by packagehttp.*
. Moreover, within the protocol
modules, the top-level interface object is often the name of the
protocol. So, for instance, to start an FTP session in a Python script,
you runimport ftplib
and pass
appropriate parameters in a call toftplib.FTP
; for Telnet, create atelnetlib.Telnet
instance.
In addition to the protocol implementation modules in
Table 12-1
, Python’s standard
library also contains modules for fetching replies from web servers for
a web page request (urllib.request
),
parsing and handling data once it has been transferred over sockets or
protocols (html.parser
, theemail.*
andxml.*
packages), and more.
Table 12-2
lists
some of the more commonly used modules in this category.
Table 12-2. Common Internet-related standard modules
Python | Utility |
---|---|
| Network and IPC |
| Server-side |
| Fetch web |
| Parse URL |
| HTTP (web), |
| HTTP cookies |
| POP, |
| Telnet protocol |
| Parse |
| Encode |
| Encode |
| Parse and |
| Process on |
| Guess file |
| Encode and |
| Framework |
| Basic |
We will meet many of the modules in this table in the next few
chapters of this book, but not all of them. Moreover, there are
additional Internet modules in Python not shown here. The modules
demonstrated in this book will be representative, but as always, be sure
to see Python’s standard Library Reference Manual for more complete and
up-to-date lists and
details.
More on Protocol Standards
If you want the full story on
protocols and ports, at this writing you can find a
comprehensive list of all ports reserved for protocols or registered
as used by various common systems by searching the web pages
maintained by the Internet Engineering Task Force (IETF)
and the Internet Assigned Numbers Authority
(IANA)
. The IETF is the organization responsible for
maintaining web protocols and standards. The IANA is the central
coordinator for the assignment of unique parameter values for Internet
protocols. Another standards body, the W3 (for WWW), also maintains
relevant documents. See these web pages for more details:
http://www.iana.org/numbers.html
http://www.iana.org/assignments/port-numbers
It’s not impossible that more recent repositories for standard
protocol specifications will arise during this book’s shelf life, but
the IETF website will likely be the main authority for some time to
come. If you do look, though, be warned that the details are, well,
detailed. Because Python’s protocol modules hide most of the socket
and messaging complexity documented in the protocol standards, you
usually don’t need to memorize these documents to get web work done
with Python.
[
43
]
Some books also use the term
protocol
to
refer to lower-level transport schemes such as TCP. In this book, we
use
protocol
to refer to higher-level
structures built on top of sockets; see a networking text if you are
curious about what happens at lower levels.
[
44
]
Since Python is an open source system, you can read the source
code of theftplib
module if you
are curious about how the underlying protocol actually works. See
the
ftplib.py
file in the standard source
library directory in your machine. Its code is complex (since it
must format messages and manage two sockets), but with the other
standard Internet protocol modules, it is a good example of
low-level socket programming.
Now that we’ve seen
how sockets figure into the Internet picture, let’s move on
to explore the tools that Python provides for programming sockets with
Python scripts. This section shows you how to use the Python socket
interface to perform low-level network communications. In later chapters,
we will instead use one of the higher-level protocol modules that hide
underlying sockets. Python’s socket interfaces can be used directly,
though, to implement custom network dialogs and to access standard
protocols manually.
As previewed in
Chapter 5
, the basic
socket interface in Python is the standard library’ssocket
module. Like theos
POSIX module, Python’ssocket
module is just a thin wrapper (interface
layer) over the underlying C library’s socket calls. Like Python files,
it’s also object-based—methods of a socket object implemented by this
module call out to the corresponding C library’s operations after data
conversions. For instance, the C library’ssend
andrecv
function calls become methods of socket objects in Python.
Python’ssocket
module supports
socket programming on any machine that supports BSD-style sockets—Windows,
Macs, Linux, Unix, and so on—and so provides a portable socket interface.
In addition, this module supports all commonly used socket
types—
TCP/IP, UDP, datagram, and Unix
domain—and can be used as both a network interface API and a general IPC
mechanism between processes running on the same machine.
From a functional perspective, sockets are a programmer’s device for
transferring bytes between programs, possibly running on different
computers. Although sockets themselves transfer only byte strings, we can
also transfer Python objects through them by using Python’spickle
module. Because this module converts
Python objects such as lists, dictionaries, and class instances to and
from byte strings, it provides the extra step needed to ship higher-level
objects through sockets when required.
Python’sstruct
module can also
be used to format Python objects as packed binary data byte strings for
transmission, but is generally limited in scope to objects that map to
types in the C programming language. Thepickle
module supports transmission of larger
object, such as dictionaries and class instances. For other tasks,
including most standard Internet protocols, simpler formatted byte strings
suffice. We’ll learn more aboutpickle
later in this chapter and book.
Beyond basic data communication tasks, thesocket
module also includes a variety of more
advanced tools. For instance, it has calls for the following and
more:
Converting bytes to a standard network ordering (ntohl
,htonl
)
Querying machine name and address
(gethostname
,gethostbyname
)
Wrapping socket objects in a file object interface (sockobj.makefile
)
Making socket calls nonblocking (sockobj.setblocking
)
Setting socket timeouts (sockobj.settimeout
)
Provided your Python was compiled with Secure Sockets Layer (SSL)
support, thessl
standard library
module also supports encrypted transfers with itsssl.wrap_socket
call. This call wraps a socket
object in SSL logic, which is used in turn by other standard library
modules to support the HTTPS secure website protocol (http.client
andurllib.request
), secure email transfers
(poplib
andsmtplib
), and more. We’ll meet some of these
other modules later in this part of the book, but we won’t study all of
thesocket
module’s advanced features
in this text; see the Python library manual for usage details omitted
here.
Although we won’t get into
advanced socket use in this chapter, basic socket
transfers are remarkably easy to code in Python. To create a connection
between machines, Python programs import thesocket
module, create a socket object, and
call the object’s methods to establish connections and send and receive
data.
Sockets are inherently bidirectional in nature, and socket object
methods map directly to socket calls in the C library. For example, the
script in
Example 12-1
implements a
program that simply listens for a connection on a socket and echoes back
over a socket whatever it receives through that socket, addingEcho=>
string prefixes.
Example 12-1. PP4E\Internet\Sockets\echo-server.py
"""
Server side: open a TCP/IP socket on a port, listen for a message from
a client, and send an echo reply; this is a simple one-shot listen/reply
conversation per client, but it goes into an infinite loop to listen for
more clients as long as this server script runs; the client may run on
a remote machine, or on same computer if it uses 'localhost' for server
"""
from socket import * # get socket constructor and constants
myHost = '' # '' = all available interfaces on host
myPort = 50007 # listen on a non-reserved port number
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP socket object
sockobj.bind((myHost, myPort)) # bind it to server port number
sockobj.listen(5) # listen, allow 5 pending connects
while True: # listen until process killed
connection, address = sockobj.accept() # wait for next client connect
print('Server connected by', address) # connection is a new socket
while True:
data = connection.recv(1024) # read next line on client socket
if not data: break # send a reply line to the client
connection.send(b'Echo=>' + data) # until eof when socket closed
connection.close()
As mentioned earlier, we usually call programs like this that
listen for incoming connections
servers
because
they provide a service that can be accessed at a given machine and port
on the Internet. Programs that connect to such a server to access its
service are generally called
clients
.
Example 12-2
shows a simple client
implemented in Python.
Example 12-2. PP4E\Internet\Sockets\echo-client.py
"""
Client side: use sockets to send data to the server, and print server's
reply to each message line; 'localhost' means that the server is running
on the same machine as the client, which lets us test client and server
on one machine; to test over the Internet, run a server on a remote
machine, and set serverHost or argv[1] to machine's domain name or IP addr;
Python sockets are a portable BSD socket interface, with object methods
for the standard socket calls available in the system's C library;
"""
import sys
from socket import * # portable socket interface plus constants
serverHost = 'localhost' # server name, or: 'starship.python.net'
serverPort = 50007 # non-reserved port used by the server
message = [b'Hello network world'] # default text to send to server
# requires bytes: b'' or str,encode()
if len(sys.argv) > 1:
serverHost = sys.argv[1] # server from cmd line arg 1
if len(sys.argv) > 2: # text from cmd line args 2..n
message = (x.encode() for x in sys.argv[2:])
sockobj = socket(AF_INET, SOCK_STREAM) # make a TCP/IP socket object
sockobj.connect((serverHost, serverPort)) # connect to server machine + port
for line in message:
sockobj.send(line) # send line to server over socket
data = sockobj.recv(1024) # receive line from server: up to 1k
print('Client received:', data) # bytes are quoted, was `x`, repr(x)
sockobj.close() # close socket to send eof to server
Before we see these
programs in action, let’s take a minute to explain how
this client and server do their stuff. Both are fairly simple examples
of socket scripts, but they illustrate the common call patterns of
most socket-based programs. In fact, this is boilerplate code: most
connected socket programs generally make the same socket calls that
our two scripts do, so let’s step through the important points of
these scripts line by line.
Programs such as
Example 12-1
that provide services for
other programs with sockets generally start out by following this
sequence of calls:
sockobj = socket(AF_INET,
SOCK_STREAM)
Uses the
Python socket module to create a TCP socket
object. The namesAF_INET
andSOCK_STREAM
are preassigned
variables defined by and imported from the socket module; using
them in combination means “create a TCP/IP socket,” the standard
communication device for the Internet. More specifically,AF_INET
means the IP address
protocol, andSOCK_STREAM
means the TCP transfer protocol. TheAF_INET
/SOCK_STREAM
combination is the default
because it is so common, but it’s typical to make this
explicit.
If you use other names in this call, you can instead
create things like UDP connectionless sockets (useSOCK_DGRAM
second) and Unix domain
sockets on the local machine (useAF_UNIX
first), but we won’t do so in
this book. See the Python library manual for details on these
and other socket module options. Using other socket types is
mostly a matter of using different forms of boilerplate
code.
sockobj.bind((myHost,
myPort))
Associates the
socket object with an address—for IP addresses, we
pass a server machine name and port number on that machine. This
is where the server identifies the machine and port associated
with the socket. In server programs, the hostname is typically
an empty string (“”), which means the machine that the script
runs on (formally, all available local and remote interfaces on
the machine), and the port is a number outside the range 0 to
1023 (which is reserved for standard protocols, described
earlier).
Note that each unique socket dialog you support must have
its own port number; if you try to open a socket on a port
already in use, Python will raise an exception. Also notice the
nested parentheses in this call—for theAF_INET
address protocol socket here,
we pass the host/port socket address tobind
as a two-item tuple object (pass
a string forAF_UNIX
).
Technically,bind
takes a
tuple of values appropriate for the type of socket
created.
sockobj.listen(5)
Starts
listening for incoming client connections and
allows for a backlog of up to five pending requests. The value
passed sets the number of incoming client requests queued by the
operating system before new requests are denied (which happens
only if a server isn’t fast enough to process requests before
the queues fill up). A value of 5 is usually enough for most
socket-based programs; the value must be at least 1.
At this point, the server is ready to accept connection requests
from client programs running on remote machines (or the same machine)
and falls into an infinite loop—while
(or the equivalent
Truewhile
for older Pythons and ex-C programmers)—waiting for them
1
to arrive:
connection, address =
sockobj.accept()
Waits for the next client connection request to occur;
when it does, theaccept
call
returns a brand-new socket object over which data can be
transferred from and to the connected client. Connections are
accepted onsockobj
, but
communication with a client happens onconnection
, the new socket. This call
actually returns a two-item tuple—address
is the connecting client’s
Internet address. We can callaccept
more than one time, to service
multiple client connections; that’s why each call returns a new,
distinct socket for talking to a particular client.
Once we have a client connection, we fall into another loop to
receive data from the client in blocks of up to 1,024 bytes at a time,
and echo each block back to the client:
data =
connection.recv(1024)
Reads at most 1,024 more bytes of the next message sent
from a client (i.e., coming across the network or IPC
connection), and returns it to the script as a byte string. We
get back an empty byte string when the client has
finished—end-of-file is triggered when the client closes its end
of the socket.
connection.send(b'Echo=>' +
data)
Sends the latest byte string data block back to the client
program, prepending the string'Echo=>'
to it first. The client
program can thenrecv
what wesend
here—the next reply
line. Technically this call sends as much data as possible, and
returns the number of bytes actually sent. To be fully robust,
some programs may need to resend unsent portions or useconnection.sendall
to force all bytes
to be sent.
connection.close()
Shuts down the connection with this particular
client.
So far we’ve seen
calls used to transfer data in a server, but what is it
that is actually shipped through a socket? As we learned in
Chapter 5
, sockets by themselves always deal
in binary
byte strings
, not text. To your
scripts, this means you must send and will receivebytes
strings, notstr
, though you can convert to and from text
as needed withbytes.decode
andstr.encode
methods. In our scripts,
we useb'...' bytes
literals to
satisfy socket data requirements. In other contexts, tools such as thestruct
andpickle
modules return the byte strings we
need automatically, so no extra steps are needed.
For example, although the socket model is limited to
transferring byte strings, you can send and receive nearly arbitrary
Python
objects
with the standard librarypickle
object serialization module. Itsdumps
andloads
calls convert Python objects to and
from byte strings, ready for direct socket transfer:
>>>import pickle
>>>x = pickle.dumps([99, 100])
# on sending end... convert to byte strings
>>>x
# string passed to send, returned by recv
b'\x80\x03]q\x00(KcKde.'
>>>pickle.loads(x)
# on receiving end... convert back to object
[99, 100]
For simpler types that correspond to those in the C language,
thestruct
module provides the
byte-string conversion we need as well:
>>>import struct
>>>x = struct.pack('>ii', 99, 100)
# convert simpler types for transmission
>>>x
b'\x00\x00\x00c\x00\x00\x00d'
>>>struct.unpack('>ii', x)
(99, 100)
When converted this way, Python native objects become candidates
for socket-based transfers. See
Chapter 4
for more onstruct
. We previewedpickle
and object serialization in
Chapter 1
, but we’ll learn more about it and its
few pickleability constraints when we explore data persistence in
Chapter 17
.
In fact there are a variety of ways to extend the basic socket
transfer model. For instance, much likeos.fdopen
andopen
for the file descriptors we studied in
Chapter 4
, thesocket.makefile
method allows you to wrap
sockets in text-mode file objects that handle text encodings for you
automatically. This call also allows you to specify nondefault Unicode
encodings and end-line behaviors in text mode with extra arguments in
3.X just like theopen
built-in
function. Because its result mimics file interfaces, thesocket.makefile
call additionally allows thepickle
module’s file-based calls to
transfer objects over sockets implicitly. We’ll see more on socket
file wrappers later in this chapter.
For our simpler scripts here, hardcoded byte strings and direct
socket calls do the job. After talking with a given connected client,
the server in
Example 12-1
goes
back to its infinite loop and waits for the next client connection
request. Let’s move on to see what happened on the other side of the
fence.
The actual socket-related
calls in client programs like the one shown in
Example 12-2
are even simpler; in
fact, half of that script is preparation logic. The main thing to keep
in mind is that the client and server must specify the same port
number when opening their sockets and the client must identify the
machine on which the server is running; in our scripts, server and
client agree to use port number 50007 for their conversation, outside
the standard protocol range. Here are the client’s socket
calls:
sockobj = socket(AF_INET,
SOCK_STREAM)
Creates a Python socket object in the client program, just
like the server.
sockobj.connect((serverHost,
serverPort))
Opens a
connection to the machine and port on which the
server program is listening for client connections. This is
where the client specifies the string name of the service to be
contacted. In the client, we can either specify the name of the
remote machine as a domain name (e.g.,
starship.python.net
) or numeric IP address.
We can also give the server name aslocalhost
(or the equivalent IP
address127.0.0.1
) to specify
that the server program is running on the same machine as the
client; that comes in handy for debugging servers without having
to connect to the Net. And again, the client’s port number must
match the server’s exactly. Note the nested parentheses
again—just as in serverbind
calls, we really pass the server’s host/port address toconnect
in a tuple object.
Once the client establishes a connection to the server, it falls
into a loop, sending a message one line at a time and printing
whatever the server sends back after each line is sent:
sockobj.send(line)
Transfers
the next byte-string message line to the server
over the socket. Notice that the default list of lines containsbytes
strings (b'...'
). Just as on the server, data
passed through the socket must be a byte string, though it can
be the result of a manualstr.encode
encoding call or an object
conversion withpickle
orstruct
if desired. When lines
to be sent are given as command-line arguments instead, they
must be converted fromstr
tobytes
; the client arranges
this by encoding in a generator expression (a callmap(str.encode, sys.argv[2:])
would
have the same effect).
data =
sockobj.recv(1024)
Reads the next reply line sent by the server program.
Technically, this reads up to 1,024 bytes of the next reply
message and returns it as a byte string.
sockobj.close()
Closes the
connection with the server, sending it the
end-of-file signal.
And that’s it. The server exchanges one or more lines of text
with each client that connects. The operating system takes care of
locating remote machines, routing bytes sent between programs and
possibly across the Internet, and (with TCP) making sure that our
messages arrive intact. That involves a lot of processing, too—our
strings may ultimately travel around the world, crossing phone wires,
satellite links, and more along the way. But we can be happily
ignorant of what goes on beneath the socket call layer when
programming in
Python.