Programming Python (149 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
7.41Mb size Format: txt, pdf, ePub
What’s a Server-Side CGI Script?

Simply put, CGI scripts
implement much of the interaction you typically experience
on the Web. They are a standard and widely used mechanism for programming
web-based systems and website interaction, and they underlie most of the
larger web
development
models.

There are other ways to add interactive behavior to websites with
Python, both on the client and the server. We briefly met some such
alternatives near the start of
Chapter 12
. For
instance, client-side solutions include Jython applets, RIAs such as
Silverlight and pyjamas, Active Scripting on Windows, and the emerging
HTML 5 standard. On the server side, there are a variety of additional
technologies that build on the basic CGI model, such as Python Server
Pages, and web frameworks such as Django, App Engine, CherryPy, and Zope,
many of which utilize the MVC programming model.

By and large, though, CGI server-side scripts are used to program
much of the activity on the Web, whether it’s programmed directly or
partly automated by frameworks and tools. CGI scripting is perhaps the
most primitive approach to implementing websites, and it does not by
itself offer the tools that are often built into larger frameworks such as
state retention, database interfaces, and reply templating. CGI scripts,
however, are in many ways the simplest technique for server-side
scripting. As a result, they are an ideal way to get started with
programming on the server side of the Web. Especially for simpler sites
that do not require enterprise-level tools, CGI is sufficient, and it can
be augmented with additional libraries as needed.

The Script Behind the Curtain

Formally speaking, CGI scripts
are programs that run on a server machine and adhere to
the Common Gateway Interface—a model for browser/server communications,
from which CGI scripts take their name. CGI is an application protocol
that web servers use to transfer input data and results between web
browsers and other clients and server-side scripts. Perhaps a more
useful way to understand CGI, though, is in terms of the interaction it
implies.

Most people take this interaction for granted when browsing the
Web and pressing buttons in web pages, but a lot is going on behind the
scenes of every transaction on the Web. From the perspective of a user,
it’s a fairly familiar and simple process:

Submission

When you visit a website to search, purchase a product, or
submit information online, you generally fill in a form in your
web browser, press a button to submit your information, and begin
waiting for a reply.

Response

Assuming all is well with both your Internet connection and
the computer you are contacting, you eventually get a reply in the
form of a new web page. It may be a simple acknowledgment (e.g.,
“Thanks for your order”) or a new form that must be filled out and
submitted again.

And, believe it or not, that simple model is what makes most of
the Web hum. But internally, it’s a bit more complex. In fact, a subtle
client/server socket-based architecture is at work—your web browser
running on your computer is the
client
, and the
computer you contact over the Web is the
server
.
Let’s examine the interaction scenario again, with all the gory details
that users usually never see:

Submission

When you fill out a form page in a web browser and press a
submission button, behind the scenes your web browser sends your
information across the Internet to the server machine specified as
its receiver. The server machine is usually a remote computer that
lives somewhere else in both cyberspace and reality. It is named
in the URL accessed—the Internet address string that appears at
the top of your browser. The target server and file can be named
in a URL you type explicitly, but more typically they are
specified in the HTML that defines the submission page
itself—either in a hyperlink or in the “action” tag of the input
form’s HTML.

However the server is specified, the browser running on your
computer ultimately sends your information to the server as bytes
over a socket, using techniques we saw in the last three chapters.
On the server machine, a program called an
HTTP
server
runs perpetually, listening on a socket for
incoming connection requests and data from browsers and other
clients, usually on port number 80.

Processing

When your information shows up at the server machine, the
HTTP server program notices it first and decides how to handle the
request. If the requested URL names a simple
web
page
(e.g., a URL ending in
.html
), the HTTP server opens the named
HTML file on the server machine and sends its text back to the
browser over a socket. On the client, the browser reads the HTML
and uses it to construct the next page you see.

But if the URL requested by the browser names an
executable program
instead (e.g., a URL
ending in
.cgi
or
.py
), the HTTP server starts the named
program on the server machine to process the request and redirects
the incoming browser data to the spawned program’s
stdin
input stream, environment
variables, and command-line arguments. That program started by the
server is usually a CGI script—a program run on the remote server
machine somewhere in cyberspace, usually not on your computer. The
CGI script is responsible for handling the request from this point
on; it may store your information in a database, perform a search,
charge your credit card, and so on.

Response

Ultimately, the CGI script prints HTML, along with a few
header lines, to generate a new response page in your browser.
When a CGI script is started, the HTTP server takes care to
connect the script’s
stdout
standard output stream to a socket that the browser is listening
to. As a result, HTML code printed by the CGI script is sent over
the Internet, back to your browser, to produce a new page. The
HTML printed back by the CGI script works just as if it had been
stored and read from an HTML file; it can define a simple response
page or a brand-new form coded to collect additional information.
Because it is generated by a script, it may include information
dynamically determined per request.

In other words, CGI scripts are something like
callback
handlers
for requests generated by web browsers that require
a program to be run dynamically. They are automatically run on the
server machine in response to actions in a browser. Although CGI scripts
ultimately receive and send standard structured messages over sockets,
CGI is more like a higher-level procedural convention for sending and
receiving information between a browser and a
server.

Writing CGI Scripts in Python

If all of this
sounds complicated, relax—Python, as well as the resident
HTTP server, automates most of the tricky bits. CGI scripts are written
as fairly autonomous programs, and they assume that startup tasks have
already been accomplished. The HTTP web server program, not the CGI
script, implements the server side of the HTTP protocol itself.
Moreover, Python’s library modules automatically dissect information
sent up from the browser and give it to the CGI script in an easily
digested form. The upshot is that CGI scripts may focus on application
details like processing input data and producing a result page.

As mentioned earlier, in the context of CGI scripts, the
stdin
and
stdout
streams are automatically tied to sockets connected to the
browser. In addition, the HTTP server passes some browser information to
the CGI script in the form of shell environment variables, and possibly
command-line arguments. To CGI programmers, that means:

  • Input
    data sent from the browser to the
    server shows up as a stream of bytes in the
    stdin
    input stream, along with shell
    environment variables.

  • Output
    is sent back from the server to
    the client by simply printing properly formatted HTML to the
    stdout
    output stream.

The most complex parts of this scheme include parsing all the
input information sent up from the browser and formatting information in
the reply sent back. Happily, Python’s standard library largely
automates both tasks:

Input

With the Python
cgi
module,
input typed into a web browser form or appended to a URL string
shows up as values in a dictionary-like object in Python CGI
scripts. Python parses the data itself and gives us an object with
one
key
:
value
pair
per input sent by the browser that is fully independent of
transmission style (roughly, by fill-in form or by direct
URL).

Output

The
cgi
module also has
tools for automatically escaping strings so that they are legal to
use in HTML (e.g., replacing embedded
<
,
>
, and
&
characters with HTML escape
codes). Module
urllib.parse
provides additional tools for formatting text inserted into
generated URL strings (e.g., adding
%XX
and
+
escapes).

We’ll study both of these interfaces in detail later in this
chapter. For now, keep in mind that although any language can be used to
write CGI scripts, Python’s standard modules and language attributes
make it a snap.

Perhaps less happily, CGI scripts are also intimately tied to the
syntax of HTML, since they must generate it to create a reply page. In
fact, it can be said that Python CGI scripts
embed HTML, which is an entirely distinct language in its
own right.
[
57
]
As we’ll also see, the fact that CGI scripts create a user
interface by printing HTML syntax means that we have to take special
care with the text we insert into a web page’s code (e.g., escaping HTML
operators). Worse, CGI scripts require at least a cursory knowledge of
HTML forms, since that is where the inputs and target script’s address
are typically specified.

This book won’t teach HTML in depth; if you find yourself puzzled
by some of the arcane syntax of the HTML generated by scripts here, you
should glance at an HTML introduction, such as
HTML
& XHTML: The Definitive Guide
. Also keep in mind
that higher-level tools and frameworks can sometimes hide the details of
HTML generation from Python programmers, albeit at the cost of any new
complexity inherent in the framework itself. With HTMLgen and similar
packages, for instance, it’s possible to deal in Python objects, not
HTML syntax, though you must learn this system’s API as
well.

[
57
]
Interestingly, in
Chapter 12
we
briefly introduced other systems that take the opposite
route—embedding Python code or calls in HTML. The server-side
templating
languages in Zope, PSP, and other
web frameworks use this model, running the embedded Python code to
produce part of a reply page. Because Python is embedded, these
systems must run special servers to evaluate the embedded tags.
Because Python CGI scripts embed HTML in Python instead, they can be
run as standalone programs directly, though they must be launched by
a CGI-capable web server.

Running Server-Side Examples

Like GUIs, web-based
systems are highly interactive, and the best way to get a
feel for some of these examples is to test-drive them live. Before we get
into some code, let’s get set up to run the examples we’re going to
see.

Running CGI-based programs
requires three pieces of software:

  • The client, to submit requests: a browser or script

  • The web server that receives the request

  • The CGI script, which is run by the server to process the
    request

We’ll be writing CGI scripts as we move along, and any web browser
can be used as a client (e.g., Firefox, Safari, Chrome, or Internet
Explorer). As we’ll see later, Python’s
urllib.request
module can also serve as a web client in scripts we write.
The only missing piece here is the intermediate web server.

Web Server Options

There are a variety of approaches to running web servers. For
example, the open source Apache system provides a complete,
production-grade web server, and its
mod_python
extension
discussed later runs Python scripts quickly. Provided you are willing to
install and configure it, it is a complete solution, which you can run
on a machine of your own. Apache usage is beyond our present scope here,
though.

If you have access to an account on a web server machine that runs
Python 3.X, you can also install the HTML and script files we’ll see
there. For the second edition of this book, for instance, all the web
examples were uploaded to an account I had on the “starship” Python
server, and were accessed with URLs of this form:

http://starship.python.net/~lutz/PyInternetDemos.html

If you go this route, replace
starship.python.net/~lutz
with the names of
your own server and account directory path. The downside of using a
remote server account is that changing code is more involved—you will
have to either work on the server machine itself or transfer code back
and forth on changes. Moreover, you need access to such a server in the
first place, and server configuration details can vary widely. On the
starship machine, for example, Python CGI scripts were required to have
a
.cgi
filename extension, executable permission,
and the Unix
#!
line at the top to
point the shell to Python.

Finding a server that supports Python 3.X used by this book’s
examples might prove a stumbling block for some time to come as well;
neither of my own ISPs had it installed when I wrote this chapter in
mid-2010, though it’s possible to find commercial ISPs today that do.
Naturally, this may change over time.

Running a Local Web Server

To keep things simple,
this edition is taking a different approach. All the
examples will be run using a simple web server coded in Python itself.
Moreover, the web server will be run on the same local machine as the
web browser client. This way, all you have to do to run the server-side
examples is start the web server script and use “localhost” as the
server name in all the URLs you will submit or code (see
Chapter 12
if you’ve forgotten why this name means
the local machine). For example, to view a web page, use a URL of this
form in the address field of your web browser:

http://localhost/tutor0.html

This also avoids some of the complexity of per-server differences,
and it makes changing the code simple—it can be edited on the local
machine directly.

For this book’s examples, we’ll use the web server in
Example 15-1
. This is essentially the
same script introduced in
Chapter 1
, augmented
slightly to allow the working directory and port number to be passed in
as command-line arguments (we’ll also run this in the root directory of
a larger example in the next chapter). We won’t go into details on all
the modules and classes
Example 15-1
uses here; see the
Python library manual. But as described in
Chapter 1
, this script implements an HTTP web server,
which:

  • Listens for incoming socket requests from clients on the
    machine it is run on and the
    port
    number specified in the script or command line (which defaults to
    80, that standard HTTP port)

  • Serves up HTML pages from the
    webdir
    directory specified in the script or command line (which defaults to
    the directory it is launched from)

  • Runs Python CGI scripts that are located in the
    cgi-bin
    (or
    htbin
    )
    subdirectory of the
    webdir
    directory, with a
    .py
    filename extension

See
Chapter 1
for additional background
on this web server’s operation.

Example 15-1. PP4E\Internet\Web\webserver.py

"""
Implement an HTTP web server in Python which knows how to serve HTML
pages and run server-side CGI scripts coded in Python; this is not
a production-grade server (e.g., no HTTPS, slow script launch/run on
some platforms), but suffices for testing, especially on localhost;
Serves files and scripts from the current working dir and port 80 by
default, unless these options are specified in command-line arguments;
Python CGI scripts must be stored in webdir\cgi-bin or webdir\htbin;
more than one of this server may be running on the same machine to serve
from different directories, as long as they listen on different ports;
"""
import os, sys
from http.server import HTTPServer, CGIHTTPRequestHandler
webdir = '.' # where your HTML files and cgi-bin script directory live
port = 80 # http://servername/ if 80, else use http://servername:xxxx/
if len(sys.argv) > 1: webdir = sys.argv[1] # command-line args
if len(sys.argv) > 2: port = int(sys.argv[2]) # else default ., 80
print('webdir "%s", port %s' % (webdir, port))
os.chdir(webdir) # run in HTML root dir
srvraddr = ('', port) # my hostname, portnumber
srvrobj = HTTPServer(srvraddr, CGIHTTPRequestHandler)
srvrobj.serve_forever() # serve clients till exit

To start the server to run this chapter’s examples, simply run
this script from the directory the script’s file is located in, with no
command-line arguments. For instance, from a DOS command line:

C:\...\PP4E\Internet\Web>
webserver.py
webdir ".", port 80

On Windows, you can simply click its icon and keep the console
window open, or launch it from a DOS command prompt. On Unix it can be
run from a command line in the background, or in its own terminal
window. Some platforms may also require you to have administrator
privileges to run servers on reserved ports, such as the Web’s port 80;
if this includes your machine, either run the server with the required
permissions, or run on an alternate port number (more on port numbers
later in this chapter).

By default, while running locally this way, the script serves up
HTML pages requested on “localhost” from the directory it lives in or is
launched from, and runs Python CGI scripts from the
cgi-bin
subdirectory located there; change its
webdir
variable or pass in a
command-line argument to point it to a different directory. Because of
this structure, in the examples distribution HTML files are in the same
directory as the web server script and CGI scripts are located in the
cgi-bin
subdirectory. In other words, to visit web
pages and run scripts, we’ll be using URLs of these forms,
respectively:

http://localhost/somepage.html
http://localhost/cgi-bin/somescript.py

Both map to the directory that contains the web server script
(
PP4E\Internet\Web
) by default. Again, to run the
examples on a different server machine of your own, simply replace the
“localhost” and “localhost/cgi-bin” parts of these addresses with your
server name and directory path details (more on URLs later in this
chapter); with this address change the examples work the same, but
requests are routed across a network to the server, instead of being
routed between programs running on the same local machine.

The server in
Example 15-1
is by no means a
production-grade web server, but it can be used to experiment with this
book’s examples and is viable as a way to test your CGI scripts locally
with server name “localhost” before deploying them on a real remote
server. If you wish to install and run the examples under a different
web server, you’ll want to extrapolate the examples for your context.
Things like server names and pathnames in URLs, as well as CGI script
filename extensions and other conventions, can vary widely; consult your
server’s documentation for more details. For this chapter and the next,
we’ll assume that you have the
webserver.py
script
running
locally.

The Server-Side Examples Root Page

To confirm that you
are set up to run the examples, start the web server
script in
Example 15-1
and type
the following URL in the address field at the top of your web
browser:

http://localhost/PyInternetDemos.html

This address loads a launcher page with links to this chapter’s
example files (see the examples distribution for this page’s HTML source
code, which is not listed in this book). The launcher page itself
appears as in
Figure 15-1
,
shown displayed in the Internet Explorer web browser on Windows 7 (it
looks similar on other browsers and platforms). Each major example has a
link on this page, which runs when clicked.

Figure 15-1. The PyInternetDemos launcher page

It’s possible to open some of the examples by clicking on their
HTML file directly in your system’s file explorer GUI. However, the CGI
scripts ultimately invoked by some of the example links must be run by a
web server. If you click to browse such pages directly, your browser
will likely display the scripts’ source code, instead of running it. To
run scripts, too, be sure to open the HTML pages by typing their
“localhost” URL address into your browser’s address field.

Eventually, you probably will want to start using a more powerful
web server, so we will study additional CGI installation details later
in this chapter. You may also wish to review our prior exploration of
custom server options in
Chapter 12
(Apache
and mod_python are a popular option). Such details can be safely skipped
or skimmed if you will not be installing on another server right away.
For now, we’ll run locally.

Viewing Server-Side Examples and Output

The source code of examples in this part of the book is listed in
the text and included in the book’s examples distribution package. In
all cases, if you wish to view the source code of an HTML file, or the
HTML generated by a Python CGI script, you can also simply select your
browser’s View Source menu option while the corresponding web page is
displayed.

Keep in mind, though, that your browser’s View Source option lets
you see the
output
of a server-side script after it
has run, but not the source code of the script itself. There is no
automatic way to view the Python source code of the CGI scripts
themselves, short of finding them in this book or in its examples
distribution.

To address this issue, later in this chapter we’ll also write a
CGI-based program called
getfile
, which allows
the source code of any file on this book’s website (HTML, CGI script,
and so on) to be downloaded and viewed. Simply type the desired file’s
name into a web page form referenced by the
getfile.html
link on the Internet demos launcher
page of
Figure 15-1
, or add it
to the end of an explicitly typed URL as a parameter like the following;
replace
tutor5.py
at the end with the
name of the script whose code you wish to view, and omit the
cgi-bin
component at the end to view HTML files
instead:

http://localhost/cgi-bin/getfile.py?filename=cgi-bin\tutor5.py

In response, the server will ship back the text of the named file
to your browser. This process requires explicit interface steps, though,
and much more knowledge of URLs than we’ve gained thus far; to learn how
and why this magic line works, let’s move on to the next
section.

Other books

Cerulean Isle by Browning, G.M.
Stay with Me by Jessica Blair
Color Me a Crime by Tonya Kappes
Tucker’s Grove by Kevin J. Anderson
My Darkest Passion by Carolyn Jewel
Enthusiasm by Polly Shulman
Dom of Ages by K.C. Wells & Parker Williams