Programming Python (153 page)

Read Programming Python Online

Authors: Mark Lutz

Tags: #COMPUTERS / Programming Languages / Python

BOOK: Programming Python
7.29Mb size Format: txt, pdf, ePub
Testing outside browsers with the module urllib.request

Once we understand
how to send inputs to forms as query string parameters
at the end of URLs like this, the Python
urllib.request
module we met in Chapters
1
and
13
becomes even more useful. Recall
that this module allows us to fetch the reply generated for any URL
address. When the URL names a simple HTML file, we simply download its
contents. But when it names a CGI script, the effect is to run the
remote script and fetch its output. This notion opens the door to
web services
, which generate useful XML in
response to input parameters; in simpler roles, this allows us to
test
remote scripts.

For example, we can trigger the script in
Example 15-8
directly, without
either going through the
tutor3.html
web page or typing a URL in a
browser’s address field:

C:\...\PP4E\Internet\Web>
python
>>>
from urllib.request import urlopen
>>>
reply = urlopen('http://localhost/cgi-bin/tutor3.py?user=Brian').read()
>>>
reply
b'tutor3.py\n

Greetings

\n
\n

Hello, Brian.

\n
\n'
>>>
print(reply.decode())
tutor3.py

Greetings




Hello, Brian.




>>>
url = 'http://localhost/cgi-bin/tutor3.py'
>>>
conn = urlopen(url)
>>>
reply = conn.read()
>>>
print(reply.decode())
tutor3.py

Greetings




Who are you?




Recall from
Chapter 13
that
urllib.request.urlopen
gives us a file object connected to the generated reply
stream. Reading this file’s output returns the HTML that would
normally be intercepted by a web browser and rendered into a reply
page. The reply comes off of the underlying socket as
bytes
in 3.X, but can be decoded to
str
strings as needed.

When fetched directly this way, the HTML reply can be parsed
with Python text processing tools, including string methods like
split
and
find
, the
re
pattern-matching module, or the
html.parser
HTML parsing module—all tools
we’ll explore in
Chapter 19
. Extracting text
from the reply like this is sometimes informally called
screen scraping
—a way to use
website content in other programs. Screen scraping is an alternative
to more complex web services frameworks, though a brittle one: small
changes in the page’s format can often break scrapers that rely on it.
The reply text can also be simply inspected—
urllib.request
allows us to test CGI scripts
from the Python interactive prompt or other scripts, instead of a
browser.

More generally, this technique allows us to use a server-side
script as a sort of function call. For instance, a client-side GUI can
call the CGI script and parse the generated reply page. Similarly, a
CGI script that updates a database may be invoked programmatically
with
urllib.request
, outside the
context of an input form page. This also opens the door to automated
regression testing of CGI scripts—we can invoke scripts on any remote
machine, and compare their reply text to the expected output.
[
60
]
We’ll see
urllib.request
in action again in later
examples.

Before we move on, here are a few advanced
urllib.request
usage notes. First, this
module also supports proxies, alternative transmission modes, the
client side of secure HTTPS, cookies, redirections, and more. For
instance, proxies are supported transparently with environment
variables or system settings, or by using
ProxyHandler
objects
in this module (see its
documentation for details and examples).

Moreover, although it normally doesn’t make a difference to
Python scripts, it is possible to send parameters in both the
get
and the
put
submission modes described earlier with
urllib
.
request
. The
get
mode, with parameters in the query
string at the end of a URL as shown in the prior listing, is used by
default. To invoke
post
, pass
parameters in as a separate argument:

>>>
from urllib.request import urlopen
>>>
from urllib.parse import urlencode
>>>
params = urlencode({'user': 'Brian'})
>>>
params
'user=Brian'
>>>
>>>
print(urlopen('http://localhost/cgi-bin/tutor3.py', params).read().decode())
tutor3.py

Greetings




Hello, Brian.




Finally, if your web application depends on client-side cookies
(discussed later) these are supported by
urllib.request
automatically, using Python’s
standard library cookie support to store cookies locally, and later
return them to the server. It also supports redirection,
authentication, and more; the client side of secure HTTP transmissions
(HTTPS) is supported if your computer has secure sockets support
available (most do). See the Python library manual for details. We’ll
explore both cookies later in this chapter, and introduce secure HTTPS
in the
next.

Using Tables to Lay Out Forms

Now let’s move on to
something a bit more realistic. In most CGI applications,
input pages are composed of multiple fields. When there is more than
one, input labels and fields are typically laid out in a table, to give
the form a well-structured appearance. The HTML file in
Example 15-9
defines a form with two
input fields.

Example 15-9. PP4E\Internet\Web\tutor4.html


CGI 101

A second user interaction: tables









Enter your name:

Enter your age:






The

tag defines a
column like

, but also tags
it as a header column, which generally means it is rendered in a bold
font. By placing the input fields and labels in a table like this, we
get an input page like that shown in
Figure 15-10
. Labels and inputs are
automatically lined up vertically in columns, much as they were by the
tkinter GUI geometry managers we met earlier in this book.

Figure 15-10. A form laid out with table tags

When this form’s Submit button (labeled “Send” by the page’s HTML)
is pressed, it causes the script in
Example 15-10
to be executed on the
server machine, with the inputs typed by the user.

Example 15-10. PP4E\Internet\Web\cgi-bin\tutor4.py

#!/usr/bin/python
"""
runs on the server, reads form input, prints HTML;
URL http://server-name/cgi-bin/tutor4.py
"""
import cgi, sys
sys.stderr = sys.stdout # errors to browser
form = cgi.FieldStorage() # parse form data
print('Content-type: text/html\n') # plus blank line
# class dummy:
# def __init__(self, s): self.value = s
# form = {'user': dummy('bob'), 'age':dummy('10')}
html = """
tutor4.py

Greetings




%s


%s


%s



"""
if not 'user' in form:
line1 = 'Who are you?'
else:
line1 = 'Hello, %s.' % form['user'].value
line2 = "You're talking to a %s server." % sys.platform
line3 = ""
if 'age' in form:
try:
line3 = "Your age squared is %d!" % (int(form['age'].value) ** 2)
except:
line3 = "Sorry, I can't compute %s ** 2." % form['age'].value
print(html % (line1, line2, line3))

The table layout comes from the HTML file, not from this Python
CGI script. In fact, this script doesn’t do much new—it uses string
formatting to plug input values into the response page’s HTML
triple-quoted template string as before, this time with one line per
input field. When this script is run by submitting the input form page,
its output produces the new reply page shown in
Figure 15-11
.

Figure 15-11. Reply page generated by tutor4.py

As usual, we can pass parameters to this CGI script at the end of
a URL, too.
Figure 15-12
shows the page we get when passing a
user
and
age
explicitly in this URL:

http://localhost/cgi-bin/tutor4.py?user=Joe+Blow&age=30

Figure 15-12. Reply page from tutor4.py for parameters in URL

Notice that we have two parameters after the
?
this time; we separate them with
&
. Also note that we’ve specified a blank
space in the
user
value with
+
. This is a common URL encoding convention.
On the server side, the
+
is
automatically replaced with a space again. It’s also part of the
standard escape rule for URL strings, which we’ll revisit later.

Although
Example 15-10
doesn’t introduce much that is new about CGI itself, it does highlight a
few new coding tricks worth noting, especially regarding CGI script
debugging and security. Let’s take a quick look.

Converting strings in CGI scripts

Just for fun, the
script echoes back the name of the server platform by
fetching
sys
.
platform
along with the square of the
age
input field. Notice that the
age
input’s value must be converted
to an integer with the built-in
int
function; in the CGI world, all inputs arrive as strings. We could
also convert to an integer with the built-in
eval
function.
Conversion (and other) errors are trapped gracefully in a
try
statement to yield an error line,
instead of letting our
script die.

Warning

But you should never use
eval
to convert strings that were sent
over the Internet, like the
age
field in this example, unless you can be absolutely sure that the
string does not contain even potentially malicious code. For
instance, if this example were available on the general Internet,
it’s not impossible that someone could type a value into the
age
field (or append an
age
parameter to the URL) with a value
that invokes a system shell command. Given the appropriate context
and process permissions, when passed to
eval
, such a string might delete all the
files in your server script directory, or worse!

Unless you run CGI scripts in processes with limited
permissions and machine access, strings read off the Web can be
dangerous to run as code in CGI scripting. You should never pass
them to dynamic coding tools like
eval
and
exec
, or to tools that run arbitrary shell
commands such as
os.popen
and
os.system
, unless you can be sure
that they are safe. Always use simpler tools for numeric conversion
like
int
and
float
, which recognize only numbers, not
arbitrary Python code.

Debugging CGI scripts

Errors happen,
even in the brave new world of the Internet. Generally
speaking, debugging CGI scripts can be much more difficult than
debugging programs that run on your local machine. Not only do errors
occur on a remote machine, but scripts generally won’t run without the
context implied by the CGI model. The script in
Example 15-10
demonstrates the
following two common debugging tricks:

Error message trapping

This script assigns
sys.stderr
to
sys.stdout
so that Python
error messages wind up being displayed in the response page in
the browser. Normally, Python error messages are written to
stderr
, which generally
causes them to show up in the web server’s console window or
logfile. To route them to the browser, we must make
stderr
reference the same file object
as
stdout
(which is connected
to the browser in CGI scripts). If we don’t do this assignment,
Python errors, including program errors in our script, never
show up in the browser.

Test case mock-up

The
dummy
class
definition, commented out in this final version, was used to
debug the script before it was installed on the Net. Besides not
seeing
stderr
messages by
default, CGI scripts also assume an enclosing context that does
not exist if they are tested outside the CGI environment. For
instance, if run from the system command line, this script has
no form input data. Uncomment this code to test from the system
command line. The
dummy
class
masquerades as a parsed form field object, and
form
is assigned a dictionary
containing two form field objects. The net effect is that
form
will be plug-and-play
compatible with the result of a
cgi
.
Field
Storage
call. As usual in Python,
object interfaces, not datatypes, are all we must adhere
to.

Here are a few general tips for debugging your server-side CGI
scripts:

Run the script from the command
line

It probably won’t generate HTML as is, but running it
standalone will detect any syntax errors in your code. Recall
that a Python command line can run source code files regardless
of their extension: for example,
python
somescript.cgi
works fine.

Assign
sys.stderr
to
sys.stdout
as early
as possible in your script

This will generally make the text of Python error messages
and stack dumps appear in your client browser when accessing the
script, instead of the web server’s console window or logs.
Short of wading through server logs or manual exception
handling, this may be the only way to see the text of error
messages after your script aborts.

Mock up inputs to simulate the enclosing CGI
context

For instance, define classes that mimic the CGI inputs
interface (as done with the
dummy
class in this script) to view
the script’s output for various test cases by running it from
the system command line.
[
61
]
Setting environment variables to mimic form or URL
inputs sometimes helps, too (we’ll see how later in this
chapter).

Call utilities to display CGI context in the
browser

The CGI module includes utility
functions that send a formatted dump of CGI
environment variables and input values to the browser, to view
in a reply page. For instance,
cgi.print_form(form)
prints all the
input parameters sent from the client, and
cgi.test()
prints environment
variables, the form, the directory, and more. Sometimes this is
enough to resolve connection or input problems. We’ll use some
of these in the webmail case study in the next chapter.

Show exceptions you catch, print
tracebacks

If you catch an exception that Python raises, the Python
error message won’t be printed to
stderr
(that is normal behavior). In
such cases, it’s up to your script to display the exception’s
name and value in the response page; exception details are
available in the built-in
sys
module, from
sys.exc_info()
.
In addition, Python’s
traceback
module
can be used to manually generate stack traces on
your reply page for errors; tracebacks show source-code lines
active when an exception occurred. We’ll use this later in the
error page in PyMailCGI (
Chapter 16
).

Add debugging prints

You can always insert
tracing
print
statements in your code, just as in normal Python programs. Be
sure you print the content-type header line first, though, or
your prints may not show up on the reply page. In the worst
case, you can also generate debugging and trace messages by
opening and writing to a local text file on the server; provided
you access that file later, this avoids having to format the
trace messages according to HTML reply stream
conventions.

Run it live

Of course, once your script is at least half working, your
best bet is likely to start running it live on the server, with
real inputs coming from a browser. Running a server locally on
your machine, as we’re doing in this chapter, can help by making
changes go faster as you
test.

Other books

Babel Found by Matthew James
Romola by George Eliot
Might as Well Laugh About It Now by Marie Osmond, Marcia Wilkie
Loving Hart by Ella Fox
Jingle Bell Blessings by Bonnie K. Winn
The Keys to the Street by Ruth Rendell
Fix-It and Forget-It Pink Cookbook by Phyllis Pellman Good
After Ever After (9780545292788) by Sonnenblick, Jordan
Tied Up and Twisted by Alison Tyler