Thegetfile
script lets us
view server files on the client, but in some sense, it is
a general-purpose file download tool. Although not as direct as fetching
a file by FTP or over raw sockets, it serves similar purposes. Users of
the script can either cut-and-paste the displayed code right off the web
page or use their browser’s View Source option to view and cut. As
described earlier, scripts that contact the script withurllib
can also extract the file’s text with
Python’s HTML parser module.
But what about going the other way—uploading a file from the
client machine to the server? For instance, suppose you are writing a
web-based email system, and you need a way to allow users to upload mail
attachments. This is not an entirely hypothetical scenario; we will
actually implement this idea in the next chapter, when we develop the
PyMailCGI webmail site.
As we saw in
Chapter 13
, uploads are
easy enough to accomplish with a client-side script that uses Python’s
FTP support module. Yet such a solution doesn’t really apply in the
context of a web browser; we can’t usually ask all of our program’s
clients to start up a Python FTP script in another window to accomplish
an upload. Moreover, there is no simple way for the server-side script
to request the upload explicitly, unless an FTP server happens to be
running on the client machine (not at all the usual case). Users can
email files separately, but this can be inconvenient, especially for
email
attachments
.
So is there no way to write a web-based program that lets its
users upload files to a common server? In fact, there is, though it has
more to do with HTML than with Python itself. HTML
tags also support atype=file
option, which produces an input
field, along with a button that pops up a file-selection dialog. The
name of the client-side file to be uploaded can either be typed into the
control or selected with the pop-up dialog. To demonstrate, the HTML
file in
Example 15-29
defines a
page that allows any client-side file to be selected and uploaded to the
server-side script named in the form’saction
option.
Example 15-29. PP4E\Internet\Web\putfile.html
method=postPutfile: upload page
One constraint worth noting: forms that usefile
type inputs should also specify amultipart/form-data
encoding type and
thepost
submission method, as shown
in this file;get
-style URLs don’t
work for uploading files (adding their contents to the end of the URL
doesn’t make sense). When we visit this HTML file, the page shown in
Figure 15-33
is delivered. Pressing its
Browse button opens a standard file-selection dialog, while Upload sends
the file.
Figure 15-33. File upload selection page
On the client side, when we press this page’s Upload button, the
browser opens and reads the selected file and packages its contents with
the rest of the form’s input fields (if any). When this information
reaches the server, the Python script named in the formaction
tag is run as always, as listed in
Example 15-30
.
Example 15-30. PP4E\Internet\Web\cgi-bin\putfile.py
#!/usr/bin/python
"""
##################################################################################
extract file uploaded by HTTP from web browser; users visit putfile.html to
get the upload form page, which then triggers this script on server; this is
very powerful, and very dangerous: you will usually want to check the filename,
etc; this may only work if file or dir is writable: a Unix 'chmod 777 uploads'
may suffice; file pathnames may arrive in client's path format: handle here;
caveat: could open output file in text mode to wite receiving platform's line
ends since file content always str from the cgi module, but this is a temporary
solution anyhow--the cgi module doesn't handle binary file uploads in 3.1 at all;
##################################################################################
"""
import cgi, os, sys
import posixpath, ntpath, macpath # for client paths
debugmode = False # True=print form info
loadtextauto = False # True=read file at once
uploaddir = './uploads' # dir to store files
sys.stderr = sys.stdout # show error msgs
form = cgi.FieldStorage() # parse form data
print("Content-type: text/html\n") # with blank line
if debugmode: cgi.print_form(form) # print form fields
# html templates
html = """Putfile response page
Putfile response page
%s
"""
goodhtml = html % """Your file, '%s', has been saved on the server as '%s'.
An echo of the file's contents received and saved appears below.
%s
"""
# process form data
def splitpath(origpath): # get file at end
for pathmodule in [posixpath, ntpath, macpath]: # try all clients
basename = pathmodule.split(origpath)[1] # may be any server
if basename != origpath:
return basename # lets spaces pass
return origpath # failed or no dirs
def saveonserver(fileinfo): # use file input form data
basename = splitpath(fileinfo.filename) # name without dir path
srvrname = os.path.join(uploaddir, basename) # store in a dir if set
srvrfile = open(srvrname, 'wb') # always write bytes here
if loadtextauto:
filetext = fileinfo.value # reads text into string
if isinstance(filetext, str): # Python 3.1 hack
filedata = filetext.encode()
srvrfile.write(filedata) # save in server file
else: # else read line by line
numlines, filetext = 0, '' # e.g., for huge files
while True: # content always str here
line = fileinfo.file.readline() # or for loop and iterator
if not line: break
if isinstance(line, str): # Python 3.1 hack
line = line.encode()
srvrfile.write(line)
filetext += line.decode() # ditto
numlines += 1
filetext = ('[Lines=%d]\n' % numlines) + filetext
srvrfile.close()
os.chmod(srvrname, 0o666) # make writable: owned by 'nobody'
return filetext, srvrname
def main():
if not 'clientfile' in form:
print(html % 'Error: no file was received')
elif not form['clientfile'].filename:
print(html % 'Error: filename is missing')
else:
fileinfo = form['clientfile']
try:
filetext, srvrname = saveonserver(fileinfo)
except:
errmsg = 'Error%s
%s' % tuple(sys.exc_info()[:2])
print(html % errmsg)
else:
print(goodhtml % (cgi.escape(fileinfo.filename),
cgi.escape(srvrname),
cgi.escape(filetext)))
main()
Within this script, the Python-specific interfaces for handling
uploaded files are employed. They aren’t very new, really; the file
comes into the script as an entry in the parsed form object returned bycgi.FieldStorage
, as usual; its key
isclientfile
, the input control’sname
in the HTML page’s code.
This time, though, the entry has additional attributes for the
file’s name on the client. Moreover, accessing thevalue
attribute of an uploaded file input
object will automatically read the file’s contents all at once into a
string on the server. For very large files, we can instead read line by
line (or in chunks of bytes) to avoid overflowing memory space.
Internally, Python’scgi
module
stores uploaded files in temporary files automatically; reading them in
our script simply reads from that temporary file. If they are very
large, though, they may be too long to store as a single string in
memory all at once.
For illustration purposes, the script implements either scheme:
based on the setting of theloadtextauto
global variable, it either asks
for the file contents as a string or reads it line by line. In general,
the CGI module gives us back objects with the following attributes for
file upload controls:
filename
The name of the file as specified on the client
file
A file object from which the uploaded file’s contents can be
read
value
The contents of the uploaded file (read from the file on
attribute access)
Additional attributes are not used by our script. Files represent
a third input field object; as we’ve also seen, thevalue
attribute is a
string
for simple input fields, and we may receive
a
list
of objects for multiple-selection
controls.
For uploads to be saved on the server, CGI scripts (run by the
user “nobody” on some servers) must have write access to the enclosing
directory if the file doesn’t yet exist, or to the file itself if it
does. To help isolate uploads, the script stores all uploads in whatever
server directory is named in theuploaddir
global. On one Linux server, I had
to give this directory a mode of 777 (universal read/write/execute
permissions) withchmod
to make
uploads work in general. This is a nonissue with the local web server
used in this chapter, but your mileage may vary; be sure to check
permissions if this script fails.
The script also callsos.chmod
to set the permission on the server file such that it can be read and
written by everyone. If it is created anew by an upload, the file’s
owner will be “nobody” on some servers, which means anyone out in
cyberspace can view and upload the file. On one Linux server, though,
the file will also be writable only by the user “nobody” by default,
which might be inconvenient when it comes time to change that file
outside the Web (naturally, the degree of pain can vary per file
operation).
Isolating client-side file uploads by placing them in a single
directory on the server helps minimize security risks: existing files
can’t be overwritten arbitrarily. But it may require you to copy files
on the server after they are uploaded, and it still doesn’t prevent
all security
risks—
mischievous
clients can still upload
huge files, which we would need to trap with additional logic not
present in this script as is. Such traps may be needed only in scripts
open to the Internet at large.
If both client and server do their parts, the CGI script presents
us with the response page shown in
Figure 15-34
, after it has stored the contents of
the client file in a new or existing file on the server. For
verification, the response gives the client and server file paths, as
well as an echo of the uploaded file, with a line count in line-by-line
reader mode.
Notice that this echo display assumes that the file’s content is
text. It turns out that this is a safe assumption to make, because thecgi
module always returns file
content asstr
strings, notbytes
. Less happily, this also stems from the
fact that binary file uploads are not supported in thecgi
module in 3.1 (more on this limitation in
an upcoming note).
Figure 15-34. Putfile response page