Authors: TJ O'Connor
On Microsoft Operating Systems, the Recycle Bin serves as a special folder that contains deleted files. When a user deletes files via Windows Explorer, the operating system places the files in this special folder, marking them for deletion but not actually removing them. On Windows 98 and prior systems with a FAT file system, the C:\Recycled\ directory holds the Recycle Bin directory. Operating systems that support NTFS, including Windows NT, 2000, and XP, store the Recycle Bin in the C:\Recycler\ directory. Windows Vista and 7 store the directory at C:\$Recycle.Bin.
To allow our script to remain independent of the operating system, let’s write a function to test each of the possible candidate directories and return the first one that exists on the system.
import os
def returnDir():
dirs=[‘C:\\Recycler\\’,‘C:\\Recycled\\’,‘C:\\$Recycle.Bin\\’]
for recycleDir in dirs:
if os.path.isdir(recycleDir):
return recycleDir
return None
After discovering the Recycle Bin directory, we will need to inspect its contents. Notice the two subdirectories. They both contain the string S-1-5-21-1275210071-1715567821-725345543- and terminate with 1005 or 500. This string represents the user SID, corresponding to a unique user account on the machine.
C:\RECYCLER>dir /a
Volume in drive C has no label.
Volume Serial Number is 882A-6E93
Directory of C:\RECYCLER
04/12/2011 09:24 AM
04/12/2011 09:24 AM
04/12/2011 09:56 AM
1005
04/12/2011 09:20 AM
500
0 File(s) 0 bytes
4 Dir(s) 30,700,670,976 bytes free
We will use the windows registry to translate this SID into an exact username. By inspecting the windows registry key HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList\
translate the SID S-1-5-21-1275210071-1715567821-725345543-1005 directly to the username “alex”.
C:\RECYCLER>reg query “HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentV
ersion\ProfileList\S-1-5-21-1275210071-1715567821-725345543-1005” /v ProfileImagePath
! REG.EXE VERSION 3.0
HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList \S-1-5-21-1275210071-1715567821-725345543-1005 ProfileImagePath
REG_EXPAND_SZ %SystemDrive%\Documents and Settings\alex
As we will want to know who deleted which files in the Recycle Bin, let’s write a small function to translate each SID into a username. This will allow us to print some more useful output when we recover deleted items in the Recycle Bin. This function will open the registry to examine the ProfileImagePath Key, find the value and return the name located after the last backward slash in the userpath.
from _winreg import ∗
def sid2user(sid):
try:
key = OpenKey(HKEY_LOCAL_MACHINE,
“SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”
+ ‘\\’ + sid)
(value, type) = QueryValueEx(key, ‘ProfileImagePath’)
user = value.split(‘\\’)[-1]
return user
except:
return sid
Finally, we will put all of our code together to create a script that will print the deleted files still in the Recycle Bin.
import os
import optparse
from _winreg import ∗
def sid2user(sid):
try:
key = OpenKey(HKEY_LOCAL_MACHINE,
“SOFTWARE\Microsoft\Windows NT\CurrentVersion\ProfileList”
+ ‘\\’ + sid)
(value, type) = QueryValueEx(key, ‘ProfileImagePath’)
user = value.split(‘\\’)[-1]
return user
except:
return sid
def returnDir():
dirs=[‘C:\\Recycler\\’,‘C:\\Recycled\\’,‘C:\\$Recycle.Bin\\’]
for recycleDir in dirs:
if os.path.isdir(recycleDir):
return recycleDir
return None
def findRecycled(recycleDir):
dirList = os.listdir(recycleDir)
for sid in dirList:
files = os.listdir(recycleDir + sid)
user = sid2user(sid)
print ‘\n[∗] Listing Files For User: ’ + str(user)
for file in files:
print ‘[+] Found File: ’ + str(file)
def main():
recycledDir = returnDir()
findRecycled(recycledDir)
if __name__ == ‘__main__’:
main()
Running our code inside a target, we see that the script discovers two users: alex and Administrator. It lists the files contained in the Recycle Bin of each user. In the next section, we will examine a method for examining some of the content inside of those files that may prove useful in an investigation.
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.
C:\>python dumpRecycleBin.py
[∗] Listing Files For User: alex
[+] Found File: Notes_on_removing_MetaData.pdf
[+] Found File: ANONOPS_The_Press_Release.pdf
[∗] Listing Files For User: Administrator
[+] Found File: 192.168.13.1-router-config.txt
[+] Found File: Room_Combinations.xls
C:\Documents and Settings\john\Desktop>
In this section, we will write some scripts to extract metadata from some files. A not clearly visible object of files, metadata can exist in documents, spreadsheets, images, audio and video file types. The authoring application may store details such as the file’s authors, creation and modification times, potential revisions, and comments. For example, a camera-phone may imprint the GPS location of a photo, or a Microsoft Word application may store the author of a Word document. While checking every individual file appears an arduous task, we can automate this using Python.
From The Trenches
Anonymous’ Metadata Fail
On December 10, 2010, the hacker group Anonymous posted a press release outlining the motivations behind a recent attack named Operation Payback (
Prefect, 2010
). Angry with the companies that had dropped support for the Web site WikiLeaks, Anonymous called for retaliation by performing a distributed denial of service (DDoS) attack against some of the parties involved. The hacker posted the press release unsigned and without attribution. Distributed as a Portable Document Format (PDF) file, the press release contained metadata. In addition to the program used to create the document, the PDF metadata contained the name of the author, Mr. Alex Tapanaris. Within days, Greek police arrested Mr. Tapanaris (
Leyden, 2010
).
Let’s use Python to quickly recreate the forensic investigation of a document that proved useful in the arrest of a member of the hacker group Anonymous.
Wired.com
still mirrors the document ANONOPS_The_Press_Release.pdf. We can start by downloading the document using the wget utility.
forensic:∼# wget http://www.wired.com/images_blogs/threatlevel/2010/12/ANONOPS_The_Press_Release.pdf
--2012-01-19 11:43:36-- http://www.wired.com/images_blogs/threatlevel/2010/12/ANONOPS_The_Press_Release.pdf
Resolving www.wired.com... 64.145.92.35, 64.145.92.34
Connecting to www.wired.com|64.145.92.35|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 70214 (69K) [application/pdf]
Saving to: ‘ANONOPS_The_Press_Release.pdf.1’
100%[==================================================================================>] 70,214 364K/s in 0.2s
2012-01-19 11:43:39 (364 KB/s) - ‘ANONOPS_The_Press_Release.pdf’ saved [70214/70214]
PYPDF is an excellent third-party utility for managing PDF documents and is available for download from
http://pybrary.net/pyPdf/
. It offers the ability to extract document information, split, merge, crop, encrypt and decrypt documents. To extract metadata, we utilize the method .getDocumentInfo(). This method returns an array of tuples. Each tuple contains a description of the metadata element and its value. Iterating through this array prints out the entire metadata of the PDF document.
import pyPdf
from pyPdf import PdfFileReader
def printMeta(fileName):
pdfFile = PdfFileReader(file(fileName, ‘rb’))
docInfo = pdfFile.getDocumentInfo()
print ‘[∗] PDF MetaData For: ’ + str(fileName)
for metaItem in docInfo:
print ‘[+] ’ + metaItem + ‘:’ + docInfo[metaItem]
Adding an option parser to identify a specific file, we have a tool that can identify the metadata embedded in a PDF document. Similarly, we can modify our script to test for specific metadata, such as a specific user. Certainly, it might be helpful for Greek law enforcement officials to search for files that also list Alex Tapanaris as the author.
import pyPdf
import optparse
from pyPdf import PdfFileReader
def printMeta(fileName):
pdfFile = PdfFileReader(file(fileName, ‘rb’))
docInfo = pdfFile.getDocumentInfo()
print ‘[∗] PDF MetaData For: ’ + str(fileName)
for metaItem in docInfo:
print ‘[+] ’ + metaItem + ‘:’ + docInfo[metaItem]
def main():
parser = optparse.OptionParser(‘usage %prog “+\
“-F
parser.add_option(‘-F’, dest=‘fileName’, type=‘string’,\
help=‘specify PDF file name’)
(options, args) = parser.parse_args()
fileName = options.fileName
if fileName == None:
print parser.usage
exit(0)
else:
printMeta(fileName)
if __name__ == ‘__main__’:
main()
Running our pdfReader script against the Anonymous Press Release, we see the same metadata that led Greek authorities to arrest Mr. Tapanaris.
forensic:∼# python pdfRead.py -F ANONOPS_The_Press_Release.pdf
[∗] PDF MetaData For: ANONOPS_The_Press_Release.pdf
[+] /Author:Alex Tapanaris
[+] /Producer:OpenOffice.org 3.2
[+] /Creator:Writer
[+] /CreationDate:D:20101210031827+02’00’
The exchange image file format (Exif) standard defines the specifications for how to store image and audio files. Devices such as digital cameras, smartphones, and scanners use this standard to save audio or image files. The Exif standard contains several useful tags for a forensic investigation. Phil Harvey wrote a tool aptly named exiftool (available from
http://www.sno.phy.queensu.ca/~phil/exiftool/
) that can parse these tags. Examining all the Exif tags in a photo could result in several pages of information, so let’s examine a snipped version of some information tags. Notice that the Exif tags contain the camera model name
iPhone 4S
as well as the GPS latitude and longitude coordinates of the actual image. Such information can prove helpful in organizing images. For example, the Mac OS X application iPhoto uses the location information to neatly arrange photos on a world map. However, this information also has plenty of malicious uses. Imagine a soldier placing Exif-tagged photos on a blog or a Web site: the enemy could download entire sets of photos and know all of that soldier’s movements in seconds. In the following section, we will build a script to connect to a Web site, download all the images on the site, and then check them for Exif metadata.
investigator$ exiftool photo.JPG
ExifTool Version Number : 8.76
File Name : photo.JPG
Directory : /home/investigator/photo.JPG
File Size : 1626 kB
File Modification Date/Time : 2012:02:01 08:25:37-07:00
File Permissions : rw-r--r--
File Type : JPEG
MIME Type : image/jpeg
Exif Byte Order : Big-endian (Motorola, MM)
Make : Apple
Camera Model Name : iPhone 4S
Orientation : Rotate 90 CW
<..SNIPPED..>
GPS Altitude : 10 m Above Sea Level
GPS Latitude : 89 deg 59’ 59.97” N
GPS Longitude : 36 deg 26’ 58.57” W
<..SNIPPED..>
Available from
http://www.crummy.com/software/BeautifulSoup/
, Beautiful Soup allows us to quickly parse HTML and XML documents. Leonard Richardson released the latest version of Beautiful Soup on May 29, 2012. To update to the latest version on Backtrack, use easy_install to fetch and install the beautifulsoup4 library.
investigator:∼# easy_install beautifulsoup4
Searching for beautifulsoup4
Reading http://pypi.python.org/simple/beautifulsoup4/
<..SNIPPED..>
Installed /usr/local/lib/python2.6/dist-packages/beautifulsoup4-4.1.0-py2.6.egg
Processing dependencies for beautifulsoup4
Finished processing dependencies for beautifulsoup4
In this section, we will Beautiful Soup to scrape the contents of an HTML document for all the images found on the document. Notice that we are using the urllib2 library to open the contents of a document and read it. Next, we can create a Beautiful Soup object or a parse tree that contains the different objects of the HTML document. In that object, we will extract all the image tags by searching using the method .findall(‘img’). This method returns an array of all the image tags, which we will return.
import urllib2
from bs4 import BeautifulSoup
def findImages(url):
print ‘[+] Finding images on ’ + url
urlContent = urllib2.urlopen(url).read()
soup = BeautifulSoup(urlContent)
imgTags = soup.findAll(‘img’)
return imgTags
Next, we need to download each image from the site in order to examine them in a separate function. To download an image, we will use the functionality included in the urllib2, urlparse, and os libraries. First, we will extract the source address from the image tag. Next, we will read the binary contents of the image into a variable. Finally, we will open a file in write-binary mode and write the contents of the image to the file.
import urllib2
from urlparse import urlsplit
from os.path import basename
def downloadImage(imgTag):
try:
print ‘[+] Dowloading image...’
imgSrc = imgTag[‘src’]
imgContent = urllib2.urlopen(imgSrc).read()
imgFileName = basename(urlsplit(imgSrc)[2])
imgFile = open(imgFileName, ‘wb’)
imgFile.write(imgContent)
imgFile.close()
return imgFileName
except:
return ’’
To test the contents of an image file for Exif Metadata, we will process the file using the Python Imaging Library. PIL, available from
http://www.pythonware.com/products/pil/
, adds image-processing capabilities to Python, and allows us to quickly extract the metadata associated with geo-location information. To test a file for metadata, we will open the object as a PIL Image and use the method
_getexif().
Next, we parse the Exif data into an array, indexed by the metadata type. With the array complete, we can search the array to see if it contains an Exif tag for GPSInfo. If it does contain a GPSInfo tag, then we will know the object contains GPS Metadata and we can print a message to the screen.
def testForExif(imgFileName):
try:
exifData = {}
imgFile = Image.open(imgFileName)
info = imgFile._getexif()
if info:
for (tag, value) in info.items():
decoded = TAGS.get(tag, tag)
exifData[decoded] = value
exifGPS = exifData[‘GPSInfo’]
if exifGPS:
print ‘[∗] ’ + imgFileName + \
‘ contains GPS MetaData’
except:
pass
Wrapping everything together, our script is now able to connect to a URL address, parse and download all the images files, and test each file for Exif metadata. Notice that in the main function, we first fetch a list of all the images on the site. Then, for each image in the array, we will download the file and test it for GPS metadata.
import urllib2
import optparse
from urlparse import urlsplit
from os.path import basename
from bs4 import BeautifulSoup
from PIL import Image
from PIL.ExifTags import TAGS
def findImages(url):
print ‘[+] Finding images on ’ + url
urlContent = urllib2.urlopen(url).read()
soup = BeautifulSoup(urlContent)
imgTags = soup.findAll(‘img’)
return imgTags
def downloadImage(imgTag):
try:
print ‘[+] Dowloading image...’
imgSrc = imgTag[‘src’]
imgContent = urllib2.urlopen(imgSrc).read()
imgFileName = basename(urlsplit(imgSrc)[2])
imgFile = open(imgFileName, ‘wb’)
imgFile.write(imgContent)
imgFile.close()
return imgFileName
except:
return ’’
def testForExif(imgFileName):
try:
exifData = {}
imgFile = Image.open(imgFileName)
info = imgFile._getexif()
if info:
for (tag, value) in info.items():
decoded = TAGS.get(tag, tag)
exifData[decoded] = value
exifGPS = exifData[‘GPSInfo’]
if exifGPS:
print ‘[∗] ’ + imgFileName + \
‘ contains GPS MetaData’
except:
pass
def main():
parser = optparse.OptionParser(‘usage%prog “+\
“-u
parser.add_option(‘-u’, dest=‘url’, type=‘string’,
help=‘specify url address’)
(options, args) = parser.parse_args()
url = options.url
if url == None:
print parser.usage
exit(0)
else:
imgTags = findImages(url)
for imgTag in imgTags:
imgFileName = downloadImage(imgTag)
testForExif(imgFileName)
if __name__ == ‘__main__’:
main()
Testing the newly created script against a target address, we see that one of the images on the target contains GPS metadata information. While this can be used in an offensive reconnaissance sense to target individuals, we can also use the script in a completely benign way—to identify our own vulnerabilities before attackers.
forensics: # python exifFetch.py -u http://www.flickr.com/photos/dvids/4999001925/sizes/o
[+] Finding images on http://www.flickr.com/photos/dvids/4999001925/sizes/o
[+] Dowloading image…
[+] Dowloading image…
[+] Dowloading image…
[+] Dowloading image…
[+] Dowloading image…
[∗] 4999001925_ab6da92710_o.jpg contains GPS MetaData
[+] Dowloading image…
[+] Dowloading image…
[+] Dowloading image…
[+] Dowloading image…