Name
directory-walk.py
Language
Python
Licence
GPL
What does it do?
It walks
a directory structure, and produces two files; first
one with a list of all the unique directories under that directory and its
children; and second a file of the filenames in that directory and its
children.
Bugs
-
As it stands, it has one fairly serious bug I recognise. It
cannot handle circular references brought up by linking
within the directory structure. Because of the
recursion, it may just keep running and fill up memory
until it bjorks (a valid technical term, I'm lead to
believe). The solution may be as simple as checking
the file and if it's a link to another directory,
saying
Nup...
. There is a check in there
at the moment, but I'm not sure I have it completely
covered.
-
Another is that if it encounters a file with a comma in its
name, it will not handle it in the output file. Don't
really know if it's a thing to worry about,
though. ( Although,
rcs
, a simple revision control program that I use
for these scripts, puts a ,v
at the end of the file name. Oh good,
I've broken this already ...) May be solved simply by quoting the file name in the
output.
-
I've noticed I've missed out on a check to see if we
actually can write the files to the current directory.
This normally shouldn't be a problem, because you will
be running it from outside the target directory tree, usually
in a directory that you have write access.
- The files it produces are comma-delimited *.csv files,
but the strings are not quoted. Be aware of this when dealing
with the output files. In fact, considering the bug above, go
ahead and modify it anyway to quote the file names.
Comments
I wrote this program to catalogue a friend's media drives so I could
put that information into a searchable database for them. Did its job!
This was the first time that I've used recursion in a program.
Download it here
Version 1.0
TODO
import os, sys
import csv
from optparse import OptionParser
optParser = OptionParser()
usage = """%prog [options] /absolute/path/of/directory/to/walk
This will produce two files named from the subdirectory of the input
directory (in our example above, the 'walk' directory):
1) The first file is called 'walk-uniqueDirs.csv' and will contain a
list of unique directories within the directory 'walk', one per line.
For example, the file contents maybe something like:
walk
walk/subdir1
walk/subdir2
...
2) The second file will be called 'walk-files.csv' and will contain a
comma-delimited list of directories and the files they contain, one file per
line, for example:
walk,stuff.txt
walk/subdir1,other.txt
walk/subdir1,who.txt
walk/subdir2,another.txt
...
Of course, this will have problems if the filename has a comma in it ...
"""
optParser.set_usage(usage)
optParser.add_option("-d", "--dot-files", action="store_true",
dest="includeDotFiles", default=False,
help="Include dot files and dot directories [default is no]")
optParser.add_option("-o", "--output-directory", action="store",
dest="fnBase", metavar="OUTPUT_DIRECTORY", default="",
help="Where to write the output files. Default is the current directory.")
(options, args) = optParser.parse_args()
if len(args) != 1:
optParser.error("Directory missing or too many arguments")
sys.exit()
uniqueDirs = []
outputList = []
dirToWalk = args[0]
dirToWalkLength = len(dirToWalk)
dirName = ""
dirBase = ""
dirBaseLength = 0
lastFSlashIndex = dirToWalk.rfind("/")
if lastFSlashIndex == len(dirToWalk) - 1:
dirToWalk = dirToWalk[:-1]
dirToWalkLength = len(dirToWalk)
lastFSlashIndex = dirToWalk.rfind("/")
if lastFSlashIndex > -1:
dirName = dirToWalk[lastFSlashIndex + 1 :]
dirBase = dirToWalk[: lastFSlashIndex + 1]
dirBaseLength = len(dirBase)
else:
dirName = dirToWalk
fnBase = options.fnBase
if len(fnBase) > 0:
if os.path.exists(fnBase):
if fnBase.rfind("/") != len(fnBase) - 1:
fnBase += "/"
else:
print("Output directory " + fnBase + " doesn't exist - aborting...")
sys.exit(0)
def writeData(pathName, fileName):
print pathName + "," + fileName + "\n"
def walkDir(dirPath):
if os.path.exists(dirPath):
print "Entering " + dirPath + "..."
uniqueDirs.append([dirPath[dirBaseLength : ]])
fileList = os.listdir(dirPath)
for fn in fileList:
if options.includeDotFiles or fn.find(".") != 0:
fullFileName = dirPath + "/" + fn
if os.path.isdir(fullFileName):
if not os.path.islink(fullFileName):
walkDir(fullFileName)
elif os.path.isfile(fullFileName):
outputList.append([dirPath[dirBaseLength : ], fn])
if os.path.isabs(dirToWalk):
if os.path.exists(dirToWalk):
walkDir(dirToWalk)
else:
print "Directory " + args[0] + " doesn't exist - aborting..."
sys.exit()
else:
print "Directory " + args[0] + " is not absolute - aborting..."
sys.exit()
dirListFile = open(fnBase + dirName + "-uniqueDirs.csv", "w")
dirListWriter = csv.writer(dirListFile, dialect='excel')
for dn in uniqueDirs:
dirListWriter.writerow(dn)
dirListFile.close()
fileListFile = open(fnBase + dirName + "-files.csv", "w")
fileWriter = csv.writer(fileListFile, dialect='excel')
for fn in outputList:
fileWriter.writerow(fn)
fileListFile.close()