Name

directory-walk.py

Language

Python

Licence

GPL

What does it do?

It walks a directory structure, and produces two files; first one with a list of all the unique directories under that directory and its children; and second a file of the filenames in that directory and its children.

Bugs

  1. As it stands, it has one fairly serious bug I recognise. It cannot handle circular references brought up by linking within the directory structure. Because of the recursion, it may just keep running and fill up memory until it bjorks (a valid technical term, I'm lead to believe). The solution may be as simple as checking the file and if it's a link to another directory, saying Nup.... There is a check in there at the moment, but I'm not sure I have it completely covered.
  2. Another is that if it encounters a file with a comma in its name, it will not handle it in the output file. Don't really know if it's a thing to worry about, though. ( Although, rcs, a simple revision control program that I use for these scripts, puts a ,v at the end of the file name. Oh good, I've broken this already ...) May be solved simply by quoting the file name in the output.
  3. I've noticed I've missed out on a check to see if we actually can write the files to the current directory. This normally shouldn't be a problem, because you will be running it from outside the target directory tree, usually in a directory that you have write access.
  4. The files it produces are comma-delimited *.csv files, but the strings are not quoted. Be aware of this when dealing with the output files. In fact, considering the bug above, go ahead and modify it anyway to quote the file names.

Comments

I wrote this program to catalogue a friend's media drives so I could put that information into a searchable database for them. Did its job!

This was the first time that I've used recursion in a program.

Version 1.0

#!/usr/bin/python # -*- coding: utf-8 -*- # This program is free software: you can redistribute it and/or modify # it under the terms of the GNU General Public License as published by # the Free Software Foundation, either version 3 of the License, or # (at your option) any later version. # # This program is distributed in the hope that it will be useful, # but WITHOUT ANY WARRANTY; without even the implied warranty of # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the # GNU General Public License for more details. # # You should have received a copy of the GNU General Public License # along with this program. If not, see <http://www.gnu.org/licenses/>. # This program should take a directory as the first command line arguement and # then "walk" that directory, producing two text files with the same base name # as the initial directory, one that is a listing of the unique directories in # that directory, and another that is a comma-delimited list of all files in # that directory with their subdirectories, one per line. That is: the text file # should have the form: # # "directory/sub/directories","filename" # # TODO: this has a recursive call that will make it vulnerable to infinite loops # from soft/hard links linking to a directory in their own path. # Copyright (c) 2012 Stewart Park <phoheo at gmail dot com> import os, sys import csv from optparse import OptionParser optParser = OptionParser() # Usage usage = """%prog [options] /absolute/path/of/directory/to/walk This will produce two files named from the subdirectory of the input directory (in our example above, the 'walk' directory): 1) The first file is called 'walk-uniqueDirs.csv' and will contain a list of unique directories within the directory 'walk', one per line. For example, the file contents maybe something like: walk walk/subdir1 walk/subdir2 ... 2) The second file will be called 'walk-files.csv' and will contain a comma-delimited list of directories and the files they contain, one file per line, for example: walk,stuff.txt walk/subdir1,other.txt walk/subdir1,who.txt walk/subdir2,another.txt ... Of course, this will have problems if the filename has a comma in it ... """ optParser.set_usage(usage) # Options optParser.add_option("-d", "--dot-files", action="store_true", dest="includeDotFiles", default=False, help="Include dot files and dot directories [default is no]") optParser.add_option("-o", "--output-directory", action="store", dest="fnBase", metavar="OUTPUT_DIRECTORY", default="", help="Where to write the output files. Default is the current directory.") (options, args) = optParser.parse_args() if len(args) != 1: # Usage optParser.error("Directory missing or too many arguments") sys.exit() uniqueDirs = [] outputList = [] # Grab the name of the directory to use as part of the output file names dirToWalk = args[0] dirToWalkLength = len(dirToWalk) dirName = "" dirBase = "" dirBaseLength = 0 # Find the last occurance of the forward slash lastFSlashIndex = dirToWalk.rfind("/") if lastFSlashIndex == len(dirToWalk) - 1: # User has put the slash as the last character - remove it dirToWalk = dirToWalk[:-1] dirToWalkLength = len(dirToWalk) lastFSlashIndex = dirToWalk.rfind("/") if lastFSlashIndex > -1: # Found a slash dirName = dirToWalk[lastFSlashIndex + 1 :] dirBase = dirToWalk[: lastFSlashIndex + 1] dirBaseLength = len(dirBase) else: dirName = dirToWalk # The directory to which the output files will be written fnBase = options.fnBase # Make sure that the output file name base exists and ends with a # slash, if it is set if len(fnBase) > 0: # Test to see if it actually exists if os.path.exists(fnBase): # Does it end with a slash? if fnBase.rfind("/") != len(fnBase) - 1: # Add a slash fnBase += "/" else: # Directory doesn't exist print("Output directory " + fnBase + " doesn't exist - aborting...") sys.exit(0) def writeData(pathName, fileName): "Do nothing yet" print pathName + "," + fileName + "\n" def walkDir(dirPath): "Walks a particular directory, writeData if it's a file, recursive call if it's a directory" if os.path.exists(dirPath): print "Entering " + dirPath + "..." uniqueDirs.append([dirPath[dirBaseLength : ]]) fileList = os.listdir(dirPath) for fn in fileList: if options.includeDotFiles or fn.find(".") != 0: # If includeDotFiles is false, check that the file name doesn't start with . fullFileName = dirPath + "/" + fn if os.path.isdir(fullFileName): # This could cause an infinite loop problem if symlinked to # another directory if not os.path.islink(fullFileName): walkDir(fullFileName) elif os.path.isfile(fullFileName): outputList.append([dirPath[dirBaseLength : ], fn]) # Main program # Grab whatever's on the command line if os.path.isabs(dirToWalk): if os.path.exists(dirToWalk): walkDir(dirToWalk) else: print "Directory " + args[0] + " doesn't exist - aborting..." sys.exit() else: print "Directory " + args[0] + " is not absolute - aborting..." sys.exit() # Print out the files from the two lists we got dirListFile = open(fnBase + dirName + "-uniqueDirs.csv", "w") dirListWriter = csv.writer(dirListFile, dialect='excel') for dn in uniqueDirs: dirListWriter.writerow(dn) dirListFile.close() # Files found fileListFile = open(fnBase + dirName + "-files.csv", "w") fileWriter = csv.writer(fileListFile, dialect='excel') for fn in outputList: fileWriter.writerow(fn) fileListFile.close()