Os Path Walk



Join joins any number of path elements into a single path, separating them with an OS specific Separator. Empty elements are ignored. The result is Cleaned. However, if the argument list is empty or all its elements are empty, Join returns an empty string. On Windows, the result will only be a UNC path if the first non-empty element is a UNC path. The basic idea here is that folder will be an absolute normalized path if the argument to os.walk is one and os.path.join will produce an absolute normalized path if any of the arguments is an absolute path and all the following arguments are normalized.

Have a mess of files to read into Python? Maybe you downloaded Kaiko trade data, with unpredictable sub-directories and file names, from Penn+Box. Or maybe you’ve dropped TXT, PDF, and PY files into a single working directory that you’d rather not reorganize. A simple script will find the files you need, listing their names and paths for easy processing.

Walk

Because this process involves exploring our operating system’s file structure, we begin by importing the os module into our Python environment:

This module, on top of a standard Python installation, should address any dependencies in our upcoming file-listing code.

Walk

Let’s define our file-listing function. We can name it, unimaginatively, list_files and give it two arguments, filepath and filetype:

filepath will tell the function where to start looking for files. This argument will take a file path string in your operating system’s format. (Be sure to encode or escape characters as appropriate.) When the function runs, it will assume this base directory contains all of the files and/or subfolders we need it to check.

filetype will tell the function what kind of file to find. This argument will take a file extension in string format (e.g.: '.csv' or '.TXT').

Within our function, we’ll need to store any relevant file paths our script finds. Let’s create an empty list for this purpose:

Python Walk Directory Tree

Practically speaking, our function will find each file within filepath, check whether its file extension matches a given filetype, and add relevant results to paths. We begin this iterative process with a for loop to find and examine each file:

In this configuration, os.walk() finds each file and path in filepath and generates a 3-tuple (a type of 3-item list) with components we will refer to as root, dirs, and files.

Because files lists all file names within a path, our function will iterate through each individual file name. Iterating again involves another for loop:

Within the file-level loop, our function can examine various aspects of each file. You may want to customize this section if your application has other requirements. For now, we’ll focus on checking files for a matching file extension.

Because comparing strings is case-sensitive while file extensions are not, we use the lower() method to convert both file and filetype to lower-case strings (file.lower() and filetype.lower(), respectively). This avoids confusion due to mismatched capitalization.

In turn, the endswith() method will compare the end of our lower case file (where the file extension lives) to the lower case filetype, returning True for a match or False otherwise.

Python Os Path Walk

Python walk directory tree

We include our Boolean (True/False) result in an if statement so that only a matching file type (True outcome) triggers the next stage of our function.

Python Os Walk

If the file’s extension matches, we want to add file and its location to paths, our list of relevant file paths. os.path.join() will combine the root file path and file name to construct a complete address our operating system can reference. The append() method will add this complete file address to our list of paths:

Our sets of loops will iterate through our folders and files, dutifully developing our paths list. In order to make this list available outside of our function, we need one final line:

Altogether, our code should read as follows:

Os Path Walk Python

Calling the list_files function—after you’ve run the above—and saving the resulting file locations list as an object might look something like this:

Os.path.walk Deprecated

Now that your code can find files it needs, you can focus on merging data, analyzing text, or conducting whatever research you imagine.