CSE 124 Linux Directories and File Paths
2017 October 31: Linux Directories and File Paths

Introduction

We’ve gotten a number of questions that are concerned with how directories and file paths work in Linux systems. This short writeup should hopefully clarify how file paths work in Linux, which will help understand how to work with file system calls for the homeworks and projects.

Unix file system

Linux uses the Unix file system, which is largely unchanged from its original inception in 1979. A Unix file system contains two primary components: inode blocks and data blocks. A superblock inode at the start of the file system contains specific filesystem options, overall state information, and serves as the the root directory of the filesystem.

An inode block represents “entities” in the file system. For Unix file systems, there’s only two entities: directories, and files.

File inodes contain metadata about the file, including the file’s name, permissions, owner, and size. Most importantly, it contains an array of pointers to data blocks that contain the file’s actual data.

Directory inodes contain metadata about the directory, including the directory’s name, permissions, and owner (but not size, since directories don’t contain data). It too contains an array of pointers, but the pointers are to the directory and files inodes inside of this directory. When you run the ls command on a directory, the returned list are the files and directory that the queried directory points to.

The directory tree

If you think about the data structure of inodes, and how they point to one another, you might realize that it’s just a more complicated version of an extremely basic data structure- and n-way tree. The entire filesystem is often referred to as the directory tree for this very reason!

Every Unix file system can be viewed as a directory tree that starts with the root directory (stored in the superblock inode), which is denoted by /.

Above is a sample directory tree. The root directory has four subdirectories that it points to; only the contents of two of these directories are shown here.

The bin directory contains a file called echo, similar to what you’d find on a real Linux system: bin contains most of the default programs you use all the time, such as ls and ping.

The usr directory contains two subdirectories, john and mary. We see that mary has just one file and no directories, but john has both a file and a directory. That directory, hw1, contains just one file.

Absolute file paths

Every one of the files and directories shown above has an absolute file path (also referred to as the full file path), which represents the full path through the directory tree.

Absolute file paths always begin with /, denoting that we wish to start from the root directory and work our way down. Intermediate directories are also joined with a /, until we reach the destination.

For example, the hw1 directory above has the absolute path /usr/john/hw1 as we have to traverse usr and john before we can get to hw1. So, if we want to list the files in hw1, we would type the following:

ls /usr/john/hw1

Relative file paths and the current working directory

At this point John might point out to you that he can just list his hw1 directory by typing the following into his terminal:

[john@mycomputer]:~:1$ ls hw1
main.c

John tells his project partner Mary to run this command to see the code John wrote, but Mary comes back to John with the following error:

[mary@mycomputer]:~:1$ ls hw1
ls: cannot access hw1: No such file or directory

This is because John used a relative file path and John and Mary are running with different current working directories.

The current working directory (also referred to as the parent working directory in Unix) is a saved location in the directory tree that the terminal is working on. A relative file path is any path that does not begin with a /.

When a system call or program is given a relative file path, Unix joins the current working directory with the relative file path to create an absolute path. You can retrieve the the current working directory of your terminal with the following command:

[john@mycomputer]:~:2$ pwd
/usr/john

John’s current working directory is /usr/john, so when he types ls hw1, the absolute path used is /usr/john/hw1, which exists. Let’s see what Mary’s current working directory is:

[mary@mycomputer]:~:2$ pwd
/usr/mary

So when Mary typed ls hw1, the absolute path is /usr/mary/hw1, which doesn’t exist.

You change the current working directory of your terminal with the cd command. As with any other command, you can give a relative or absolute file path.

Special characters

There’s several common special characters used in the terminal that make things easier. These characters are evaluated when the absolute file path is created, and can be used in both absolute and relative file paths.

  • ..: Deletes the previous directory on the file path. Usually this is used with cd to move up one directory: if Mary typed cd .. then the path would be /usr/mary/.. which when evaluated would be /usr.
  • .: Evaluates to nothing. If Mary typed ls /usr/john/./hw1, this would evaluate to just /usr/john/hw1. This is commonly used when running executable scripts or files in the current working directory: typing my_script.sh even if that script is in your current working directory will give an error, but typing ./my_script.sh won’t.
  • ~: Evaluates to your home directory. Every user has a home directory unique to them, and all terminals start their current working directory at your home directory: this is why John and Mary had different working directories, despite just starting a new terminal!
  • $VAR: Evaluates to the currently set value of VAR in the environment. These are referred to as environment variables, which are essentially global variables for your terminal. A common one used for the course is $PUBLIC, which is set to the absolute path of the public directory for the course on ieng6. You can set environment variables with the export command and remove them with the unset command.

Relative paths as program arguments

Up until now we’ve been assuming that a relative file path is turned into an absolute file path on the command line, so the program has the absolute path whenever it runs. This actually isn’t true: relative file paths are given verbatim to the program as a command line argument. After all, the OS doesn’t know what your program is going to do with the argument!

Programs inherit the current working directory of the terminal they are run from.

This means if you give a relative file path to a program, any system calls that use it will be joined with the current working directory of the terminal. This is independent of the path used to execute the program: for example, if John compiled main.c and ran ./main from /usr/john/hw1, then the current working directory of the program would be /usr/john/hw1. But if John’s current working directory is /usr/john and he runs ./hw1/main, then the program’s working directory will be /usr/john.

There are ways to change the working directory of a program, but usually it’s best to just be aware of how relative paths interact with your program and plan accordingly.