CSE 120: Homework #4 Solutions

Spring 2023

  1. On a Unix-style file system, how many disk read operations are required to open and read the first block of the file "/usr/include/X11/Xlib.h"? Assume (1) that the superblock is in memory, but nothing else; (2) that all directories and inodes are one block in size; (3) that once a block is read, it is cached in memory (and does not need to be read again); (4) inode timestamps (like last access) do not need to be updated (the inode does not need to be written after access).

    Read inode for "/"
    Read data block for "/"
    Read inode for "usr"
    Read data block for "usr"
    Read inode for "include"
    Read data block for "include"
    Read inode for "X11"
    Read data block for "X11"
    Read inode for "Xlib.h"
    Read data block for "Xlib.h"
    
    10 reads
    

  2. Consider a UNIX-style inode with 10 direct pointers, one single- indirect pointer, and one double-indirect pointer only. Assume that the block size is 4KB (including indirect blocks), and that the size of a pointer is 4 bytes. How large a file can be indexed using such an inode?

    We have 10 direct pointers that each point directly to a 4K data block, so we can point to 10 * 4K = 40K with the direct pointers.

    We have one single-indirect pointer that points to a 4K disk block (an "index block") which contains nothing but pointers to data blocks. Since each disk block pointer is 4 bytes, the index block can hold 4K / 4 = 1K pointers to data blocks. So, the single-indirect pointer indirectly points to 1K * 4K = 4M (210 * 2 12 = 222 bytes).

    We have one double-indirect pointer that points to an index block that contains 1K pointers to more index blocks. These leaf index blocks each contain 1K pointers to data blocks. So, the double-indirect pointer indirectly points to 1K * 1K * 4K = 4G (210 * 210 * 212 = 232 bytes).

    Adding everything up, the maximum file size is 40K + 4M + 4G.

  3. The original Berkeley Fast File System increased the Unix file system block size from 512 bytes to 4096 bytes. Concerned about internal fragmentation, the FFS also introduced the ability to end a file with a small fragment. A disk block could be broken up into small fixed-size fragments, each of which could be used to store the ends of different files. For instance, a file of size 5000 bytes would need two blocks (8192 bytes) to store on disk, resulting in 3192 bytes lost to internal fragmentation. With 1024-byte fragments, though, the file could be stored with one full-sized block (4096 bytes) and one 1024-byte fragment, requiring 5120 bytes on disk to store the file and reducing internal fragmentation to just 120 bytes. The tradeoff is that managing fragments increases the complexity of the file system implementation.

    1. My laptop has 220 files on it. Assume the disk block size is 4KB and the average amount of internal fragmentation is 2KB per file. How much storage is wasted due to internal fragmentation in the file system on my laptop?

      220 files * 2KB fragmentation per file = 220 * (2 * 210) = 230 * 2 = 231 = 2GB (any notation is fine)

    2. Assume that, with fragments, the average amount of internal fragmentation goes down to 256 bytes per file. How much storage is wasted due to internal fragmentation when using fragments?

      220 files * 256 bytes of fragmentation per file = 220 * (28) = 228 = 256MB (any notation is fine)

    3. Assume that you would receive the same benefits for your laptop. Would you want to use fragments to save space? Why or why not?

      There are many possible answers, all depending on the reasons for answering "yes" or "no". For example, your laptop disk may always be nearly filled to capacity, so an additional 1.75GB of extra disk space would be very useful. Alternatively, your laptop may have a large disk and you may never reach capacity, so an additional 1.75GB of disk space does not affect your use. The question is primarily about providing a reason one way or the other.

  4. Consider a file archival system, like the programs zip or tar. Such systems copy files into a backup file and restore files from the backup. For example, from the zip documentation:
    The zip program puts one or more compressed files into a single zip archive, along with information about the files (name, path, date, time of last modification, protection, and check information to verify file integrity).

    When a file is restored, it is given the same name, time of last modification, protection, and so on. If desired, it can even be put into the same directory in which it was originally located.

    Can zip restore the file into the same inode as well? Briefly explain your answer.

    In general, the archive program cannot guarantee that the file will be restored to the same inode.

    User programs like zip and tar use the logical file system (filenames and directories). inodes are part of the physical file system (superblock and inodes). The logical file system is built on top of the physical file system, and, in general, the kernel only provides a syscall interface for the logical file system.

    So if an archive program wanted to guarantee that a file is restored to its original inode, it would have to write directly to the hard disk (bypassing the file system entirely), since there is no syscall interface to write to a particular inode. This means that the archive program would need root access, and it would need to understand the layout of the physical file system on the hard disk.

    In particular, writing a file to its original inode becomes very difficult if the original inode is in use by some other file f. The archive program would have to relocate f to another inode, which is hard because the archive program would have to search the disk for all references to f's old inode, and make them all point to f's new inode. This is extremely difficult if the file system is mounted and there are other running processes that use the file system.

  5. [Silberschatz] Consider a system that supports 5000 users. Suppose that you want to allow 4990 of these users to be able to access one file.
    1. How would you specify this protection scheme in Unix?

      The Unix file protection system allows users to specify protection at the granularity of a single user (owner), a group of users (group), and all users (other). To specify this policy on Unix, you would create a group with the 4990 users who are able to access the file and give the file the appropriate group permissions (e.g., read and write).

    2. Could you suggest another protection scheme that can be used more effectively for this purpose than the scheme provided by Unix?

      If we are not constrained by what Unix provides, more compact approaches would involve some way to specify who does not have access. One approach would be to extend the Unix group model to be able to list users who are not in the group. With this mechanism, you would create a group as before, but list the 10 users who should not get access. The file would then get the same group permissions, but the group would be interpreted differently.

      Another approach would be to be able to specify negative permissions on a file. In this approach, you would create a group with the 10 users who should not access a file, and then assign group permissions on the file as negative permissions (e.g., users in this should group should not be able to access the file).

      A third approach would be to have very flexible access control lists where you can associate an arbitrary number of access control entries (not just "owner", "group", "other"). In such a system, you could create access control entries for each of the 10 users specifying that they did not have access to the file, and then have a wildcard entry specifying that every other user (who does not specifically match the first 10) can access the file. (For those who have ever created a .htaccess file for Apache, you likely have done something like this with a "deny from all" or "accept from all".)

      Note that this is technically a violation of the principle of "permission, not exclusion". Interestingly, it is still a useful method for addressing this situation because it makes it much easier to express protection intent — in this case, the "ease of use" principle outweighs the other.

  6. [Silberschatz] How does a file cache help improve performance? Why do systems not use much larger caches if they are so useful?

    File caches improve performance by converting what would be disk I/O operations (read, write) into memory operations (reading from memory, writing into memory). For those operations that can be served from the cache (e.g., one user accessing "/usr/include/stdio.h" and then another user accessing it soon after), they operate at the speed of memory rather than the speed of disk. Since disk I/O is much slower than memory, reducing the number of disk I/Os substantially improves performance.

    But there is always a tradeoff. In this case, the file buffer cache uses physical memory to store the file data. As a result, it reduces the amount of physical memory that the OS can use for mapping virtual pages. As more physical memory is used for the file buffer cache, less is used to store virtual pages and processes may start to page fault more. At some point, growing the file buffer cache will result in more disk I/O (for handling page faults) than the file buffer cache saves in caching file data. This situation reduces performance overall, which we want to avoid.

  7. Consider a program that executes a loop that issues a read I/O to a storage device and waits I milliseconds for the I/O to complete, and then computes on the data returned for X milliseconds, and then repeats.

    For various values of I and X, compute the percentage of time that the program spends waiting for I/O and fill in the following table. If I and X are both 1 ms, for example, then the program spends 50% of its time waiting for I/O.

    X
    100 ms 10 ms 1 ms 0.1 ms
    25 ms (cloud storage)   20.0% 71.4% 96.2% 99.6%
    5 ms (HDD) 4.76% 33.3% 83.3% 98.0%
    I 0.1 ms (SSD) 0.0999% 0.990% 9.09% 50.0%
    0.0005 ms (NVM) 0.00050% 0.0050% 0.0500% 0.498%
    0.0001 ms (RAM) 0.0001% 0.0010% 0.0100% 0.0999%