CSE80 -- Lecture 4, Apr 25 -- Buffered I/O


Buffered I/O Libraries in C/C++

I/O access in C and C++ is normally through the standard library function:
For C: #include <stdio.h>
For C++: #include <iostream.h>

I'll discuss only the stdio library for C programs, but the comments about buffering apply to the iostream package for C++ as well.

All C (or C++) programs that use stdio get, as part of including stdio.h, the declarations

FILE *stdin, *stdout, *stderr;

stdin, stdout, stderr are buffered I/O streams. They are predefined in the C/C++ library and does I/O to the descriptors 0, 1, and 2 respectively. Each of them is a pointer to a structure called FILE. The FILE structure contains buffer memory and the I/O descriptor associated with the stream -- so when you printf to the stdout stream, you are writing into the stdout buffer. The contents of the buffer is written out to the stdout descriptor (1) when the it fills. Note: be careful about the distinction among stdout as a stdio stream and as an I/O descriptor -- when the name stdout is used in the context of stdio, it is referring to this buffer, and when the name is used in the context of operating system level I/O descriptors or system calls, it is referring the descriptor 1. Output done with the stdio library via the stdout structure will use the write system call to the stdout descriptor, but the two are separate ideas.

getchar() and getc() are two I/O routines predefined in standard C/C++ Library. Both are used to read in a single character. The prototypes are:

int getchar();
int getc(FILE *stream);
getchar() reads a character from stdin, while getc() reads a character from the I/O stream specified by stream. If stdin is used as the argument of getc(), getc() will act exactly same as getchar() -- read a character from standard input (stdin).
"i = getchar()"  is equivalent to "i = getc(stdin)"

Buffered I/O

I/O typically is buffered for efficiency. The reason why is that making system calls read and write on a per-character basis is expensive (system calls involve a lot of machinery when crossing the user-program to operating system kernel boundary). So, instead of doing character-at-a-time I/O, the system is designed to do I/O many characters at once. Standard libraries such as stdio provide the abstraction of reading or writing one character at a time: it reads from the kernel many characters at once and stores them in a buffer, and the user of the library reads from that buffer one character at a time. When the buffer is exhausted, new data is read in in a big chunk. Similarly, the library user can write into a stdio-managed buffer one character at a time, with the entire buffer written only when the buffer becomes full. (This strategy is not unique to Unix; all operating systems include mechanisms to buffer I/O.)

When we are debugging or examining a program, it is often important to know if the correct data is being written as commands/functions are being executed. However, confusion often arises due to the buffered nature of I/O. As mentioned above, FILE * is a structure contains a data buffer. I/O streams (stdout) keep data in the output buffer before they are written out to the descriptors). This timing difference is normally insignificant, but if you are debugging a program, sometimes you don't see output exactly when you would otherwise expect it to appear. (The code that you expected to run is executed, but nothing shows up -- because the output data are held in the buffer inside the I/O stream.)

  • When a program is started under UNIX, three I/O descriptors (stdin, stdout, stderr) are available by convention. These have the values 0, 1, and 2 respectively.
  • When the stdio library initializes, it determines to what kind of object does stdout refer. If stdout is a tty devices (i.e. terminals), it is line-buffered. This means the output of a program is written out one line at a time -- the stdio code flushes the output buffer whenever it encounters a newline character (a.k.a. linefeed). If stdout is a pipe or a file, stdio will make stdout fully buffered -- the library will not automatically write the data out until the buffer is full and it has to make room for further output.
  • For many standard I/O implementations, whenever a program perform a read from stdin, stdout's output buffer will be automatically flushed. This means that the input prompt of the program will be displayed before the program reads from input. (User sees the prompt first, then inputs data.)
  • stderr is normally not buffered at all, any error message will be write to the terminal instantly.
  • One way to see I/O buffering in action is to do the following experiment using the faucet and drip programs in my bin (/a/sdcc8/u/disk03/cs80s/cs80s/bin). The faucet program acts as an output server using networking primitives. It does nothing until a drip program tells it to print something. The faucet program typically flushes its output all the time (calls fflush(stdout);), but if you give it the -B flag it will leave its standard output alone.

    To do this experiment, open up two shell windows. (xterm& from one window will create another one.) Then type in one window:

    % faucet -B
    
    and type in the other window:
    % drip a line of text
    % drip next line
    % drip -x
    
    and observe what happens (the -x flag is used by the drip program to tell faucet to exit). Next, fire up faucet thus:
    % faucet -B | cat
    
    This means faucet's standard output refers to a pipe rather than a terminal device. Repeat the sequence of drip commands and observe the different behavior.

    back forward


    [ CSE 80 | ACS home | CSE home | CSE calendar | bsy's home page ]
    picture of bsy

    bsy@cse.ucsd.edu, last updated Thu May 23 13:04:11 PDT 1996.

    email bsy