CSE80 -- Lecture 4, Apr 25 -- An Example of Shell Script


The following is a useful shell script for those who likes comic strips. United Media is a company that publishes comics on the Web (as well as server as an agency through which comic strip authors may sell their work to newspapers). Their Web site carries advertisements, which helps to defray the expense of running the site. The Web site is updated daily, but the comics lag the newspaper publication by one-week, so as to give people reason to actually buy the newspapers (in addition to the news). United Media includes a random number as part of the daily comic's URL to prevent people from pointing their browser (via a bookmark, say) to the comic strip directly and thus avoid reading the advertisements. To view the comic strip, one must follow their links by reading their introductory web page.

At least that's the theory.

This shell script automatically fetches the introductory page and extracting the link for the current comic, and then fetches the comic and display it with xv program. It internally uses a filter-mode Web browser, url_fetch, that I wrote some time ago. The ASCII terminal Web browser lynx may be used with the -source flag in most cases where url_fetch are used.

A few things to notice: 1. United Media always add the newest link on the top of the list. 2. every comic strip URL contains the world "archive" since they are placed in a two-week archive. If you don't know HTML or need a quick-and-dirty review, you should read the HTML overview.

#!/bin/sh

PATH=/u/disk03/cs80s/cs80s/bin:/u/disk03/cs80s/cs80s/bin.`/u/disk03/cs80s/cs80s/bin/sys`:/usr/bin:/bin:/usr/local/bin:/usr/local/GNU/bin:/usr/local/X11/bin:/usr/X11R6/bin:$PATH
export PATH

#
# United media adds a random suffix in the URL to make people's "private"
# comics pages break -- to force people to see their advertisers' messages
# in the "official" Dibert comics intro page that they provide.
#
# We don't need no steenkin' adverts.
#

dilbert=http://www.unitedmedia.com/comics/dilbert
tmpf="${TMPDIR-/usr/tmp}/dilbert.$$.html"
trap 'rm "$tmpf"; exit' 0 1 2 3 15

if url_fetch "$dilbert/" > "$tmpf"
then
	suffix=`html_url < "$tmpf" | grep archive | head -1`
	url_fetch "$dilbert/$suffix" | xv ${1+"$@"} -
fi


You can fetch the script via the browser, or just use the path ../cse80/bin/dilbert from your home directory.

${TMP-/usr/tmp}
The - in this context means if environment variable TMPDIR is not set, /usr/tmp is returned, and use the value of TMPDIR otherwise.
trap
The trap Bourne shell command tells the shell what to do when it received specified system signals. The first parameter ('rm "$tmpf"; exit' contain the commands to be executed when specified signals are received. The rest parameters are the signal numbers that will be caught -- and invoke the commands in the first argument. The first argument is read twice: once when the shell executes the trap command, and again when one of the listed signals is caught. If there is a shell or environment variable embedded in the (otherwise quoted) argument, it will be expanded in the second read. This allows it to use the value of shell/environment variables at the time the signal is caught. In the case of this script, the only variable used is tmpf, and its value does not change after initialization.

An example where it does change is the following script

$ cat sig-example
#!/bin/sh
stty all
trap 'echo "signal: $v"' 1 2
v=1
while :
do
        sleep 2
        v=expr $v + 1`
        echo "v = $v"
done
which you can run, hit your interrupt character (usually control-C) a few times, and observe the output. To terminte the script, you should use the suspend character (usually control-Z) to suspend the job and then type in kill %1 (if it is job 1 -- shells with job control print out the job number when you hit control-Z).

Signals are asynchronous notifications of special events. Every signal is associated with an unique integer pre-defined in UNIX systems to ensure uniformity throughout the system and the compatibility across the different systems. The C head file can be included in a C/C++ program by:

#include <signal.h>

A signal can originated from the kernel (e.g., SIGIO, SIGSEGV), sent from a process to itself using the _a(`_http_man`?kill(2)'',``kill system call''), sent from another process (likewise using kill(2)), or sent by the terminal driver on behalf of user (e.g., control-C). The syntax is:

$ kill -SIGNAL pid
where pid is the process ID of the process. There are many different signals predefined in UNIX systems. The following is a brief summary of some common signals.

Common Signals
SignalSymbolic NameUsage
1 SIGHUP Hang-up signal is used to tell programs that the terminal disconnected from the host. For example, you might use a modem to log on to a Unix machine from a home computer or dialup terminal. Suppose you are in the middle of reading your email using pine. If the phone line suddenly disconnects, all of your processes, including pine, will receive a SIGHUP signal from the system. The default action for most programs is to exit. However, programs like pine catches the SIGHUP signal and performs graceful cleanup, e.g., saves your mail files.
2 SIGINT Interrupt signal. This signal is generated from your terminal when you hit control-C. (Control-C is the default; it can be set to another character using _a(`_http_man`?stty'',``stty'')) An interrupt causes programs to terminate by default.
3 SIGQUIT Quit signal is similar to SIGINT. In addition to cause program to quit, it will also force the running program to do a core dump (leaving a file called core in the current directory). "core" is a file contains an image of the memory used by this program right before it aborts. It is often used by programmers, who may interrupt a program at anytime and exam the execution memory. The default key combination for generating a SIGQUIT signal is CTRL-\ on most UNIX system.
9 SIGKILL This is another commonly used signal. SIGKILL signal kills a process immediately. It cannot be captured by the target process. Target process die instant upon receiving of SIGKILL, meaning you will not be able to use trap command with this signal. You could use this signal to kill any program that will ignore all other signals. syntax: kill -9 pid .
15 SIGTERM Termination signal. This signal instructs a process to terminate itself. This is the default signal sent by kill when no signal is specified. (i.e. $ kill pid). When UNIX system is instructed to shutdown, it will first send SIGTERM signal to all the processes and wait for them to terminate. It will then send SIGKILL signal, forcefully kill any remaining processes.
0 N/A Signal 0 is not a real signal. It is not defined in signal.h mentioned above. It is a shell specific signal represent normal exiting status(program exit normally.). You can not use this signal in C/C++. It allows you to specify an action, which will be taken when the script _a(`_http_man`?exit'',``exit'') normally. For example, You can specify actions that will be taken when the program received a Interrupt Signal,Hangup Signal or it just exits normally.

Note: You may use a symbolic name in place of any signal numbers. For example:

$ kill -1 1234

is equivalent to:

$ kill -SIGHUP 1234
Where 1234 is an random process id. Note: You may use these symbolic names in C/C++ too.

  • url_fetch and html_url:

    url_fetch will fetch the specified _a(``http://www.ncsa.uiuc.edu/demoweb/html-rimer.html'',``HTML'') page or file and write the file to standard output.
    html_url is a _a(``http://www.ncsa.uiuc.edu/demoweb/html-rimer.html'',``HTML'') code parser that reads HTML code from standard input (stdin), and filter out the URLs (Universal Resource Locators) embedded in HTML and print them out line by line to the standard output(stdout).

  • _Concat(`suffix=',`_backquote()')html_url < "$tmpf" | grep archive | head _Concat(``-l'',`_backquote()')

    This line of code is a little complicated. It creates a new variable suffix and initialize it to the output of the commands in the backquotes. Every thing on the right side of "=" is wrapped with _backquote _backquote (back quotes). The shell will replace the quoted string by the output of the commands inside the quotes.

    Let's take a look at whats inside of the back quotes:

    • "$tmpf" (which contains the name of the temporary file created by url_fetch) will be expanded first. The file is then opened for reading, and the standard input of html_url is made to refer to it. html_url parses its input and send all URLs found to the next program in the pipeline, which is grep.
    • grep archive will then exam the output of the html_url line by line, and prints out only the lines that contains the word "archive".
    • head -1 reads its standard input and outputs only the first line. Its input coming from the grep, this means that only the first URL containing the word archive will be printed. This output is stored into shell variable suffix.
  • xv ${1+"$@"} -

    xv is a X-windows program that can display variety files (pictures and text file). "-", at the end instructs xv to read the input from stdin. In this case, stdin is the piped output of url_fetch. As we have mentioned in the previous class, $1 represent the first command line argument of the shell script. "+" here (in analog to "-" above) means, if $1 has a value, use "$@" instead; if $1 does not exist, the expression is removed.

    The notation $@ expands into the list of all arguments to the shell script. There is a crucial distinction, however, when the $@ is quoted:

    VariableEquivalent NotationEffect
    "$*""$1 $2 $3 ... $N" one argument
    "$@""$1" "$2" "$3" ... "$N" many arguments
    where N is the number of arguments to the script. Note the different between "$*" and "$@", both refer to all the arguments (flags and other options) specified at command line. However, when they are quoted, "$*" expands to a single argument, whereas "$@" is a series of arguments. Thus, if one want to pass arguments from one program to another, "$@" should be used.

    ${1+"$@"} simply pass all arguments given at the command line to program xv. This is useful for passing options to xv program.

    back forward


    [ CSE 80 | ACS home | CSE home | CSE calendar | bsy's home page ]
    picture of bsy

    bsy@cse.ucsd.edu, last updated Thu May 23 13:04:13 PDT 1996.

    email bsy