CSE80 -- Lecture 4, Apr 25 -- Intro to HTML Language

HTML stands for HyperText Markup Language. It is used for creating hypertext on the World Wild Web. For example, The class notes you are reading right now are written in HTML. Your browser, e.g., Mosaic or netscape, retrieves the HTML document and display according the embedded HTML mark-ups. HTML is an application of Standard Generalized Markup Language (SGML). HTML is a language used to mark the structural or functional parts of a document -- parts such as paragraphs, lists, heading, block quotations, etc. It is intended as a semantic markup language, unlike PostScript, which is a page layout language and permits a complete specification of what text or figure goes where on the page. The idea of having semantic markup is that document processors may automatically go through documents and process them, e.g., extract headers to create tables of contents or tables of figures, layout the document for European A4 paper rather than US 8 1/2 by 11 paper, completely change the look of a document by changing the style in which each structural element is rendered, etc.

The appearance of HTML document is determined by the browser, and at least theoretically the browser may be configured to render the document in arbitrarily different styles based on the document structure as determined from the mark-up, e.g, various justification methods for headers, the fonts used for headers, the look of page headers and footers, etc. (Most browsers are not as configurable as might be.)

An HTML document consists of text and tags used to convey the data of a document and to mark its structure. For example, here is a sample HTML document:

<title>An Example HTML Document</title>
This is an example HTML document.

This is an example link <a href="URL">

This is an example in-line picture: <img src="URL">

The HTML tags are case insensitive.

Mark Up Elements

The < and > symbols (less-than and greater-than symbols, a.k.a. angle brackets) in an HTML document are used to separate the HTML tags from the document text -- all HTML tags are enclosed by the angle brackets. These elements identify the document's structure. In the above example, the title of the document "An Example HTML Document," is identified using the Title element, which is delimited by the start tag of <title>; and the end tag, </title>.

Elements such as the Paragraph element, <P>;, can be delimited by just a start tag, but may be delimited by an optional end tag, </P>. (It is not semantically well defined to have text after an ending </p> tag and before the next starting <p> tag; most browsers try to be forgiving and just treat that text as another paragraph.)

The HTML language provide the anchor tag for linking information together. The anchor tag is how hyperlinks are set up from one document to the next and is critical to the function of the Web. Its syntax is

<A HREF="URL"> some_text <A>
It starts with A, followed by HREF="URL", where A stands for Anchor. HREF stands for Hypertext Reference. The URL is the URL or Universal(Uniform) Resource Locator, which is a pointer to a another document. The final <A> marks the end of the anchor. For example, http://www.ucsd.edu/ is a valid URL, and it points to UCSD home page. If the HTML was <a href="http://www.ucsd.edu">UCSD Home<A>, then UCSD Home is the anchor for the hyperlink, and in a graphical browser, by simply clicking on UCSD Home the browser will fetch and display the contents stored at specified URL, in this case http://www.ucsd.edu/.

In additional to simply linking documents together, HTML also provides the ability to display graphics in-line within a document. This is

Starting with IMG, followed by SRC (source). SRC points to a URL. The URL can also be a file on the local disk, in which case you can simply use the directory path of the graphic file as the URL, prefixed by file://.

Objects referred to by URLs do not have to be HTML documents. They may be other types of data, such as audio or video data. Browsers may be configured to display this data directly or, more often, to invoke an external "helper" program to display it. Which help program to use is determined by the type of the data -- when an object is transferred using the Hypertext Transfer Protocol (HTTP), the server sends object type as part of the transfer protocol; when another protcol is used, the browsers typically determine the object type by the use of standard suffixes (e.g., .txt, .ps, .jpg, etc) that occur as part of the object path.

More on Universal Resource Locators

The basis for referring to resources on the Web is the Uniform Resource Locator, or URL. A URL consists of a string of characters that uniquely identifies a resource. The basic syntax is as follows:
where the square brackets denote optional parts of the URL syntax.
The protocols to use retrieve or send information, such as FTP, HTTP (HyperText Transfer Protocol), NNTP, Gopher, Telnet, and others.
The name of the computer on which the resource resides. It is optional, if the source is on the same computer as current HTML document.
The TCP port at which the remote server is listening for incoming service requests.
The location of a resource on the computer host.

Some example URLs

The URL found in our dilbert script
click to telnet to sdcc8
An HTML Primer Most browsers will show you the URLs in a status area (for Netscape, it is near the bottom of the window) as your mouse moves into a hyperlink so you can tell to which URL your browser will be sent if you activated the hyperlink.

For more information on HTML language and how to create your own site, you can look at various on-line references. Look at Yahoo for pointers for various Web authoring advice/pointers/tools.

Creating your own web page

If you are interested in create your own HTML document, Type "help web" at the shell prompt, and choose option 5 and 6 to learn how to create your own WWW Home Page. Since these home pages reside on the school computer, people all over the world will be able to access them 24 hours a day. The address(URL) to your home page (after you created them) is:
where cs80xx is your login.

back forward

[ CSE 80 | ACS home | CSE home | CSE calendar | bsy's home page ]
picture of bsy

bsy@cse.ucsd.edu, last updated Thu May 23 13:04:10 PDT 1996.

email bsy