CSE134A LECTURE NOTES

May 9, 2001
 
 

ANNOUNCEMENTS

The midterm is next Wednesday, May 16.
 
 

WEB SITE ARCHITECTURE

Today's lecture, with part of last Monday's, is based on A Blueprint for Building Web Sites Using the Microsoft Windows DNA Platform, Draft Version .9,  Microsoft Corporation, January 2000.

On Monday we talked about scalability, manageability, and availability.

Security is an important issue also, with two aspects: protect servers from attacks and also data from theft.  Security is based on multiple separate security zones.  Each zone is protected by a firewall, i.e. network packet filter.  Typically at least three zones: the public Internet, a public-facing DMZ, and a private zone with sensitive data.

The overall design we discussed consists of loosely-connected tiers of replicated, task-focused servers.  Application complexity is managed by specialization: different servers perform different functions.

Manageability and scalability are often achieved by outsourcing, i.e. remote hosting.  A specialized company installs the servers near a main Internet access point.
 
 

FRONT-END SERVERS

Front-end servers have no long-term state; these can be cloned.  Each may have its own copy of content, i.e. HTML, PHP, etc.

Load-balancing software/hardware spreads requests across multiple front-end servers, and includes failure detection.  Several different load-balancing techniques exist.

Session management stores state information in clients and a backend server, not in the frontend servers.  Client state can easily be distributed across multiple state servers.

This requires a data-dependent routing layer, which maps logical data onto a physical partition.  This software runs on each frontend server.  Load-balancing is stateful but not adaptive.

Availability/security issues: prevent a script failure from crashing the web server, prevent failure on one server from being repeated on identical servers.

Usability issue: provide limited functionality even if some backends are unavailable, e.g . send mail even if can't read; see catalog en=ven if can't place order.

SSL sessions are segrated from regular HTTP sessions.  SSL servers have hardware for encryption.

All frontends
 
 

BACK-END SERVERS

Persistent content is divided across multiple back-end servers.  Fault tolerance is harder for servers that must maintain state.  Failover clustering assumes that different servers can access the same or replicated disk drives.  A group of servers that share storage is called a partition.

Low-end, static sites store content in a file system.  Higher-end, dynamic sites use a relational database.

The most complex sites use other applications, encapsulated as objects, e.g. Enterprise Java beans.  Other applications include legacy databases, existing enterprise software e.g. for manufacturing planning, external ad servers.

Allocating data to partitions is difficult.  The objective is to avoid hot spots.  We need tools to split and merge partitions.

A large multiprocessor system can replace multiple partitions, but is usually more expensive.

Remote replication can be online or offline..  Staging content.  Db replication plus log shipping.
 
 

SECURITY DOMAINS

Objectives: Security domains are regions with restricted and monitored communication.  Domains may be geographical, organizational, by server type, by data type.  Domains may be nested but preferably not overlapping.

Firewalls inspect every packet coming into (or out of) a domain.  A packet filter looks at IP addresses and port numbers.

Management involves consoles, monitors, and agents (controllers).  Preferably, management is done with a physically separate network, so each host has two network interface cards (at least).  Logging can be a heavy network load.
 
 

SITE TOPOLOGY

Each domain has its own network: The management network overlays the other networks.  Each domain may have an internal management network also.

Clients benefit from multiple ISPs through a feature of standard Internet Domain Name Servers (DNS): round-robin behavior.  If an IP address does not respond, the client just has to hit "reload."

All frontends at one ISP respond to the same IP address, which is handled by a load-balancer.

Each frontend in the DMZ has an OS hardened for security.  Firewalls separate it from the Internet and from the internal network.
 
 



Copyright (c) by Charles Elkan, 2001.