On Monday we talked about scalability, manageability, and availability.
Security is an important issue also, with two aspects: protect servers from attacks and also data from theft. Security is based on multiple separate security zones. Each zone is protected by a firewall, i.e. network packet filter. Typically at least three zones: the public Internet, a public-facing DMZ, and a private zone with sensitive data.
The overall design we discussed consists of loosely-connected tiers of replicated, task-focused servers. Application complexity is managed by specialization: different servers perform different functions.
Manageability and scalability are often achieved by outsourcing, i.e.
remote hosting. A specialized company installs the servers near a
main Internet access point.
Load-balancing software/hardware spreads requests across multiple front-end servers, and includes failure detection. Several different load-balancing techniques exist.
Session management stores state information in clients and a backend server, not in the frontend servers. Client state can easily be distributed across multiple state servers.
This requires a data-dependent routing layer, which maps logical data onto a physical partition. This software runs on each frontend server. Load-balancing is stateful but not adaptive.
Availability/security issues: prevent a script failure from crashing the web server, prevent failure on one server from being repeated on identical servers.
Usability issue: provide limited functionality even if some backends are unavailable, e.g . send mail even if can't read; see catalog en=ven if can't place order.
SSL sessions are segrated from regular HTTP sessions. SSL servers have hardware for encryption.
All frontends
Low-end, static sites store content in a file system. Higher-end, dynamic sites use a relational database.
The most complex sites use other applications, encapsulated as objects, e.g. Enterprise Java beans. Other applications include legacy databases, existing enterprise software e.g. for manufacturing planning, external ad servers.
Allocating data to partitions is difficult. The objective is to avoid hot spots. We need tools to split and merge partitions.
A large multiprocessor system can replace multiple partitions, but is usually more expensive.
Remote replication can be online or offline.. Staging content.
Db replication plus log shipping.
Firewalls inspect every packet coming into (or out of) a domain. A packet filter looks at IP addresses and port numbers.
Management involves consoles, monitors, and agents (controllers).
Preferably, management is done with a physically separate network, so each
host has two network interface cards (at least). Logging can be a
heavy network load.
Clients benefit from multiple ISPs through a feature of standard Internet Domain Name Servers (DNS): round-robin behavior. If an IP address does not respond, the client just has to hit "reload."
All frontends at one ISP respond to the same IP address, which is handled by a load-balancer.
Each frontend in the DMZ has an OS hardened for security. Firewalls
separate it from the Internet and from the internal network.