Return to lecture notes index
Lecture 1: Administrivia and Introduction

Procedural Stuff

Welcome to CSE 291 Distributed Systems!

The procedural stuff which can be found on-line. If you didn't make it to class, you want to be particularly careful to get a copy of the syllabus from this Web site and give it a read.

The Area We Call "Systems" -- And the Funny Creatures We Call "Systems People"

When I'm hanging out in the "Real world", people often ask me about my job. I usually explain that I am a teacher. Everyone understands what a teacher does. We talk for a living. Beyond that, I'm safe. Everyone knows, "Those who can, do. Those who can't, teach."

When people ask me what I teach, I tell them, "Computer Science". Oddly enough, they only hear the first word, "Computer". Sorry, ya'll, I don't do windows. You'll need IT for that. This brings me to two questions, "What is the area we call, Comptuer Systems?" and, "How does Distributed Systems fit in?"

When I explain my area of interest to every day folks, I like to tell them that in "Systems" we view the computing landscape as if it were the air traffic system or the system of highways and roadways. There is a bunch of work that needs to get done, a bunch of resources that need to be used to get it done, and a whole lot of management to make it work.

And, like aur and auto traffic, computer systems is most interesting when it scales to reach scarcity and when bad things happen. We care about how our roadways and airways perform during rush hour, in the rain, when there is a big game, and, by the way, bad things happen to otherwise good drivers along the way. In otherwords, our problem space is characterized by scarcity, failure, interaction, and scale.

Distributed Systems, In Particular

"Systems people" come in all shapes and sizes. They are interested in such problems as operating systems, embedded systems, networks, databases, and distributed systems. This quarter, we are focusing mostly on "Distributed systems", though we'll touch on some areas of networks, and monolithic databases and operating systems.

Distributed systems occur when the execution of user work involved managing state which is connected somewhat weakly. In other words, distributed systems generally involve organizing resources connected via a network that has more latency, less bandwidth, and/or a higher error rate than can be safely ignored.

This is a different class of problems, for example, than when the limiting factors might include processing, storage, memory, or other units of work. There is tremendous complexity in scheduling process to make efficient use of scarce processors, managing virtual memory, or processing information from large attached data stores, as might occur in monolithic operating systems or databases. It is also a different class of problems than managing the fabric, itself, as is the case with networks.

Exploring the Model

When I've taught Operating Systems, I've begun with a picture that looks like the one below. If you didn't take OS, please don't worry -- everything on the picture, almost, should be familiar to you. It contains the insides of a computer: memory and memory controllers, storage devices and their controllers, processors, and the bus that ties them all together.

This time however, the bus isn't magical. It isn't a fast, reliable, predictable communication channel called that always works and maintains a low latency and high bandwidth. Instead, it is a simple, cheap, far-reaching commodity network that may become slow and bogged down and/or lose things outright. It might become partitions. And, it might not deliver messages in the same order that they were sent.

To reinforce the idea that this is a commodity network, like the Internet, I added a few smartphones to the picture this time. Remember, the network isn't necessarily wired -- and all of the components aren't necessarily of the same type.

Furthermore, there is no global clock or hardware support for synchronization. And, to make things worse, thr processors aren't necessarily reliable, and nor is the RAM or anything else. For those that are familiar with them, snoopy caches aren't practical, either.

In other words, all of the components are independent, unreliable devices connected by an unreliable, slow, narrow, and disorganized network.

What's the Good News?

The bottom line is that, despite the failure, uncertainty, and lack of specialized hardware support, we can build and effectively use systems that are an order of magnitude more powerful. In fact we can do this while providing a more available, more robust, more convenient solution. This quarter, we'll learn how.

A Brief History

Distributed Systems, as we know them and love them, trace their origins to the early 1980s. Or, if you'd like to dig a little deeper, the 1970s (DECnet, SNA, DSA, and friends). We didn't have global scale computing in the 1970s and 19802, but we did have enterprise-level computing, networking existed (Ethernet was invested in 1973), computers were being clustered locally for performance and scale, and personal computing was arriving. It was clear that computers were becoming smaller scale, lower cost, and more ubiquitous. And, it was apparent that networks were becoming faster, wider, and cheaper.

To oversimplify things a bit, this led to the question, "Can we build a global (or, at least, super large) scale distributed computer?" And, loosely speaking, much of the distributed systems work done from the early 1980s through the early 1990s was centered around the goal of managing resources at this scale trhough what can (again, taking some liberties) be described as "The Great Distributed Operating System In The Sky."

The goal was to invent a software management layer that enabled users to harness the distributed resources (processing, memory, storage, users, etc) just like a normal operating system does for the recources within a single computer. The goal was to make the distributed nature of the resources "transparent" to the user, such that the user didn't have to know or care that the resources were distributed, despite the limitations of scale and communications.

A lot of good things came out of that era. CMU's Andrew project effectively invented distributed file systems and the techniques that are nearly universal today for implementing them. MIT's Project Athena practically invented modern authentication in the form of Kerberos, which remains in use today. Distributed transactions, replication schemes, etc, etc, etc were all children of this era.

But, it all came to a crashing end in the early 1990s. The dream of "The Great Distributed Operating System In The Sky" was not, and is not, to be. The bottom line is this: We can't have a general purpose distributed shared memory, because the communication required to build one is too expensive. We'll take about it shortly, when we talk about "atomic commit protocols".

But, one way to think of it is this. At the most primitive level, we normally enforce mutual exlusion and other synchronization via shared memory. But, in distributed systems, we need synchronization to construct a consistent shared memory. Chicken or egg?

For a while we looked at "relaxing" the memory model to meet most needs in an efficient way -- but that didn't really work. So, by the early 1990s, this phase was over. Instead, the focus became "Middleware" -- using specific software layers to construct specific solutions to specific classes of problems, rather than trying to make one size fit all.

Distributed systems research died down for a while. Then it got red-hot with the "Dot Com Boom of the late 90s and early 2000s". As Internet-based commerce exploded, infrastructure needed to be distributed to scale up, ensure robustness -- and meet the needs of users distributed around the globe. Then, the dramatic growth and increasing value of data generated a focus on "Data Intensive Scalable Computing (DISC)", i.e. the scaling of the processing of data.

And, this growth continues to be fueled by ubiquitous computing in the form of mobile devices, "The Internet of Things", the back-end infrastructure needed to support this, energy-aware computing, and more. It is an exciting time to study distributed systems!