CSE 223B Term Project
Each group should submit a 1-2 page write up describing the progress you
have made to date, and a detailed set of deliverables you plan to submit
at the end of the term. In particular, you should be concrete about:
This is also a great point to notify us of any challenges you have
enountered so far, any modifications in your originally proposed goals, or
basically anything else you'd like to bring to our attention. As a
reminder, you will be submitting an 8-10 page report at the end of the
term in the same style as the research papers we've read in class
describing your system, and any text you write for this checkpoint will be
a great start toward that goal.
- The particular software artifact you intent to complete.
- A high-level description of the design of your system.
- Any existing software packages you are using as a basis for your
- Your evaluation plan for this artifact. How will you measure success?
What will you use for a testbed? Do you need any resources from us?
I have listed a number of possible project ideas. By no means are you
limited by these ideas; in fact, I encourage you to come up with some
of your own. Even if you do choose to go with one of these, keep in
mind they're only starting points. Each of the ideas on this list will
require considerable flushing out and refining in order to turn it
into a reasonable project proposal.
I realize each of you are in different stages of your graduate career,
and are looking for different things out of a course project. While
all of the ideas below likely could result in a suitable class
project, they vary in their ambitiousness and scope. You do NOT need
to conduct original research for the class project---implementing
something real is just fine. Those of you with
aspirations of possibly publishing your work might think about
tackling something more open-ended, however.
Some of the projects require access to resources in my research
group. Plese contact me if you're interesting in persuing one of them
so I can make sure we have enough resources to go around.
I also have credits for various Amazon Web Services (e.g., EC2, S3,
Elastic MapReduce, CloudFront, DynamoDB, RDS) that I can provide for use
in this course, so feel free to think big.
The following topics were suggested by other faculty members in the
department. I suspect there could be interesting 223B projects here,
but I do not know the details of their work. Those of you interested in learning more about them should
contact the faculty indicated.
- Implement a distributed shared memory system along the lines of IVY or
TreadMarks. You could implement it at the user level like TreadMarks so
that you could run it on EC2.
- Build a collaborative editing environment like Google Docs or
Etherpad. You could probably use your Lab 3 as a good starting
- Use Fuse to implement a distributed file system with consistency
semantics of your choosing.
- Measure the performance of memcached on interesting workloads,
identify sources of poor performance, and improve things.
- In a
recent talk Larry Carter proposed using Cartesian Coordinates to
manage the amount of memory movement in algorithms, as memory costs now
frequently dominate computation. This is especially true in distributed
computations. Consider applying his technique to MapReduce.
The idea is to quantify the tradeoff between different ways of
moving data to/from computation.
- Extend any of the papers we've read to fix problems they left
- Practical differential privacy for federated querying.
Ensuring differential privacy
means that a database eventually must be retired (i.e., clients use up all your
querying budget). Professor Kamalika Chaudhuri and Research Scientist
Ken Yocum are interested in exploring databases that instead retire values and add new rows to get around this
problem. This might help when the database(s) are federated, so no
single entity manages the entire system.
- Stream processing in Pig. Ken Yocum's group has extended
Pig to support incremental queries. The current implementation is
efficient because it generates delta's between each operator. However, if delta's are
dropped in the network, than processing cannot continue. He s
interested in testing
some simple policies to recover from loss events like occasionally sending more than a delta and
rolling back in the event of a loss.
Last updated: Mon May 20 14:33:11 -0700 2013