Piazza forum


Schedule

Staff

Prerequisites

Project

Reading List

Database Systems: Advanced Topics and Implementation
CSE232B Spring 2016

Description: Description: Description:
                      Description: Description: Description:
                      Description: Description:
                      Z:\cse232b\ArrowTopBlue.gif

Welcome!

For announcements regarding the course, please sign up to our Piazza forum.

Milestone 1 of the project will be due in week 7. Stay tuned for exact date.


Schedule Description: Description: Description:
                      Description: Description: Description:
                      Description: Description:
                      Z:\cse232b\ArrowTopBlue.gif

.

Mon

Tue

Wed

Thu

Fri

Lecture


9:30-10:50
CSE 2154


9:30–10:50
CSE 2154


Instructor Office Hours



9:00 - 10:00
 CSE 3238

.

TA Office Hours




11:00 - 12:00
CSE 3232

Staff Description: Description: Description:
                      Description: Description: Description:
                      Description: Description:
                      Z:\cse232b\ArrowTopBlue.gif

  • Instructor: Alin Deutsch, deutsch at cs dot ucsd dot edu, office CSE 3238
  • TA: Rana Ralotibi alotaib,  at eng dot ucsd dot edu

Prerequisites Description: Description: Description:
                      Description: Description: Description:
                      Description: Description:
                      Z:\cse232b\ArrowTopBlue.gif

  • cse232A or instructor's permission, granted only if the following prerequisites are fulfilled
  • Java (8B or 11 or equivalent)
  • SQL (132A or equivalent)

Class Project

You may team up with (at most) one partner for the class projects. Use the Piazza forum to advertise if you are looking for a partner.

Our class project is the construction of an XQuery processor. We consider a subset/modification of XML’s data model, XQuery, and XQuery’s type system as described in this note. The processor receives an XQuery, parses it into an abstract tree representation, optimizes it and finally executes the optimized plan.

  • Milestone 1 (Naïve Evaluation) [due in week 7]: A straightforward query execution engine receives the simplified XQuery and an input XML file and evaluates the query using a recursive evaluation routine which, given an XQuery expression (path, concatenation, element creation, etc) and a list of input nodes, produces a list of output nodes. For the XQuery parser, we recommend the jjtree tool provided with the javacc (Java Compiler Compiler) software, available for download here. Provided with a grammar, jjtree generates a compiler which automatically constructs abstract syntax trees of  its input expressions.

  • Milestone 2 (Efficient Evaluation): Implement a join operator as defined in Section 7 of this note. Implement an algorithm which detects the fact that the FOR and WHERE clause computation can be implemented using the join operator. You may assume that the input XQueries to be optimized are in the simplified "Core" syntax given in the note. No need to first normalize your queries to this form.


To access XML files you can use the standard DOM interface. There are a number of XML DOM parser implementations.
The Java distribution includes one (see documentation here). As an alternative, the Xerces-J project from Apache is quite mature and stable.

The W3C specification of DOM is here.


Test cases for project Phase I
. For the data, download Shakespeare's play, Julius Caesar, in XML form (the associated DTD is here). Queries can be found here.


Presentation Topics

Teams of two studens each will give a 20-minute talk in class, presenting a research paper from the general field of XML-based data integration. Here are a few suggestions (you are welcome to make new ones after consulting with me).




Presentation Schedule

 

Pick a slot here.


Time your presentations to last maximum 20 minutes in total (each presentation team member covers about half of this time.)

Do not exceed this time, we want to allow time for discussion, questions

and interruptions. Think of the presentations as catalysts for (hopefully intense and unruly) group discussions covering
both the material and the critique of the presentation.

Grading

The project constitutes 80% of the final grade, the remaining 20% are earned for the presentation and class participation.

Reading List

Formal XPath Semantics note

A brief informal XQuery tutorial (much briefer and more readable than the W3C standard below)


Textbook material (from the warmly recommended textbook "Web Data Management"):

  • XQuery Advanced Topics slides.


The complete XQuery and XML Schema documentation (the WWW Consortium standards):