(click your browser's refresh button to ensure that you have the most recent version)Note: To download and install Python version 2.6 on your home machines see this page. Remember that this is only to enable you to play with the assignment at home: The final version turned in must work on the ACS Linux machines. While you can use MacOS or Windows to begin working with Python, the code you turn in must be that required for the Linux environment.
University rules on integrity of scholarship will be strictly enforced. By completing this assignment, you implicitly agree to abide by the UCSD Policy on Integrity of Scholarship described beginning on page 68 of the Academic Regulations section (PDF) of the 2007-2008 General Catalog, in particular, "all academic work will be done by the student to whom it is assigned, without unauthorized aid of any kind."
You are expected to do your own work on this assignment; there are no group projects in this course. You may (and are encouraged to) engage in general discussions with your classmates regarding the assignment, but specific details of a solution, including the solution itself, must always be your own work. Incidents that violate the University's rules on integrity of scholarship will be taken seriously: In addition to receiving a zero (0) on the assignment, students may also face other penalties, up to and including, expulsion from the University. Should you have any doubt about the moral and/or ethical implications of an activity associated with the completion of this assignment, please see the instructors.
Code for all programming assignments should be well documented. A working program with no comments will receive only partial credit. Documentation entails providing documentation strings for all methods, classes, packages, etc., and comments throughout the code to explain the program logic. Comments in Python are preceded by # and extend to the end of the line. Documentation strings are strings in the first line of a function, method, etc., and are accessible using help(foo), where foo is the name of the method, class, etc. It is understood that some of the exercises in this programming assignment require extremely little code and will not require extensive comments.
While few programming assignments pretend to mimic the "real" world, they may, nevertheless, contain some of the ambiguity that exists outside the classroom. If, for example, an assignment is amenable to differing interpretations, such that more than one algorithm may implement a correct solution to the assignment, it is incumbent upon the programmer to document not only the functionality of the algorithm (and more broadly his/her interpretation of the program requirements), but to articulate clearly the reasoning behind a particular choice of solution.
Your solutions to this assignment will be stored in separate files under a directory called pa7_solution/, inside which you will place the files: ff.py and sg.py. These two files listed are the versions of the corresponding supplied files that you will have modified. There should be no other files in the directory.
After creating and populating the directory as described above, create a zip file called <LastName>_<FirstName>_pa7.zip by going into the directory pa7_solution and executing the UNIX shell command:
zip <LastName>_<FirstName>_pa7.zip ff.py sg.py
Once you've created the zip file with your solutions, you will use the turnin program to submit this file for grading by going into the directory pa7_solution/ and executing the UNIX shell command:
turnin -c cs130w -p pa7 <LastName>_<FirstName>_pa7.zip
The turnin program will provide you with a confirmation of the submission process; make sure that the size of the file indicated by turnin matches the size of your zip file. See the ACS Web page on turnin for more information on the operation of the program.
map (k1,v1) -> list(k2,v2) reduce (k2,list(v2)) -> v2Download the files ff.py, sg.py, huckleberry, and gulliver. You will complete the function definitions wherever you see raise Exception("not implemented"). Some additional books to play with.
The intermediate results for each of the stages will be saved in files output/map1-c, output/reduce1-c, output/map2-c, and output/reduce2-c, where c ranges over alphabetic characters.
% mkdir output
Here's how you can use C-style format strings and all their glory in Python:
>>> import ff
>>> ff.splitFile("huckleberry", 1000)
The concatenation of these chunks should be exactly equal to the original file. Here are some sanity checks you can run from the shell:
>>> "hello %s %.5d" % ("world", 10)
'hello world 00010'
% wc -l huckleberry
% wc -l output/huckleberry-*
% cat output/huckleberry-* > tmp
% diff huckleberry tmp
% rm tmp
>>> l = ff.map1(["huckleberry","56"], ["YOU don't know about me without you have read a book by the name of The"])
>>> for x in l:
... print x
(['you', 'dont', 'know', 'about', 'me'], ['1', 'huckleberry', '56'])
(['dont', 'know', 'about', 'me', 'without'], ['1', 'huckleberry', '56'])
(['know', 'about', 'me', 'without', 'you'], ['1', 'huckleberry', '56'])
(['about', 'me', 'without', 'you', 'have'], ['1', 'huckleberry', '56'])
(['me', 'without', 'you', 'have', 'read'], ['1', 'huckleberry', '56'])
(['without', 'you', 'have', 'read', 'a'], ['1', 'huckleberry', '56'])
(['you', 'have', 'read', 'a', 'book'], ['1', 'huckleberry', '56'])
(['have', 'read', 'a', 'book', 'by'], ['1', 'huckleberry', '56'])
(['read', 'a', 'book', 'by', 'the'], ['1', 'huckleberry', '56'])
(['a', 'book', 'by', 'the', 'name'], ['1', 'huckleberry', '56'])
(['book', 'by', 'the', 'name', 'of'], ['1', 'huckleberry', '56'])
(['by', 'the', 'name', 'of', 'the'], ['1', 'huckleberry', '56'])
>>> l = ff.map1(["blah","1"], ["one two three four five six seven 8ight 9ine 10en 11even twelve thirteen fourteen fifteen sixteen"])
>>> for x in l:
... print x
(['one', 'two', 'three', 'four', 'five'], ['1', 'blah', '1'])
(['two', 'three', 'four', 'five', 'six'], ['1', 'blah', '1'])
(['three', 'four', 'five', 'six', 'seven'], ['1', 'blah', '1'])
(['twelve', 'thirteen', 'fourteen', 'fifteen', 'sixteen'], ['1', 'blah', '1'])
>>> ff.reduce1(['away', 'in', 'the', 'night', 'and'], [['1', 'huckleberry', '2472'], ['1', 'huckleberry', '10928']])
['2', 'huckleberry', '2472']
>>> ff.map2(['cabin', 'again', 'on', 'bread', 'and'], ['1', 'huckleberry', '11154'])
[(['cabin'], ['again', 'on', 'bread', 'and', '1', 'huckleberry', '11154'])]
>>> ff.reduce2(['underneath'], [['i', 'didnt', 'know', 'how', '1', 'huckleberry', '202'], ['the', 'picture', 'it', 'said', '2', 'huckleberry', '3916']])
['the', 'picture', 'it', 'said', '2', 'huckleberry', '3916']
If there is no entry for a given word, mostCommonFragment returns None. If you ever need a fragment for a word but there is none, you should return None from genSentence. Otherwise, you should piece together the c fragments into a single string.
>>> import sg
>>> sg.runBothPhases(1000, ["huckleberry"])
['boy', 'says', 'the', 'old', '2', 'huckleberry', '3781']
>>> sg.genSentence("my", 1)
'my boy says the old'
>>> sg.genSentence("my", 2)
'my boy says the old rags and my sugar'
>>> sg.genSentence("my", 4)
'my boy says the old rags and my sugar there was and all the time and never'
>>> sg.genSentence("no", 4)
'no use to try to put in the time i didnt want to put in the time'
>>> sg.runBothPhases(1000, ["huckleberry", "gulliver"])
>>> sg.genSentence("my", 4)
'my body by the help me tow the raft nine logs fast together with the minute descriptions'