SOFTWARE ENGINEERING What is Software Engineering? - It's the systematic application of knowledge (scientific, mathematical, and rules of thumb) to fashion useful computer software artifacts - To be useful, something must work well, but not cost too much. This is what I call ``Making do with what you've got'' - These two conflicting desires require balancing tradeoffs + Prioritize requirements + Leave out or reduce requirements that cost too much - Not considering these issues carefully leads to serious problems, which we characterize as *failures* + Software is unreliable: modularize, measure, test + Users don't like functionality: elicit requirements, specify, prototype + New features are demanded that cannot be easily added: modularize - Engineering is a ``noble'' discipline, dedicated to making the world a better place for its inhabitants. + With this dedication come many responsibilities + In particular, every effort must be made to construct, first of all, safe artifacts + Secondarily, the issues of functionality and cost must be met + Unfortunately, the ineffable qualities of aesthetics are often forgotten, adding a hidden cost to these artifacts - Some common engineering principles + Systematicization of knowledge - rules of thumb, checklists, procedures, formulae - much of it domain-specific (e.g., compilers, air traffic control) + Use of tools to save labor + Teamwork - cooperating with other people to complete a large task + Design for separation of concerns, reuse (e.g., object-orientation) + Avoiding failure through planning (specify, design, and test) - In this class, I will try to emphasize these issues to help make you better engineers Success requires mastery of Why, What, When, How, Who + Specification: The ``What'' - Abstract description of a system to be used for analysis & communication + negotiate features + completeness, consistency + manager to programmers, etc. + choice of language is important and very flexible - Your specification are the handouts + Mixture of English, grammar rules, and error/type statements - Specifications, by their very nature are incomplete and ambiguous + keeps it short and simple, allows analysis + you have to fill in the blanks + Design and Implementation: The ``How'' - Choice of program structure, data structures, algorithms - In some sense, the next level of specification + materials and composition, not so much behavior - Used to effectively divide and incrementalize work, manage performance + Modules, classes, files, layers, subsystems + Testing: Checking that the ``How'' achieves the ``What'' - Ensuring that you've implemented what you thought you did - Any program description can be ``tested'' + only implementation can be tested by compiling and running it - Typically first test ``units'' then ``system'' + System errors are hard to debug unless sure components are right + Exponential number of system states; need to reduce search space - Can use domain structure to guide testing, minimize number of cases + In compilers, syntax-directed testing + For each piece of grammar, write a test case for each semantically distinguishable case - e.g., need cases for binary operators versus unary operators - for binary operators, test each operator, test with pos and neg vals + Testing is ``complete'' when have ``covered'' grammar and subcases - Also structural testing + boundary cases, statement coverage, branch coverage, off-by-one - Monkey testing: random or destructive inputs + Management: The ``Who'' and the ``When'' - Teamwork (See Web site) + Divide work in ways that minimize communication (simple interfaces) - class, file, subsystem boundaries; use .h files - unit test + Maintain relationships: create win-win situations - meet regularly, divide work according to skills and interests - Scheduling: plan the flight, fly the plan + Establish milestones to allow measuring progress (syntax directed) + Incrementally construct, always runnable Modularity: Most important design idea in software engineering - Designing a software system is hard because large and frequently changes - Several difficult problems in software addressed by modularity + large systems can only be built be several people working in parallel + understanding a big system is difficult + adding new features leads to changing of existing software + costly to resolve problems that have already been solved - All of these assisted by ``separation of concerns''. If a system can be divided into several relatively independent components, then + programmers can work on components in parallel without misunderstanding + system can be understood in parts, rather than all at once + changes are isolated to a few components + component can be reused independent of rest of system - So how do we design ``independent'' components? Basic idea is ``information hiding'' + identify a design decision that might change sometime in the future (due to performance, functionality, error, etc.) + a module is defined to hide this information inside it - all code directly relating to the decision is put inside the module - functions are defined that provide access to the code such that the functionality of the code it provided, but the design decision is hidden + interface is insensitive to change (i.e., design decision can change without requiring changes to interface) + interface is easy to understand + sometimes these can conflict + modules can be defined in C using .h files and prototypes, or in C++ using classes - The most typical design decision is how data is represented + is the dictionary implemented as a tree, list, array, or hash table? - client of the dictionary doesn't care: correct, fast function that is easy to use - designer cares about ease of implementation, flexibility, reuse + other decisions may be worth hiding: is sorted array sorted when created, or only sorted as elements are accessed? (heap allows incremental, array does not) - Example: symbol table module (class) + three possible implementations: list, array, hash table + if don't hide in module, hundreds of lines of code change when change from one implementation to another. // Vector implementation - requires searching and casting. class SymbolTable { Vector table; public void insert(SymbolObject ste) { table.add(ste); } public SymbolObject lookup(String id) { Iterator entries = table.iterator(); SymbolObject entry; while (entries.hasNext()) { entry = (SymbolObject) entries.next(); // check if id matches... } } ... } // Array implementation - requires searching. class SymbolTable { SymbolObject table[100]; next = 0; public void insert(SymbolObject ste) { table[next++] = ste; } public SymbolObject lookup(String id) { SymbolObject entry; for (int i = 0 ; i < 100 ; i++) { entry = table[i]; // check if id matches... } ... } // HashMap implementation - requires casting. class SymbolTable { HashMap table; public void insert(SymbolObject ste) { table.put(ste.id(), ste); } public SymbolObject lookup(String id) { return (SymbolObject) table.get(id) } ... } + Note that the decision of how the symbol table is represented and accessed has been replaced by a decision about an interface: insert, delete, openScope, closeScope, etc. Although it is not hard to change table internals, it is costly to change the interface. Suppose, for example, that insert was now to take a ``kind'' value (e.g., variable, procedure, type, etc.)? + Although SymbolTable is easy, interface design is hard in general - you don't know what will be easy for others to use - you don't always know how it will be used - you don't always know how it will be extended in the future General hints - Understand the specification - Start early - Make sure compiler is always runnable: extend in runnable increments - Test! You are responsible for testing your own compiler - Modularize wisely + consider separating actions from parsing + make sure symbol table is abstract + ``Program to the interface'' - Work cooperatively with partner: interfaces are basis for communication