CIS 223
Assignment 5-S97
Simple Assembler Project

Reading: Complete all readings for Sections A through C in the Course Syllabus.

Topics and Concepts: This is a group project intended to emphasize group work techniques. The project focuses on the use of C++ classes and template classes and on the concepts of encapsulation, information hiding, limiting the need to know, and separation of concerns as discussed in class.

Project: Write a One-Pass Assembler for a subset of the FACET Assembler Language (for the FACET Computer) as described in the Anthology material in (Sections 10 and 11). You are to handle only a subset of the opcodes in the language. Other limi-tations will be placed on the language as well in order to make this project doable within a month. THE PROGRAM FOR THE ASSEMBLER IS TO BE WRITTEN LARGELY IN C++ BUT USING Standard C i/o rather than C++ Stream i/o.

You will be given an overview of the FACET computer in class but will be expected to sort out the rest of the details that you need from the material in the Anthology.

Topics and Concepts: Group work on a relatively large project -- including definition of group and individual responsibilities and the importance of regular communication; use of state transition diagrams to model the behavior of a program system; thorough and precise definitions of component interfaces; use of electronic communication media (e-mail, phone, ftp, etc.); data driven analysis, decomposition, and design of a larger project; separate component and integration testing; incremental programming (or programming by iterative enhancement). The focus will be on the precise definition of the steps required to translate a single instruction from FACET Assembly Language to FACET Machine Language.

I will provide you with some project milestones. We will attempt to solve this problem one step at a time. Considerable planning and a commitment to documentation and verbal and written communication will be required if you expect to complete this project in a timely manner. You will find it useful to first carefully analyze the problem specification (as given in the Anthology) and then work on the structural design of your Assembler. Only after this has been done should you even begin to think about the C++ implementation.

Phase 1: Problem Analysis (up-front work in the problem domain)

The material provided in class combined with that in the Anthology is expected to provide a fairly complete specification of the problem to be solved. Whereever the specifications are found to be incomplete or ambiguous, you will be expected to ask questions and get detailed answers from the project sponsor (client) -- that's me.

You will have to read and understand the specifications to the point where you are able to carry out a sample translation of a short program on paper. To do this, you will need to identify, very early in the game, the new data types you will want to use in the translation process. These types will help you build an extended platform of types which you can use in writing the Assembler. Among other things, you will want to design and implement classes that model following 5 entities of the FACET Assembler world:

the input file, with its fixed form layout as described in the specifications;
the output file, as it must appear to be processed by the FACET simulator;
an error handler;
a symbol table;
an opcode table.

We will provide for you a hash table template class that models table management using hashing with chaining (that is, it contains the code for one possible implementation of hashing with chaining). The chains are implemented using link nodes, and the code for a template class for a link node is also provided. Finally, we have also provided the code for a sample user data node and a driver program to test the 2 template classes and the data node class.

You will want to use the hash table template (with link nodes) to define two new data types -- one to model a lookup table (the symbol table) of legal FACET symbols (identifiers); and the other to model a lookup table (the opcode table) of legal FACET opcodes. To do this, you will first have to define two simple data types, one to model a legal FACET symbol and the other to model a legal FACET opcode. These data types will be defined by classes (symbol_node and opcode_node, respectively) that are very similar to the hash_node in the code provided.

The sample translation that you carry out by hand should clearly show that you understand how the objects of these classes are manipulated. As a result of this process, you should be able to produce following information for each of five classes listed above and any others you might decide to use:

A listing of the problem domain entities (at least the five entities listed above) that have to be manipulated as the Assembler does it work.
For each entity identified in A), a list of the attributes (characteristics) of and the operations (methods) on that entity. The operations will fit into one of two categories -- those that
- change the state of the entity [for example, changing the state of an entry in the symbol table from undefined to defined]. and those that
- ask a question (query) about the state of an entity (for example, a query as to whether or not a symbol table entry is defined)
The attributes and operations together will provide a static model of your system, illustrating the structure of each problem domain entity and its relationship to other entities at any given point in time. Initially, our static model will be described mostly in English, at a fairly high level of abstraction. Eventually, we will translate this more abstract model into a C++ class using the data types and operations at our disposal in the language.
As an example of what is required here, consider the following (incomplete) description of the symbol table:
```
    Entity: Symbol Table
    Attributes:
	Name of a symbol (max 6 characters, must start with a letter, etc);
	Numeric value associated with the symbol;
	Number of times the symbol is referenced (used) in the program, and 
	A pointer to a linked list in which each entry contains the number of
 		the line of an Assembly Code instruction which references the 	
		symbol);
	Flag (Boolean value) to indicate if symbol is defined;
	etc.
    Operations:
	Initialize the table (part of the constructor?);
	Determine if a symbol is in the table or not (query operation);
	Store the value of a symbol in the table;
	Add an entry for a new symbol to the table;
	etc.
```
An informal state transition diagram providing a dynamic model of the states and related events for the translation of a single instruction. This diagram should illustrate the entire sequence of possible events that take place during the translation of an instruction.

To Be Turned In: The English lists as illustrated above and a state transition diagram.

Due: Within one week of the initial assignment of the project.

Phase 2: Assembler Design (work in the program domain)

In defining the primary components of your Assembler, the main focus should be on the data type models constructed as part of Phase 1. At this point your group should begin to think about partitioning the set of all of these components into 3 subsets, and allocating each subset to a member of the group. Your group needs to begin to map out a description of the responsibilities of each member.

The mapping of each entity model to a C++ class can now begin.

A model of each entity attribute should be constructed using either the built-in C++ data types, or othe user-defined data types.
Additional detail should be added to the description of each operation. The detail should include:
- the name of the operation;
- a brief description of the purpose of each operation;
- a list of arguments with a brief description of the purpose of the argument along with its IN, OUT, or INOUT designation.
A list of possible exceptions (special situations or errors that can occur for the objects of your type) and the actions to be taken for each. You may never actually handle all of the exceptions that you note, but every exception that comes to mind should be noted.
A list of any assumptions that you need to make along the way. In our case these assumptions will usually be of one or two kinds:
- an assumption related to a decision to defer handling a certain features of the FACET language until another day;
- an assumption that involves a clarification of the Assembler process.

NOTE: Chapter 12 (pp. 625-629) of the Friedman/Koffman text (First Edition) contains a small example illustrating how some of the above information might be documented. It is advisable to put the material that you develop for the Assembler on the computer so that it can be updated as needed and e-mailed to your group members and/or printed. All team members should try to adopt a similar format for describing his/her types.

To Be Turned In: Complete printouts of the documentation provided for items 1 - 4 listed above. This material should be on the computer.

Phase 3: Component Implementation and Testing:

Each member of your group team is responsible for implementing, compiling, and testing the classes designed in Phase 2. Each member of the team should test a class that someone else designed. I will not insist on this, but it might be an useful exercise. To test your class, you will need to write a special driver program to test the constructor and all other functions of the class. You should also update the lists from Phase 2 and star (use an asterisk, *) any changes in these lists.

NOTE: You should NOT try to code any class all at one time. Rather begin by coding a small "nucleus" of attributes and methods. Get these compiled and tested, and then incrementally add more attributes and methods and test these. Repeat this process until you have a complete class as described in your Phase 2 lists. You may even wish to complete Phase 4 (below) before adding additional features to your components.

To Be Turned In: Your driver programs, the code for each class implemented, and your revised lists from Phase 2.

Phase 4: Integration Testing

Integrate all components developed in Phases 1-3 into a single program, and test this program. You should keep a log of ALL major problems encountered during integration. Turn in your final code, your log, and a reasonably informative trace of the execution of your program.

To Be Turned In: The output from your system integration test and your Problem Log. A "live" demonstration of your simulator in action would be appreciated.

Notes:

I. The final output from your Assembler should consist of (in order):

a listing of the opcode table after its initial construction
a listing of the source code lines read in; if there are any errors in a line, the number of the error and an English phrase indicating what is wrong should be listed directly beneath the instruction in error;
a listing of machine code lines generated;
a listing of any errors detected after the FACET Assembler program has been processed and printed;
a dump of the symbol table at the end of the Assembly process.

Finally -- the machine code file that you produce should be read into the FACET interpreter and processed and the output from this test should also be turned in.

II. You should have two levels of diagnostics: WARNING and FATAL

III. The symbol and opcode tables will be generated as instances of the hash code template class described in the Anthology and stored away for your use on the CIS 223 board.

IV. The opcodes that you are responsible for handling are listed on pp. 53, 91, 93, and 94 of Section 11 of the Anthology. You can ignore the overflow indicator opcodes; you are not responsible for them.

V. First test your Assembler with a correct version of the program on p. 98 (Section 11). Then test it again with the same program but with numerous errors, carefully selected to completely exercise the error handling capability of your Assembler.