Chapter 13 We've talked about data files that contain documents (such as a word document), worksheets (excel document). This chapter discusses structured data files or database files - files that are organized in a uniform format. Examples: ¥ a file of employees for a company - name, address, job classification, hourly pay - for each employee. ¥ A telephone directory - name, address, phone number - for each entry. Tasks associated with maintaining and accessing the data in a data file are called data management. Each database file has a file structure that describes the way data are stored in the file. A database file is a collection of records. Each record contains the data for a particular entity - a person, place, or thing about which you want to store data. A record is subdivided into fields. The data contained in a field is called a data item. Example: Record for an employee entity: Field names data items Last name: Smith______ First name: William____ Address: ___________ Zip code: phone: Gender: SocNum JObCode DeptCode HourlyWage Exemptions Birthdate A field can be fixed-length or variable length. A fixed-length field has a maximum limit on the number of characters it can store. A data item with fewer characters is "padded" with blanks to fill the field. For example, if the last name field has a maximum length of 10, Smith would be stored as Smithbbbbb where each b is a blank. You can store last names with fewer than 10 characters but you cannot store names with more than 10 characters. A variable-length field expands to fit the data you store in it. Less common than fixed-length fields. Each field of a database file is assigned a data type which indicates the kind of data that can be stored in it. Numeric data types: real numbers and integers. Real numbers can contain decimal points, integers cannot. integer - exemptions real number - hourly wage Other data types: strings (last name, first name, address, zip code, etc. ) a string is a sequence of letter and/or digit characters. character data type- single characters - gender (F or M) - middle initial date data type - enables you to store and manipulate dates - 10/23/1996 is equivalent to October 23, 1996. Logical data type - used to store true/false or yes/no data. A memo data type or memo field - variable length text field in which you store a message or note. One reason we prefer fixed length fields is it gives us an exact count of the number of characters in a record. We can add up the characters in each field to come up with the record size (# of characters in a record). The record type is a general description of a record which includes its field names, data types, and field lengths. A record occurrence is the data stored for a particular entity in the database. A flat file is a data file in which all the records are the same record type. A database can be a collection of flat files, each containing records of different types. In a flat file, you only can access information stored in the records of that file. In a database, you can manipulate data items in one file with data items from another file. The individual files can be consolidated or related so that they can be used as one unit - the database. Example: Records in an employee file contains hourly wage rate and social security numbers of employees. Records in a time card file contains pay period, hours worked, and employee social security numbers. If these are two files in a database, we can form a relationship between records in different files based on items they have in common. For example, we can form a relationship between records in each file that have the same social security number. We can create an expanded record for each employee that contains the hours worked and pay period as two additional data items. Weekly pay is hourlywage * hours worked. Such a data base is called a relational data base. A data structure diagram shows the relationships between record types. Three kinds of relationships: one-to-one, one-to-many, and many-to-many. one-to-one means there is one record of one type corresponding to a record of another type. For example, each employee record corresponds to one record in a file of social security records. __________ __________ __________ | | | | | | __________ __________ __________ | | \ | / | 1 to 1 | 1 to many | many-to-many | / | \ / | \ ___________ ___________ __________ | | | | | | ___________ __________ ___________ one-to-many, a record of one type corresponds to many records of another type. each employee record corresponds to many time card records (one per pay period for an employee). each job record (description of a particular job entity) corresponds to many employees (many employees have same kind of job). many-to-many - a record of one type can correspond to many records of antoher type and vice-versa. a record of type department may correspond to many records of type job (i.e., many different kinds of jobs are in a department). Also, a particular job record may correspond to many different department records in a company (e.g.., many departments have a need for secretaries). We will consider a data base that is used to store information about a hospital. There is a file of employees - contains employee records: file of jobs - each job record has a job description file of departments - each department record has a list of its staff File Employee Lastname Firstname Socnum Job code pay rate Dept code File Timecard Pay period Hours Socnum File Job File Department Job title Job code pay rate Dept name Dept code Office Phone Four kinds of data base models: hierarchical data base: a parent record can be linked to multiple child records, but a child record can have only one parent. The relationships between records are established by creating physical links between a parent record and its children records (1-to-1 and one-to-many relationships). The links are established when the database is formed. Inflexible - cannot change the database because links are built in. Cannot define new relationships between records. department /|\ /|\ employees jobs /|\ time cards Network model similar to hierarchical model. Physical links connect records. Links cannot be changes. Unlike a hierarchical model, a child record (called a member) can have multiple parents (called owners). department jobs /|\ /|\ employee /|\ time cards Relational data base: Links are not preestablished. Each file is preceived as a table with the fields being columns (called attributes). Each row (called a tuple) is a record of the file. If two files have a column in common, their records can be related and data from the two tables can be combined. Table Employee Table Timecard Lastname Firstname Socnum Pay rate Dept code Pay period Hours Socnum Ang Susan 1234 10.00 Medrec 5/26/96 20 1234 Roberts Joe 3456 20.00 Emerg 5/26/96 40 3456 Jones Sam 5122 15.00 Admit 6/2/96 30 1234 Relation formed between Time card table and employee table yields a Combined table (1 record corresponding to each time card record) last First Socnum rate dept code pay period hours Ang Susan 1234 10.00 Medrec 5/26/96 20 Roberts Joe 3456 20.00 Emerg 5/26/96 40 Ang Susan 1234 10.00 Medrec 6/2/96 30 Can use this to figure the pay for each employee for each pay period - record 1 - pay is 10 x $20 or $200. Object-oriented database - organizes data into a hierarchical class of objects defined by attributes. Objects can be manipulated using methods. Objects that share common characteristics are grouped into a class. Each box represents a class. Employee Classes Hourly Employee Salaried Employee Classes Nurse Clerk Manager Physician Objects Employees have attributes like name, address, birthdate, socnum, job code Salaried emploee have additional attributes: Annualsalary Hourly employee have additional attribute : HourlyWage A nurse is a particular object of the class Hourly employee Kinds of Database software: 1. Custom software Software that is developed from scratch to manage your files. Too expensive and time-consuming. Not practical today. 2. File management software: Software that helps you create file records and manage those records. It can only deal with one file at a time and cannot recognize relationships between different files it helps you create. Less expensive that a data base management system. Functions: 1. Define the file structure 2. Enter and edit records 3. Sort records (rearrange their order). 4. search for records that satisfy a condition (all recordds for employees who have job code RN2, for example). (answer a query) 5. Define and Print reports. 3. Database Management system (DBMS) Application software that helps you manage the data in more than one file at a time. Has same functions as file management system but lets you define relationships among files. It also provides a simplified programming language that lets you write your own custom programs to perform tasks that are not "built in". Object Oriented data base management system Use special software that enables you to define classes, define objects, and define special methods. Some general methods like searching and sorting are already built in. Setting up a Database 1. Design the file structure Determine what data needs to be collected. List the information available as well as additonal information that you need to produce screen output or printed reports. Each piece of information is a candidate for a data field. Video store Example: primary customer name, address, phone, credit card #, other family members authorized to use account, list of movies rented, status of each movie - returned/not-returned, date rented, price of each rental. Next decide on the data type for each field - character, numeric, date, logical, memo. Next select format for each field. For example, a numeric field containing a rental price could have a field $9.99. A numeric field storing weekly pay rate may have a format such as $9,999.99. If you type in 345.67 it appears as as $ 345.67. If you use the format XX/XX/XX for a date and type in 102696, it appears as 10/26/96. For unformatted fields, you specify the field length (number of characters). For numeric fields you can specify a range of allowable values and system will perform a range check on each value entered. Next, you want to decide how to group the fields into file records. For our video shop example, it would be inefficient to group the movie info and customer info into the same file. For each file, determine whether the records have a unique identifier or key field. For files that need to be related, make sure they share a field. Eliminate redundant information. Customer file - primary key is phone number Phone number Primary user name Family member 1 Family member 2 Family member3 address credit card #, expiration date, account balance File of Active Rentals - primary key is movie id movie id, Phone number, date of rental Required Operations: Enter a new record in the active rental file each time a video is rented. Delete a record from the file of rentals when it is returned. Movie File - primary key is movie id movie id, title, in/out, location in store, rental category, date of last rental, count of total rentals, count of rentals in current period, vendor id Note- date of rental appears in both the movied file and the file of active rentals - delete it from file of active rentals. Required Operations - When a movie is rented, change in/out status to out, enter today's date as date of last rental, increment counts, retrieve rental category so cost of rental can be computed. When a movie is returned, change in/out status to in. Check rental category to determine if it is overdue. If so, update account balance by proper amount. Vendor file vendor id, phone number, address, contact name Rental category file - primary key is rental category - small file rental category price days of rental Possible tradeoff - put price and days of rental in each movie record. Extra field for each movie record would be required. Save space this way. Easier to update - just change one entry if price changes. Entering Records: When you specify the database structure, you create a blank form that will hold data. You can then enter each record as prompted by the system. The records are numbered in sequence as you enter them. Updating or editing records - you can usually change a record or delete it. Specify the record number and then type in your changes. Or enter a query - search request - and the computer will find the record and allow you to modify it. Searching the database - done by queries. In customre file, Retrieve the record with phone number __________ . You can see if a particular person is listed as an eligible borrower under that account. In rental file, retrieve all records for phone number ___________ . Lists the active rentals for a particular borrower. Is a particular movie available for rental? In movie file, retrieve all records for that title. See whether in/out status is out for all of them. Global update for a file in a database. At end of each period, you will want to reset the count of rentals in the current period to 0. Reorganizing records - You can select a field to act as a sort key to rearrange the records in sequence by the data in the sort key. If the records are physically moved, this is called a sort. Sorting can be a time consuming process. You can also create one or more smaller index files, each of which allows the user to view the records in a different sequence. The index file stores a record # sequence for a particular arrangement of data. As an example, lets say we have the following data in a movie file whose records are in order by movie id number: record # movie id title # rentals per period total rentals 1 1112 30 100 2 1113 50 50 3 1114 20 250 4 1115 35 70 The records in a file are numbered sequentially. Index file for movie file indexed by movie id (same as record number sequence): movie id Record number 1112 1 1113 2 1114 3 1115 4 Index file for movie file indexed by rental count for current period # of rentals record # 50 2 35 4 30 1 20 3 Index file for movie file idexed by total rental count 250 3 100 1 70 4 50 2 To list the records as ordered in an indexed file, DBMS simply pulls them out and displays them in sequence by their position in the indexed file, not their actual sequence in the file. Designing Effective Output A database report is the formatted output of some or all of the data from one or more files of a database. The DBMS report generator helps you create a report template which specifies the report title, the column headings, which field to display under each column, which fields to subtotal, and the report format. Once the template is designed, you can produce a printed or displayed report at any time. The report lists the actual file data at the time you generate it. For example, for our hospital database, you can specify an employee report with the format: page # today's date Temple University Hospital Employees Last Name First name Social security number Gender Job code Different kinds of reports: Detail report: Lists transactions in the sequence that they occur. For example, a report showing all movie rentals (date of rental, movie id, customer id) for a given period would be a transaction report. A list of all the customers and account transactions for a bank teller would be a transaction report. Generally fairly long and not particularly useful. Used to trace down an error or discrepancy. Summary Report: condensed version of a detail report - usually produced for a manager to aid in making decisions. For bank transactions - might summarize total number of customers processed by a teller during each hour of the day, total amount of deposits entered, total amount of witdrawals. Control Break report: a summary report which shows subtotals by category. For example, for a hospital employees report, the employees might be grouped by department. For each department, the total hours worked and gross pay could be subtotaled. There would also be a final total for the whole hospital. Admitting Depatment Last Name First Name Hours worked Wages Subtotal: ____________ _______ Emergency Department Last Name First Name Hours worked Wages Subtotal ___________ ________ Final Total ___________ ________ Exception report: Lists records that contain data outside of the normal ranges or limits. For example, all teller transactions over $1000 during the day. Or all movie rental customers whose credit cards will expire in the next 30 days, or who have a negative balance in excess of $50. The purpose of an exception report is to point out where an action needs to be taken - send out notices to update credit card expiration date or to initiate collection of money owed. Queries Queries are often formed as simple conditions (value True or False) using symbols like >, >=, <, <=, =, and <>. For example, to locate all employees who earn more than 10.00 an hour: Find Hour wage > 10.00. To locate all employees who are turning 65 this year: FIND birth year = 1931. The DBMS displays the records of employees for which the search condition is True. Sometimes more complex criteria are needed. You might want to find all employees who are in the admitting department who are female: FIND department = 'admitting' AND gender = 'F' The Boolean operator AND is true if both conditions it combines are true. There is another operator OR which is more inclusive. It is true if either condition it combines is true. FIND birth year <= 1931 OR hours worked < 20 Displays records of employees who are 65 or who work less than 20 hours per week. FIND hours >= 10 AND hours <= 20 - Displays employees who work between 10 hours and 20 hours a week. FIND (hours >= 10 AND hours <= 20) OR hour wages < 8.00 - Displays employees who work between 10 and 20 hours. Also displays employees who earn less than 8.00 an hour. A third Boolean operator NOT is also used. Not is used to indicate that a condition should be False in order for a record to be displayed. FIND NOT (department = 'emergency') AND birth year <= 1931 Displays employees who are 65 or over and in any department except emergency. How about FIND NOT (department = 'emergency' AND birth year <= 1931) Finds employees who do not satisfy the criteria - in emergency and 65 or over -- employees who are in any department except emergency or who are younger than 65. Elliot Koffman, Professor CIS Department Temple University