Data storage is considered by some to be the heart of an information system. First, the data have to be available when the user wants to use them. Second, the data must be accurate and consistent (they must possess integrity). Beyond this requirement, the objectives of database design include efficient storage of data as well as efficient updating and retrieval. Finally, it is necessary that information retrieval be purposeful.The information obtained from the stored data must be in a form useful for managing, planning, controlling, or decision making.
There are two approaches to the storage of data in a computer-based system. The first is to store the data in individual files, each unique to a particular application. The second approach involves building a database.A database is a formally defined and centrally controlled store of data intended for use in many different applications.
Individual files are often designed only with immediate needs in mind, so it becomes important to query the system for a combination of some of the attributes, these attributes may be contained in separate files or may not even exist. Databases are planned, so that data is organized for efficient storage and effective retrieval. Data warehouses are very large databases that store summarized data relating to a specific subject so that queries are answered very efficiently.
How to store data is often an important decision in the design of an information system. There are two approaches to storing data. The first approach is to store data in individual files, one file for each application. The second approach is to develop a database that can be shared by many users for a variety of applications as the need arises.
An understanding of data storage requires a grasp of three realms: reality, data, and metadata. An entity is any object or event for which we are willing to collect and store data. Attributes are the actual characteristics of these entities. Data items can have values and can be organized into records that can be accessed by a key. Metadata describe the data and can contain restrictions about the value of a data item (such as numeric only).
Examples of conventional files include master files, table files, transaction files, work files, and report files. Databases typically are constructed with a relational structure. Legacy systems can have hierarchical or network structures, however.
Normalization is the process that takes user views and transforms them into less complex structures called normalized relations. There are three steps in the normalization process. First, all repeating groups are removed. Second, all partial dependencies are removed. Finally, the transitive dependencies are taken out. After these three steps are completed, the result is the creation of numerous relations that are of third normal form (3NF). The entity-relationship diagram may be used to determine the keys required for a record or a database relation. The three guidelines to follow when designing master tables or database relations are that
- each separate data entity should create a master table (do not combine two distinct entities within one table);
- a specific data field should exist only on one master table; and
- each master table or database relation should have programs to Create, Read, Update, and Delete.
The process of retrieving data may involve as many as eight steps:
- choosing a relation,
- joining two relations together,
- projecting (choosing) columns,
- selecting relevant rows,
- deriving new attributes,
- sorting or indexing rows,
- calculating totals and performance measures, and finally
- presenting the results to the user.
Denormalization is a process that takes the logical data model and transforms it into a physical model that is efficient for tasks that are most needed. Data warehouses differ from traditional databases in many ways; one is that they store denormalized data, which is organized around subjects. Data warehouses allow easy access via data mining software, called siftware, which searches for patterns and identifies relationships not imagined by human decision makers.
Data mining involves using a database for more selective targeting of customers. Assuming that past behavior is a good predictor for future purchases, companies collect data about a person from past credit card purchases, driver’s license applications, warranty cards, and so on. Data mining can be powerful, but it may be costly and it needs to be coordinated. In addition, it may infringe on consumer privacy or even a person’s civil rights.
Once you have mastered the material in this chapter you will be able to:
- Understand database concepts.
- Use normalization to efficiently store data in a database.
- Use databases for presenting data.
- Understand the concept of data warehouses.
- Comprehend the usefulness of publishing databases to the Web.