A database is a collection of data (or facts) that are logically organized and can easily be searched or manipulated. The term "database" nearly always refers to such a collection in electronic form, which is stored on and can be searched by computer. A database is in essence one or multiple computer files that encode data in a highly structured format.
A database management system (DBMS) is the category of computer software programs used for creating, organizing, retrieving, analyzing, and sorting information in computer-based databases. Such software is often informally referred to as "database software." The database and the DBMS, however, are distinct, just as a text document is distinct from the word processing program used to create and modify it.
Companies have many uses for accurate, comprehensive databases. Firms commonly maintain databases of clients, vendors, employees, inventory, supplies, product orders, and service requests, among other things. A database system that can handle all the relevant attributes of a type of data and to provide the desired methods for analyzing the data is an essential management tool for all but the smallest of businesses.
Technically, there are forms of databases that predate the computer age. During the first half of the 20th century, companies kept large numeric databases on punched cards, and the data was retrieved and sorted by mechanical tabulating equipment. However, the term "database"—sometimes written as two words—did not come into usage until the 1960s, and today is only used to refer to computer databases.
Computer databases, in turn, predate true DBMS, because in the 1960s most computers stored data sequentially on magnetic tape. This precluded quick access to data, which requires random access, as is possible with a spinning computer disk. Early database systems, which were developed for mainframe computers, could handle only a single data file and were oriented toward specific data-processing functions. For example, a certain DBMS would be used for maintaining accounting records and an entirely different DBMS would be used for tracking inventory. Companies and researchers began experimenting with systems that would process transactions as they occurred rather than in daily batches. Later, DBMS were developed that could handle multiple functions and different files, usually in a "hierarchical" database structure. Unfortunately, these databases, which comprised the thrust of the commercial market through the 1970s, were not very amenable to database redesign, and were usually fairly tedious to navigate. The development of the "relational" model largely alleviated these shortcomings. This model, developed throughout the 1970s, based its programs on "abstract" input models, independent of the specific database design, and thus was immune to database redesign. It further offered users a far less convoluted and cumbersome navigational process.
Instead of being designed for a specific industry or task, subsequent generations of DBMS have offered more flexibility and customization, including the ability to perform additional programming. DBMS tailored to specific applications—such as scientific data, text retrieval, image data, spatial and geographical information, and many others—continue to abound. Some of the more common general-purpose DBMS for the large and complex data processing needs of corporations include Oracle, INFORMIX, Microsoft, Ingres, and Sybase SQL-Server, all of which run on UNIX-based workstations and minicomputers, and DB2 for IBM mainframe computers.
DBMS became commercially available for the personal computer in 1981 with the introduction of dBASE II, a program originally developed in 1976 based on the relational model. This and subsequent versions, dBASE III and dBASE IV, were the most popular DBMS for personal computers in the 1980s. By the 1990s, numerous DBMS for DOS, Windows, and the Macintosh operating systems were competing on the market, but none dominated. Some of the more common programs were Paradox, FoxBASE and its successor FoxPro, Microsoft Access, FileMaker Pro, DataEase, and Lotus Approach. These software packages typically cost several hundred dollars.
Units of data within a database are generally called "records." Each record is unique and is further broken down into a limited number of "fields," which describe attributes of the record. For example, in an employee database, a record exists for each employee, and the fields within each record may designate each employee's name, title, salary, date of hire, telephone extension, supervisor's name, and so forth. The fields may or may not be unique to the record, but at least one must be unique for the record to be unique. The fields may contain fixed or variable information, and they may contain either text or numbers. Figures in value- and date-type fields can be used for computations when the DBMS is used to analyze the data. Fields can even contain pictures, video clips, or sound if the DBMS and the computer hardware are capable of handling such multimedia data. Records with the same set of field classifications are usually kept within one file. In a business database, sets of records often exist both for concrete things, such as clients or vendors, and for activities, such as orders, payments, and production statistics.
While many DBMS are specially tailored to specific industry applications and can only be customized by a programmer, often DBMS users have the ability to design at least some attributes of the fields or records and specify how fields, records, and files relate to each other. In companies or organizations that maintain complex databases, a database designer or database administrator position is generally created specifically for this task. Designing databases is also a major activity of computer consulting services.
Once a database is designed, records are created by performing data entry, either human or computer-assisted, such as through bar-code scanning. Records can be added, deleted, or modified instantly, i.e., nearly in real time, or, if there are large volumes of records that require modification, they can by updated, or processed, by a computer operating in batch mode at some specified time after multiple computer users have entered the requests for changes. Such batch-mode data processing typically takes place after the business day is over in order to record the day's sales or shipments.
While all databases include records and fields in one form or another, DBMS vary in how they treat the relationships between records and files. The two best known categories of DBMS structures or models are flat-file and relational. Flat-file systems treat the relationship between fields and records as a two-dimensional table with columns and rows for records and fields, and they are limited in their ability to analyze data from more than one file. Some of the simpler flatfile programs, usually called "file managers" instead of DBMS, can only open and analyze records in one file at a time. Relational DBMS, on the other hand, can analyze data from multiple files with complete flexibility of relationship between records of the multiple files. Other types of DBMS models that can relate data in more than one file but only in restricted relationships include hierarchical DBMS, which relate records from different files in a one-way, many-to-one tree structure. In such a relationship, there exists a number of levels of operation in which each "child" record has only one "parent," and variables are restricted accordingly. Another type, network DBMS, can relate records bidirectionally. The ability to relate records reduces the redundancy of data and makes it unnecessary to update multiple records when data in a single related record changes. A typical scenario of the relationship between data in two files would be the linking of a record in a purchase order file to the customer file based on a single unique field, such as a customer identification number. The latest model is the object-oriented database, in which units of data are treated as abstract objects. Thus, the operations and functions are not dependent on the database application. A particular advantage of the object-oriented database is its ability to create new objects in terms of previously defined objects. This model is most suited for databases containing a combination of media, such as text, sound, and pictures. By the early 1990s, relational DBMS had become the most popular category for new DBMS purchases among businesses.
Once a database is created, the DBMS can be used to select records that meet user requirements based on the information contained (or not contained) in their fields. For example, in using an inventory database, the user can check the availability of a product that meets certain criteria—such as style, color, and additional features—each of which are defined in the fields. A retrieval request may be made for a single, specific record or for multiple records. An example of a request for multiple records in a customer database would be for all those customers whose invoices are past due. The user would, in this case, request records in which the difference between today's date and the date the invoice was recorded being sent is, say, greater than 30 days, and in which the "date of payment receipt" field is blank.
Different DBMS offer different methods of entering commands or "queries" to retrieve information. The most common query command format is Structured Query Language (SQL), in large part because it allows several users on a network to access a database simultaneously. Some DBMS offer the choice of query by command, through menus, or by example forms.
In addition to data retrieval, DBMS allow the user to sort data in the fields by any criteria. This could involve all records in a database or, more practically, those that meet specified selection criteria. For example, records can be selected from a sales database of all salespeople who sold over a certain total dollar amount, and that list then can be sorted to rank the salespeople by amount sold.
Finally, DBMS software allows for generation of various printed or electronic reports from the selected data. One of the most common formats of a database report is a table based on a list of sorted records with selected fields displayed. Data from individual records can also be automatically merged into templates, or empty fields of specific forms or attributes. Additionally, mailing labels can be created by printing data from name and address fields. Some DBMS also incorporate additional software features, such as spreadsheet, word processing, and communications functions, permitting further manipulation of information retrieved from the database.
Databases and DBMS are used on all kinds of computer systems, many of which permit multiple users to access a database simultaneously. On mainframe and minicomputer/midrange systems, users access the database through multiple terminals. DBMS are also increasingly being used on client-server computer networks of personal computers or workstations, including over corporate intranets using a Web browser interface. The database and the DBMS server software reside on one computer that acts as the server, and other copies of the DBMS software are on each of the client computers linked to the server. Finally, there are distributed databases, in which a database is physically stored in two or more computers at different locations yet managed by a single DBMS through copies of the software at each location.
Databases are used in all kinds of businesses for all types of functions: in sales to compile information about clients and potential clients and to keep track of client correspondence; in accounting to keep track of accounts payable and receivable; in purchasing to choose suppliers and their goods and to place orders; in manufacturing to keep track of supplies of component or raw materials; in shipping and receiving to keep track of orders and shipments; in marketing to. track records of advertisers and prospective advertising outlets; and in human resource management to maintain records of employees and match resumes of applicants to job openings. The same DBMS may be used to manage all such tasks in the same organization.
Customer databases are especially important to service industries for maintaining ongoing customer relations. Financial institutions, such as banks, stock brokerages, and insurance companies, rely on DBMS to keep track of customers' financial accounts. Utilities, such as telephone and electric companies, also keep databases of customers, tracking usage of utility service, varied rates, and billing information. Maintenance and repair services, such as those repairing office equipment or appliances, keep service and repair records for each customer. Specialized, and often very simple, DBMS are popular for keeping track of clients or contacts.
For certain brokers or agents, database systems are especially crucial for selecting the goods or services to be sold. One of the first large-scale DBMS for business was the Sabre airline reservation system introduced in 1964. It contains data on the flights and seats of most commercial airlines, permitting a coordination of reservations. Both airline booking departments and travel agents depend upon Sabre and other computer reservations systems, such as Apollo. More recently, the Sabre database has powered scores of Internet-based travel services that allow travelers to search for their own itineraries and fares. Travel agents also use other computer reservation systems to book hotel rooms in major hotels. Other businesspeople who use databases in their business include real estate agents, who keep databases of properties for sale that can be searched by the attributes desired by individual customers. Car dealers and brokers also use databases for locating the various makes and models of cars and their accessories. Real estate agency or car dealer networks may use distributed databases to share information on the properties or cars each broker has locally.
Finally, for some companies, the production or management of a database is their primary business. They conduct their business by selling access to their databases to other companies or the public. The types of databases provided by database vendors include literature retrieval databases (citations and full text of articles or reports), numeric databases (such as stock quotes), individual credit histories, directories, maps and other graphics, and employment listings. Some of the better known text-information database vendors are Dialog, LEXIS-NEXIS, Dow Jones News/Retrieval, ORBIT/Questel, and Chemical Abstracts Service. Corporations subscribe to these database services to research information about their market, competitors, or emerging technologies. For most subscription databases, the software used by the customer for searching the databases is not the same as the DBMS used by the company for creating and updating the database. The software used by the client only permits retrieval and possibly sorting and printing of the data, and it typically has an easy-to-use graphical user interface. Such database software is called "search software." The client may remotely access the vendor's database through communications software and a modem or a network connection, especially over the Internet, or the database vendor may distribute the database with search software on disk, typically a CD-ROM.
The fastest-emerging area of the database-management industry involves the use of parallel database systems to mine data over a server. This entails utilizing several general-purpose database systems in parallel to approximate the functions of highly specialized, task- or industry-specific databases in lieu of going through the steps of designing a special-purpose, highly focused system that may well be quickly outmoded by developing technology. In addition, many researchers have been working toward the perfection of a system that seamlessly merges the object-oriented and relational models. Developers are beginning to hit on a solution that incorporates the SQL search language and the platform-independent Java programming language. This type of database affords a great deal of the visual attractiveness and user-friendliness of Internet applications alongside standard DBMS capabilities.
[ Heather Behn Hedden ]
Inmon, William H. Corporate Information Factory. New York: Wiley Computer Publishing, 1998.
Levitin, Anany V., and Thomas C. Redman. "Data as a Resource: Properties, Implications, and Prescriptions." Sloan Management Review, fall 1998, 89-101.
Mattison, Robert M. Understanding Database Management Systems: An Insider's Guide to Architectures, Products, and Design. New York: McGraw-Hill, 1998.