Metrics and measurement is comprised of two major bodies of knowledge, both studying the quantitative assessment of performance using test statistics. The first area includes the test measurement (primarily education-oriented), opinion survey, and market research survey industries. The second area is software engineering whereby the technical specifications of computer programs are appraised for execution.
In the education industry, student intelligence measurement is a key concern for several reasons, including resource scarcity and pressure to keep up with a world that's becoming increasingly technological. The market research industry finds corporations striving to reach their customers through advertising and direct contact as well through niche marketing, whereby firms are able to reach consumers by way of preferred select offerings. This increased focus has driven firms to increase their knowledge of their customers and produce or provide services to meet specific consumer demands. Research methodologies used for measuring student learning and consumer preferences and profiles are similar.
The reporting and interpretation of tests and surveys is becoming more of a matter of presentation than content. For example, if a simple spelling test was administered and the number of words correctly spelled was 67, what would this raw score mean? By itself, not much. Questions that would arise include: How many words were asked to be spelled on the test? How many difficult and easy words were there? What is the age and education of the person being tested? Raw scores (absolute standards) typically need to be transformed into a test statistic for the purposes of measurement. Likewise, a survey designed to poll people on their preference for capital punishment will generate varying results depending on who is asked, the circumstances of when the question is asked, and so on. For example, the time period surrounding an emotionally charged event, such as the murder of a child, may cause sentiment in favor of capital punishment to rise. The way capital punishment is carried out (hanging, electric chair, lethal injection, firing squad) can affect public opinion as well. Researching consumer tastes will produce different results depending on the consumer group, season of the year, state of the economy, and so on.
The units of measurement are the test (or survey) scores. There are two major types of scores: norm-referenced scores and content-referenced scores. Norm-referenced scores—which include same group (inter-individual) and growth (intra-individual)norms—measure the raw scores relative to a statistical norm. Same-group norms measure the performance compared to a reference group—the benchmark. Growth norms are used to interpret scores over time, and are relevant when observing an individual or group that is changing because of training, aging, disease progression, etc.
Content-referenced scores provide insight into how specific test questions were answered. Examples of metrics that provide this information are raw, percent correct, and criterion (e.g., pass/fail, healthy/sick, satisfied/unsatisfied) scores.
Of these two types of scores, norm-referenced scores, in general, have become much more popular as opposed to content-referenced scores. Of course, which is the most appropriate depends on the situation.
Another rationale for scales and norms to interpret tests (and surveys) is that the number of questions answered correctly is not necessarily a meaningful measure. First, it assumes each unit (question) is of equal value. Second, it may depend on the purpose (e.g., minimal knowledge or ability). One assumes the test (or survey) results are transferable to other situations (or places) and are therefore generalizable. For example, passing a competency test, such as a driver's license test, denotes competence; that is, you can perform a specific task, such as driving, adequately. Similarly, a marketing survey conducted in San Diego, California, that finds a preference for red-painted cars is more useful if it holds true for people living in Chicago, Atlanta, and elsewhere.
Metrics that may be used to report and interpret test scores include:
The impact of computers on business and society increased exponentially in the latter part of the 20th century. Methods of measuring the quality of software programs are continually being developed. Much of the metrics to measure and distinguish software quality focus on conformance to explicit development standards, functional and performance requirements, and features expected of all software. The measurement of software quality can be classified into those variables that can be directly measured (e.g., errors per billion calculations) and indirectly measured (e.g., maintainability).
Important quality factors include the following measures:
The metrics to grade (on a scale of 0 [low] to [high]) the previous measures are as follows:
As both software and hardware engineering evolve, new metrics and measurements will be created to evaluate the performance quality.
The purposes of the measurement of the metrics are to help evaluate the software models, indicate the complexity of the procedural designs and source code, and aid in the construction of more testing. This measurement process follows five steps:
In these ways, metrics and measurement can provide a quantifiable gauge of quality for software, facilitating the improvement of the product by the software engineer.
[ Raymond A. K. Cox ]
Ebel, R. L. Essentials of Educational Measurement. Englewood Cliffs, NJ: Prentice-Hall, 1979.
Grady, R. B. Practical Software Metrics for Project Management and Process Improvement. Englewood Cliffs, NJ: Prentice-Hall, 1992.
Lorenz, M., and J. Kidd. Object-Oriented Software Metrics. Englewood Cliffs, NJ: Prentice Hall, 1994.
Sheppard, M. Software Engineering Metrics. New York: McGraw-Hill, 1992.