Screening Fundamentals
Abstract
Abstract While researchers have established the value of screening for breast cancer with mammography, with and without clinical breast examination, age-specific analyses have led to differing opinions regarding the ages and the intervals that breast cancer screening should begin. This article, therefore, provides a detailed, age-specific evaluation of mammography screening by assessing the severity of breast cancer, the effectiveness of earlier versus later treatment, and the accuracy and reliability of mammography. Data from previous randomized trials and other sources are used to evaluate these criteria. The results indicate that screening programs must have high levels of participation, achieve acceptable sensitivity (85%) and specificity (90%), adopt age-specific screening intervals, and consider how disease stage influences diagnosis. In addition, as others have noted, the following benchmarks can be used to evaluate screening programs: (1) more than 50% of screen-detected cancers should be smaller than 15 mm; (2) 30% or more of grade 3 cancers detected on screening should be less than 15 mm; and (3) more than 70% of cancers detected on screening should be node negative. As a disease control strategy and policy, the goal of breast cancer screening is to reduce morbidity and mortality by distinguishing those individuals in an asymptomatic population that are likely and not likely to have breast cancer (1). The emphasis on likelihood is important and inherent in the concept of screening. A person identified by a screening test as likely to have a disease is then referred for further diagnostic testing to determine whether he or she does in fact have the disease and therefore needs treatment. The emphasis on likelihood also is important because screening tests and programs have inherent limitations according to the criteria that will be described below; thus, while the majority of screening test interpretations are correct, inevitably some individuals will be incorrectly identified as possibly having the disease (a “false positive”), and screening will fail to identify some who do have the disease (a “false negative”). The advantage of screening an asymptomatic population is that the test can identify preclinical disease with sufficient lead time—that is, the time before the expected onset of symptoms—to potentially alter the natural, and more adverse, course of disease. In order to be an effective disease control strategy, a screening program should meet fundamental criteria in three areas: 1) characteristics of the disease; 2) the effectiveness of earlier versus later treatment; and 3) characteristics of the screening test—specifically, its accuracy and reliability, but also its costs and acceptability to the target population (2). It would be ideal if there were conventional benchmarks for these criteria, either alone or considered together, but this is not the case. Further, decisions about screening are more easily reached if the evidence for the effectiveness of earlier versus later treatment, or test performance, derives from well-designed randomized clinical trials, since observational studies are subject to well-known biases that complicate the interpretation of end results (2). When such evidence is lacking, decision makers are confronted with two alternatives: await data from a well-designed randomized clinical trial, or attempt to draw inferences from the data at hand. The interplay between standard evaluative criteria for screening, evidence-based medicine, and the existing evidence has been at the heart of the debate over the efficacy and value of mammography screening for women ages 40-49. Prior to 1995, no individual trial or meta-analysis had demonstrated a statistically significant reduction in breast cancer deaths among women ages 40-49 who received an invitation to mammography screening (7). While a number of U.S. organizations at that time recommended that women ages 40-49 undergo mammography every one to two years, this recommendation was made on the basis of indirect evidence that mammography is beneficial to this age group (3,4). Other organizations did not endorse screening women ages 40-49, primarily because none of the trials up until then showed a statistically significant reduction in deaths among the 40-49 group (5,6). In 1995, however, Smart and colleagues published meta-analysis results that showed a statistically significant 24% reduction in deaths when all population-based trials of mammography screening were combined (7). More recent results reveal statistically significant mortality reductions for all trials combined, and for two individual Swedish trials (8,9,10). Thus, at this time, breast cancer screening for women ages 40-49 has met standard norms of evidence, and screening for women in their forties is endorsed by both the American Cancer Society and the National Cancer Institute. It is nonetheless important to carefully evaluate the criteria listed above and to compare the performance of screening among women in their forties and fifties according to these criteria. The remainder of this article is devoted to such an evaluation. Disease Burden In order to justify screening large numbers of healthy people, the disease should represent a significant public health burden. This burden may be a function of any one or combination of three disease burden measures: morbidity, mortality, and/or premature mortality. For most, breast cancer meets these criteria of importance well enough. Breast cancer is the most common malignancy diagnosed among women, and the second leading cause of mortality from cancer. The American Cancer Society estimates that in 1997, 180,200 women will be diagnosed with invasive breast cancer, 36,400 women will be diagnosed with ductal carcinoma in situ (DCIS), and 43,900 women will die from this disease (11). Breast cancer is also a leading cause of premature mortality among women, and the leading cause of premature mortality from cancer (12). On average, a woman who has died of breast cancer has lost 19.4 years of life she might have otherwise had (13). In fact, the decision to include women aged 40 and older in the Health Insurance Plan of Greater New York randomized trial of breast cancer screening was based on the observation that women diagnosed with breast cancer between ages 40-49 contributed 34% of the total years of potential life lost due to breast cancer (14). The emphasis on incidence rather than deaths for women in their forties is important here, since a significant proportion of the deaths that occur among women diagnosed in their forties will occur after age 50. On the basis of these early study design decisions, subsequent studies and screening guidelines by some organizations have also set the age of 40 as the earliest age at which screening should begin. The incidence of breast cancer increases with age. The diagnosis of breast cancer is uncommon before age 25 years, and begins to increase measurably thereafter. Between ages 40-49, an estimated 1 in 66 women (1.52%) will be diagnosed with breast cancer during that 10-year period; annual age-specific incidence rates are 122.6 per 100,000 women ages 40-44 and 199.5 per 100,000 for women ages 45-49. Between the ages of 50-59 an estimated 1 in 40 women (2.48%) will be diagnosed with breast cancer in that 10-year period; annual age-specific rates are 237.1 per 100,000 women ages 50-54 and 280.0 per 100,000 women ages 55-59 (15). Due to trends in aging (in particular, the maturation of the postwar birth cohort), in 1997, nearly the same number of cases of breast cancer are expected to be diagnosed among women aged 40-49 as among women aged 50-59 (32,600 vs. 33,000), even through breast cancer rates among younger women are lower (16). In recent years, the estimated number of women diagnosed in their forties actually exceeded the estimated number of new cases among women aged 50-59 (17). These measures of disease burden, taken individually or comparatively, allow one to reasonably conclude that breast cancer is an important health problem for women in their forties. While incidence is lower among women ages 40-49 compared with women in their fifties, incidence and associated measures of disease burden in each age group are sufficiently high to justify disease control efforts. Earlier versus Later Treatment Beyond disease burden, the disease must also meet certain criteria related to its preclinical phase (1). First, the preclinical condition should reasonably predict the probability of progression to clinical symptoms if left untreated. It should be noted that the preclinical condition may be invasive disease, or some important disease precursor. Diagnosis of invasive disease before the onset of symptoms is the goal of breast cancer screening. However, controversy has arisen over the increasing rate of diagnosis of DCIS resulting from greater participation in mammography, especially in younger women, on the basis that not all DCIS will progress to invasive disease. Clearly, some does, but knowledge is limited as to the proportion that will evolve into an invasive tumor. DCIS is believed to be a precursor to invasive disease for several reasons. First, it is often found in the adjacent margins of excised tumors. Second, invasive breast cancer has been shown to develop in a proportion of untreated cases (having biopsy only) of previously diagnosed benign disease, subsequently determined to be low-grade DCIS. In one study, breast cancer developed in 9 of 28 patients—five of the nine patients died of the disease (18). In some of the cases that did not eventually develop breast cancer, the entire lesion may have been removed at the time of biopsy, and thus effectively treated. Third, incomplete excision of DCIS has been associated with a greater probability of subsequent recurrence of invasive disease in the same area of the breast (19). Nevertheless, the fact that not all, and perhaps a significant proportion of DCIS may not progress to invasive disease has led to concerns regarding overtreatment, highlighted recently in an article by Ernster and colleagues (20). In fact, a growing clinical appreciation for the heterogeneity of DCIS has led to a number of efforts to determine prognostic factors associated with DCIS, as well as the range of treatments, some less and some more aggressive, that are appropriate based on the histologic characteristics of the disease (21,22,23). Given the current state of knowledge, reducing overtreatment of non-invasive and minimally invasive disease is a high priority. However, a diagnosis of DCIS should not be considered a “cost” of a screening program, insofar as DCIS represents a non-invasive condition with the highest probability of progression to invasive disease and thus, today, requires treatment. It should also selectively not be considered a cost only for women ages 40-49, since women ages 50-59 show a similar proportion of tumors diagnosed as DCIS (24,25). For individuals or the population at risk, we do not have the knowledge to tailor screening schedules in order to only detect lesions of “known” significance. Thus, it is important, however, to consider the relative importance of a diagnosis of DCIS in a screening program apart from the issue of over-treatment, especially since the latter can be addressed through professional education. A second criterion is that the disease should have a detectable, preclinical phase, estimated as the mean sojourn time (1,26). The sojourn time is the estimated maximum duration of the detectable preclinical phase, and is the basis for establishing screening intervals within which beneficial lead times are attainable (26). The sojourn time must be of sufficient length to assure a reasonable level of disease prevalence, both for the disease to be detectable and to offer the opportunity for detection at a point when medical intervention can make a difference in its natural history. Thus, it is axiomatic that screening intervals be less than the estimated mean sojourn time. It has been estimated that the mean breast cancer sojourn time for women aged 40-49 is 1.7 years, whereas for women aged 50-69 it is between 3.3 and 3.8 years (27). This difference in estimated sojourn times has caused concern that the majority of existing trials screened women ages 40-49 at an interval that was too wide to provide the full potential benefit of an early detection program (24,27). Thus, the absence of a larger reduction in deaths, and the longer period of follow-up required to observe a benefit in individual trials and meta-analyses, has been attributed in large part to the failure of two-year screening intervals to adequately reduce the rate of advanced disease in women aged 40-49 (28). Finally, there should be sufficient evidence that treatment for early stage disease offers significant benefits compared with treatment at a later stage. The benefits of breast cancer treatment at earlier versus later stages are well established, although on the basis of observed mortality reductions in the trials, evidence has historically been stronger for women aged 50 and older than for women aged 40-49 (29,30,31). However, since diagnosis at more favorable stages has been the basis for the observed mortality reductions in the trials, and analyses have shown similar long-term survival for women ages 40-49 compared with women ages 50 years and older when grouped by similar prognostic factors, benefits have been inferred for breast cancer detected by mammography in younger women (31,32). Further, longer term follow-up of trial data has revealed incremental benefits from screening among women randomized in their forties, eventually revealing statistically significant reductions in deaths after an average 12-year follow-up (8). Characteristics of the Screening Test Provided that the disease in question meets the characteristics described above, the test must meet acceptable criteria for accuracy and reliability. In other words, it must do a reasonably good job to correctly distinguish those who probably have the disease from those who probably do not have the disease. The conventional performance measures are the cancer detection rate, sensitivity, specificity, and positive predictive value. These measures are defined by end results in the context of a breast cancer screening program. By convention, the basic measurements for calculating these outcome measures are as follows: A true positive (TP) can be defined as breast cancer diagnosed within one year after a biopsy recommendation following an abnormal mammogram. A true negative (TN) can be defined as no evidence of breast cancer within one year of a normal mammogram. A false negative (FN) can be defined as a cancer diagnosed within one year of a normal mammogram. Finally, a false positive (FP) can be defined several ways, each relevant to the focus of evaluation in a screening program, and each according to the criterion that there is no evidence of breast cancer within one year after the definition of a positive finding. First, the false positive rate can be based on cases recalled for additional imaging evaluation after an abnormal screening mammogram. An alternative measure is based on the number of cases referred to biopsy or surgical consultation after an abnormal mammogram. A third definition considers only those who have actually undergone biopsy after an abnormal mammogram. Each false positive measurement, in turn, represents additional progression into the diagnostic process (33). Sensitivity is a measure of the probability of detecting a cancer when a cancer exists, or the proportion of patients found to have cancer within one year of screening who were identified as having an abnormality at the time of screening. Sensitivity is estimated by TP/(TP + FN). Specificity is a measure of the probability of correctly identifying an individual as not having cancer when no cancer exists, or the proportion of patients not found to have cancer within one year of a normal screening examination. Specificity is estimated as TN/(TN + FP). The positive predictive value (PPV) varies according to the definition of a false positive result, and is the proportion of cases correctly identified as having cancer among all cases identified as positive according to the three definitions listed above (33). In other words, positive predictive value is given by TP/(TP + FP). The goal of a screening program is to achieve uniformly high sensitivity and specificity, and the relative importance of accuracy for either of these measures is a function of the consequences and severity of an error, both for the individual and the cost of the screening program. From a measurement standpoint, the sensitivity and specificity of mammography are influenced by number of factors, including the quality control of the screening tests, interpretation thresholds, and the screening interval. Thus, any assessment of existing estimates must consider the characteristics of the screening program from which they derive (34). For this reason, constant monitoring of the performance of a screening program is essential to determine those dimensions of sensitivity and specificity inherent in the interplay between the disease and the technology at hand, and those which may be influenced by improvements in technique and operation. How well does screening women ages 40-49 measure against screening women ages 50-59 according to these criteria? The Agency for Health Care Policy and Research's (AHCPR) Clinical Practice Guidelines No. 13: Quality Determinants of Mammography included performance measures to help mammography facilities evaluate medical audit data (33). According to the AHCPR guideline, if measurable, sensitivity should exceed 85%, specificity should exceed 90%, positive predictive value based on abnormal screening exam should be between 5-10%, and positive predictive value when biopsy is recommended should be between 25-40%. Data from established screening programs in the United States typically reveal that the efficiency of screening improves somewhat with age; this is especially true for positive predictive value measures, since they depend on the underlying prevalence of disease (35). However, in these series, and those data reported elsewhere, screening performance for women ages 40-49 and 50-59 approximates these performance measures and was more similar than dissimilar (35,36,37,38,39). Further, in a University of California, San Francisco (UCSF) program, a substantial decline in sensitivity was observed as the screening interval increased among participants in the program (36), meaning many of the existing measures of sensitivity from trials and other studies must be interpreted in the context of not only accuracy of interpretation, but the width of the screening interval. This is especially true when comparing sensitivity data for women aged 40-49 with older women, since women aged 50 and older are estimated to have a much wider mean sojourn time, one that is more coincident with the average screening intervals in the trials (24,27,28). Moreover, data from UCSF and Albuquerque presented at the 1997 National Institutes of Health Consensus Development Conference on Breast Cancer Screening for Women Ages 40-49 showed similar performance for women ages 40-49 and 50-59 with respect to tumor size, nodal involvement, and the rate of advanced cancers (36,37). Other published reports have shown similar comparative performance (38,39). Still, Tabar and colleagues have argued that these conventional measures lack the necessary precision to fully assess the performance of a breast cancer screening program, and the argument is compelling in light of the varying end results and measures of sensitivity observed in previous studies (3,40,41). Mammographic sensitivity is not simply a measurement of test accuracy. Underlying the measurement of sensitivity is disease prevalence, characteristics of the population being screened, image quality, interpretative skill, screening intervals, and the threshold for intervention. Further, since breast cancer is a heterogeneous disease, similar measures of sensitivity are no assurance of detecting the same mix of cancers at favorable prognostic stages. For these reasons, Tabar et al. recommend the following benchmarks for the evaluation of the performance of a screening program: (1) more than 50% of screen-detected cancers should be smaller than 15 mm; (2) 30% or more of grade 3 cancers detected on screening should be less than 15 mm; and (3) more than 70% of cancers detected on screening should be node negative (40). In addition, high rates of participation are required, and participants should adhere as closely as possible to a recommended interval. More than anything else, the goal of a breast cancer screening program is a significant reduction in the rate of advanced disease over what would be expected in the absence of screening. Conclusion As noted above, the decision to screen is based on factors related to the importance of the disease as a public health problem and the ability of a screening test and program to meet acceptable levels of performance. Population-based screening is generally thought to be justified if the disease is important, and the screening test is judged to meet accepted criteria related to accuracy, efficacy, and practicality. While these criteria are commonly applied as an evaluative template, there are no specific thresholds by which decisions to offer or not offer screening can be made. A screening test may fail to meet any one of these criteria and therefore deemed not useful, i.e., it may have low sensitivity, or lower sensitivity than an alternative test. However, it is also the case that these criteria may be evaluated collectively, since the benefits, costs, and consequences of these criteria considered together may vary in important ways according to the population, disease, and test under scrutiny. Still, on balance, the same data may lead to different conclusions about the value of screening in a population, and decisions to recommend or not recommend screening may be more complicated when the underlying evidence is more inferential than direct. However, once the decision to screen has been reached, it is critical that screening programs are carefully monitored and that attention is devoted to using results to improve performance. In general, a breast cancer screening program must have high levels of participation and must achieve acceptable levels of performance in terms of sensitivity and specificity. More fundamentally, for screening to be effective, the program must reduce the incidence rate of advanced breast cancer in a population. References (1) Morrison AS. Screening in Chronic Disease. New York: Oxford University Press, 1992. (2) Cole P, Morrison AS. Basic issues in population screening for cancer. J Natl Cancer Inst 1980 ; 64 : 1263 -72. (3) Fletcher SW, Black W, Harris R, Rimer BK, Shapiro S. Report of the international workshop for screening for breast cancer. J Natl Cancer Inst 1993 ; 85 : 1644 -5. (4) Dodd GD. American Cancer Society guidelines on screening for breast cancer: an overview. CA 1992 ; 42 : 177 -80. (5) Smith RA. Breast cancer screening guidelines. Women's Health Issues 1992 ; 2 : 212 -17. (6) U.S. Preventive Services Task Force. Guide to clinical preventive services, 2nd ed. Baltimore: Williams & Wilkins, 1996. (7) Smart CR, Hendrick RE, Rutledge JH, Smith RA. Benefit of mammography screening in women aged 40-49: current evidence from randomized controlled trials. Cancer 1995 ; 75 : 1619 -26. (8) Hendrick RE, Smith RA, Routledge JH, Smart C. Benefit of mammography screening in women ages 40-49: current evidence from randomized clinical trials. Presented at the NIH Consensus Development Conference on Breast Cancer Screening for Women Ages 40-49, 1997 January 21-23; Bethesda (MD). (9) Bjurstam N, Bjornel L, Duffy SW. The Gothenberg breast screening trial: results from 11 years' follow-up. Presented at the NIH Consensus Development Conference on Breast Cancer Screening for Women Ages 40-49, 1997 January 21-23; Bethesda (MD). (10) Andersson I. The Malmö mammographic screening trial: update on results and a harm-benefit analysis. Presented at the NIH Consensus Development Conference on Breast Cancer Screening for Women Ages 40-49, 1997 January 21-23; Bethesda (MD). (11) American Cancer Society. Facts and Figures. Atlanta: American Cancer Society, 1997. (12) CDC. Premature mortality due to breast cancer—United States, 1984. Morbidity and Mortality Weekly Report, 1987 ; 36 : 736 -9. (13) National Cancer Institute. Stat Bite: Average years of life lost from cancer. J Natl Cancer Inst 1995 ; 87 : 956 . (14) Shapiro S, Venet W, Strax P, Venet L. Periodic Screening for Breast Cancer: The Health Insurance Plan Project and its Sequelae, 1963-1986. Baltimore: Johns Hopkins Press, 1988. (15) Ries LAG, Kosary C, Hankey BF, Harras A, Miller BA, Edwards BK. SEER Cancer Statistics Review, 1973-1993: Tables and Graphs, National Cancer Institute. Bethesda, MD, 1996. (16) American Cancer Society. Breast Cancer Facts and Figures, 1997. Atlanta: American Cancer Society, 1997. (17) Smith RA. Epidemiology of breast cancer. In Kopans DB, Mendelson EB, editors. Syllabus: a categorical course in breast imaging. Chicago: Radiological Society of North America, 1995. (18) Page D, Dupont W, Rogers L, Jensen R, Schuyler P. Continued local recurrence of carcinoma 15-25 years after a diagnosis of low-grade ductal carcinoma in situ of the breast treated only by biopsy. Cancer 1995 ; 76 : 1197 -200. (19) Frykberg ER, Bland KI. Management of in situ and minimally invasive breast carcinoma. World J Surgery 1994 ; 18 : 45 -57. (20) Ernster VL, Barclay J, Kerlikowske K, Grady D, Henderson IC. Incidence and treatment for ductal carcinoma in situ of the breast. JAMA 1996 ; 275 : 913 -8. (21) Lagios, M. Duct carcinoma in situ. Surgical Clinics of North America 1990 : 70 : 853 -71. (22) Silverstein M, Craig P, Waisman J, Lewinsky B, Colburn W, Poller D. A prognostic index for ductal carcinoma in situ of the breast. Cancer 1996 ; 77 : 2267 -74. (23) Schnitt S, Harris J, Smith B. Developing a prognostic index for ductal carcinoma in situ of the breast. Are we there yet? Cancer 1996 ; 77 : 2189 -92. (24) Tabar L, Fagerberg G, Chen RH, Duffy SW, Smart CR, Gad A, Smith RA. Efficacy of breast screening by age. New results from the Swedish two-county trial. Cancer 1995 ; 75 : 2412 -19. (25) Smart CR, Byrne C, Smith RA, Garfinkel L, Letton AH, Dodd GD, Beahrs OH. Twenty-year follow-up of the breast cancers diagnosed during the breast cancer detection demonstration project. CA 1997 ; 47 : 134 -49. (26) Walter SD, Day NE. Estimation of the duration of the preclinical state using data. Am J Epidemiol 1983 ; 118 : 865 -86. (27) Duffy SW, Chen HH, Tabar L, Day NE. Estimation of mean sojourn time in breast cancer screening using a Markov-chain model of both entry to and exit from the preclinical detectable phase. Statistics in Medicine 1995 ; 14 : 1531 -43. (28) Breast cancer screening with mammography in women aged 40-49 years. Report of the Organizing Committee and Collaborators, Falun Meeting, Falun, Sweden (1996 March 21-22). Int J Cancer 1996 ; 68 : 693 -9. (29) Hurley SF, Kaldor JM. The benefits and risks of mammographic screening for breast cancer. Epidemiologic Reviews 1992 ; 14 : 101 -30. (30) Nystrom L, Rutqvist LE, Wall S, Lindgren A, Lindqvist M, Ryden S, et al. Breast cancer screening with mammography: overview of Swedish randomised trials [published erratum appears in Lancet 1993;342:1372]. Lancet 1993 ; 341 : 973 -8. (31) Tabar L, Duffy SW, Burhenne LW. New Swedish breast cancer detection results for women aged 40-49. Cancer 1993 ; 72 : 1437 -48. (32) Ries LAG, Henson DE, Harras A. Survival from breast cancer according to tumor size and nodal status. Surgical Oncology Clinics of North America 1994 ; 3 : 35 -50. (33) Bassett LW, Hendrick RE, Bassford TL, et al. Quality Determinants of Mammography. Clinical Practice Guideline No. 13. AHCPR Publication No. 95-0632. Rockville (MD): AHCPR, DHHS, PHS, 1994. (34) Chen HH, Duffy SW. A Markov-chain method to estimate the tumor progression rate from preclinical to clinical phase, sensitivity and positive predictive value for mammography in breast cancer screening. The Statistician 1996 ; 45 : 1 -11. (35) Kerlikowske K, Grady D, Barclay J, Sickles EA, Eaton A, Ernster V. Positive predictive value of screening mammography by age and family history of breast cancer. JAMA 1993 ; 271 : 2444 -50. (36) Sickles EA. Screening outcomes: clinical experience with service screening using modern mammography. In: Program and Abstracts. NIH Consensus Development Conference: Breast Cancer Screening for Women Ages 40-49. National Institutes of Health, Bethesda (MD), 1997. (37) Linver MN. Mammography outcomes in a practice setting by age: prognostic factors, sensitivity, and positive biopsy rate. In: Program and Abstracts. NIH Consensus Development Conference: Breast Cancer Screening for Women Ages 40-49. National Institutes of Health, Bethesda (MD), 1997. (38) Curpen BN, Sickles EA, Sollitto RA, Ominsky SH, Galvin HB, Frankel SD. The comparative value of mammographic screening for women 40-49 years old versus women 50-64 years old. AJR 1995 ; 164 : 1099 -103. (39) Thurfjell EL, Lindgren JAA. Breast cancer survival rates with mammographic screening: similar favorable survival rates for women younger and those older than 50 years. Radiology 1996 ; 201 : 421 -6. (40) Tabar L, Fagerberg G, Duffy SW, Day NE, Gad A, Grotoft O. Update of the Swedish two-county program of mammographic screening for breast cancer. Radiologic Clinics of North America 1992 ; 30 : 187 -210. (41) Kerlikowske, KM. Outcomes of modern screening mammography. In: Program and Abstracts. NIH Consensus Development Conference: Breast Screening for Women Ages 40-49. National Institutes of Health, Bethesda (MD), 1997. (42) Linver MN, Osuch JR, Brenner RJ, Smith RA. The mammography audit: a primer for the Mammography Quality Standards Act (MQSA). AJR 1995 ; 165 : 19 -25. Oxford University Press Oxford University Press