Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Developing an Algorithm to Identify History of Cancer Using Electronic Medical Records

Developing an Algorithm to Identify History of Cancer Using Electronic Medical Records Introduction/Objective: The objective of this study was to develop an algorithm to identify Kaiser Permanente Colorado (KPCO) members with a history of cancer. Background: Tumor registries are used with high precision to identify incident cancer, but are not designed to capture prevalent cancer within a population. We sought to identify a cohort of adults with no history of cancer, and thus, we could not rely solely on the tumor registry. Methods: We included all KPCO members between the ages of 40-75 years who were continuously enrolled during 2013 (N=201,787). Data from the tumor registry, chemotherapy files, inpatient and outpatient claims were used to create an algorithm to identify members with a high likelihood of cancer. We validated the algorithm using chart review and calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for occurrence of cancer. Findings: The final version of the algorithm achieved a sensitivity of 100% and specificity of 84.6% for identifying cancer. If we relied on the tumor registry alone, 47% of those with a history of cancer would have been missed. Discussion: Using the tumor registry alone to identify a cohort of patients with prior cancer is not suffic ient. In the final version of the algorithm, the sensitivity and PPV were improved when a diagnosis code for cancer was required to accompany oncology visits or chemotherapy administration. Conclusion: EMR data can be used effectively in combination with data from the tumor registry to identify health plan members with a history of cancer. Acknowledgements This work was funded by the National Cancer Institute (Contracts HHSN261201400644P and HHSN261201300460P). We thank LeeAnn Rohm, MSW and Kate Burniece, BS for completing the chart abstraction and Erica Blum-Barnett, MS for manuscript preparation. Keywords algorithm, electronic health record, cancer Disciplines Databases and Information Systems | Health Information Technology | Oncology | Theory and Algorithms Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. This empirical research is available at EDM Forum Community: http://repository.edm-forum.org/egems/vol4/iss1/5 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records eGEMs Generating Evidence & Methods to improve patient outcomes Developing an Algorithm to Identify History of Cancer Using Electronic Medical Records Christina L. Clarke, MS; Heather S. Feigelson, PhD, MPH ABSTRACT Introduction/Objective: The objective of this study was to develop an algorithm to identify Kaiser Permanente Colorado (KPCO) members with a history of cancer. Background: Tumor registries are used with high precision to identify incident cancer, but are not designed to capture prevalent cancer within a population. We sought to identify a cohort of adults with no history of cancer; thus, we could not rely solely on the tumor registry. Methods: We included all KPCO members between the ages of 40–75 years who were continuously ROOHGGXULQJ 1 outpatient claims were used to create an algorithm to identify members with a high likelihood of cancer. H H : value (PPV) and negative predictive value (NPV) for occurrence of cancer. Findings:Y  F percent for identifying cancer. If we relied on the tumor registry alone, 47 percent of those with a history of cancer would have been missed. Discussion: Using the tumor registry alone to identify a cohort of patients with prior cancer is not YR code for cancer was required to accompany oncology visits or chemotherapy administration. Conclusion: Electronic medical record (EMR) data can be used effectively in combination with data from the tumor registry to identify health plan members with a history of cancer. Kaiser Permanente Institute for Health Research Published by EDM Forum Community, 2016 1 HUVLRQRIWKHDOJRULWKPWKHVHQVLWLYLW\DQG339ZHUHLPSUHGZKHQDGLDJQRVLV VXŜFLHQW,QWKHŚQDOY HQWDQGVSHFLŚFLW\RI 7KHŚQDOYHUVLRQRIWKHDOJRULWKPDFKLHHGDVHQVLWLYLW\RISHU HYDOLGDWHGWKHDOJRULWKPXVLQJFKDUWUYLHZDQGFDOFXODWHGVHQVLWLYLW\VSHFLŚFLW\SRVLWLYHSUHGLFWLY 'DWDIURPWKHWXPRUUHJLVWU\FKHPRWKHUDS\ŚOHVDQGLQSDWLHQWDQG HQU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 be missed, because the tumor registry has a lag time Introduction and Objectives of about 12 months. We sought to identify a cohort Tumor registries are considered the “gold standard” of adults from our current KPCO member population for identifying incident cancer; that is, identifying who had never been diagnosed with cancer. To new cancer cases within a defined population within accomplish this, we used our electronic data systems a defined period. The case finding procedures to develop an algorithm to identify individuals with a used by tumor registrars do not focus on the history of cancer, and validated the algorithm using capture of prevalent cancer, or on indications of a manual chart reviews. history of cancer. We are interested in developing Methods a prospective cohort to study the development of incident cancer over time. For such studies, it is KPCO maintains an EMR for each of its members. important to begin with a population with no prior Data from the EMR are collected into a virtual data cancer history. In order to identify all cancers, the warehouse (VDW), which contains content areas tumor registry may not be the only data source. The such as pharmacy (including chemotherapy), objective of this study was to develop an algorithm inpatient and outpatient claims, enrollment, and that used data available from the electronic medical 8-10 patient demographics. We included all KPCO record (EMR) that could identify individuals with a members between the ages of 40–75 years who history of cancer, which included data sources to were continuously enrolled during 2013. We used supplement the tumor registry. administrative and EMR data including the tumor registry, chemotherapy files, and inpatient and Background outpatient claims to identify those with a high Health systems that capture clinical data in an likelihood of prior cancer. This project was reviewed EMR system find many ways to use these data to and approved by the KPCO institutional review improve health care and answer important scientific board (IRB). The requirement for informed consent 2-7 questions. Kaiser Permanente Colorado (KPCO) is was waived. an integrated health plan with EMR data collected Our goal was to identify individuals with any prior over several decades, including information such as cancer, with the exception of nonmelanoma skin medication use, medical conditions, laboratory test cancer. Thus, for the initial algorithm, we cast a wide results, disease onset, and subsequent treatment. net to capture any possible incidence or history When paired with a tumor registry, the EMR can of cancer. Patients who ever had a behavior code be a powerful tool for conducting studies of cancer indicating an in situ or invasive cancer from the incidence and prognosis; however, EMR data are not tumor registry were flagged as having a history without limitations. In particular, events occurring of cancer. We flagged anyone who ever had an outside of the health plan are not well captured in inpatient or outpatient claim with an International most EMRs. Classification of Disease Ninth Edition (ICD-9) code KPCO maintains a tumor registry dating back to indicating cancer. We also flagged any patient who 1989, and it is used reliably to identify incident cancer had at least three visits in the oncology department 1,2 diagnosed and treated within KPCO. Cancers on separate days, or at least two records on separate diagnosed and treated outside of KPCO, usually days of receiving a chemotherapeutic drug (specific prior to KPCO membership, are not followed in the codes are provided in Table 1), dating back to the tumor registry. Recently diagnosed cases may also beginning of our EMR in 1998. http://repository.edm-forum.org/egems/vol4/iss1/5 2 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Table 1. ICD-9 Codes Used to Flag Patients History or Incidence of Cancer VERSION OF CODE SOURCE CODE/LOGIC ALGORITHM TYPE Tumor registry 1 and 2 Behavior In situ or invasive Inpatient and 1 and 2 ICD-9 Any code between 140 and 239 or V10.x, outpatient claims Excluding: 173.x, 199.1, 209.4x, 209.5x, 209.6x, 210.x-229.x, 232.x, 233.1, 238.2, 238.4, 238.7x, 238.9, 239, 239.1-239.5, 239.8-239.9 or V10.83 Inpatient and 1 Only Encounters *3 visits on separate days to Oncology outpatient claims to oncology Inpatient and 1 Only ICD-9 17.70, 99.25, 99.28, V58.11, V07.3, V07.39 outpatient DRG 410, 492 if between years 1998 and 2007, or 837-839, 846-848 if claims indicating between years 2008 and 2013 chemotherapy was given at HCPCS A9600, A9604, C1086, C1166, C1167, C1178, C8953, C8954, C8955, least twice C9004, C9012, C9110, C9205, C9207, C9213, C9214, C9215, C9235, C9257, C9262, C9265, C9414, C9415, C9417, C9418, C9419, C9420, C9421, C9422, C9423, C9424, C9425, C9426, C9427, C9429, C9431, C9432, C9433, C9434, C9437, C9438, C9440, G0355, G0357, G0358, G0359, G0360, G0361, G0362, G8372, G8373, G8374, G9021, G9022, G9023, G9024, G9025, G9026, G9027, G9028, G9029, G9030, G9031, G9032, J0490, J0594, J0894, J1094, J1100, J1190, J1457, J2323, J3262, J7150, J7527, J8510, J8520, J8521, J8530, J8540, J8560, J8561, J8562, J8565, J8600, J8610, J8700, J8705, J8999, J9000, J9001, J9002, J9010, J9015, J9017, J9019, J9020, J9025, J9027, J9033, J9035, J9040, J9041, J9042, J9045, J9050, J9055, J9060, J9062, J9065, J9070, J9080, J9090, J9091, J9092, J9093, J9094, J9095, J9096, J9097, J9098, J9100, J9110, J9120, J9130, J9140, J9150, J9151, J9165, J9170, J9171, J9178, J9180, J9181, J9182, J9185, J9190, J9200, J9201, J9206, J9207, J9208, J9211, J9230, J9245, J9250, J9260, J9261, J9263, J9264, J9265, J9266, J9268, J9270, J9280, J9290, J9291, J9293, J9300, J9302, J9303, J9305, J9307, J9310, J9315, J9320, J9328, J9330, J9340, J9350, J9351, J9355, J9357, J9360, J9370, J9375, J9380, J9390, J9600, J9999, Q0083, Q0084, Q0085, Q2017, Q2024, Q2049, S0087, S0088, S0115, S0116, S0172, S0176, S0178, S0182, S5019, S5020, S9329, S9330, S9331 CPT-4 0519F, 36640, 4180F, 61517, 96400, 96401, 96402, 96405, 96406, 96408, 96409, 96410, 96411, 96412, 96413, 96414, 96415, 96416, 96417, 96420, 96422, 96423, 96425, 96440, 96445, 96446, 96450, 96542, 96545, 96549, C9287, J9043, J9179 Revenue 331, 332, 335 Codes Published by EDM Forum Community, 2016 3 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 To test the algorithm, we conducted manual chart We calculated sensitivity, specificity, positive 11-12 reviews, powered on specificity. We randomly predictive value (PPV), and negative predictive value selected 297 charts from KPCO members who had (NPV) for occurrence of cancer, and used the chart any utilization in 2013. Of these, 69 patients were review results to identify specific codes or logic identified by the algorithm as having a history of patterns that could improve the performance of the cancer, and 228 patients were classified by the 13 algorithm. CIs were calculated using the efficient- algorithm as cancer free. Cancer risk increases with 11 score method corrected for continuity. Based on age, so for the chart review we oversampled those the results of the first chart review, we refined the older than the median age using a ratio of 2:1. We algorithm, then conducted a second review of used a ratio of approximately 4:1 of individuals flagged 200 novel charts, using the same sampling criteria as cancer free to those with cancer. Sampling 228 specified above; except we selected those flagged as cancer-free individuals could detect an 80 percent not having cancer in a ratio of 2:1 (137 no indication specificity with a 95 percent confidence interval (CI) of cancer, 63 flagged with a history of cancer), and of 75 percent to 85 percent. Finally, we excluded recalculated the aforementioned statistics. The cases from chart review that were flagged by the second chart review of 137 charts for those who tumor registry—as we consider the tumor registry to were flagged as cancer free could detect an 80 be a validated source for identifying incident cancer, percent specificity with a 95 percent CI of 73 percent and our aim was to develop a method to identify to 87 percent. cancers not included in the tumor registry. We used PROC SURVEYSELECT (SAS 9.2) to obtain a Findings weighted random sample fitting the above criteria. A total of 201,787 members met our initial inclusion Each chart was fully reviewed to find any mention criteria. The median age of this cohort as of January of cancer, or to confirm there was no history of 1, 2013 was 56 years, 53.8 percent were female, and cancer—using all notes in the chart dating back to the average continuous enrollment time including the beginning of a patient’s enrollment, or to the 2013 was 8.75 years (standard deviation= 6.35 beginning of the EMR in 1998 (up to 15 years of years). The initial algorithm flagged 25,824 (12.8 medical utilization and history). The chart reviewer percent) people as having a history of cancer. Table first looked for the exact date of diagnosis for patients 2 describes the number of people with a history who had been flagged with a history of cancer. If a of cancer identified by each “flag” specified in diagnosis for cancer was not found, the reviewer then the initial algorithm. As we would expect, a large examined the chart from the administrative diagnosis proportion of cases (n=11,410; 44.2 percent), were date in the EMR going forward in time for a mention included in the tumor registry. Another 8,906 (34.5 of cancer. Patients who were not identified as having percent) were identified with only a diagnosis code cancer were fully reviewed from the start of the EMR of cancer; 2,348 (9.1 percent) had only a receipt of forward. We did not review medical record data prior chemotherapy; and having at least three visits to to 1998 (available in paper charts), as it is unlikely oncology accounted for 1,570 (6.1 percent) cases. that information about cancer history would only The remaining 1,590 (6.2 percent) cases were a be recorded prior to 1998 and not captured in EMRs spanning 1998–2013. combination of the aforementioned categories. http://repository.edm-forum.org/egems/vol4/iss1/5 4 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Table 2. Results from Initial Algorithm Indicating How Patients Were Flagged as Having Cancer % NOT IN TUMOR NUMBER OF % OF INCLUSION CRITERIA* REGISTRY/IN TUMOR SUBJECTS TOTAL REGISTRY NOT IN TUMOR REGISTRY (N= 14414 CASES) Chemotherapy only 2,348 16.3% 9.1% Diagnosis only 8,906 61.8% 34.5% Oncology visits only 1,570 10.9% 6.1% Diagnosis and Chemotherapy 262 1.8% 1.0% Oncology and Chemotherapy 324 2.2% 1.3% Oncology and Diagnosis 773 5.4% 3.0% Oncology, Diagnosis and Chemotherapy 231 1.6% 0.9% TUMOR REGISTRY (N= 11410 CASES) Chemotherapy only 5 0.0% 0.0% Diagnosis only 5,319 46.6% 20.6% Oncology visits only 60 0.5% 0.2% Diagnosis and Chemotherapy 306 2.7% 1.2% Oncology and Chemotherapy 11 0.1% 0.0% Oncology and Diagnosis 2,587 22.7% 10.0% Oncology, Diagnosis and Chemotherapy 2,910 25.5% 11.3% Tumor Registry Alone 212 1.9% 0.8% ! UHUWGLDJQRVLVRIFDQFKDGDWOHDVWU\HJLVHLQWKHWXPRUUHU\ZHULIWKHRU\RIFDQFWYLQJDKLVHśDJJHGDVKDHUDWLHQWVZHV 31RW GVRIRUHF RU! YLVLWVW\DSFKHPRWKHU RORJ\RRQF Using manual chart review as the gold standard, by chemotherapy visits only, and 5 (7.3 percent) had the sensitivity, specificity and NPV for a history both chemotherapy and oncology visits, but did not of cancer from the first iteration of the algorithm have any diagnosis of cancer. Of those 23 patients, (algorithm V1) were all relatively high (92.3 percent, only 1 was confirmed to have cancer through chart 87.2 percent, and 98.7 percent, respectively); review. We reviewed these 23 further to determine however, the PPV was low (52.2 percent) (Figure why these patients had encounters consistent with 1). Of the 69 patients identified by the algorithm as cancer treatment, but no evidence in the medical having cancer, 8 (11.6 percent) were identified by record indicated a cancer diagnosis. This review oncology visits only, 10 (14.5 percent) were identified revealed that patients may be seen in oncology Published by EDM Forum Community, 2016 5 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 Figure 1. Performance of Each Algorithm for Identifying Individuals with a History of Cancer Legend: For each table, the results from the tumor registry (the gold standard) are shown in the columns, and the results from the algorithm are HQWFHGSHUVRFLDW9 DQGWKHLUDVDOXH 33HYHGLFWLYHSU9 DQGQHJDWLYDOXH 33HYHGLFDWLYHSUSRVLWLYVSHFLŚFLW\V6HQVLWLYLW\ZRZQLQWKHUVKR RUHGIHFWRUUHPHWKRGFRUHGXVLQJWKHHŜFLHQWVFHFDOFXODWHUDOVZHUYHLQWRQŚGHQF&ZYLHHRXQGRIFKDUWURUHDFKUZQIHVKRDOVDUHUYHLQWRQŚGHQFF continuity. Panel A shows results from chart review 1, algorithm version 1. Panel B shows results from chart review 1, algorithm version 2. Panel C shows results from chart review 2, algorithm version 2. several times for suspected tumors, or they may flagged, in the revision, as having cancer. The undergo chemotherapy for noncancer-related resulting specificity and PPV improved (95.7 percent conditions such as idiopathic thrombocytopenic and 76.1 percent, respectively), while the sensitivity purpura (ITP). and NPV were slightly reduced to 89.7 percent and 98.4 percent, respectively (Figure 1). We then ran the Based on this first chart review, we revised revised algorithm on the full data set and conducted the algorithm (algorithm V2) to require either a second manual review on a new sample of charts. (1) diagnosis of cancer, or (2) inclusion in the This second version of the algorithm had a sensitivity tumor registry. Visits to oncology or receipt of of 100 percent, specificity of 84.6 percent, NPV of chemotherapy alone were not sufficient to flag a 100 percent, and PPV of 60.3 percent on the second record as a cancer case. This revision eliminated 4,242 (16.4 percent) of cases originally suspected chart review, a marked improvement from version as having cancer; and 46 of the 297 patients were one (Figure 1). http://repository.edm-forum.org/egems/vol4/iss1/5 6 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Using the second version of the algorithm, most 60 percent were women. Members flagged as cases—19,278 (89.3 percent)—were identified as having cancer were significantly older than those having cancers prior to 2013. Nearly half of the without; the median age was 64 years and 56 cancers identified were not in the tumor registry years, respectively (p < 0.0001). Figure 2 shows (N=10,172; 47 percent), and of those, 9,241 (91 the age distribution of the cancer and noncancer percent) were identified prior to 2013. The top populations. cancer indications that were not in the tumor registry prior to 2013 were “history of breast cancer,” Figure 2. Age Distribution of Those with and Without “diagnosis of stage 1 breast cancer,” “history of a History of Cancer, as Defined by the Algorithm, of malignant melanoma,” “history of prostate cancer,” 201,787 KPCO Members Enrolled in 2013 and Ages and “diagnosis of prostate cancer,” which accounted 40–75 years. The distribution of individuals with for 37 percent of diagnoses prior to 2013 not in the tumor registry. Of those with prior cancer, cancer is skewed to older ages, which is expected. Figure 2. Age Distribution of Those with and Without a History of Cancer, as Defined by the Algorithm, of 201,787 KPCO Members Enrolled in 2013 and Ages 40–75 years 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Age Patients without a history of cancer Patients with a history of cancer Note: The distribution of individuals with cancer is skewed to older ages, which is expected. Published by EDM Forum Community, 2016 7 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 rate may be unacceptably high, and one may wish to Discussion improve the specificity of this algorithm. We developed an algorithm with high sensitivity (100 Our methods have other limitations that should be percent) and specificity (84.6 percent) to identify considered. First, we limited our cohort to adults patients with a history of cancer using administrative who are ages 40–75 years; the algorithm may data routinely captured in an EMR. Our final perform differently in other age groups. Second, algorithm identified 10.7 percent of all health plan it is likely that we still included people with prior members ages 40–75 years with a history of cancer; cancer, in that we depend on the patients to report nearly half (47 percent) of these cases would have prior cancer to their medical providers, and for been missed if we relied on the tumor registry alone. those providers to record the information using a Our intent was to create an algorithm to identify diagnosis code. We did not review medical records anyone with a history of cancer, so that those prior to 1998 (available in paper charts). Although individuals could be excluded from an analytic unlikely, it is possible that information about cancer cohort, leaving only cancer-free individuals. Because history would only be recorded prior to 1998 and cancer is a rare disease, the remaining cohort of not captured in EMRs spanning the next 15 years. In cancer-free individuals would be large, and we were such instances, we would have erroneously classified thus willing to except a somewhat lower specificity. people with a history of cancer as “cancer free.” In the first version of the algorithm, chart review People who have little interaction with the medical revealed that receipt of chemotherapy or visits to system, or who are newer members of the health oncology alone contributed to an unacceptably low plan, are more likely to have an incomplete medical PPV (52.2 percent). In particular, infused therapies history. Given that the average length of KPCO are not always administered for treatment of cancer; membership in this population was 8.75 years, the we identified patients receiving infusions for ITP magnitude of this error is likely small. or other conditions being mistakenly flagged as We have not validated this algorithm in other data cancer cases. In the final version of the algorithm, the systems; however, this algorithm and these methods sensitivity and PPV were improved when a diagnosis should be generalizable to other health plans with code for cancer was required to accompany EMR systems. We used codes such as ICD-9 that oncology visits or chemotherapy administration. are readily available in other systems (Table 1). KPCO The PPV of this algorithm was modest (60.3 has a well-developed and validated tumor registry; percent), a reflection of both the specificity and the organizations without a tumor registry, or with a less prevalence of cancer in our study population. When complete registry, could find this algorithm useful the prevalence of a disease is low, the PPV will not to identify all cancer cases (not just prior cancers) be close to 1, even if both sensitivity and specificity for research purposes. This algorithm is useful for are high. In the final version of the algorithm, our identifying a cancer-free study population that may false positive rate was 15 percent; thus, in our sample be desirable for any number of research questions— of 21,582 people flagged as having cancer, we for example, to study conditions such as heart excluded 3,331 people who did not have a history failure that may be more common among those of cancer. The remaining sample (n=180,205) was with a history of cancer. Our algorithm may not sufficiently large to create a cancer-free cohort. perform as well in other EMR systems where the data However, for other applications, this false positive are not as complete as those at KPCO. The KPCO http://repository.edm-forum.org/egems/vol4/iss1/5 8 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 5. Platt R, Davis R, Finkelstein J, et al. Multicenter epidemiologic EMR system dates back to 1998, and the accuracy of and health services research on therapeutics in the HMO 16-19 our data has been validated in previous studies. Research Network Center for Education and Research on This algorithm, like other algorithms, depends on Therapeutics. Pharmacoepidemiol Drug Saf. 2001;10:373–377. complete and accurate data; additional validation 6. Go AS, Magid DJ, Wells B, et al. The Cardiovascular Research may be required when applying it to EMR data that Network: a new paradigm for cardiovascular quality and outcomes research. Circ Cardiovasc Qual Outcomes. are not as well developed as the KPCO EMR. 2008;1:138–147. 7. Baggs J, Gee J, Lewis E, et al. The Vaccine Safety Datalink: a Conclusion model for monitoring immunization safety. Pediatrics. 2011;127 Suppl 1:S45-S53. We developed an algorithm with high sensitivity (100 8. Ross TR, Ng D, Brown JS et al. The HMO Research Network percent) and specificity (84.6 percent) to identify Virtual Data Warehouse: A public data model to support patients with a history of cancer using administrative collaboration. EGEMS (Wash DC). 2014 Mar 24;2(1):1049. data routinely captured in an EMR. It is not sufficient doi:10.13063/2327-9214.1049. eCollection 2014. to rely on the tumor registry alone to capture those 9. Ritzwoller DP, Carroll N, Delate T et al. Validation of electronic data on chemotherapy and hormone therapy use in HMOs. with a history of cancer. Casting a wide net will help Med Care. 2013;51:e67-e73. ensure that anyone with a history of cancer, whether 10. Hornbrook MC, Hart G, Ellis JL et al. Building a virtual cancer diagnosed recently or prior to joining the health plan, research organization. J Natl Cancer Inst Monogr. 2005:12-25. will be identified. This algorithm could be applied to 11. Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. Third Edition. New York: John Wiley & Sons 2003. other health plans with similar data coding systems. 12. Newcombe, R. G. Two-sided confidence intervals for the single proportion: comparison of seven methods: comparison of Acknowledgments seven methods. Statistics in Medicine. 1998;17.8:857-872. 13. Gordis L. Epidemiology. Fourth Edition. Philadelphia: Elsevier This work was funded by the National Cancer Saunders, 2009. Institute (Contracts HHSN261201400644P and 14. Altman DG and Bland JM. Diagnostic tests 2: predictive values. HHSN261201300460P). We thank LeeAnn Rohm, BMJ. 1994;09:102. MSW and Kate Burniece, BS for completing the 15. Bowles EJA, Wellman R, Feigelson HS, et al. Risk of heart chart abstraction and Erica Blum-Barnett, MS for failure in breast cancer patients after anthracycline and trastuzumab treatment: a retrospective cohort study. Journal manuscript preparation. of the National Cancer Institute. 2012;104.17:1293-1305. 16. Delate T, Bowles EJA, Pardee R, et al. Validity of eight References integrated healthcare delivery organizations’ administrative 1. Thoburn KK, German RR, Lewis M, et al. Case completeness clinical data to capture breast cancer chemotherapy exposure. and data accuracy in the Centers for Disease Control and Cancer Epidemiol Biomarkers Prev. 2012;21(4):673-80. Prevention’s National Program of Cancer Registries. Cancer. 17. Ritzwoller DP, Carroll N, Delate T, et al. Validation of Electronic 2007;109(8):1607-1616. Data on Chemotherapy and Hormone Therapy Use in HMOs. 2. Bowles EJA, Feigelson HS, Barney T, et al. Improving quality Med Care. 2013;51.10:e67-73. of breast cancer surgery through development of a national breast cancer surgical outcomes (BRCASO) research 18. Andrade SE, Moore Simas TA, Boudreau D, et al. Validation database. BMC Cancer. 2012;12:136. of algorithms to ascertain clinical conditions and medical 3. Wagner EH, Greene SM, Hart G, et al. Building a research procedures used during pregnancy. Pharmacoepidemiol Drug consortium of large health systems: the Cancer Research Saf. 2011 Nov;20(11):1168-76. Network. J Natl Cancer Inst Monogr. 2005;(35):3-11. 19. Bowles EJA, Tuzzio L, Ritzwoller DP, et al. Accuracy and 4. Kahn MG, Raebel, MA, Glanz JM, et al. A Pragmatic Framework complexities of using automated clinical data for capturing for Single-site and Multisite Data Quality Assessment in chemotherapy administrations: implications for future Electronic Health Record-based Clinical Research. Med Care. 2012;50(Suppl):S21–S29. research. Med Care. 2009;47:1091–1097. Published by EDM Forum Community, 2016 9 ,VVXH1XPEHU http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png eGEMs Pubmed Central

Developing an Algorithm to Identify History of Cancer Using Electronic Medical Records

eGEMs , Volume 4 (1) – Apr 13, 2016

Loading next page...
 
/lp/pubmed-central/developing-an-algorithm-to-identify-history-of-cancer-using-electronic-jVkKAF0d8y

References (18)

Publisher
Pubmed Central
ISSN
2327-9214
eISSN
2327-9214
DOI
10.13063/2327-9214.1209
Publisher site
See Article on Publisher Site

Abstract

Introduction/Objective: The objective of this study was to develop an algorithm to identify Kaiser Permanente Colorado (KPCO) members with a history of cancer. Background: Tumor registries are used with high precision to identify incident cancer, but are not designed to capture prevalent cancer within a population. We sought to identify a cohort of adults with no history of cancer, and thus, we could not rely solely on the tumor registry. Methods: We included all KPCO members between the ages of 40-75 years who were continuously enrolled during 2013 (N=201,787). Data from the tumor registry, chemotherapy files, inpatient and outpatient claims were used to create an algorithm to identify members with a high likelihood of cancer. We validated the algorithm using chart review and calculated sensitivity, specificity, positive predictive value (PPV) and negative predictive value (NPV) for occurrence of cancer. Findings: The final version of the algorithm achieved a sensitivity of 100% and specificity of 84.6% for identifying cancer. If we relied on the tumor registry alone, 47% of those with a history of cancer would have been missed. Discussion: Using the tumor registry alone to identify a cohort of patients with prior cancer is not suffic ient. In the final version of the algorithm, the sensitivity and PPV were improved when a diagnosis code for cancer was required to accompany oncology visits or chemotherapy administration. Conclusion: EMR data can be used effectively in combination with data from the tumor registry to identify health plan members with a history of cancer. Acknowledgements This work was funded by the National Cancer Institute (Contracts HHSN261201400644P and HHSN261201300460P). We thank LeeAnn Rohm, MSW and Kate Burniece, BS for completing the chart abstraction and Erica Blum-Barnett, MS for manuscript preparation. Keywords algorithm, electronic health record, cancer Disciplines Databases and Information Systems | Health Information Technology | Oncology | Theory and Algorithms Creative Commons License This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License. This empirical research is available at EDM Forum Community: http://repository.edm-forum.org/egems/vol4/iss1/5 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records eGEMs Generating Evidence & Methods to improve patient outcomes Developing an Algorithm to Identify History of Cancer Using Electronic Medical Records Christina L. Clarke, MS; Heather S. Feigelson, PhD, MPH ABSTRACT Introduction/Objective: The objective of this study was to develop an algorithm to identify Kaiser Permanente Colorado (KPCO) members with a history of cancer. Background: Tumor registries are used with high precision to identify incident cancer, but are not designed to capture prevalent cancer within a population. We sought to identify a cohort of adults with no history of cancer; thus, we could not rely solely on the tumor registry. Methods: We included all KPCO members between the ages of 40–75 years who were continuously ROOHGGXULQJ 1 outpatient claims were used to create an algorithm to identify members with a high likelihood of cancer. H H : value (PPV) and negative predictive value (NPV) for occurrence of cancer. Findings:Y  F percent for identifying cancer. If we relied on the tumor registry alone, 47 percent of those with a history of cancer would have been missed. Discussion: Using the tumor registry alone to identify a cohort of patients with prior cancer is not YR code for cancer was required to accompany oncology visits or chemotherapy administration. Conclusion: Electronic medical record (EMR) data can be used effectively in combination with data from the tumor registry to identify health plan members with a history of cancer. Kaiser Permanente Institute for Health Research Published by EDM Forum Community, 2016 1 HUVLRQRIWKHDOJRULWKPWKHVHQVLWLYLW\DQG339ZHUHLPSUHGZKHQDGLDJQRVLV VXŜFLHQW,QWKHŚQDOY HQWDQGVSHFLŚFLW\RI 7KHŚQDOYHUVLRQRIWKHDOJRULWKPDFKLHHGDVHQVLWLYLW\RISHU HYDOLGDWHGWKHDOJRULWKPXVLQJFKDUWUYLHZDQGFDOFXODWHGVHQVLWLYLW\VSHFLŚFLW\SRVLWLYHSUHGLFWLY 'DWDIURPWKHWXPRUUHJLVWU\FKHPRWKHUDS\ŚOHVDQGLQSDWLHQWDQG HQU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 be missed, because the tumor registry has a lag time Introduction and Objectives of about 12 months. We sought to identify a cohort Tumor registries are considered the “gold standard” of adults from our current KPCO member population for identifying incident cancer; that is, identifying who had never been diagnosed with cancer. To new cancer cases within a defined population within accomplish this, we used our electronic data systems a defined period. The case finding procedures to develop an algorithm to identify individuals with a used by tumor registrars do not focus on the history of cancer, and validated the algorithm using capture of prevalent cancer, or on indications of a manual chart reviews. history of cancer. We are interested in developing Methods a prospective cohort to study the development of incident cancer over time. For such studies, it is KPCO maintains an EMR for each of its members. important to begin with a population with no prior Data from the EMR are collected into a virtual data cancer history. In order to identify all cancers, the warehouse (VDW), which contains content areas tumor registry may not be the only data source. The such as pharmacy (including chemotherapy), objective of this study was to develop an algorithm inpatient and outpatient claims, enrollment, and that used data available from the electronic medical 8-10 patient demographics. We included all KPCO record (EMR) that could identify individuals with a members between the ages of 40–75 years who history of cancer, which included data sources to were continuously enrolled during 2013. We used supplement the tumor registry. administrative and EMR data including the tumor registry, chemotherapy files, and inpatient and Background outpatient claims to identify those with a high Health systems that capture clinical data in an likelihood of prior cancer. This project was reviewed EMR system find many ways to use these data to and approved by the KPCO institutional review improve health care and answer important scientific board (IRB). The requirement for informed consent 2-7 questions. Kaiser Permanente Colorado (KPCO) is was waived. an integrated health plan with EMR data collected Our goal was to identify individuals with any prior over several decades, including information such as cancer, with the exception of nonmelanoma skin medication use, medical conditions, laboratory test cancer. Thus, for the initial algorithm, we cast a wide results, disease onset, and subsequent treatment. net to capture any possible incidence or history When paired with a tumor registry, the EMR can of cancer. Patients who ever had a behavior code be a powerful tool for conducting studies of cancer indicating an in situ or invasive cancer from the incidence and prognosis; however, EMR data are not tumor registry were flagged as having a history without limitations. In particular, events occurring of cancer. We flagged anyone who ever had an outside of the health plan are not well captured in inpatient or outpatient claim with an International most EMRs. Classification of Disease Ninth Edition (ICD-9) code KPCO maintains a tumor registry dating back to indicating cancer. We also flagged any patient who 1989, and it is used reliably to identify incident cancer had at least three visits in the oncology department 1,2 diagnosed and treated within KPCO. Cancers on separate days, or at least two records on separate diagnosed and treated outside of KPCO, usually days of receiving a chemotherapeutic drug (specific prior to KPCO membership, are not followed in the codes are provided in Table 1), dating back to the tumor registry. Recently diagnosed cases may also beginning of our EMR in 1998. http://repository.edm-forum.org/egems/vol4/iss1/5 2 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Table 1. ICD-9 Codes Used to Flag Patients History or Incidence of Cancer VERSION OF CODE SOURCE CODE/LOGIC ALGORITHM TYPE Tumor registry 1 and 2 Behavior In situ or invasive Inpatient and 1 and 2 ICD-9 Any code between 140 and 239 or V10.x, outpatient claims Excluding: 173.x, 199.1, 209.4x, 209.5x, 209.6x, 210.x-229.x, 232.x, 233.1, 238.2, 238.4, 238.7x, 238.9, 239, 239.1-239.5, 239.8-239.9 or V10.83 Inpatient and 1 Only Encounters *3 visits on separate days to Oncology outpatient claims to oncology Inpatient and 1 Only ICD-9 17.70, 99.25, 99.28, V58.11, V07.3, V07.39 outpatient DRG 410, 492 if between years 1998 and 2007, or 837-839, 846-848 if claims indicating between years 2008 and 2013 chemotherapy was given at HCPCS A9600, A9604, C1086, C1166, C1167, C1178, C8953, C8954, C8955, least twice C9004, C9012, C9110, C9205, C9207, C9213, C9214, C9215, C9235, C9257, C9262, C9265, C9414, C9415, C9417, C9418, C9419, C9420, C9421, C9422, C9423, C9424, C9425, C9426, C9427, C9429, C9431, C9432, C9433, C9434, C9437, C9438, C9440, G0355, G0357, G0358, G0359, G0360, G0361, G0362, G8372, G8373, G8374, G9021, G9022, G9023, G9024, G9025, G9026, G9027, G9028, G9029, G9030, G9031, G9032, J0490, J0594, J0894, J1094, J1100, J1190, J1457, J2323, J3262, J7150, J7527, J8510, J8520, J8521, J8530, J8540, J8560, J8561, J8562, J8565, J8600, J8610, J8700, J8705, J8999, J9000, J9001, J9002, J9010, J9015, J9017, J9019, J9020, J9025, J9027, J9033, J9035, J9040, J9041, J9042, J9045, J9050, J9055, J9060, J9062, J9065, J9070, J9080, J9090, J9091, J9092, J9093, J9094, J9095, J9096, J9097, J9098, J9100, J9110, J9120, J9130, J9140, J9150, J9151, J9165, J9170, J9171, J9178, J9180, J9181, J9182, J9185, J9190, J9200, J9201, J9206, J9207, J9208, J9211, J9230, J9245, J9250, J9260, J9261, J9263, J9264, J9265, J9266, J9268, J9270, J9280, J9290, J9291, J9293, J9300, J9302, J9303, J9305, J9307, J9310, J9315, J9320, J9328, J9330, J9340, J9350, J9351, J9355, J9357, J9360, J9370, J9375, J9380, J9390, J9600, J9999, Q0083, Q0084, Q0085, Q2017, Q2024, Q2049, S0087, S0088, S0115, S0116, S0172, S0176, S0178, S0182, S5019, S5020, S9329, S9330, S9331 CPT-4 0519F, 36640, 4180F, 61517, 96400, 96401, 96402, 96405, 96406, 96408, 96409, 96410, 96411, 96412, 96413, 96414, 96415, 96416, 96417, 96420, 96422, 96423, 96425, 96440, 96445, 96446, 96450, 96542, 96545, 96549, C9287, J9043, J9179 Revenue 331, 332, 335 Codes Published by EDM Forum Community, 2016 3 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 To test the algorithm, we conducted manual chart We calculated sensitivity, specificity, positive 11-12 reviews, powered on specificity. We randomly predictive value (PPV), and negative predictive value selected 297 charts from KPCO members who had (NPV) for occurrence of cancer, and used the chart any utilization in 2013. Of these, 69 patients were review results to identify specific codes or logic identified by the algorithm as having a history of patterns that could improve the performance of the cancer, and 228 patients were classified by the 13 algorithm. CIs were calculated using the efficient- algorithm as cancer free. Cancer risk increases with 11 score method corrected for continuity. Based on age, so for the chart review we oversampled those the results of the first chart review, we refined the older than the median age using a ratio of 2:1. We algorithm, then conducted a second review of used a ratio of approximately 4:1 of individuals flagged 200 novel charts, using the same sampling criteria as cancer free to those with cancer. Sampling 228 specified above; except we selected those flagged as cancer-free individuals could detect an 80 percent not having cancer in a ratio of 2:1 (137 no indication specificity with a 95 percent confidence interval (CI) of cancer, 63 flagged with a history of cancer), and of 75 percent to 85 percent. Finally, we excluded recalculated the aforementioned statistics. The cases from chart review that were flagged by the second chart review of 137 charts for those who tumor registry—as we consider the tumor registry to were flagged as cancer free could detect an 80 be a validated source for identifying incident cancer, percent specificity with a 95 percent CI of 73 percent and our aim was to develop a method to identify to 87 percent. cancers not included in the tumor registry. We used PROC SURVEYSELECT (SAS 9.2) to obtain a Findings weighted random sample fitting the above criteria. A total of 201,787 members met our initial inclusion Each chart was fully reviewed to find any mention criteria. The median age of this cohort as of January of cancer, or to confirm there was no history of 1, 2013 was 56 years, 53.8 percent were female, and cancer—using all notes in the chart dating back to the average continuous enrollment time including the beginning of a patient’s enrollment, or to the 2013 was 8.75 years (standard deviation= 6.35 beginning of the EMR in 1998 (up to 15 years of years). The initial algorithm flagged 25,824 (12.8 medical utilization and history). The chart reviewer percent) people as having a history of cancer. Table first looked for the exact date of diagnosis for patients 2 describes the number of people with a history who had been flagged with a history of cancer. If a of cancer identified by each “flag” specified in diagnosis for cancer was not found, the reviewer then the initial algorithm. As we would expect, a large examined the chart from the administrative diagnosis proportion of cases (n=11,410; 44.2 percent), were date in the EMR going forward in time for a mention included in the tumor registry. Another 8,906 (34.5 of cancer. Patients who were not identified as having percent) were identified with only a diagnosis code cancer were fully reviewed from the start of the EMR of cancer; 2,348 (9.1 percent) had only a receipt of forward. We did not review medical record data prior chemotherapy; and having at least three visits to to 1998 (available in paper charts), as it is unlikely oncology accounted for 1,570 (6.1 percent) cases. that information about cancer history would only The remaining 1,590 (6.2 percent) cases were a be recorded prior to 1998 and not captured in EMRs spanning 1998–2013. combination of the aforementioned categories. http://repository.edm-forum.org/egems/vol4/iss1/5 4 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Table 2. Results from Initial Algorithm Indicating How Patients Were Flagged as Having Cancer % NOT IN TUMOR NUMBER OF % OF INCLUSION CRITERIA* REGISTRY/IN TUMOR SUBJECTS TOTAL REGISTRY NOT IN TUMOR REGISTRY (N= 14414 CASES) Chemotherapy only 2,348 16.3% 9.1% Diagnosis only 8,906 61.8% 34.5% Oncology visits only 1,570 10.9% 6.1% Diagnosis and Chemotherapy 262 1.8% 1.0% Oncology and Chemotherapy 324 2.2% 1.3% Oncology and Diagnosis 773 5.4% 3.0% Oncology, Diagnosis and Chemotherapy 231 1.6% 0.9% TUMOR REGISTRY (N= 11410 CASES) Chemotherapy only 5 0.0% 0.0% Diagnosis only 5,319 46.6% 20.6% Oncology visits only 60 0.5% 0.2% Diagnosis and Chemotherapy 306 2.7% 1.2% Oncology and Chemotherapy 11 0.1% 0.0% Oncology and Diagnosis 2,587 22.7% 10.0% Oncology, Diagnosis and Chemotherapy 2,910 25.5% 11.3% Tumor Registry Alone 212 1.9% 0.8% ! UHUWGLDJQRVLVRIFDQFKDGDWOHDVWU\HJLVHLQWKHWXPRUUHU\ZHULIWKHRU\RIFDQFWYLQJDKLVHśDJJHGDVKDHUDWLHQWVZHV 31RW GVRIRUHF RU! YLVLWVW\DSFKHPRWKHU RORJ\RRQF Using manual chart review as the gold standard, by chemotherapy visits only, and 5 (7.3 percent) had the sensitivity, specificity and NPV for a history both chemotherapy and oncology visits, but did not of cancer from the first iteration of the algorithm have any diagnosis of cancer. Of those 23 patients, (algorithm V1) were all relatively high (92.3 percent, only 1 was confirmed to have cancer through chart 87.2 percent, and 98.7 percent, respectively); review. We reviewed these 23 further to determine however, the PPV was low (52.2 percent) (Figure why these patients had encounters consistent with 1). Of the 69 patients identified by the algorithm as cancer treatment, but no evidence in the medical having cancer, 8 (11.6 percent) were identified by record indicated a cancer diagnosis. This review oncology visits only, 10 (14.5 percent) were identified revealed that patients may be seen in oncology Published by EDM Forum Community, 2016 5 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 Figure 1. Performance of Each Algorithm for Identifying Individuals with a History of Cancer Legend: For each table, the results from the tumor registry (the gold standard) are shown in the columns, and the results from the algorithm are HQWFHGSHUVRFLDW9 DQGWKHLUDVDOXH 33HYHGLFWLYHSU9 DQGQHJDWLYDOXH 33HYHGLFDWLYHSUSRVLWLYVSHFLŚFLW\V6HQVLWLYLW\ZRZQLQWKHUVKR RUHGIHFWRUUHPHWKRGFRUHGXVLQJWKHHŜFLHQWVFHFDOFXODWHUDOVZHUYHLQWRQŚGHQF&ZYLHHRXQGRIFKDUWURUHDFKUZQIHVKRDOVDUHUYHLQWRQŚGHQFF continuity. Panel A shows results from chart review 1, algorithm version 1. Panel B shows results from chart review 1, algorithm version 2. Panel C shows results from chart review 2, algorithm version 2. several times for suspected tumors, or they may flagged, in the revision, as having cancer. The undergo chemotherapy for noncancer-related resulting specificity and PPV improved (95.7 percent conditions such as idiopathic thrombocytopenic and 76.1 percent, respectively), while the sensitivity purpura (ITP). and NPV were slightly reduced to 89.7 percent and 98.4 percent, respectively (Figure 1). We then ran the Based on this first chart review, we revised revised algorithm on the full data set and conducted the algorithm (algorithm V2) to require either a second manual review on a new sample of charts. (1) diagnosis of cancer, or (2) inclusion in the This second version of the algorithm had a sensitivity tumor registry. Visits to oncology or receipt of of 100 percent, specificity of 84.6 percent, NPV of chemotherapy alone were not sufficient to flag a 100 percent, and PPV of 60.3 percent on the second record as a cancer case. This revision eliminated 4,242 (16.4 percent) of cases originally suspected chart review, a marked improvement from version as having cancer; and 46 of the 297 patients were one (Figure 1). http://repository.edm-forum.org/egems/vol4/iss1/5 6 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 Using the second version of the algorithm, most 60 percent were women. Members flagged as cases—19,278 (89.3 percent)—were identified as having cancer were significantly older than those having cancers prior to 2013. Nearly half of the without; the median age was 64 years and 56 cancers identified were not in the tumor registry years, respectively (p < 0.0001). Figure 2 shows (N=10,172; 47 percent), and of those, 9,241 (91 the age distribution of the cancer and noncancer percent) were identified prior to 2013. The top populations. cancer indications that were not in the tumor registry prior to 2013 were “history of breast cancer,” Figure 2. Age Distribution of Those with and Without “diagnosis of stage 1 breast cancer,” “history of a History of Cancer, as Defined by the Algorithm, of malignant melanoma,” “history of prostate cancer,” 201,787 KPCO Members Enrolled in 2013 and Ages and “diagnosis of prostate cancer,” which accounted 40–75 years. The distribution of individuals with for 37 percent of diagnoses prior to 2013 not in the tumor registry. Of those with prior cancer, cancer is skewed to older ages, which is expected. Figure 2. Age Distribution of Those with and Without a History of Cancer, as Defined by the Algorithm, of 201,787 KPCO Members Enrolled in 2013 and Ages 40–75 years 5.00% 4.00% 3.00% 2.00% 1.00% 0.00% 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 Age Patients without a history of cancer Patients with a history of cancer Note: The distribution of individuals with cancer is skewed to older ages, which is expected. Published by EDM Forum Community, 2016 7 ,VVXH1XPEHU eGEMs (Generating Evidence & Methods to improve patient outcomes), Vol. 4 [2016], Iss. 1, Art. 5 rate may be unacceptably high, and one may wish to Discussion improve the specificity of this algorithm. We developed an algorithm with high sensitivity (100 Our methods have other limitations that should be percent) and specificity (84.6 percent) to identify considered. First, we limited our cohort to adults patients with a history of cancer using administrative who are ages 40–75 years; the algorithm may data routinely captured in an EMR. Our final perform differently in other age groups. Second, algorithm identified 10.7 percent of all health plan it is likely that we still included people with prior members ages 40–75 years with a history of cancer; cancer, in that we depend on the patients to report nearly half (47 percent) of these cases would have prior cancer to their medical providers, and for been missed if we relied on the tumor registry alone. those providers to record the information using a Our intent was to create an algorithm to identify diagnosis code. We did not review medical records anyone with a history of cancer, so that those prior to 1998 (available in paper charts). Although individuals could be excluded from an analytic unlikely, it is possible that information about cancer cohort, leaving only cancer-free individuals. Because history would only be recorded prior to 1998 and cancer is a rare disease, the remaining cohort of not captured in EMRs spanning the next 15 years. In cancer-free individuals would be large, and we were such instances, we would have erroneously classified thus willing to except a somewhat lower specificity. people with a history of cancer as “cancer free.” In the first version of the algorithm, chart review People who have little interaction with the medical revealed that receipt of chemotherapy or visits to system, or who are newer members of the health oncology alone contributed to an unacceptably low plan, are more likely to have an incomplete medical PPV (52.2 percent). In particular, infused therapies history. Given that the average length of KPCO are not always administered for treatment of cancer; membership in this population was 8.75 years, the we identified patients receiving infusions for ITP magnitude of this error is likely small. or other conditions being mistakenly flagged as We have not validated this algorithm in other data cancer cases. In the final version of the algorithm, the systems; however, this algorithm and these methods sensitivity and PPV were improved when a diagnosis should be generalizable to other health plans with code for cancer was required to accompany EMR systems. We used codes such as ICD-9 that oncology visits or chemotherapy administration. are readily available in other systems (Table 1). KPCO The PPV of this algorithm was modest (60.3 has a well-developed and validated tumor registry; percent), a reflection of both the specificity and the organizations without a tumor registry, or with a less prevalence of cancer in our study population. When complete registry, could find this algorithm useful the prevalence of a disease is low, the PPV will not to identify all cancer cases (not just prior cancers) be close to 1, even if both sensitivity and specificity for research purposes. This algorithm is useful for are high. In the final version of the algorithm, our identifying a cancer-free study population that may false positive rate was 15 percent; thus, in our sample be desirable for any number of research questions— of 21,582 people flagged as having cancer, we for example, to study conditions such as heart excluded 3,331 people who did not have a history failure that may be more common among those of cancer. The remaining sample (n=180,205) was with a history of cancer. Our algorithm may not sufficiently large to create a cancer-free cohort. perform as well in other EMR systems where the data However, for other applications, this false positive are not as complete as those at KPCO. The KPCO http://repository.edm-forum.org/egems/vol4/iss1/5 8 DOI: 10.13063/2327-9214.1209 Clarke and Feigelson: Algorithm to Identify History of Cancer Using Electronic Medical Records Volume 4 5. Platt R, Davis R, Finkelstein J, et al. Multicenter epidemiologic EMR system dates back to 1998, and the accuracy of and health services research on therapeutics in the HMO 16-19 our data has been validated in previous studies. Research Network Center for Education and Research on This algorithm, like other algorithms, depends on Therapeutics. Pharmacoepidemiol Drug Saf. 2001;10:373–377. complete and accurate data; additional validation 6. Go AS, Magid DJ, Wells B, et al. The Cardiovascular Research may be required when applying it to EMR data that Network: a new paradigm for cardiovascular quality and outcomes research. Circ Cardiovasc Qual Outcomes. are not as well developed as the KPCO EMR. 2008;1:138–147. 7. Baggs J, Gee J, Lewis E, et al. The Vaccine Safety Datalink: a Conclusion model for monitoring immunization safety. Pediatrics. 2011;127 Suppl 1:S45-S53. We developed an algorithm with high sensitivity (100 8. Ross TR, Ng D, Brown JS et al. The HMO Research Network percent) and specificity (84.6 percent) to identify Virtual Data Warehouse: A public data model to support patients with a history of cancer using administrative collaboration. EGEMS (Wash DC). 2014 Mar 24;2(1):1049. data routinely captured in an EMR. It is not sufficient doi:10.13063/2327-9214.1049. eCollection 2014. to rely on the tumor registry alone to capture those 9. Ritzwoller DP, Carroll N, Delate T et al. Validation of electronic data on chemotherapy and hormone therapy use in HMOs. with a history of cancer. Casting a wide net will help Med Care. 2013;51:e67-e73. ensure that anyone with a history of cancer, whether 10. Hornbrook MC, Hart G, Ellis JL et al. Building a virtual cancer diagnosed recently or prior to joining the health plan, research organization. J Natl Cancer Inst Monogr. 2005:12-25. will be identified. This algorithm could be applied to 11. Fleiss JL, Levin B, Paik MC. Statistical Methods for Rates and Proportions. Third Edition. New York: John Wiley & Sons 2003. other health plans with similar data coding systems. 12. Newcombe, R. G. Two-sided confidence intervals for the single proportion: comparison of seven methods: comparison of Acknowledgments seven methods. Statistics in Medicine. 1998;17.8:857-872. 13. Gordis L. Epidemiology. Fourth Edition. Philadelphia: Elsevier This work was funded by the National Cancer Saunders, 2009. Institute (Contracts HHSN261201400644P and 14. Altman DG and Bland JM. Diagnostic tests 2: predictive values. HHSN261201300460P). We thank LeeAnn Rohm, BMJ. 1994;09:102. MSW and Kate Burniece, BS for completing the 15. Bowles EJA, Wellman R, Feigelson HS, et al. Risk of heart chart abstraction and Erica Blum-Barnett, MS for failure in breast cancer patients after anthracycline and trastuzumab treatment: a retrospective cohort study. Journal manuscript preparation. of the National Cancer Institute. 2012;104.17:1293-1305. 16. Delate T, Bowles EJA, Pardee R, et al. Validity of eight References integrated healthcare delivery organizations’ administrative 1. Thoburn KK, German RR, Lewis M, et al. Case completeness clinical data to capture breast cancer chemotherapy exposure. and data accuracy in the Centers for Disease Control and Cancer Epidemiol Biomarkers Prev. 2012;21(4):673-80. Prevention’s National Program of Cancer Registries. Cancer. 17. Ritzwoller DP, Carroll N, Delate T, et al. Validation of Electronic 2007;109(8):1607-1616. Data on Chemotherapy and Hormone Therapy Use in HMOs. 2. Bowles EJA, Feigelson HS, Barney T, et al. Improving quality Med Care. 2013;51.10:e67-73. of breast cancer surgery through development of a national breast cancer surgical outcomes (BRCASO) research 18. Andrade SE, Moore Simas TA, Boudreau D, et al. Validation database. BMC Cancer. 2012;12:136. of algorithms to ascertain clinical conditions and medical 3. Wagner EH, Greene SM, Hart G, et al. Building a research procedures used during pregnancy. Pharmacoepidemiol Drug consortium of large health systems: the Cancer Research Saf. 2011 Nov;20(11):1168-76. Network. J Natl Cancer Inst Monogr. 2005;(35):3-11. 19. Bowles EJA, Tuzzio L, Ritzwoller DP, et al. Accuracy and 4. Kahn MG, Raebel, MA, Glanz JM, et al. A Pragmatic Framework complexities of using automated clinical data for capturing for Single-site and Multisite Data Quality Assessment in chemotherapy administrations: implications for future Electronic Health Record-based Clinical Research. Med Care. 2012;50(Suppl):S21–S29. research. Med Care. 2009;47:1091–1097. Published by EDM Forum Community, 2016 9 ,VVXH1XPEHU

Journal

eGEMsPubmed Central

Published: Apr 13, 2016

There are no references for this article.