Evidence review for diagnostic accuracy of endoscopic surveillance techniques
Evidence review D
NICE Guideline, No. 231
1. Diagnostic accuracy of endoscopic surveillance techniques
1.1. Review question
What is the diagnostic accuracy of different endoscopic surveillance techniques including high resolution endoscopy and chromoendoscopy?
1.1.1. Introduction
Different techniques of endoscopic surveillance are currently used within clinical practice. It is not known how accurate those techniques are in comparison to what is held as the gold standard or reference for endoscopic surveillance (high resolution white light endoscopy).
1.1.2. Methods and process
This evidence review was developed using the methods and process described in Developing NICE guidelines: the manual. Methods specific to this review question are described in the review protocol in appendix A and the methods document.
Declarations of interest were recorded according to NICE’s conflicts of interest policy.
1.1.3. Summary of the protocol
For full details see the review protocol in Appendix A.
Table 1
PICO characteristics of review question.
1.1.4. Diagnostic evidence
1.1.4.1. Included studies
15 diagnostic accuracy studies were included in the review; 1–9, 11–16 these are summarised in Table 2 below. Evidence from these studies is summarised in the clinical evidence summary below in Appendix C and references in 1.1.13 References.
The aim of the studies was to assess diagnostic test accuracy in identifying Barrett’s oesophagus with dysplasia or cancer, low grade dysplasia, high grade intraepithelial dysplasia/ neoplasia/ cancer, T1a or T1b neoplasia.
12 studies provided information on the diagnostic accuracy of chromoendoscopy techniques, 1 study provided information on the diagnostic accuracy of endoscopic brushing (brush biopsy). 2 studies provided information on the diagnostic accuracy of artificial intelligence (AI): one study looking at convolutional neural networks and one looking at narrow-bang imaging + AI and white-light imaging +AI.
No evidence was identified for the diagnostic accuracy of trans-nasal endoscopy.
Meta-analysis was not conducted because where two or more studies examined the diagnostic accuracy of the same index test, they looked at different target conditions (e.g. high grade dysplasia or low-grade dysplasia), or reported location based analysis while other studies reported per patient based analysis. Thus, results from these studies are presented individually on a per-study basis. Where studies provided insufficient information to extract 2×2 table data (true positives, true negatives, false positives, false negatives) this has been highlighted for each study in Table 3 and sensitivity and specificity measures were extracted as reported in the paper. Where confidence intervals were not available to assess imprecision in the effect measures, evidence quality was downgraded by 1 increment. Evidence was downgraded for indirectness where studies included a mixed population of people with and without known Barrett’s oesophagus. Evidence was also downgraded for indirectness where there was a lack of clarity around the quality of endoscopy as a reference standard, or where histology was used as a reference standard with white-light endoscopy results provided separately to those of the index test.
The majority of studies were of cross-sectional design, 5 studies being prospective and 3 studies being retrospective. There were also 5 randomised cross-over studies and 2 prospective randomised controlled trials included in the review.
It was noted in the literature high-resolution white light endoscopy is also referred to as high-definition white-light endoscopy. It has been extracted as reported in the studies, but the terms are used interchangeably within the evidence report with high-resolution white light endoscopy primarily used in the committee’s discussion of the evidence.
See also the study selection flow chart in Appendix C, sensitivity and specificity forest plots in Appendix E, and study evidence tables in Appendix D.
1.1.4.2. Excluded studies
See the excluded studies list in Appendix G.
1.1.5. Summary of studies included in the diagnostic evidence
Table 2
Summary of studies included in the evidence review.
See Appendix D for full evidence tables.
1.1.6. Summary of the diagnostic evidence
Clinical decision thresholds were set as sensitivity/specificity =0.9 and 0.8 above which a test would be recommended and 0.6 and 0.5 below which a test is of no clinical use.
Table 3
Clinical evidence summary: diagnostic test accuracy for chromoendoscopy.
Table 4
Clinical evidence summary: diagnostic test accuracy for endoscopic brushing.
Table 5
Clinical evidence summary: diagnostic test accuracy for artificial intelligence.
1.1.7. Economic evidence
1.1.7.1. Included studies
No health economic studies were included.
1.1.7.2. Excluded studies
No relevant health economic studies were excluded due to assessment of limited applicability or methodological limitations.
See also the health economic study selection flow chart in Appendix F.
1.1.8. Summary of included economic evidence
There was no economic evidence found.
1.1.9. Economic model
This area was given medium priority for new cost-effectiveness analysis.
1.1.10. Unit costs
Relevant unit costs are provided below to aid consideration of cost effectiveness.
1.1.11. The committee’s discussion and interpretation of the evidence
1.1.11.1. The outcomes that matter most
The committee considered the diagnostic measures of sensitivity and specificity of the index tests for diagnosing dysplasia and early cancer. The sensitivity of tests was deemed the most important measure in this review because the committee agreed the most important outcome is to diagnose dysplasia which is associated with significant risk of progression to cancer. Thus, sensitivity was prioritised for decision making. Clinical decision thresholds were set by the committee as sensitivity/specificity =0.9 and 0.8 above which a test would be recommended and 0.6 and 0.5 below which a test is of no clinical use. The committee agreed that the default values of 0.9 and 0.8 that are widely used for decision making across clinical guidelines were also applicable to people with Barrett’s oesophagus and that these were high enough to ensure almost all cases of dysplasia are detected and that the majority of non-cases are correctly identified as such.
1.1.11.2. The quality of the evidence
Chromoendoscopy
12 studies were included for the diagnostic accuracy of chromoendoscopy. 3 studies (1 RCT and 2 observational prospective studies) were for confocal laser endomicroscopy including outcomes of high-grade neoplasia/dysplasia and carcinoma, intramucosal or oesophageal cancer. One of these studies also examined the diagnostic accuracy of high-resolution white-light endoscopy (with biopsy as the reference standard) separately. One multi-centre RCT looked at the diagnostic accuracy of high-resolution white light endoscopy combined with endoscope-based confocal laser endomicroscopy with targeted biopsies (HDWLE+CLE+TB) to detect neoplasia, reporting on the diagnostic accuracy high-resolution white light endoscopy alone with random biopsies (HDWLE+RB) separately.
Evidence on autofluorescence for detecting intestinal metaplasia with columnar and goblet cells, to detect low or high-grade dysplasia or cancer was available from 1 retrospective study. There was evidence from one RCT on the accuracy of autofluorescence imaging-guided probe-based confocal laser endomicroscopy and molecular biomarkers (3-biomarker panel) (AFI-guided pCLE) to detect dysplasia and high-grade dysplasia with the diagnostic accuracy of high-resolution white-light endoscopy given separately.
Evidence on methylene blue staining was available from 3 studies (1 retrospective, 1 prospective and 1 cross over RCT), and related to the detection of dysplasia or carcinoma, oesophageal cancer or intestinal metaplasia with columnar and goblet cells.
Evidence on narrow-band imaging was available from 4 studies (2 prospective, 1 RCT, 1 cross-over RCT) and related to the detection of high-grade dysplasia and oesophageal cancer or intramucosal cancer, low grade dysplasia or indefinite for dysplasia findings.
There was evidence from one cross-over RCT on endoscopic tri-modal imaging (incorporating high-resolution endoscopy, autofluorescence and narrow-band imaging) for the detection of high-grade dysplasia and early carcinoma with standard video endoscopy used as the reference standard and one cross-over RCT on acetic acid-targeted biopsies for detecting low-or high-grade dysplasia or cancer.
Evidence for sensitivity and specificity for different chromoendoscopy techniques was mostly of low and very low quality. Moderate quality evidence was available for specificity of probe-based confocal laser endomicroscopy in one study, both sensitivity and specificity of narrow-band imaging in one study, and sensitivity of acetic acid-targeted biopsies from one study. High quality evidence from one study was available for both sensitivity and specificity of narrow-band imaging, endoscopic-trimodal imaging, and specificity of acetic acid-targeted biopsies. Evidence was mostly downgraded for indirectness (that was due to the reference standard being histology or biopsy, with results for the protocol reference standard: white-light imaging given separately, or the reference standard being ‘standard endoscopy’ the quality of which was not specified or due to the population including people with oesophagitis in one study and diagnostic accuracy in one study not being limited to detection of dysplasia but results also including metaplasia) and imprecision in the effect measures. Evidence was occasionally downgraded for risk of bias (that was due to lack of blinding in the interpretation of each test or lack of details over the interpretation of the index test and reference standard results). Overall, evidence for chromoendoscopy techniques was derived from studies including 35 to 192 participants with results of 2 studies based on 874 to 1190 locations, with standard endoscopy or biopsy from the white light imaging reported as the reference standard.
Endoscopic brushing
Clinical evidence for the diagnostic accuracy of endoscopic brushing to detect Barrett’s metaplasia, indefinite for dysplasia, dysplasia and inadequate (no Barrett’s oesophagus) findings was available from one prospective study. The evidence was of low quality for sensitivity and very low quality for specificity and was downgraded due to risk of bias and indirectness, with specificity also downgraded for imprecision in the effect measure. The study included 151 people with forceps biopsy used as the reference standard.
Artificial intelligence
Clinical evidence for the diagnostic accuracy of artificial intelligence (AI) was available from 2 retrospective studies. One study looked at the diagnostic accuracy of convolutional neural networks to detect T1a or T1b neoplasia and the other study looked at Narrow-band imaging + AI and white-light imaging +AI to detect high-grade dysplasia, both using histology as the reference standard. The quality of the evidence for sensitivity and specificity ranged from very low to low for narrow-band imaging and white-light imaging combined with AI but was moderate for convolutional neural networks. Evidence was downgraded mostly for indirectness (due to AI combined with another technique for analysis of previously captured images, histology being the reference standard and results from white light endoscopy and narrow-band imaging given separately in one study and AI not being used immediately during endoscopy and the other study) and occasionally for risk of bias and imprecision based on the width of the confidence intervals around the effect estimate. The two studies included 100 and 116 people with results of the former study corresponding to 458 images obtained from those people.
1.1.11.3. Benefits and harms
Chromoendoscopy
The majority of the evidence for the diagnostic accuracy of different chromoendoscopy techniques suggested that both sensitivity and specificity did not meet the clinical threshold of 0.9 for sensitivity and 0.8 for specificity, that the committee had set above which a test would be recommended. Specificity evidence for probe-based confocal laser endomicroscopy did meet or exceeded the clinical threshold, but the committee noted that this was not the case for sensitivity which was prioritised for decision making. Sensitivity and Specificity of high-resolution white light endoscopy combined with confocal laser endomicroscopy with targeted biopsies to detect Barrett’s oesophagus neoplasia exceeded clinical thresholds, but the committee noted this was supported by one study and the evidence was of low quality. The committee also noted the limited availability of this equipment within endoscopy services and the need for longer procedural time, compared to standard endoscopy. It was also noted that where sensitivity and specificity of narrow-band imaging exceeded the clinical thresholds set for decision making, results were based on only one true positive case and the measure was imprecise. This was also the case for acetic acid-targeted biopsies where diagnostic accuracy results were based on two true positive and 172 negative cases resulting in imprecise estimates.
Sensitivity and specificity of chromoendoscopy with methylene blue staining for detecting Barrett’s oesophagus with oesophageal cancer in one study, also exceeded the clinical thresholds set by the committee. However, the committee noted evidence for sensitivity was of very low quality and was not supported by sensitivity or specificity evidence for methylene blue staining available from two other studies. The committee noted the diagnostic accuracy of methylene blue staining met clinical thresholds in relation to detecting oesophageal cancer whereas a lower sensitivity and specificity was shown in detecting dysplasia. The committee agreed this was in line with their clinical experience and emphasised that high and low-grade dysplasia are more difficult to detect compared to cancer, with dysplasia being flat which makes them easier to miss while cancer is often nodular. Hence image-enhanced techniques are required to detect lesions that may be missed by standard endoscopy.
Endoscopic brushing
Evidence for the diagnostic accuracy of endoscopic brushing showed sensitivity and specificity did not meet the clinical thresholds for decision making. The committee noted the evidence came from a single prospective study and was of low quality.
Artificial Intelligence
Evidence for the diagnostic accuracy of artificial intelligence (AI) showed high sensitivity and specificity for both narrow-band imaging combined with AI, and white-light imaging combined with AI, with both exceeding clinical thresholds of 0.9 and 0.8 respectively for detecting high grade dysplasia. The committee noted that sensitivity of white-light imaging when combined with AI was higher than that of the narrow-band imaging combined with AI (0.99 and 0.92 respectively) with the effect estimate for narrow-band imaging +AI being imprecise. The committee also noted that AI is currently not fully developed in the field of Barrett’s oesophagus as the algorithms have not been fully developed and are not available for wider use.
Overall
Overall, the committee agreed the current evidence was limited both in terms of quality with the majority of the evidence graded very low to low, and in quantity with a limited number of small studies available for each surveillance technique, the characteristics of which did not allow for a meta-analysis of findings. They acknowledged that on the basis of the evidence available, it was not possible to make a recommendation for any of the newer technologies such as AI, pCLE (which is currently not used outside a research context) and volumetric laser endomicroscopy or endoscopic brushing (both used in the USA but the UK) and further research is needed. Therefore, the committee made a research recommendation to assess the utility of image enhanced endoscopy in surveillance of Barrett’s oesophagus, including narrow band imaging, acetic acid and artificial intelligence.
No evidence was identified for trans-nasal endoscopy. The committee agreed, based on their clinical experience that trans-nasal endoscopy is unlikely to be better than standard endoscopy, given the lower quality of white light imaging and smaller size of biopsy forceps compared to conventional trans-oral endoscopy. They agreed not to make a recommendation for future research on trans-nasal endoscopy.
The committee decided to make a recommendation for surveillance of Barrett’s oesophagus using white light endoscopy with Seattle protocol biopsies based on their clinical experience and in recognition that this reflects the current standard of care for endoscopic surveillance for Barrett’s oesophagus. Seattle protocol biopsies entail 4 biopsies in different oesophageal quadrants taken every 2 centimetres within the Barrett’s oesophagus. Random biopsies are advised as dysplasia is often invisible on white light endoscopy.
See also evidence review C on endoscopic surveillance using white light endoscopy.
1.1.11.4. Cost effectiveness and resource use
There were no published economic evaluations found. In the absence of suitable clinical evidence, cost-effectiveness modelling was not feasible since a model will require good evidence of clinical effectiveness.
Standard white light endoscopy for surveillance of Barrett’s oesophagus is commonly available in the NHS. The committee’s decision to continue to recommend its use is unlikely to have an impact on resource use and ensures that patients continue to receive current standard of care. However, it should be noted that uptake of endoscopic surveillance in the NHS is currently sub-optimal and any changes in practice may result in subsequent changes in resource use.
The committee also made a research recommendation to assess the utility of image enhanced endoscopy for surveillance. If such techniques were to be recommended in future, it would be expected to cause a significant increase in resource use because of up-front staff training, an increase in costs associated with the new technologies and an increase in staff time for some procedures such as chromoendoscopy. However, the additional costs may be offset if there were evidence of increased diagnostic accuracy with the new technologies and a reduced need for biopsies.
1.1.12. Recommendations supported by this evidence review
This evidence review supports recommendations 1.3.1 to 1.3.5 and the research recommendation on endoscopic surveillance techniques.
1.1.13. References
- 1.
- Anandasabapathy S, Sontag S, Graham DY, Frist S, Bratton J, Harpaz N et al Computer-assisted brush-biopsy analysis for the detection of dysplasia in a high-risk Barrett’s esophagus surveillance population. Digestive Diseases and Sciences. 2011; 56(3):761–766 [PubMed: 20978843]
- 2.
- Bajbouj M, Vieth M, R?sch T, Miehlke S, Becker V, Anders M et al Probe-based confocal laser endomicroscopy compared with standard four-quadrant biopsy for evaluation of neoplasia in Barrett’s esophagus. Endoscopy. 2010; 42(6):435–440 [PubMed: 20506064]
- 3.
- Canto MI, Anandasabapathy S, Brugge W, Falk GW, Dunbar KB, Zhang Z et al In vivo endomicroscopy improves detection of Barrett’s esophagus-related neoplasia: a multicenter international randomized controlled trial (with video). Gastrointestinal Endoscopy. 2014; 79(2):211–221 [PMC free article: PMC4668117] [PubMed: 24219822]
- 4.
- Curvers WL, Alvarez Herrero L, Wallace MB, Wong Kee Song LM, Ragunath K, Wolfsen HC et al Endoscopic tri-modal imaging is more effective than standard endoscopy in identifying early-stage neoplasia in Barrett’s esophagus. Gastroenterology. 2010; 139(4):1106–1114 [PubMed: 20600033]
- 5.
- Ebigbo A, Mendel R, Ruckert T, Schuster L, Probst A, Manzeneder J et al Endoscopic prediction of submucosal invasion in Barrett’s cancer with the use of artificial intelligence: A pilot study. Endoscopy. 2021; 53(9):878–883 [PubMed: 33197942]
- 6.
- Egger K, Werner M, Meining A, Ott R, Allescher HD, Hofler H et al Biopsy surveillance is still necessary in patients with Barrett’s oesophagus despite new endoscopic imaging techniques. Gut. 2003; 52(1):18–23 [PMC free article: PMC1773515] [PubMed: 12477753]
- 7.
- Hashimoto R, Requa J, Dao T, Ninh A, Tran E, Mai D et al Artificial intelligence using convolutional neural networks for real-time detection of early esophageal neoplasia in Barrett’s esophagus (with video). Gastrointestinal Endoscopy. 2020; 91(6):1264–1271 [PubMed: 31930967]
- 8.
- Jayasekera C, Taylor AC F, Desmond PV, MacRae F, Williams R. Added value of narrow band imaging and confocal laser endomicroscopy in detecting Barretts esophagus neoplasia. Endoscopy. 2012; 44(12):1089–1095 [PubMed: 23188660]
- 9.
- Longcroft-Wheaton G, Fogg C, Chedgy F, Kandiah K, Murray L, Dewey A et al A feasibility trial of acetic acid-targeted biopsies versus nontargeted quadrantic biopsies during Barrett’s surveillance: the ABBA trial. Endoscopy. 2020; 52(1):29–36 [PubMed: 31618768]
- 10.
- National Institute for Health and Care Excellence. Developing NICE guidelines: the manual [updated January 2022]. London. National Institute for Health and Care Excellence, 2014. Available from: http://www
.nice.org.uk /article/PMG20/chapter /1%20Introduction%20and%20overview - 11.
- Ormeci N, Savas B, Coban S, Palabiyikoglu M, Ensari A, Kuzu I et al The usefulness of chromoendoscopy with methylene blue in Barrett’s metaplasia and early esophageal carcinoma. Surgical Endoscopy. 2008; 22(3):693–700 [PubMed: 17704887]
- 12.
- Pascarenco OD, Coros MF, Pascarenco G, Boeriu AM, Drasovean SC, Onisor DM et al A preliminary feasibility study: Narrow-band imaging targeted versus standard white light endoscopy non-targeted biopsies in a surveillance Barrett’s population. Digestive and Liver Disease. 2016; 48(9):1048–1053 [PubMed: 27246796]
- 13.
- Ragunath K, Krasner N, Raman VS, Haqqani MT, Cheung WY. A randomized, prospective cross-over trial comparing methylene blue-directed biopsy and conventional random biopsy for detecting intestinal metaplasia and dysplasia in Barrett’s esophagus. Endoscopy. 2003; 35(12):998–1003 [PubMed: 14648410]
- 14.
- Sharma P, Hawes RH, Bansal A, Gupta N, Curvers W, Rastogi A et al Standard endoscopy with random biopsies versus narrow band imaging targeted biopsies in Barrett’s oesophagus: A prospective, international, randomised controlled trial. Gut. 2013; 62(1):15–21 [PubMed: 22315471]
- 15.
- Sharma P, Meining AR, Coron E, Lightdale CJ, Wolfsen HC, Bansal A et al Realtime increased detection of neoplastic tissue in Barrett’s esophagus with probe-based confocal laser endomicroscopy: Final results of an international multicenter, prospective, randomized, controlled trial. Gastrointestinal Endoscopy. 2011; 74(3):465–472 [PMC free article: PMC3629729] [PubMed: 21741642]
- 16.
- Vithayathil M, Modolell I, Ortiz-Fernandez-Sordo J, Oukrif D, Pappas A, Januszewicz W et al Image-enhanced endoscopy and molecular biomarkers vs Seattle protocol to diagnose dysplasia in Barrett’s esophagus. Clinical Gastroenterology and Hepatology. 2022; 10.1016/j.cgh.2022.01.060 [PubMed: 35183768] [CrossRef]
Appendices
Appendix A. Review protocols
Download PDF (202K)
Appendix B. Literature search strategies
The literature searches for this review are detailed below and complied with the methodology outlined in Developing NICE guidelines: the manual.10
For more information, please see the Methodology review published as part of the accompanying documents for this guideline.
B.1. Clinical search literature search strategy
Download PDF (221K)
B.2. Health Economics literature search strategy
Download PDF (140K)
Appendix C. Diagnostic evidence study selection
Download PDF (103K)
Appendix D. Diagnostic evidence
Download PDF (412K)
Appendix E. Sensitivity and specificity forest plots
E.1. Chromoendoscopy
Download PDF (145K)
E.2. Endoscopic brushing
Download PDF (103K)
E.3. Artificial intelligence
Download PDF (92K)
Appendix F. Economic evidence study selection
Download PDF (191K)
Appendix G. Excluded studies
Clinical studies
Table 8
Studies excluded from the clinical review.
Health Economic studies
Published health economic studies that met the inclusion criteria (relevant population, comparators, economic study design, published 2006 or later and not from non-OECD country or USA) but that were excluded following appraisal of applicability and methodological quality are listed below. See the health economic protocol for more details.
None.
Appendix H. Research recommendations
Endoscopic surveillance
What is the diagnostic accuracy of different endoscopic surveillance techniques including high resolution endoscopy and chromoendoscopy for use in adults?
Why this is important
Chromoendoscopy, electronic imaging and more recently artificial intelligence have all shown considerable promise in enriched patient populations but their utility in a surveillance population is unclear. In order for image enhanced endoscopy based surveillance protocols to be implemented robust data in a low risk Barrett’s surveillance population (i.e. patients who have no history of previous dysplasia or cancer) is needed from high quality fully powered studies.
Large scale studies in patients undergoing endoscopic surveillance are therefore recommended for assessing clinical and cost effectiveness of image enhanced endoscopy in surveillance of Barrett’s oesophagus. Narrow band imaging, acetic acid and artificial intelligence are considered as most appropriate for clinical trials.
Rationale for research recommendation
Download PDF (136K)
Modified PICO table
Download PDF (125K)
Final version
Evidence review underpinning recommendations 1.3.1 to 1.3.5 and a research recommendation in the NICE guideline
Disclaimer: The recommendations in this guideline represent the view of NICE, arrived at after careful consideration of the evidence available. When exercising their judgement, professionals are expected to take this guideline fully into account, alongside the individual needs, preferences and values of their patients or service users. The recommendations in this guideline are not mandatory and the guideline does not override the responsibility of healthcare professionals to make decisions appropriate to the circumstances of the individual patient, in consultation with the patient and/or their carer or guardian.
Local commissioners and/or providers have a responsibility to enable the guideline to be applied when individual health professionals and their patients or service users wish to use it. They should do so in the context of local and national priorities for funding and developing services, and in light of their duties to have due regard to the need to eliminate unlawful discrimination, to advance equality of opportunity and to reduce health inequalities. Nothing in this guideline should be interpreted in a way that would be inconsistent with compliance with those duties.
NICE guidelines cover health and care in England. Decisions on how they apply in other UK countries are made by ministers in the Welsh Government, Scottish Government, and Northern Ireland Executive. All NICE guidance is subject to regular review and may be updated or withdrawn.