Risk assessment tools for predicting surgical outcomes of patients who undergo elective abdominal aortic aneurysm repair
Evidence review H
NICE Guideline, No. 156
Risk assessment tools for predicting surgical outcomes of patients who undergo elective abdominal aortic aneurysm repair
Review question
What is the accuracy of available risk assessment tools in predicting poor and good surgical outcomes in people with unruptured abdominal aortic aneurysms?
Introduction
Various multifactorial risk models have been developed that aim to facilitate decision making before abdominal aortic aneurysm (AAA) repair; however, there is no consensus as to which tools should be used and when they should be used. This review question aims to determine which assessment tools are accurate in predicting surgical outcomes after elective AAA repair and might therefore inform patients in their decision to undergo surgery for an unruptured AAAs.
PICO table
Methods and process
This evidence review was developed using the methods and process described in Developing NICE guidelines: the manual. Methods specific to this review question are described in the review protocol in Appendix A.
Declarations of interest were recorded according to NICE’s 2014 conflicts of interest policy.
A single broad search was used to identify all studies that examine the diagnosis, surveillance or monitoring of AAAs. This was a ‘bulk’ search that covered multiple review questions. The database was sifted to identify all studies that met the criteria detailed in Table 1. The relevant review protocol can be found in Appendix A.
Table 1
Inclusion criteria.
Cohort studies in which multivariate models were used to assess the accuracy of risk assessment tools (risk prediction models) for predicting peri- and postoperative outcomes of patients undergoing EVAR or open repair procedures were considered for inclusion. Prospective and retrospective cohort studies with sample sizes greater than 500 participants were included.
The included studies all reported the area under the curve (AUC) of receiver operating characteristic (ROC) curves for each model. A ROC curve plots the sensitivity of a model against its specificity across the full range of possible thresholds scores. Accuracy, in terms of being able to discriminate between cases and non-cases, is then measured by the AUC. The committee interpreted AUCs in accordance with thresholds suggested by Hosmer and Lemeshow (2000). An area under the curve (AUC) of 1 represents a perfect prediction; an area less than of 0.6 represents a worthless prediction (equivalent to ‘chance’). An AUC value between 0.6 and 0.69 indicates poor model discrimination. Values of 0.7 to 0.79 indicates acceptable model discrimination; values of 0.8 to 0.89 indicate excellent discrimination, and values greater than 0.9 indicate outstanding discrimination.
It was not appropriate to pool AUCs from identified studies due to dissimilar definitions of outcome, factors, and mix of confounders between studies. Where a model was examined in two or more studies, we have reported the individual AUC with 95% CIs reported by each study, and a summary median and range of AUCs for the study sample. Where a model was examined in a single study we have reported the AUC with the reported 95% CIs.
Studies were excluded if they:
- were case-control or cross-sectional studies
- were not in English
- were not full reports of the study (for example, published only as an abstract)
- were not peer-reviewed.
Clinical evidence
Included studies
From an initial database of 16,274 abstracts, 66 were identified as being potentially relevant. Following full-text review of these articles, 10 studies were included. These included 4 prospective cohort studies and 6 retrospective cohort studies.
An update literature search was performed and provided by Cochrane, in December 2017. The search found a total of 2,180 abstracts; of which, 5 full manuscripts were ordered. Upon review of the full manuscripts, none of the studies met the inclusion criteria for this review question.
Excluded studies
The list of papers excluded at full-text review, with reasons, is given in Appendix G.
Summary of clinical studies included in the evidence review
Table 2
Included studies.
See Appendix D for full evidence tables.
Quality assessment of clinical studies included in the evidence review
The GRADE working group has not published criteria for assessing imprecision in relation to AUC statistics. For the current review, the AUC classification categories referred to above were used. Minimal important difference (MID) levels of 0.7 and 0.8 were chosen for the assessment of imprecision, to be applied to the range of AUCs reported across contributing studies (or to the 95% confidence interval where a model was evaluated by a single study). When evidence on the prognostic utility of a risk assessment tool was obtained from a single study, the evidence was downgraded one level if the 95% CI around an AUC crossed one MID (0.7 or 0.8), or two levels if the 95% CI crossed both MIDs. When evidence on the prognostic utility of a risk assessment tool was obtained from more than one study, the evidence was downgraded one level if the AUC range crossed one MID (0.7 or 0.8), or two levels if the AUC range crossed both MIDs.
See Appendix E for full modified GRADE tables.
Economic evidence
Included studies
A literature search was conducted jointly for all review questions by applying standard health economic filters to a clinical search for AAA. This search returned a total of 5,173 citations. Following review of all titles and abstracts, no studies were identified as being potentially relevant to risk factors associated with AAA expansion or rupture. No full texts were retrieved, and no studies were included as economic evidence.
An update search was conducted in December 2017, to identify any relevant health economic analyses published during guideline development. The search found 814 abstracts; all of which were not considered relevant to this review question. As a result no additional studies were included.
Excluded studies
No studies were retrieved for full-text review.
Economic model
This review question does not lend itself to economic evaluation, and was not prioritised by the committee for economic modelling. As such, no economic model was developed for this review question.
Evidence statements
An area under the curve (AUC) of 1 represents a perfect prediction; an area less than of 0.6 represents a worthless prediction (equivalent to ‘chance’). An AUC value between 0.6 and 0.69 indicates poor model discrimination. Values of 0.7 to 0.79 indicate acceptable model discrimination; values of 0.8 to 0.89 indicate excellent discrimination, and values greater than 0.9 indicate outstanding discrimination.
30-day mortality
People undergoing EVAR or open repair
Very low- to low-quality evidence from 4 cohort studies, including up to 8,271 people with unruptured AAA, indicated that the Comorbidity Severity Score (CSS), Glasgow Aneurysm Scale (GAS), modified Leiden score and the Vascular Governance North West (VGNW) risk model had acceptable discriminatory power at predicting 30-day mortality after EVAR or open surgical repair.
People undergoing EVAR
Very low-quality evidence from 1 cohort study, including 862 people with unruptured AAA, indicated that the modified Leiden score had acceptable discriminatory power at predicting 30-day mortality after EVAR. Very low-quality evidence from 2 cohort studies, including up to 6,360 people with unruptured AAA, indicated that the CSS and the GAS had poor discriminatory power at predicting 30-day mortality after EVAR.
People undergoing open repair
Very low-quality evidence from 1 cohort study, including up to 862 people with unruptured AAA, indicated that the CSS and the modified Leiden score had acceptable discriminatory power at predicting 30-day mortality after open surgical repair. Very low-quality evidence from 2 cohort studies, including 2,773 people with unruptured AAA, indicated that the GAS had poor discriminatory power at predicting 30-day mortality after open surgical repair.
In-hospital mortality
People undergoing EVAR or open repair Moderate-quality evidence from 1, including up to 1,124 people with unruptured AAA, indicated that the British Aneurysm Repair (BAR) score had excellent discriminatory power at predicting in-hospital mortality after EVAR or open surgical repair.
Very low- to moderate-quality evidence from 4 cohort studies, including up to 19,140 people with unruptured AAA, indicated that the Medicare tool, Physiological and Operative Severity Score for enUmeration of Mortality (POSSUM tool) and the VGNW risk model had acceptable discriminatory power at predicting in-hospital mortality after EVAR or open surgical repair.
Very low- to moderate-quality evidence from 3 cohort studies, including up to 15,322 people with unruptured AAA, indicated that the GAS, Vascular-POSSUM tool and the Vascular Biochemical and Haematological Outcome Model (VBHOM) had poor discriminatory power at predicting in-hospital mortality after EVAR or open surgical repair.
People undergoing EVAR
Low-quality evidence from 1, including up to 1,124 people with unruptured AAA, indicated that the British Aneurysm Repair (BAR) score had acceptable discriminatory power and the Medicare tool had poor discriminatory power at predicting in-hospital mortality after EVAR.
Low-quality evidence from the same study indicated that the VGNW had a discriminatory power no better than chance at predicting in-hospital mortality after EVAR.
People undergoing open repair
Moderate-quality evidence from 1, including up to 1,124 people with unruptured AAA, indicated that the British Aneurysm Repair (BAR) score had acceptable discriminatory power while the Medicare tool and the VGNW risk model had poor discriminatory power at predicting in-hospital mortality after EVAR.
Mortality after 1 year in people undergoing EVAR or open repair
Very low-quality evidence from 1 retrospective cohort study, including 1,096 patients with unruptured AAA, indicated that the Carlisle calculator had acceptable discriminatory power at predicting mortality at 1 and 2 years. Very-low quality evidence from the same study indicated that the Carlisle calculator had poor discriminatory power at predicting mortality at 3, 4 and 5 years.
Postoperative morbidity
Low-quality evidence from 1 retrospective cohort study, including 1,911 patients with unruptured AAA, indicated that the GAS had poor discriminatory power at predicting cardiac complications (type of complications were not specified) after open surgical repair. Moderate-quality evidence from the same study indicated that the GAS had poor discriminatory power at predicting severe postoperative complications (including cardiac, cerebrovascular, renal, pulmonary venous, and peripheral arterial complications, as well as sepsis) after open surgical repair.
Length of stay
Moderate-quality evidence from 1 retrospective cohort study, including 1,911 patients who underwent with unruptured AAA, indicated that the GAS had poor discriminatory power at predicting prolonged length of stay (longer than 5 days) in intensive care after open surgical repair.
The committee’s discussion of the evidence
Interpreting the evidence
The outcomes that matter most
The committee agreed that the outcomes which matter most are mortality and complications that occur within 30 days of surgery. The committee considered that these outcomes were more important than long-term outcomes because their clinical experience highlighted that patients undergoing AAA surgery are at risk of experiencing more serious complications soon after surgery.
The quality of the evidence
The committee only considered studies where a pre-existing risk assessment tool was tested on a validation cohort. Studies in which risk assessment tools were developed using a derivation cohort and tested on the same cohort were not considered in this review. This was because these types of studies only assessed the internal validity of risk models (the degree to which errors have been minimised within a study). The committee believed that it was more important to evaluate the external validity (the degree to which a study’s findings are generalisable to wider populations and other settings) of risk models as it enabled them to determine the prognostic utility of the tools.
The committee noted that investigators from the majority of included studies collected data from national or international disease registries. It was considered that this type of approach to data collection may have introduced bias to findings due to an inability to accurately record and assess confounding. One study in particular (Giles et al., 2009) was considered to be at high risk of bias because investigators assessed codes from a health insurance provider database to ascertain the presence of risk factors, and subsequently used the data to calculate risk scores.
Benefits and harms
The committee concluded that the majority of assessed risk assessment tools had poor-to-acceptable discriminatory power as pooled estimates of AUCs across included studies ranged from 0.65 to 0.75. They contrasted this with equivalent predictive statistics, such as QRISK2, which is recommended by NICE for predicting cardiovascular disease (CG181), on the basis of AUCs between 0.77 and 0.84, which would be classified as acceptable-to-excellent discrimination using the rules of thumb adopted here. The committee noted that one study by Grant et al. (2014) suggested that the British Aneurysm Repair Score (BAR) had excellent discriminatory power at predicting in-hospital mortality in a heterogeneous group of patients who underwent endovascular or open surgical repair (AUC of 0.83). Upon examination of a treatment-specific subgroup analysis, the BAR score had acceptable discriminatory power at predicting in-hospital mortality in patients who only underwent endovascular repair (AUC of 0.75). The same was observed for patients who only underwent open repair (AUC of 0.70). In light of the variation between the overall and treatment-specific AUCs, the committee had little confidence in the discriminatory power of the BAR score at predicting in-hospital mortality. Overall, the committee considered the AAA tools assessed in this review to have insufficient discriminatory power for predicting postoperative outcomes of patients undergoing elective AAA surgery. There was little confidence about the clinical utility of the assessment tools as the committee could not see how using tools with AUCs of around 0.70 would lead to appropriate decisions about patient management and prognostic outcomes.
The committee considered that use of risk assessment tools with insufficient discriminatory power could have potentially harmful effects on patient care. This is because such tools could result in the decision to operate on a patient who shouldn’t be operated on, or vice versa. The committee discussed decision-making without the use of risk assessment tools. They noted that most of the clinical data used to derive risk assessment tools are commonly collected and are already available before surgery. They agreed that individual variables (as opposed to risk models) can be still useful for making judgments of an individual’s risk of postoperative morbidity and mortality.
Cost effectiveness and resource use
The committee considered that the recommendations were unlikely to have an impact on costs or resource use within the NHS as risk assessment tools are not routinely used outside the context of research.
Other factors the committee took into account
The committee did not want to preclude development of tools for assessing postoperative outcomes of AAA surgery. Thus, the committee chose to specify individual risk assessment that should not be used rather than state that all risk assessment tools should not be used.
The committee decided against making a research recommendation because extensive research into risk assessment tools for AAA surgery has already been performed over recent decades and further research in this area is unlikely to be viewed as a priority.
Appendices
Appendix A. Review protocols
Review protocol for risk assessment tools for predicting surgical outcomes of patients who undergo elective AAA repair
Table
i) Prospective observational studies using multivariate analysis; population >500 ii) Prospective observational studies using smaller populations (>200) will be considered if insufficient evidence is identified
Appendix B. Literature search strategies
Clinical search literature search strategy
Main searches
Bibliographic databases searched for the guideline
- Cumulative Index to Nursing and Allied Health Literature - CINAHL (EBSCO)
- Cochrane Database of Systematic Reviews – CDSR (Wiley)
- Cochrane Central Register of Controlled Trials – CENTRAL (Wiley)
- Database of Abstracts of Reviews of Effects – DARE (Wiley)
- Health Technology Assessment Database – HTA (Wiley)
- EMBASE (Ovid)
- MEDLINE (Ovid)
- MEDLINE Epub Ahead of Print (Ovid)
- MEDLINE In-Process (Ovid)
Identification of evidence for review questions
The searches were conducted between November 2015 and October 2017 for 31 review questions (RQ). In collaboration with Cochrane, the evidence for several review questions was identified by an update of an existing Cochrane review. Review questions in this category are indicated below. Where review questions had a broader scope, supplement searches were undertaken by NICE.
Searches were re-run in December 2017.
Where appropriate, study design filters (either designed in-house or by McMaster) were used to limit the retrieval to, for example, randomised controlled trials. Details of the study design filters used can be found in section 4.
Search strategy review question 9
Table
Medline Strategy, searched 29th September 2016 Database: 1946 to September Week 3 2016
Health Economics literature search strategy
Sources searched to identify economic evaluations
- NHS Economic Evaluation Database – NHS EED (Wiley) last updated Dec 2014
- Health Technology Assessment Database – HTA (Wiley) last updated Oct 2016
- Embase (Ovid)
- MEDLINE (Ovid)
- MEDLINE In-Process (Ovid)
Search filters to retrieve economic evaluations and quality of life papers were appended to the population and intervention terms to identify relevant evidence. Searches were not undertaken for qualitative RQs. For social care topic questions additional terms were added. Searches were re-run in September 2017 where the filters were added to the population terms.
Health economics search strategy
Appendix D. Clinical evidence tables
Download PDF (328K)
Appendix E. GRADE tables
An area under the curve (AUC) of 1 represents a perfect prediction; an area less than of 0.6 represents a worthless prediction (equivalent to ‘chance’). An AUC value between 0.6 and 0.69 indicates poor model discrimination. Values of 0.7 to 0.79 indicates acceptable model discrimination; values of 0.8 to 0.89 indicate excellent discrimination, and values greater than 0.9 indicate outstanding discrimination.
30-day mortality
Table
0.69 (Not reported) 0.74 (Not reported)
In-hospital mortality
Table
0.60 (0.56, 0.63) 0.69 (Not reported)
Mortality after 1 year
Postoperative morbidity
Length of stay
Appendix G. Excluded studies
Clinical studies
Economic studies
No full text papers were retrieved. All studies were excluded at review of titles and abstracts.
Appendix H. Glossary
- Abdominal Aortic Aneurysm (AAA)
A localised bulge in the abdominal aorta (the major blood vessel that supplies blood to the lower half of the body including the abdomen, pelvis and lower limbs) caused by weakening of the aortic wall. It is defined as an aortic diameter greater than 3 cm or a diameter more than 50% larger than the normal width of a healthy aorta. The clinical relevance of AAA is that the condition may lead to a life-threatening rupture of the affected artery. Abdominal aortic aneurysms are generally characterised by their shape, size and cause:
- Infrarenal AAA: an aneurysm located in the lower segment of the abdominal aorta below the kidneys.
- Juxtarenal AAA: a type of infrarenal aneurysm that extends to, and sometimes, includes the lower margin of renal artery origins.
- Suprarenal AAA: an aneurysm involving the aorta below the diaphragm and above the renal arteries involving some or all of the visceral aortic segment and hence the origins of the renal, superior mesenteric, and celiac arteries, it may extend down to the aortic bifurcation.
- Abdominal compartment syndrome
Abdominal compartment syndrome occurs when the pressure within the abdominal cavity increases above 20 mm Hg (intra-abdominal hypertension). In the context of a ruptured AAA this is due to the mass effect of a volume of blood within or behind the abdominal cavity. The increased abdominal pressure reduces blood flow to abdominal organs and impairs pulmonary, cardiovascular, renal, and gastro-intestinal function. This can cause multiple organ dysfunction and eventually lead to death.
- Cardiopulmonary exercise testing
Cardiopulmonary Exercise Testing (CPET, sometimes also called CPX testing) is a non-invasive approach used to assess how the body performs before and during exercise. During CPET, the patient performs exercise on a stationary bicycle while breathing through a mouthpiece. Each breath is measured to assess the performance of the lungs and cardiovascular system. A heart tracing device (Electrocardiogram) will also record the hearts electrical activity before, during and after exercise.
- Device migration
Migration can occur after device implantation when there is any movement or displacement of a stent-graft from its original position relative to the aorta or renal arteries. The risk of migration increases with time and can result in the loss of device fixation. Device migration may not need further treatment but should be monitored as it can lead to complications such as aneurysm rupture or endoleak.
- Endoleak
An endoleak is the persistence of blood flow outside an endovascular stent - graft but within the aneurysm sac in which the graft is placed.
- Type I – Perigraft (at the proximal or distal seal zones): This form of endoleak is caused by blood flowing into the aneurysm because of an incomplete or ineffective seal at either end of an endograft. The blood flow creates pressure within the sac and significantly increases the risk of sac enlargement and rupture. As a result, Type I endoleaks typically require urgent attention.
- Type II – Retrograde or collateral (mesenteric, lumbar, renal accessory): These endoleaks are the most common type of endoleak. They occur when blood bleeds into the sac from small side branches of the aorta. They are generally considered benign because they are usually at low pressure and tend to resolve spontaneously over time without any need for intervention. Treatment of the endoleak is indicated if the aneurysm sac continues to expand.
- Type III – Midgraft (fabric tear, graft dislocation, graft disintegration): These endoleaks occur when blood flows into the aneurysm sac through defects in the endograft (such as graft fractures, misaligned graft joints and holes in the graft fabric). Similarly to Type I endoleak, a Type III endoleak results in systemic blood pressure within the aneurysm sac that increases the risk of rupture. Therefore, Type III endoleaks typically require urgent attention.
- Type IV– Graft porosity: These endoleaks often occur soon after AAA repair and are associated with the porosity of certain graft materials. They are caused by blood flowing through the graft fabric into the aneurysm sac. They do not usually require treatment and tend to resolve within a few days of graft placement.
- Type V – Endotension: A Type V endoleak is a phenomenon in which there is continued sac expansion without radiographic evidence of a leak site. It is a poorly understood abnormality. One theory that it is caused by pulsation of the graft wall, with transmission of the pulse wave through the aneurysm sac to the native aneurysm wall. Alternatively it may be due to intermittent leaks which are not apparent at imaging. It can be difficult to identify and treat any cause.
- Endovascular aneurysm repair
Endovascular aneurysm repair (EVAR) is a technique that involves placing a stent –graft prosthesis within an aneurysm. The stent-graft is inserted through a small incision in the femoral artery in the groin, then delivered to the site of the aneurysm using catheters and guidewires and placed in position under X-ray guidance.
- Conventional EVAR refers to placement of an endovascular stent graft in an AAA where the anatomy of the aneurysm is such that the ‘instructions for use’ of that particular device are adhered to. Instructions for use define tolerances for AAA anatomy that the device manufacturer considers appropriate for that device. Common limitations on AAA anatomy are infrarenal neck length (usually >10mm), diameter (usually ≤30mm) and neck angle relative to the main body of the AAA
- Complex EVAR refers to a number of endovascular strategies that have been developed to address the challenges of aortic proximal neck fixation associated with complicated aneurysm anatomies like those seen in juxtarenal and suprarenal AAAs. These strategies include using conventional infrarenal aortic stent grafts outside their ‘instructions for use’, using physician-modified endografts, utilisation of customised fenestrated endografts, and employing snorkel or chimney approaches with parallel covered stents.
- Goal directed therapy
Goal directed therapy refers to a method of fluid administration that relies on minimally invasive cardiac output monitoring to tailor fluid administration to a maximal cardiac output or other reliable markers of cardiac function such as stroke volume variation or pulse pressure variation.
- Post processing technique
For the purpose of this review, a post-processing technique refers to a software package that is used to augment imaging obtained from CT scans, (which are conventionally presented as axial images), to provide additional 2- or 3-dimensional imaging and data relating to an aneurysm’s, size, position and anatomy.
- Permissive hypotension
Permissive hypotension (also known as hypotensive resuscitation and restrictive volume resuscitation) is a method of fluid administration commonly used in people with haemorrhage after trauma. The basic principle of the technique is to maintain haemostasis (the stopping of blood flow) by keeping a person’s blood pressure within a lower than normal range. In theory, a lower blood pressure means that blood loss will be slower, and more easily controlled by the pressure of internal self-tamponade and clot formation.
- Remote ischemic preconditioning
Remote ischemic preconditioning is a procedure that aims to reduce damage (ischaemic injury) that may occur from a restriction in the blood supply to tissues during surgery. The technique aims to trigger the body’s natural protective functions. It is sometimes performed before surgery and involves repeated, temporary cessation of blood flow to a limb to create ischemia (lack of oxygen and glucose) in the tissue. In theory, this “conditioning” activates physiological pathways that render the heart muscle resistant to subsequent prolonged periods of ischaemia.
- Tranexamic acid
Tranexamic acid is an antifibrinolytic agent (medication that promotes blood clotting) that can be used to prevent, stop or reduce unwanted bleeding. It is often used to reduce the need for blood transfusion in adults having surgery, in trauma and in massive obstetric haemorrhage.
Final
Methods, evidence and recommendations
This evidence review was developed by the NICE Guideline Updates Team
Disclaimer: The recommendations in this guideline represent the view of NICE, arrived at after careful consideration of the evidence available. When exercising their judgement, professionals are expected to take this guideline fully into account, alongside the individual needs, preferences and values of their patients or service users. The recommendations in this guideline are not mandatory and the guideline does not override the responsibility of healthcare professionals to make decisions appropriate to the circumstances of the individual patient, in consultation with the patient and/or their carer or guardian.
Local commissioners and/or providers have a responsibility to enable the guideline to be applied when individual health professionals and their patients or service users wish to use it. They should do so in the context of local and national priorities for funding and developing services, and in light of their duties to have due regard to the need to eliminate unlawful discrimination, to advance equality of opportunity and to reduce health inequalities. Nothing in this guideline should be interpreted in a way that would be inconsistent with compliance with those duties.
NICE guidelines cover health and care in England. Decisions on how they apply in other UK countries are made by ministers in the Welsh Government, Scottish Government, and Northern Ireland Executive. All NICE guidance is subject to regular review and may be updated or withdrawn.