Physicist Ian Paddick compiles areas of potential confounding factors in study comparing mets patients treated at two centers using different therapy modalities
A recently published paper in the journal Radiotherapy and Oncology (Sebastian NT, et al.) suggests that patients treated with Leksell Gamma Knife® are nearly four times more likely to develop radionecrosis than patients treated with linac-based radiosurgery. In a personal review of the 2,699-patient study, Ian Paddick, MSc (Consultant Physicist, National Hospital for Neurology and Neurosurgery), analyzes in more detail some of the potential confounding factors that led to the investigators’ conclusions.
These include such parameters as: patients being treated at different hospitals with different patient care protocols, differences in patient baseline characteristics; the apparent disregard of dose in propensity score matching; the numbers of patients experiencing Grade 3 toxicity and above (“10 times more Gamma Knife patients had Grade 1 CNS toxicity than linac”); and patient follow up period. Here is the review by Ian Paddick:
Linear accelerator-based radiosurgery is associated with lower incidence of radionecrosis compared with gamma knife for treatment of multiple brain metastases. Nikhil T. Sebastian et al.
Radiotherapy and Oncology 147 (2020) 136–143. (An earlier version of this study was presented at ASTRO 2019.)
In this study, patients harboring 2,699 lesions (LINAC: 1,014, Gamma Knife: 1,685) from two separate institutions (Ohio State University and Wake Forest) were retrospectively examined to investigate potential differences in survival and radionecrosis.
The above headline statistics in the abstract are that Gamma Knife patients are more likely to develop radionecrosis and that the Hazard Ratio (HR) was 3.83. This means that patients were 3.83 times more likely to develop radionecrosis. This value is associated with a HR range between 1.66 and 8.84 which the statisticians can be 95% confident of i.e. There’s only a 5% chance that the HR is outside of this range. The p value is 0.002, which means there is a 0.2% chance that the data has randomly created this effect and that it’s not real. These statistics are overwhelmingly confident, so is this effect real? The statistics say so, but we have to be careful about bias (an effect that creates statistical differences but can be explained by another reason – e.g. Expensive cars tend to be faster, but this is partly because sports cars tend to be more expensive) and confounding factors (e.g. Ferraris tend to be red therefore red cars are faster), as well as statistics short-cuts (the authors wanting to obtain a p value of <0.05, and making approximations to make this happen). The statistics used in this study took account of bias and confounding effects by using a “propensity score-matched analysis” whereby similar patients in each arm were grouped together and compared. This is appropriate.
Patients treated with each device were treated at entirely different hospitals. These hospitals would have different patient care protocols, which can create bias. We know that radionecrosis is affected by the prescription dose given to the target, to the volume treated and whether the patient receives concurrent chemotherapy. Furthermore, the definition of radionecrosis varies from center to center, and will depend on imaging techniques as well as the clinician’s understanding of the disease. What one clinician calls pseudoprogression, another will call radionecrosis.
Gamma Knife patients were older, more likely to be male, had a lower performance status, had more lesions, were more likely to have concurrent cytotoxic chemotherapy. However, they did have smaller target volumes. The authors admit that “There was a statistically significant difference in all baseline patient characteristics for the treatment cohorts”. However, the propensity score-matched analysis should overcome these issues, as long as there are enough patients in each cohort.
Linac treatments used a single-isocenter multitarget (SIMT) technique, using a 2-3mm margin, treated between 2015-2018. Volumes were treated with either 18–24 Gy in a single fraction, 21–27 Gy in 3 fractions, or 25– 30 Gy in 5 fractions, typically used for target volumes <2 cm, ≥2 cm, and ≥3 cm, respectively. 95% of the PTV received 100 percent of the prescribed dose.
Gamma Knife treatments were from 2009 to 2018. Treatments were with Models B/C prior to 2009 and Perfexion after 2009. However, in the methods, it is stated that no patient was treated prior to 2009. In table 1 the range is given as 2011 to 2018. This appears to be inconsistent.
For Gamma Knife, treatment was prescribed to the 50 percent isodose line. No margin was used. Prescription doses followed the RTOG 90-05 guidelines, but most commonly treated with 20 Gy (incidentally, not an RTOG 90-05 dose). In line with most US centers, but not stated, 100% of the target volume probably received the prescribed dose. To treat small mets conformally, isodoses larger than 50% should ideally be used as, for example, a 7mm spherical met would be most conformally treated with an 8mm collimator to a higher isodose. There is no evidence that this happened.
When comparing single fraction and fractionated treatments, it’s important to compare the Biological Effective Dose (BED) of the treatments (simplified BEDs in the table below). This clearly shows that the linac treatments had a lower BED. We would expect lower BED treatments to have a lower rate of radionecrosis, (which was found). However, we would expect higher BED treatments to give a longer local control (which was not looked at) and long-term survival (which was found, but was not statistically significant).
As many linac treatments were given as a single fraction, these were also compared with Gamma Knife treatments, to eliminate this bias. However, one can argue that lesions treated with a single fraction in the linac cohort were likely to be small lesions in non-eloquent locations (as fractionation was standard in this group and single fractions were reserved for ‘safe’ targets). This may actually reintroduce bias into the results.
Survival appeared to be higher for Gamma Knife patients, though this was not statistically significant. However, if survival at 30 months was chosen, this may have reached statistical significance, as there appears to be a significant divergence at greater than one year, for example at three years, overall survival with Gamma Knife was 35 percent compared to 20 percent in the LINAC cohort. Logically, the longer a patient lives, the more likely they are to develop radionecrosis, so this could also introduce bias to the study.
Radiation necrosis was defined as either histopathologic evidence of necrosis (after surgical removal) or any new enhancement or progression of pre-existing enhancement of treated lesions on MRI that stabilized on consecutive MRI scans and did not require local salvage therapy. To quote, “we did not distinguish between ‘‘pseudoprogression” and ‘‘radionecrosis””.
In total, there were 7 and 26 lesions with radionecrosis for LINAC and Gamma Knife respectively. For LINAC, grade 1, 2, and 3 radionecrosis occurred in 1 (0.2%), 1 (0.2%), and 5 (0.9%) lesions respectively. For Gamma Knife, grade 1, 2, 3, and 4 radionecrosis occurred in 10 (1.8%), 8 (1.4%), 1 (0.2%), and 7 (1.2%) lesions, respectively. In patients with radionecrosis, the median time to radionecrosis was 2.3 months (IQR 2.1–6.6 months) for LINAC and 6.0 months (2.4–9.4 months) for Gamma Knife.
Analysis after matched pairing is most important as it should eliminate bias and confounding variables. Radionecrosis varied according to grade, as follows:
The primary endpoint of the study was radionecrosis of any grade. Grade 1 is extremely mild while Grade 3&4 are very serious. Grades were pooled together because there were not enough cases to be analyzed individually. 10 times more Gamma Knife patients had Grade 1 CNS toxicity than linac. Furthermore, the presence or absence of a grade 1 CNS toxicity, because it is so mild, can simply be down to the amount of follow up the patient receives.
To summarize, if all grades are included, linac has a 1.3 percent incidence of radionecrosis, Gamma Knife 4.6 percent. If you look at ‘serious’ radionecrosis, the incidences are 0.9% and 1.4% respectively, the difference of which is not statistically significant.
- Gamma Knife and Linac patients were treated differently. ie. At different doses and at different hospitals.
- There was a statistically significant difference in all baseline patient characteristics for the treatment cohorts, making comparison difficult.
- Information on the number of prior SRS courses (previous Gamma Knife sessions) was not available for Gamma Knife treatments. (ie. Patients may have had multiple SRS sessions and this was not taken into account). The authors admit that this is a limiting factor of the study.
- ‘Propensity score matching’ was used to eliminate differences between the two groups. However, it’s not clear that dose was used, so single fraction Gamma Knife treatments were being compared with fractionated linac treatments (with a lower BED) for the main comparison. For another comparison, single fractions only were compared, but no details were given on the matching criteria.
- The 12Gy volume used the global (total) 12Gy volume for Linac but the individual 12Gy volume for Gamma Knife. This could create bias against Gamma Knife. “The V12 was calculated on a per-target basis for Gamma Knife plans and a per-plan basis for LINAC”. We know that the 12Gy volume is directly linked to the incidence of radionecrosis, so to not have the 12Gy volumes for both groups is a major disadvantage.
- Many trials (eg. RTOG 90-05) sensibly look at Grade 3 toxicity and above. If this study used the same criteria, there would be no significant difference between the two groups (0.9% vs 1.4% for linac and Gamma Knife respectively).
- Gamma Knife had potentially 10 years of follow up, vs 4 years for the linac group. Patients with longer follow up are more likely to have radionecrosis diagnosed, though radionecrosis occurring at such a late point is rare.
The authors have attempted a very difficult study using retrospective data, as bias can so easily confound the results. Despite their good work, there are many unanswered questions. As the authors state “these findings should be validated in an independent cohort”.
Figure 1. Incidence of tumor control and complications with increasing dose. Note how a small increase in dose/BED can increase complications by a small amount, but it will also increase tumor control at the same time. This does not mean that GK is inferior – it means that a lower dose should be used. This is a fundamental error in many dose comparison studies.