It is STILL necessary to validate each individual IMRT treatment plan with dosimetric measurements before delivery

Abstract

Almost a decade ago, we published a Point/Counterpoint debate on the need for validation measurements for each individual IMRT patient [Med. Phys. 30, 2271–2273 (2003)]. Now, after many more years of experience with this modality, the necessity for such patient-specific measurements has been questioned, and this is the topic discussed in the month's Point/Counterpoint debate. Ultimately, all the Quality Control/Quality Assurance (QC/QA) we do is supposed to assure that we are delivering dose to the patient in the manner and amount planned.1 With regard to IMRT, there has been debate over whether or not we have the proper tools to test this in a meaningful way.2,3 Hopefully, systems-analysis tools will be developed to address this, and we will train ourselves to think in terms of statistical process analysis. In that light, should we do patient-specific tests now? I would say yes, even if better tools need to be developed for the future. It may well be that the leaf motions and subsequent fluence and dose maps generated by treatment planning system (TPS) algorithms have predictable patterns, but I am aware of no one who has successfully discovered or characterized them. Nor are there any TPS systems that can model all the parts of a linac and how their behavior changes with use. Until these and related questions are addressed, I can see no a priori means to determine by a standard test suite whether or not a particular dose map can be successfully delivered by a particular linac with a particular confidence level. Is such a failure likely? Actually, it seems that with today's delivery systems, the answer may be no, but failures do happen. The crux of the biscuit is what could happen if failures do occur. Given that we often have high dose gradients around critical structures, there is a very real risk of underdosing the cancer and/or overdosing the organs at risk. One may claim that such errors wash out with daily setup variations, organ motion, and such. Perhaps, but new tools for immobilization, gating, etc., are reducing these variations,4 which means that a systematic error, such as a bad dose delivery, could be significant. The hope is that such dose delivery errors will be detected and corrected at plan QA. However, even if our current tests are insensitive to small but significant errors,2,3 basic failures that can be devastating (such as missing data, mechanical breakdown, and software failure) or even of lesser consequence (such as beam intersecting patient support assembly parts), can be found at this stage, before the patient is placed on the treatment couch. Remember, an error with 200 MU delivered to a planned mostly open region (as in 3DCRT) is one thing; such an error with 1250 MU to a planned mostly blocked region is potentially fatal. I do hope that a better program based on process and failure mode analysis will be developed, but I believe per patient QA is necessary for the foreseeable future if only as a means to catch gross error. It is not the most desirable system but, for now, it is all we have. “IMRT treatment” today can mean a widely varying combination of delivery systems (TomoTherapy, CyberKnife, Gamma Knife, and MLC-based linac IMRT by several vendors), treatment planning systems, and record and verify systems for data transfer. Factoring in the installed software versions of these systems results in thousands of technology combinations with different probabilities for major failure. Choosing a single patient-specific measurement performed before the course of several weeks of treatment may be a good decision to spend QA time and resources, but it may also create a false sense of safety with other, more severe failure modes being overlooked. The first failure mode we have to think about is the treatment planning and delivery software. If the software fails to save the plan correctly, what is the default fallback? For some software, it is an open field with full MU per field, while other software would not allow plan delivery at all. Second, which files are used to execute the IMRT QA measurements? If it is a copy of the treatment delivery file, we have to ask ourselves what, exactly, are we testing? A more thorough MLC QA may be more appropriate in such instances.1 If our validation measurements are based on the delivery files, how, if at all, do we ensure that these files do not get corrupted sometime between the measurement and the end of treatment six weeks later? Lastly, for patients where we deliver dose to very inhomogeneous areas such as lung,5 using advanced respiratory motion compensation (e.g., gating, Calypso, or Synchrony tracking),4 what, exactly, are we verifying in a homogeneous, static phantom, with sometimes all fields delivered from the AP direction?6 As in 3D-conformal treatments, which can also fail to be delivered correctly with lethal consequences (e.g., by omission of physical wedges or malfunctions of the enhanced dynamic wedges), and where there are no current recommendations for patient-specific measurements, there may be equally or even more effective methods for patient-specific QA than a measurement. Even doing a patient-specific measurement for IMRT did not prevent a large fraction of institutions failing the RTOG credentialing process using the RPC phantoms in their first attempt.7 The lesson learned here is that it is not only doing a measurement per se, but also application of the correct experimental setup, execution, and analysis of the measurement, that will ensure that it is indeed a safeguard and not just an exercise to satisfy “compliance.” In conclusion, we should consider a dosimetric measurement to validate an IMRT plan as one among many QA tools available to verify a safe delivery. Depending on the combination of equipment, software, and other QC measurements, it should not be the only method considered to assure safety, unless otherwise proven to us by a thorough failure modes and effects (FMEA) analysis.8 For the future, we should strive to implement systems that can measure the daily delivered dose distribution to patients.9 Dr. Dieterich makes many valid points, similar to some of my own. What she does not say is why we no longer need to perform patient-specific QA in the current state of affairs. In fact, she states that it is “one of many tools,” implying that it is still a valid test, at least in some situations. As both of us state, directly or indirectly, the whole QA mentality needs to change to a systems/failure mode based framework. We are not the first to have made this argument, as some of my references show. It is also true, as we both note, that the patient-specific tests we do today have serious limitations and omissions that can lead to a false sense of security. Yet this is not an argument to cease testing, but rather an argument to know our tests and their weaknesses, hopefully while pursuing better methods. She notes that there are almost limitless combinations of technology, making meaningful test design seem almost impossible. Again, this is not an argument to cease testing, but rather an argument to design one's tests around the local conditions, preferably based on a more general template. 3D treatment errors can be devastating, but the delivery is static, and the dose maps are at least known to be deliverable if commissioning is done properly. In vivo verification is encouraged, if not required. Each IMRT delivery is dynamic and, at least in principle, unique. Dr. Dieterich also implies that the patient-specific IMRT tests are meaningless if the TPS, test devices, and methods are not properly commissioned and used. This is true, but I believe that this is a separate issue outside the scope of this debate. All in all, it would seem we agree that the current state of affairs is at best undesirable. However, I still believe that current patient-specific QA measurements are better than nothing until such time as different methods and tools become available. Thus, we should still do them. My opponent states that we should do patient-specific tests. I agree. However, his subsequent arguments fail to justify a measurement as the ONLY patient-specific test that can ensure patient safety. On the contrary, the literature he provides, specifically Refs. 2 and 3, argue that a measurement will only provide QA for one of many branches of the fault tree. Even more significantly, many of the concerns that are cited, such as the behavior of the linac or missing data, cannot be addressed by a patient-specific measurement alone. IMRT is not just about measuring fluence maps. Indeed, as Dr. Sherouse pointed out in the reference Mr. Smith cites,2 what these fluence maps mean and how they should be measured and interpreted in a meaningful way is anything but clear. Yes, we do have to verify the fluence maps in order to satisfy the requirements of the Current Procedural Terminology (CPT) code, but this code does not specify how to do it: By measurement or independent calculation? So, let us focus beyond CPT codes on the question of what keeps us safe. A measurement that is not well thought through, which does not cover all branches of the fault tree, which does not verify by means of the measurement setup the actual delivery conditions, and which does not verify the high dose gradient areas of IMRT plans very accurately, does not keep us safe. Image guidance and respiratory motion compensation may, at first glance, imply a higher delivery accuracy, but they also carry the potential to introduce a much larger inaccuracy than a minor error in the fluence map if they, and the QA of these techniques, are not correctly implemented. We do not have to wait for an FMEA analysis to be done to critically think about all components of an IMRT plan delivered on a specific device/software combination in order to make an informed decision if a patient-specific measurement, an alternative approach, or a combination of both, is the most prudent way to ensure safe treatment delivery.

References

Page 1

	Year	Citations

Page 1