Beyond the confines of a randomized controlled trial that often requires long-term follow-up to answer the outcome of interest such as overdiagnosis resulting from screening (Figure 1 (a)), the alternative may resort to a computer simulation approach that models the underlying natural history disease process and then assesses how screening alter the disease process using the big data associated with the non-randomized service screening. However, determining the extent of the overdiagnosis for population-based colorectal cancer (CRC) service screening with stool-based service is frequently difficult because predicting the trajectory of the counterfactual group when receiving screening like the control group in a randomized trial has been often involved with a complex and hidden time-related disease natural process. More importantly, the most intractable prediction is pertaining to the separation of progressive pre-clinical screen-detectable phase (PCDP) from the over-detected state evolving with time both of which would not have been identified (Figure 1 (b)) and available (Figure 1 (c)) had the screening not been administrated. It would be very hard to distinguish lead-time-gained cases from over-detected cases with big data on time-stamped screening round making allowance for the sensitivity of fecal immunochemical test (FIT) and competing mortality without using the delicate design other than the randomized controlled trial and the statistical machining algorithm. The purpose in this study is to provide an unbiased estimate of the proportion of overdiagnosis caused by the use of the fecal immunochemical test (FIT) in population-based service screening programs for colorectal cancer using the digital twin design in conjunction with Markov algorithm. Therefore, the time-stamped natural process of disease that is embedded with information on overdiagnosis was first constructed in order to learn transition parameters that quantify the pathway of non-progressive and progressive screen-detected cases calibrated with sensitivity, while also taking competing mortality into account (Figure 1 (c) and (d). A series of Markov transition algorithms were then built for the purpose of training these transition parameters based on the big data of CRC screening obtained from 5,417,699 Taiwanese individuals aged 50-69 years collected by Taiwan's FIT service between 2004 and 2014. Following the digital twin design with the parallel universe structure for emulating the randomized controlled trial, the screened twin, which mirrored the control group that did not undergo screening, was virtually recreated by applying the aforementioned trained parameters to predict CRC cases that contained the hidden over-detected state without and with considering adenoma (Figure 1 (c) and (d)). The amount of the overdiagnosis of colorectal cancer that is associated with FIT screening was determined by comparing these expected cases with the real-world data on the observed CRCs with an equation developed by imputing the ratio of the predicted CRCs generated from the screened twin to the observed CRCs of the comparison group minus 1. Note that the comparison group was readily derived from the pre-screening period with adjustment for increasing incidence trend.
Figure 1. Randomized controlled trial (RCT) and digital twin design for overdiagnosis in population-based screening.
The degree of overdiagnosis for invasive CRCs caused by FIT screening without considering adenoma is 4.16% (95% confidence interval: 2.61-5.78%). The comparable number rises to 9.90% (95% confidence interval: 8.41-11.42%) when high grade dysplasia (HGD) is taken into account, and it rises even further to 15.83% (95% confidence interval: 15.23-16.46%) when removal adenomas are taken into account. From the practical aspect of screening, a small percentage of overdiagnoses modelled by the digital twin approach supports that the population-based FIT service screening does not cause any significant amount of harm. From the methodological aspect of screening, in addition to estimating overdiagnosis with the digital twin dispensing with the use of randomized controlled trial, such a digital twin design can be very powerful for evaluating a series of precision screening strategies that are tailored for reducing overdiagnosis and mortality by expanding the currently proposed Markov algorithms into Markov decision network in order to transform one-size-fit-for-all screening policy into individually-tailored screening strategy.