Discussion
We performed an extensive study of the impact of nSES on ACN risk prediction. Our study indicated that ADI was significantly associated with ACN in EAs, and adding ADI to the ACN prediction model for EAs effectively improved the calibration accuracy across area deprivation subgroups.
nSES reflects area or geographical disparity in healthcare availability, affordability, acceptability and accessibility, and it may influence cancer outcomes of the residents.37 Our study is motivated by previous studies that found association between neighbourhood socioeconomic disadvantage and CRC risk3 9 and a study using nSES to improve cardiovascular risk prediction.12 We found that without adding ADI, the prediction model for EAs showed overprediction of ACN risk for those living in affluent neighbourhoods, and underprediction of risk for those living in disadvantaged neighbourhoods. Such discrepancy of calibration across area deprivation groups was corrected by adding ADI to the risk prediction model. Unlike the EA patients, we found that ACN prevalence in the AA patients does not exhibit a significant trend of increase with ADI. Additionally, the association of ADI with can in AAs was non-significant, regardless of whether the random effect of the census tract was considered in our analyses. Consequently, including ADI in the prediction model had limited improvement in the calibration accuracy across area deprivation groups in AAs. This limitation may stem from the non-linear relationship between ADI and the observed ACN risk among AAs. Specifically, the risk of ACN appears higher for patients in the second quantile of ADI compared with those in the other three quantiles (figure 1B). Thus, the linearity assumption of the risk prediction model using logistic regression may not adequately account for the complexity of this relationship. In addition, the AA participants in this study are mainly from much more disadvantaged neighbourhoods in Cleveland than the EA participants (table 1, mean ADI=53.11 for AAs vs mean ADI=28.09 for EAs, P=2.37×10−240). Ladabaum et al found a similar racial difference in the association between nSES and CRC incidence, with the association being significant among non-Hispanic whites, but considerably smaller among Asian/Pacific Islanders.38 Moreover, ADI can only explain a small portion of census tract-level variation in ACN risk in AAs, indicating the presence of other environmental risk factors not captured by nSES.
The finding of nSES disparity in ACN risk prediction is intriguing. nSES reflects the social determinants of health, the conditions to which people are exposed during their life course that are shaped by the distribution of money, power and resources.39 nSES can significantly impact health outcomes by altering the conditions that individuals or populations face. It affects their exposure to risk factors, their access to healthful behaviours and healthcare resources, and their ability to manage their health effectively.40 The observed disparity of nSES in ACN risks may in part be attributable to the higher prevalence of adverse health behaviours in low-SES populations.9
We found EAs and AAs had differences in ACN risk and risk factors, and this racial disparity in ACN led to our development of race-specific ACN prediction models. In our study, AAs have higher prevalence of ACN than EAs, which is consistent with other studies that found racial/ethnic disparities in the risk of ACN or colorectal adenoma.2 41 In univariable association analysis, age, years of smoking and diabetes show weak associations in both EAs and AAs (P<0.10), although none of them are significantly associated with ACN risk in both racial populations. However, calcium and BMI were only significantly associated with ACN in EAs. Other studies also found such racial differences, for example, two studies found BMI and waist circumference were only significantly associated with colon adenoma in whites,42 43 which is consistent with our results. Such differences in association might reflect the racial difference in underlying ACN exposures and their impact on outcome.
The prediction model fitted by the entire population (model 1) predicted well in the EAs but showed poorer performance in the AA population. Conversely, race-specific prediction models developed for EAs and AAs, with fewer predictors, exhibited better prediction accuracy within their respective racial/ethnic groups compared with the overall model. Notably, there is currently no reported ACN prediction model specifically tailored for AAs. The model reported by Schroy et al22 was developed using a diverse population, including mainly whites and blacks, and some Hispanics. Similar to our model 1, its prediction accuracy in blacks was lower than in whites. To bridge this gap, we developed the ACN prediction model tailored specifically for the AA population (model 3), which demonstrated the highest prediction accuracy for AAs among the three models we developed.
It is difficult to have one ACN prediction model fitting for multiple races/ethnicities. Model 2 for EAs did not predict well in the AAs (online supplemental table 5, C-statistic=0.608; calibration P=0.018), and model 3 for AAs did not predict well in EAs (C-statistic=0.586; calibration P=0.369). Although a number of ACN risk prediction models have been developed based on populations in Western and Asian countries,19–24 and a recent study found that a CRC prediction model may also have the potential to predict ACN risk,44 we found that four published ACN prediction models that were developed from various populations (EAs, multiple races or Asian)20 22–24 had lower discrimination and very poor calibration performance in our population (C-statistics <0.61 and calibration P values ≤0.01 for our EA, AA and the entire population, online supplemental table 6), which may further indicate the limitation of the current ACN prediction models and the complexity of generalising ACN prediction models to other races or populations. The limited generalisability of ACN prediction to external populations of the same or different races was also reported by other studies.45 Such limitation might result from the differences in population exposures such as socioeconomic factors and lifestyle risk factors, or differences in tumour molecular mechanisms or sample ascertainment criteria. Alternatively, this limited generalisability could also be due to model overfitting.
In order to obtain robust and stable prediction models with our relatively limited sample size, we performed 1000 bootstrap samplings for model fitting and backward selection, and obtained 1000 best-fitting models that only predictors with multivariable P values <0.1 were kept in each of them. The predictors that have been stably selected, that is, most frequently appeared in the 1000 best-fitting models, were selected as the final predictors. Therefore, although a predictor may have a multivariable P value >0.1 estimated in the original dataset (as shown in table 2), in each of the sampled datasets, its multivariable P value was <0.1. Such bootstrap sampling along with an automated selection procedure has been used to obtain stable models, and it has been demonstrated that the identified variables are important predictors.29 It is well known that traditional stepwise model selection methods, such as backward elimination, forward selection and bidirectional elimination, have disadvantages, especially when the sample size is small. They have limited power to select true variables, are likely to include noise variables in the final model, may lead to biased coefficients and overfitting, and consequently, they can produce unstable models.46 Bootstrapping variable selection methods have been shown to effectively mitigate model instability.29 47 In our study, we used a significance level of P=0.10 as the stopping cut-off for variable selection. This threshold is between 0.05 and P value of 0.157,48 which corresponds to the popular AIC criterion49 to avoid missing true predictors and to produce better predictive performance,50 although it may result in a less parsimonious model.
Limitations
Our study sample size is relatively limited and does not allow for split-sample validation. Independent validation of our ACN prediction models is needed.
Strengths
Our study represents the first effort to incorporate nSES into ACN risk prediction. The results from this study strongly suggest that failure to take into account neighbourhood deprivation in ACN risk prediction may lead to biased risk assessment and further worsen health disparities for those living in disadvantaged communities. This insight may help to guide future research and potentially influence targeted prevention policies for colorectal neoplasia.