Research on Ground Settlement Prediction of Subway Tunnel Construction Based on Big Language Model Data Enhancement
-
Abstract
Surface settlement induced by shield tunneling construction poses significant risks to the surrounding environment and infrastructure safety. Owing to the limited availability of monitoring data, traditional prediction methods often suffer from insufficient accuracy. To address this issue, a data augmentation strategy based on a large language model (LLM) is first proposed, in which high-quality shield tunneling construction data are generated using DeepSeek to expand the dataset. Subsequently, a two-stage modeling framework is adopted: the optimal machine learning model is first identified, and then genetic programming is employed to derive an interpretable analytical formula for settlement prediction. Finally, a random forest quantile regression approach is introduced to quantify the uncertainty of the predicted results. The results demonstrate that models trained with LLM-augmented data exhibit superior prediction accuracy and stability compared to those trained using only the original data. In particular, the random forest enhanced model achieves a root mean square error of 0.18, a mean absolute error of 0.08, and a coefficient of determination of 0.86 on the test set. The derived analytical formula yields mean absolute percentage errors of 3.6% and 6.5% on the training and test sets, respectively, indicating good generalization performance and physical interpretability. Moreover, under a 90% confidence level, the prediction interval coverage probability of the quantile regression reaches 95.5%, while both the comprehensive interval evaluation metric and the interval average score exhibit favorable performance.
-
-