基于大语言模型数据增强的地铁隧道施工地表沉降预测研究

Research on Ground Settlement Prediction of Subway Tunnel Construction Based on Big Language Model Data Enhancement

  • 摘要: 地铁隧道盾构施工引发的地表沉降对周边环境及基础设施安全影响重大,受限于小样本监测数据,传统预测方法常出现预测精度不足的问题。为此,首先提出一种基于大语言模型(Large Language Model, LLM)的数据增强策略,借助DeepSeek生成高质量盾构施工数据以扩充数据集;随后,采用双阶段建模方式,先筛选出最优机器学习模型,再通过遗传规划拟合出具备可解释性的沉降预测解析公式;最后,引入随机森林分位数回归方法对预测结果进行不确定性评估。结果表明,经LLM数据增强后,模型预测精度与稳定性均优于仅使用原始数据训练的模型,其中随机森林增强组在测试集上的均方根误差降至0.18,平均绝对误差降低为0.08,决定系数提高至0.86;所得解析公式在训练集与测试集上的平均绝对百分比误差分别为3.6%与6.5%,具备良好的泛化能力与物理解释性;90%置信水平下的分位数回归预测区间覆盖率达95.5%,综合区间评价指标与区间平均评分指标均表现优良。

     

    Abstract: Surface settlement induced by shield tunneling construction poses significant risks to the surrounding environment and infrastructure safety. Owing to the limited availability of monitoring data, traditional prediction methods often suffer from insufficient accuracy. To address this issue, a data augmentation strategy based on a large language model (LLM) is first proposed, in which high-quality shield tunneling construction data are generated using DeepSeek to expand the dataset. Subsequently, a two-stage modeling framework is adopted: the optimal machine learning model is first identified, and then genetic programming is employed to derive an interpretable analytical formula for settlement prediction. Finally, a random forest quantile regression approach is introduced to quantify the uncertainty of the predicted results. The results demonstrate that models trained with LLM-augmented data exhibit superior prediction accuracy and stability compared to those trained using only the original data. In particular, the random forest enhanced model achieves a root mean square error of 0.18, a mean absolute error of 0.08, and a coefficient of determination of 0.86 on the test set. The derived analytical formula yields mean absolute percentage errors of 3.6% and 6.5% on the training and test sets, respectively, indicating good generalization performance and physical interpretability. Moreover, under a 90% confidence level, the prediction interval coverage probability of the quantile regression reaches 95.5%, while both the comprehensive interval evaluation metric and the interval average score exhibit favorable performance.

     

/

返回文章
返回