首页期刊介绍通知公告编 委 会投稿须知电子期刊广告合作联系我们在线留言
 
基于多模型组合的类别不平衡海洋数据质量控制方法
作者:宋巍1  张贵庆1  谢京容1  董明媚2  岳心阳2  杨扬2 
单位:1. 上海海洋大学信息学院, 上海 201306;
2. 国家海洋信息中心, 天津 300171
关键词:质量控制 海洋气象数据 集成学习 类别不平衡 
分类号:P731.11
出版年·卷·期(页码):2024·41·第三期(61-70)
摘要:
提出一种多模型组合的两层海洋数据质量控制框架,选择了多种常见分类算法作为基学习器对数据质量标签进行初级预测,再经过投票法或堆叠(Stacking)法确定海洋数据质量的标识符;针对类别不平衡问题,结合自适应下采样策略,降低数据的不平衡比率,并结合Focal Loss损失函数,提升模型对难分类样本的识别能力。以来源于国际综合海洋大气数据集的海表温度和气温数据为例进行质量控制验证,结果表明:投票法或堆叠法对极少类的错误样本分类的 F1 score(精确率和召回率的加权调和平均值)在海表温度数据上可达到0.980 6和0.981 2,在气温数据上可达到0.9985和0.9983。
This paper proposes a two-layer framework for ocean data quality control based on the combination of multiple models. Various common classification algorithms are chosen as base learners to predict the primary quality labels of ocean data, and a Voting or Stacking strategy is used to identify the quality of the data. To address the issue of class imbalance, an adaptive undersampling strategy is combined with the Focal loss function to enhance the model's ability to recognize difficult samples. To verify the performance of the proposed method, we apply it to the quality control of sea surface temperature and air temperature data that are from ICOADS (International Comprehensive Ocean-Atmosphere Data Set). The results show that the F1 score (the weighted harmonic mean of precision and recall) of rare anomaly samples by the Voting or Stacking methods can reach 0.980 6 and 0.981 2 for sea surface temperature data, and 0.9985 and 0.9983 for air temperature data.
参考文献:
[1] FREEMAN E, WOODRUFF S D, WORLEY S J, et al. ICOADS release 3.0:a major update to the historical marine climate record[J]. International Journal of Climatology, 2017, 37(5):2211-2232.
[2] WU G K, ZHANG B P, XU J. Numerical computation of ocean HABs image enhancement based on empirical mode decomposition and wavelet fusion[J]. Applied Intelligence, 2023, 53(16):19338-19355.
[3] 谭哲韬,张斌,吴晓芬,等.海洋观测数据质量控制技术研究现状及展望[J].中国科学:地球科学, 2022, 52(3):418-437. TAN Z T, ZHANG B, WU X F, et al. Quality control for ocean observations:from present to future[J]. Science China Earth Sciences, 2022, 65(2):215-233.
[4] GOURETSKI V. World ocean circulation experiment-Argo global hydrographic climatology[J]. Ocean Science, 2018, 14(5):1127-1146.
[5] SCAVIA D, RABALAIS N N, EUGENE TURNER R, et al. Predicting the response of Gulf of Mexico hypoxia to variations in Mississippi River nitrogen load[J]. Limnology and Oceanography, 2003, 48(3):951-956.
[6] 任焕萍,张斌,谭哲韬,等.一种精细化的海洋浮标数据质量控制方法[J].海洋科学, 2021, 45(10):93-103. REN H P, ZHANG B, TAN Z T, et al. A new quality control scheme for marine buoy temperature and salinity data[J]. Marine Sciences, 2021, 45(10):93-103.
[7] 刘首华,陈满春,董明媚,等.一种实用海洋浮标数据异常值质控方法[J].海洋通报, 2016, 35(3):264-270. LIU S H, CHEN M C, DONG M M, et al. A quality control method for the outlier detection of buoy observations[J]. Marine Science Bulletin, 2016, 35(3):264-270.
[8] 王辉赞,张韧,王桂华,等. Argo浮标温盐剖面观测资料的质量控制技术[J].地球物理学报, 2012, 55(2):577-588. WANG H Z, ZHANG R, WANG G H, et al. Quality control of Argo temperature and salinity observation profiles[J]. Chinese Journal of Geophysics, 2012, 55(2):577-588.
[9] WONG A, KEELEY R, CARVAL T. Argo quality control manual for CTD and trajectory data[R]. ARGO, 2024.
[10] 许立兵,王安喜,汪纯阳,等.基于机器学习的海洋环境预报订正方法研究[J].海洋通报, 2020, 39(6):695-704. XU L B, WANG A X, WANG C Y, et al. Research on correction method of marine environment prediction based on machine learning[J]. Marine Science Bulletin, 2020, 39(6):695-704.
[11] TIMMS G P, DE SOUZA JR P A, REZNIK L, et al. Automated data quality assessment of marine sensors[J]. Sensors, 2011, 11(10):9589-9602.
[12] ZHOU Y S, QIN R F, XU H P, et al. A data quality control method for seafloor observatories:the application of observed time series data in the East China Sea[J]. Sensors, 2018, 18(8):2628.
[13] LE GUEN R. Machine Learning applied to Argo floats temperature and salinity Delayed-Mode Quality Control (Core-Argo DMQC)[R]. ARGO, 2019:71-100.
[14] 刘玉龙,王国松,侯敏,等.基于深度学习的海温观测数据质量控制应用研究[J].海洋通报, 2021, 40(3):283-291. LIU Y L, WANG G S, HOU M, et al. Quality control of sea temperature observation data using deep learning neural networks[J]. Marine Science Bulletin, 2021, 40(3):283-291.
[15] MIERUCH S, DEMIREL S, SIMONCELLI S, et al. SalaciaML:a deep learning approach for supporting ocean data quality control[J]. Frontiers in Marine Science, 2021, 8:611742.
[16] 向先全,路文海,杨翼,等.海洋环境监测数据集质量控制方法研究[J].海洋开发与管理, 2015, 32(1):88-91. XIANG X Q, LU W H, YANG Y, et al. Research on quality control methods of marine environmental monitoring datasets[J]. Ocean Development and Management, 2015, 32(1):88-91.
[17] LIU Z N, CAO W, GAO Z F, et al. Self-paced ensemble for highly imbalanced massive data classification[C]//Proceedings of the 2020 IEEE 36th International Conference on Data Engineering. Dallas:IEEE, 2020:841-852.
[18] 李颖.基于决策树算法的信息系统数据挖掘研究[J].信息技术, 2022(2):116-120. LI Y. Research on information system data mining based on decision tree algorithm[J]. Information Technology, 2022(2):116-120.
[19] 耿丹,刘婷婷,李超.结合FY-4A卫星及随机森林的日间沿海海雾识别模型的研究[J].海洋预报, 2022, 39(3):83-93. GENG D, LIU T T, LI C. Research on a daytime sea fog identification model based on FY-4A satellite data and random forest algorithm[J]. Marine Forecasts, 2022, 39(3):83-93.
[20] 王丹,李林,赵丹.基于LightGBM的企业财务风险预测[J].信息科学, 2022, 602:259-268. WANG D, LI L, ZHAO D. Corporate finance risk prediction based on LightGBM[J]. Information Sciences, 2022, 602:259-268.
[21] LIN T Y, GOYAL P, GIRSHICK R, et al. Focal loss for dense object detection[C]//Proceedings of 2017 IEEE International Conference on Computer Vision. Venice:IEEE, 2017:2999-3007.
[22] 孙昭,李云,江毓武,等.基于Stacking机器学习模型的南海北部海温预报[J].海洋预报, 2023, 40(1):39-45. SUN Z, LI Y, JIANG Y W, et al. Sea temperature forecast in the northern South China Sea base on Stacking machine learning model[J]. Marine Forecasts, 2023, 40(1):39-45.
[23] TAN J R, WANG C B, LI B Y, et al. Equalization loss for longtailed object recognition[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle:IEEE, 2020:11659-11668.
服务与反馈:
文章下载】【发表评论】【查看评论】【加入收藏
 
 海洋预报编辑部 地址:北京海淀大慧寺路8号
电话:010-62105776
投稿网址:http://www.hyyb.org.cn
邮箱:bjb@nmefc.cn