ISSN 1009-6248CN 61-1149/P 双月刊

主管单位:中国地质调查局

主办单位:中国地质调查局西安地质调查中心
中国地质学会

    高级检索
    魏东琦, 江宝得, 张静雅. 非结构化地质数据内容存储方法研究[J]. 西北地质,2021,54(4): 266-273.
    引用本文: 魏东琦, 江宝得, 张静雅. 非结构化地质数据内容存储方法研究[J]. 西北地质,2021,54(4): 266-273.
    WEI Dongqi, JIANG Baode, ZHANG Jingya. Research on Content Storage Method of Unstructured Geological Data[J]. Northwestern Geology,2021,54(4): 266-273.
    Citation: WEI Dongqi, JIANG Baode, ZHANG Jingya. Research on Content Storage Method of Unstructured Geological Data[J]. Northwestern Geology,2021,54(4): 266-273.

    非结构化地质数据内容存储方法研究

    Research on Content Storage Method of Unstructured Geological Data

    • 摘要: 地质工作已迈入大数据时代,但地学信息被记录成的报告、图件等非结构化数据,仍按照较为简单的方式组织归类到一起并存储在文件系统中,形成很多个内部构成复杂的数据集。这种方式不能很好的表达非结构化数据承载的丰富地学信息,也不便表达信息之间的复杂关系,更不利于发现跨数据集存在的深层知识。为尝试解决这个问题,笔者提出了多粒度级别内容树模型和支持演化的数据建模方式。这些特性使得通过模型可以对数据内容进行不同尺度的拆分,对信息的精确定位,还可以使模型根据数据主体需要,拓展主体特征描述的维度,逐步发现数据包含的信息和建立信息与信息之间的关系。考虑到地质大数据的特点,设计了以HBase为核心的数据模型持久化方式,以达到使用大数据技术体系下技术分析处理数据的目的;最后给出了对成果地质数据进行建模的实例,将文档、图件等非结构化数据以内容实体为最小单元进行拆分和重构,达到了较好的内容组织和信息表达效果。

       

      Abstract: Geological work has entered the era of big data, yet the unstructured data, such as reports and maps carrying geosciences information, are still classified in simple ways and stored in the file system, forming a lot of data set with complex internal structures. This method cannot well deliver the abundant geosciences information carried by unstructured data or the complex relationships with information, nor can it discover the knowledge deeply existing across data sets. To solve the problem, this paper proposes a multi-granularity level content tree model and a data modeling method that supports evolution. The model can split the data content at different scales and accurately locate the information and meanwhile expand the dimension of the subject's feature description according to the need of the data subject. The information contained in the data is finally discovered and the relationship with information is thus established. This paper designs a persistence method of data model with HBase as the core to achieve the purpose of processing data under the big data technology system. A modeling example shows preferable effect in content organization and information conveying, with the unstructured data of documents and maps split and reconstructed as the smallest unit of the content entity.

       

    /

    返回文章
    返回