The reciprocal conversion of environmental data for customer information support

Содержание

Слайд 2

Report Structure

Roshydromet & Unified State Data Fund (USDF).
USDF data.
Main objectives.
DDL as USDF

Report Structure Roshydromet & Unified State Data Fund (USDF). USDF data. Main
data storage format with examples.
The first version of the reciprocal data conversion system.
Description of some algorithms and subsystems.
Current results & Conclusion.

Слайд 3

Roshydromet & Unified State Data Fund

Roshydromet observation network

Unified State Data Fund
(USDF)

Roshydromet & Unified State Data Fund Roshydromet observation network Unified State Data

DHS
№26

DHS - Department of Hydrometeorological Service

DHS №1

. . .

Roshydromet Research Institutes

RIHMI-WDC

Processed data + Observation data

Observation data

Observation data

Слайд 4

USDF data

USDF data can be considered as Big Data, because they

USDF data USDF data can be considered as Big Data, because they
meet the characteristic "3V“ – volume, velocity, variety.
For long-term storage with the preservation of the hierarchical structure of environmental data obtained from observation networks, a specialized format of data – DDL (Hydrometeorological Data Description Language) was developed at RIHMI-WDC.
The data in the DDL format is a combination of files – a file with a description of the data structure, and one or more files directly with the data.

Слайд 5

Main objectives

Due to the fact the data of primary observations are of

Main objectives Due to the fact the data of primary observations are
the greatest interest (can be considered as Big Data), taking into account their specifics, it is necessary to create:
A single technology for all types of data storage, verification (completeness and reliability of data) and provision of UGFD data to consumers in the format necessary for solving their problems.
Technology for the formation and storage of meta descriptions (FSMD), describing the content of files and archives (file collections) of data. The meta description is information about the internal content and data state of each file.
Technology of mutual conversion of UGFD data (from HDDL format to other formats widely used by consumers).

This report is dedicated to the system for mutual data conversion, with control over the adequacy of the conversion performed.
To be more precise - the first version of it.

Слайд 6

General hierarchical structure of the USDF data in the DDL format

General hierarchical structure of the USDF data in the DDL format

Слайд 7

DDL part of meteorological data

2) Part of the record CONST description
RBODY(1) CONST

DDL part of meteorological data 2) Part of the record CONST description
; // Пасп-ые данные
MIT НАИМЕНСТ A(20) PA(20) NA;
MIT КООРДНОМ B(4) PC(7) NA; // Коорд. ном. станц
MIT НОМУПРАВ B(1) PC(2) NA; // Номер УГМС
MIT НОМЧАСП B(1) PC(2) NA; // Номер час. пояса
MIT ПРГЕОРАС B(1) PC(1);
MIT КОЛСРОК B(1) PC(1) NA;// Кол-во сроков набл.

Description of the data header
RECORDS;
LNG ДЛЗАП B(2) PC(4);
MIT НУЛИ B(2) PC(4);
KEY(I) ГОД B(2) PC(4); // Год
KEY(I) МЕСЯЦ B(1) PC(2); // Месяц
KEY(U) СТАНЦИЯ B(4) PC(7);
MRC(I) ТИПЗАП B(1) PC(1); // Тип записи (1-3)

3) Part of the record TPOCHV description
RBODY(3) TPOCHV ; //
KEY(I) ДЕНЬ B(1) PC(2);
CNT СЧГРОГП B(1) PC(1); //
CNT СЧГРЕСП1 B(1) PC(1); //
CNT СЧГРЕСП2 B(1) PC(1); //
MIT СНЕПВЫСТ B(2) PC(4); //
CHA(СНЕПВЫСТ) Q B(1) PC(1) NA;
GRV(СЧГРОГП ) ТЕМПОГ;
IND(1) ПРНАЛИЧ PC(1);
GRP SROKG; // -- Вложенная группа
IND(4) ГЛУБИНЫ PC(1) ;
MIT ТЕМПОГСТ B(2) PC(5,1) D(1); //
CHA(ТЕМПОГСТ) Q B(1) PC(1) NA;
END SROKG ;
END ТЕМПОГ;
END TPOCHV;

Слайд 8

system for mutual data conversion

The DDL format is convenient for accumulating and

system for mutual data conversion The DDL format is convenient for accumulating
storing large arrays of data that make up the USDF, but using it as a data format provided to consumers is impractical due to its specificity, departmental use and complexity for use by consumers.
Studies have shown that to provide consumers with their information service with USDF data, the most popular formats are netCDF, XML, CSV and relational database formats.

Data in DDL format

NetCDF

Relational database

CSV

XML

Слайд 9

СТРУКТУРА СИСТЕМЫ - СМ. годовой отчёт!

СТРУКТУРА СИСТЕМЫ - СМ. годовой отчёт!

Слайд 10

Program interface

Program interface

Слайд 11

DDL -> RDB conversion algorithm

Automatic text generation of a BAT file containing

DDL -> RDB conversion algorithm Automatic text generation of a BAT file
a script for creating a database in a PostgreSQL DBMS.

Description parsing of DDL in order to obtain and save the data structure in DDL format.

Tables creation with their fields, and links.

Sequential reading of each record and conversion of its contents into relational database tables

BAT file formation for relation database creation

Parsing a file with a DDL description

Creating a relation database structure

Converting data from a data file or files.

Слайд 12

DDL -> RDB conversion stages

1) Automatic text generation of a BAT file

DDL -> RDB conversion stages 1) Automatic text generation of a BAT
containing a script for creating a database in a PostgreSQL DBMS.

Слайд 13

DDL -> RDB conversion stages

2) Description parsing of DDL in order to

DDL -> RDB conversion stages 2) Description parsing of DDL in order
obtain and save the data structure in DDL format.

Слайд 14

DDL -> RDB conversion stages

3) Relational database structure generation – the

DDL -> RDB conversion stages 3) Relational database structure generation – the
creation of tables with their fields, and relationships between tables based on the results of parsing the DDL.

Слайд 15

4) Converting data from a data file or files.

DDL -> RDB conversion

4) Converting data from a data file or files. DDL -> RDB conversion stages
stages

Слайд 16

Program interface

Program interface

Слайд 17

Convertation results

Example of data from SYTKI table

Example of data from RECORDS

Convertation results Example of data from SYTKI table Example of data from RECORDS table
table

Слайд 18

Program interface

Program interface

Слайд 19

Establish a connection with the specified relational database and use SQL commands

Establish a connection with the specified relational database and use SQL commands
to get its structure

Parsing and saving the data structure for further conversion

Compare and combine information about the structure of a relational database and data in DDL format

Converting a relational database to a file in the DDL format

Reading and saving the relational database structure

Parsing of the description code on which relation database is based

Comparison of information about both data structures

Data Conversion

RDB -> DDL conversion algorithm

Слайд 20

Methods of adequacy control

The adequacy control subsystem includes the following methods:
"Loop" –

Methods of adequacy control The adequacy control subsystem includes the following methods:
after the conversion is completed, the reverse conversion is performed, followed by a comparison of the results;
Comparison of the results of adequate data queries;
Comparison of relationships between data in different models;

Слайд 21

Conversion results

Conversion results

Слайд 22

Adequacy of results

Adequacy of results

Слайд 23

Adequacy of results

Adequacy of results

Слайд 24

Adequacy of results

Adequacy of results