Persistent Identifier
|
doi:10.18738/T8/OOTALX |
Publication Date
|
2022-07-14 |
Title
| Replication Data for: Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections |
Author
| Esteva, Maria (University of Texas at Austin)
Weijia Xu (University of Texas at Austin)
Nevan Simone (University of Texas at Austin)
Kartik Nagpal (University of Texas at Austin)
Amit Gupta (University of Texas at Austin)
Moriba Jah (University of Texas at Austin) |
Point of Contact
|
Use email button above to contact.
Esteva, Maria (University of Texas at Austin) |
Description
| The dataset in this publication demonstrates the implementation and capabilities of Synchronic Curation (SC) in ASTRIAGraph. SC is a framework to curate multiple and large datasets for purposes of integration and reuse in research applications. Data driven applications often require data integrated from different large and continuously updated collections. These collections may present gaps and overlaps, or may conflict with or complement each other. Thus, a curation need is to continuously assess if data are fit for integration and reuse. The SC framework involves processing steps to map different collections to a unifying data model that represents research problems in a scientific area as well as the collections' provenance. Data points from the collections that are integrated to the system are mapped to the data model, and a unified data dictionary is maintained centrally and expanded as needed. The data model is implemented in a graph database where collections are continuously ingested and queried. SC includes a collection analysis and comparison module to track collections updates, and to identify gaps, changes, and irregularities within and across them. Users can query the database or access comparison results interactively through an interactive graph. We present three files: 1) The Synchronic Curation data model's state in ASTRIAGraph up to the date of this publication. The data model includes labeled classes identified by domain scientists as comprising research problems in the space, their corresponding properties. Classes and properties are defined according to a unified data dictionary maintained by the ASTRIAGraph team. Some terms/labels and definitions are extracted from the Unified Astronomy Thesaurus. The names of the collections that are ingested to ASTRIAGraph are also included in the data model, as well as the relationships between their data points to the classes and properties that they had been mapped to. 2) Schema for comparing data fields of two versions of the collection of the United Nations Office for Outer Space Affairs (UNOOSA) Space Object Register. 3) Matrix with the final tally of the comparison of the two versions. The results can be accessed via web based interactive graphs whose URL are noted in the metadata. Originally developed for ASTRIAGraph, SC can be applied to other areas of knowledge. It is specially useful for very large and frequently updated datasets. This dataset can be used to learn about the methodology used to process the data for SC and to replicate results. |
Subject
| Engineering; Other |
Keyword
| Data Curation
Data Management
Aerospace Engineering
Space Sustainability
Data Modeling |
Related Publication
| Maria Esteva, Weijia Xu, Nevan Simone, Kartik Nagpal, Amit Gupta, Moriba Jah "Synchronic Curation for Assessing Reuse and Integration Fitness of Multiple Data Collections" in International Digital Curation Conference, 2022. doi: https://doi.org/10.5281/zenodo.6641885
Maria Esteva, Weijia Xu, Nevan Simone, Amit Gupta, and Moriba Jah "Modeling Data Curation to Scientific Inquiry: A Case Study for Multimodal Data Integration" in ACM/IEEE Joint Conference on Digital Libraries in 2020. doi: https://doi.org/10.1145/3383583.3398539 |
Production Date
| 2022-01-03 |
Production Location
| Austin, TX |
Depositor
| Esteva, Maria |
Deposit Date
| 2022-06-27 |
Data Type
| data model and data comparison in json format |
Software
| AstriaGraph, Version: http://astria.tacc.utexas.edu/AstriaGraph/ |
Related Material
| Synchronic Curation: Data Model, an interactive visualization based on the data model dataset published here. See http://astriaservices.tacc.utexas.edu/liveschema; Synchronic Curation: Data Analysis and Comparison Module, an interactive visualization based on the dataset published here. http://astriaservices.tacc.utexas.edu/liveschema |
Other Reference
| Unified Astronomy Thesaurus, American Astronomical Society https://astrothesaurus.org/ |
Data Source
| The analysis and comparison module dataset compares two versions the collection of the United Nations Register of Objects Launched into Outer Space. They are: the Outer Space Objects Index (https://www.unoosa.org/oosa/osoindex/index.jspx?lf_id= ) and the Space Object Registry ( https://www.unoosa.org/oosa/en/spaceobjectregister/national-registries/index.html). The Common_field_comparison |
Origin of Historical Sources
| Both versions of the collection of the United Nations Register of Objects Launched into Outer Space are publicly available and belong to the the United Nations Office for Outer Space Affairs (UNOOSA). |
Characteristic of Sources
| They are different versions of the same information, but the granularity, processing methods and processing time are different. Thus they present differences |
Documentation and Access to Sources
| Information about the provenance of the UNOOSA datasets can be found at: https://www.unoosa.org/oosa/en/spaceobjectregister/index.html Information about the process by which the AstriaGraph_dataModel dataset is generated can be found in both of the Related Publications. |