Data Integration

Innovation: Integration of Geovisual and Humanities Databases

All of our work will be made possible by our plan to integrate Geovisual and Humanities databases. Unlike other efforts that focus on manuscript transcription or GIS-interpretation, our work will integrate a humanities-oriented database (historical details, events, and relationships) with a geovisual database (GIS and 3D modeling). Filemaker Pro is ideal for our humanities-oriented database because (1) it utilizes open ODBC and XML standards to exchange data and (2) its relational databases can exchange static and live data with SQL data sources. Further, Filemaker Pro’s relatively user-friendly tools will encourage our non-technical humanities scholars to learn to use more robust data analysis tools in their scholarship.

Although the backbone of our data management scheme will utilize traditional relational databases, with the assistance of Dr. Kantabutra we will begin testing a new data organization process known as Intentionally-Linked Entities (ILE). It replaces the rigid “indexing of things” with a more flexible approach that links “entities” (things, usually nouns) and “entity sets” (e.g. the entity set of “municipalities”) through “pointers,” which point in both directions. ILE also allows for any number of relationships to be represented as relationship objects. That is, each relationship object relates entities that have “roles” to play in that particular relationship. This aspect of ILE makes it particularly simple (and efficient) for users to navigate among the stored entities involved in that relationship and empowers scholars to explore more effectively the interconnections among individuals and groups defined by kinship, faith, and office.

Data Management Plan

Dr. Martinez at UCCS and Dr. Schinazi at ETH-Zurich will be responsible for overseeing the successful collection, preservation, and dissemination of project information. In his capacity as an database expert, Dr. Kantabutra will advise on all efforts to integrate our data processes and storage. Generally speaking, this project deals with data of two types: spatial (i.e. location-based) and non-spatial (e.g. properties of different places and things in space). Microsoft Access, Filemaker Pro, SQL and ESRI’s ArcSDE geodatabase will be used to support the storage and management of spatial and non-spatial data. Our storage and management scheme includes three components:

  • an internal project data warehouse that holds our Transcription and Recording Database and our integrative Geodatabase that contains all transcription-historical, geovisual and 3D modeling data, and
  • an Internet-accessible transcription tool that feeds our researchers’ work into the main project databases,
  • a public project website that presents our interactive Virtual Plasencia model, as well as disseminates our iBook, PDF/ebook, and NEH white paper publications. The backend of the website will consist of the transcription and geodatabase. The frontend of the website will present the digital world (both Virtual Plasencia and the digital historical documents).
Copyriught 2014. Revealing Cooperation and Conflict Project.
Copyriught 2014. Revealing Cooperation and Conflict Project.

Expected Data and Data Format

The project will generate a comprehensive collection of information, including:

  • Digital photographs of primary source manuscripts in JPEG and TIFF.
  • Digital photographs of buildings, public spaces, and roads in TIFF and JPEG format.
  • Digital  maps and architectural blueprints stored CAD (vector) and TIFF (raster) formats.
  • Electronic transcriptions of primary source manuscripts in text format.
  • Relational databases (Transcription and Recording Database and Geovisual Database) in SQL that hold familial, economic, social, religious, and political event details.
  • Geovisual 3D models (in .3ds, .skp, .dxf)  for 30 percent of the walled city of Plasencia that includes a network of 30 streets interlinking 50 prominent buildings and public spaces.
  • Software code and scripts that underpin the project’s Filemaker Pro and custom software applications. For the virtual world these will be developed mainly in Javascript and C#.
  • iBook and PDF/ebook publications, as well as the NEH White Paper.
  • Project website that hosts Virtual Plasencia and all publications.

Some of the particular data formats (and storage processes) that we will utilize are:

  • Digital map repository: In order to organize the different maps used in the project we will create a digital map repository stored within a secure and dedicated server at ETH. All paper maps will be digitized, georeferenced and catalogued according to type, scale, projection and corresponding historical period. Maps will be identified following a standardized labeling system (location_year_projection_type). Raster to vector and vector to raster conversion will be applied to maps depending on their use and desired level of user interaction. These maps will stored as individual ArcGIS projects and team members will have the choice to download them directly or work with the individual raster or vector files.
  • Virtual Plasencia (3D modeling): We will create a Geovisual Design Document (GDD) in order to organize the design and launch Virtual Plasencia. A GDD is a blueprint used to coordinate the different elements and materials that go into building a computer or video game. It is a reference document and a communication tool between the different members of the development team. The GDD will be a live document stored in an ETH server along with all elements that relate to the game (i.e., 3D models, architectural blueprints, photographs, gaming scripts).

Data Format and Dissemination

Digital data will be stored within a secure server at ETH. Using a previously established secure Microsoft Sharepoint sharing technology, all data will be made available to project collaborators. With the support of the Information Technology (IT) team, appropriate access level will be determined and granted to each collaborator. Microsoft Sharepoint will be configured to adhere to ISO 15489 standard of records keeping, and COBIT (Control Objectives for Information and Related Technology) guidelines for data governance. The ETH IT team has a vast experience setting up various collaborative environments for research purposes. All data will be the shared intellectual property of the participating scholars and institutions. Materials will also be publically accessible via our public project website and other dissemination strategies.

Period of Data Retention

UCCS and ETH-Zurich will guarantee the hosting of their respective public project websites and project data for a period of no less than ten years, but PIs intend to host these sites on a perpetual basis. In the event that neither UCCS or ETH-Zurich can permanently host the sites, the Project Director will seek another public access hosting provider.

Data Storage and Preservation of Access

The primary repositories for all data will be UCCS and ETH-Zurich servers and websites that will be properly indexed and searchable via the respective campuses’ official university websites. Both these university data storage systems are equipped with automated and redundant backup services. These systems run a full backup on a regular schedule according to university policies.

Data Protection

Data used in this project are already public domain. All personal data collected using the transcript tool will be made anonymous and not made available for public access.