Difference between revisions of "Importing Data"

From ARK
Jump to: navigation, search
Line 1: Line 1:
As of the v1.0 release of ARK there is a set of data import tools within ARK. These tools speed up and improve the process of creating a concordance map between the source data and an ARK database. Never the less, using the tools will require a working knowledge of ARK's data structures. If you are unable to imprt your data or would like training in the ARK import tools, please contact [http://www.lparchaeology.com/cms/about/contact/ L - P : Archaeology] who provide both training and custom ARK installations.
+
As of the v1.0 release of ARK there is a set of data import tools within ARK. These tools speed up and improve the process of creating a concordance map between the source data and an ARK database. Never the less, using the tools will require a working knowledge of ARK's data structures. If you are unable to import your data or would like training in the ARK import tools, please contact [http://www.lparchaeology.com/cms/about-lp/contact L - P : Archaeology] who provide both training and custom ARK installations.
  
 
==Overview==
 
==Overview==
  
The ARk import tools are designed to provide a means to import data from table based data. This can be either in the form of a series of tables from a relational database or from a spreadsheet. The tools map the fields int he data onto fields within the ARK data structure. Data can then be inspected before running an import.
+
The ARK import tools are designed to provide a means to import data from table based data. This can be either in the form of a series of tables from a relational database or from a spreadsheet. The tools map the fields int he data onto fields within the ARK data structure. Data can then be inspected before running an import.
  
 
The tools provide some join functionality and are intended to reduce the need for pre-processing of data where possible. Inevitably, there will be some need to pre-process data for import.
 
The tools provide some join functionality and are intended to reduce the need for pre-processing of data where possible. Inevitably, there will be some need to pre-process data for import.
Line 15: Line 15:
 
This is the key part of each concordance map, it contains the mappings for each field in the source data that will be imported into the ARK database. Due to the way ARK data is structured, it is often the case that many source data fields are not needed for import.
 
This is the key part of each concordance map, it contains the mappings for each field in the source data that will be imported into the ARK database. Due to the way ARK data is structured, it is often the case that many source data fields are not needed for import.
  
Once mapped, a field can be imported and reimported using the same mapping. Future version of ARK may support this as a method for dynamic updates of the data over the web.
+
Once mapped, a field can be imported and re-imported using the same mapping. Future version of ARK may support this as a method for dynamic updates of the data over the web.
  
 
To use a simple example, a field in the source data such as 'Description' which is a column in a source database, will be mapped to an ARK txttype ready for import.
 
To use a simple example, a field in the source data such as 'Description' which is a column in a source database, will be mapped to an ARK txttype ready for import.

Revision as of 14:38, 20 September 2010

As of the v1.0 release of ARK there is a set of data import tools within ARK. These tools speed up and improve the process of creating a concordance map between the source data and an ARK database. Never the less, using the tools will require a working knowledge of ARK's data structures. If you are unable to import your data or would like training in the ARK import tools, please contact L - P : Archaeology who provide both training and custom ARK installations.

Overview

The ARK import tools are designed to provide a means to import data from table based data. This can be either in the form of a series of tables from a relational database or from a spreadsheet. The tools map the fields int he data onto fields within the ARK data structure. Data can then be inspected before running an import.

The tools provide some join functionality and are intended to reduce the need for pre-processing of data where possible. Inevitably, there will be some need to pre-process data for import.

Concordance Maps

Each instance of ARK can contain multiple concordance maps. The concordance map holds the information (mappings) that explains to the import tools how a particular element of the data should be reworked for import to the ARK structure. The concordance map maps both 'structure' and 'data'.

Structure Mapping

This is the key part of each concordance map, it contains the mappings for each field in the source data that will be imported into the ARK database. Due to the way ARK data is structured, it is often the case that many source data fields are not needed for import.

Once mapped, a field can be imported and re-imported using the same mapping. Future version of ARK may support this as a method for dynamic updates of the data over the web.

To use a simple example, a field in the source data such as 'Description' which is a column in a source database, will be mapped to an ARK txttype ready for import.

Data

Data mapping maps certain terms or values in a source database to certain terms or values in the ARK database. For example, this permits a term such as 'castle' in a source database to be mapped to a term such as 'defensive structure' in a target database.

When importing from controlled lists, ARK makes new data mappings on the fly as it finds each new term within the list.

By manually creating a data mapping, it is possible to alter the way this data is imported without the need for pre-processing. This is especially important in the case of dynamically re-importing data.

How to import data

This gives an overview of how to import data into ARK using the built in import tools.

Source Data

The first task is to take a close look at the data you are about to import. Problems in the data will not be magically corrected by the import tools. If you put junk in you will get junk out.

At this stage the tools will only read data in from a source database on the same MySQL server. There is no reason you couldn't adapt the tools to easily work with another database such as Postgres, but at this stage, the tools will not read in data external to the server. The first task is therefore to import your data to the server. You can either create tables on the target database or set up a separate 'source database'.

The easiest way to get data onto the server is probably to save each table as a comma separated values file and import this to the server using phpMyAdmin.

Be sure to prefix source database tables with "import_" this will indicate to the import tools that you wish to map data in this table. Look up tables and other supplementary tables which will not be directly imported do not need this prefix.

As a rule, you should name your columns with sensible names, avoid fancy characters and so on. Try to make the names unique and memorable.

It is often simplest to create the table on the database first and then simply import the data into this table (and its defined columns), although this is up to you.

Once you have imported your source data to the server, it is time to start mapping your databases.

  • UID Column - Any table that you prefix 'import_' and intend to import data from MUST have a column which contains a unique ID for each row of the table. This is used to loop over the data by the tools, it is not imported in anyway, but it is essential. This can in fact be a column containing your 'key' data, although in certain circumstances it is desirable to create (manually) and new UID column on the source table specifically for the purpose of importing the data. See below for further information.

Concordance Map - CMAP

If you have not done so already, you will need to create a new Concordance Map. To do this, use the tool available on the left panel fo the import tools home page.

Fill in the required fields carefully (check spellings!) and save the new map to the database.

  • Nickname - A mnemonic, this can be anything you like that will help you remember the CMAP. Do NOT use spaces or funny characters here.
  • Description - A text field that you can use to describe this CMAP for future reference. This will accept UTF-8 characters etc.
  • Source DB - The exact name of the source database on this server. (May be the same as the ARK db)
  • Target Site Code - This is a default site code for the import. More complex options for this can be set in the structure map (which override this setting).

You can edit this map afterwards using the built in edit CMAP tool.

CMAP Structure

Once you have set up a CMAP, you can begin mapping fields in the source DB. The key thing here is to treat each and every field (column) in the source data s an independent entity. The import tools look only at a single column, they do not try to recreate your relational database beyond a few specific functions.

The recommended method is to look at each column in turn to decide if it will be imported into ARK or not. This is not as simple as it sounds so take time to analyse each column.