ARK Data Storage

From ARK
Jump to: navigation, search

This page set out some of major principles of how data is held in and handled by Ark. More specific practical issues related to the MySQL tables or the spatial tables can be found on those pages.

The two most fundamental characteristics of the Ark data model are:

  1. All data is data. Spatial data and text data are handled seamlessly
  2. Ark is modular, not all modules are inter compatible but all must be compatible with the core

Table Types

There are a number of different table types within the ark database structure.

  1. tbl - this is a standard table that is generally the focus of the subject. i.e. cxt_tbl_cxt is within the module cxt and is the table about contexts
  2. lut - this is a LookUp Table - it is generally used as a dropdown list for various items. i.e lut_findtype is a lookup table of findtypes. These tables are slowly being weeded out and replaced by attributes (see below).
  3. xmi - this is a Cross Module Itermediate table. Effectively it allows item_keys from different modules to be realted to each other. So for instance, a plan may depict many different contexts or a photo may depict different contexts AND different special finds (in-situ).

Data Classes

In Ark we only hold 6 classes of data:

  1. Text
  2. Dates
  3. Actions
  4. Actors
  5. Attributes
  6. Spans

In order to make calls to data classes don't write out sql long hand in scripts, make use of the data abstraction functions. See DB calls in ARK for details.


All text is held as fragments in the cor_tbl_txt this the items of text can have attributes attached to them such as author credits by means ot the cor_tbl_textattr


Attributes are considered to be effectively properties of a item key. This means things like the 'colour' of a context or the 'resolution' of a digital scan.

Attributes are made up of a name/value pair. Therefore the Munsell colour GLEY 7/5 is 'GLEY 7/5' of type 'Munsell Colour'. In the same way 'Excellent' is a value that can be applied to the name 'Stratigraphic Reliability'.

The structure is as follows. cor_tbl_attribute. This is the table that links the attribute with the item key. cor_lut_attribute is the table that lists the attirbutes (Excellent, Good, etc.) alongside its type id. cor_lut_attributetype is the table that elucidates the type_id so type_id 4 is 'Stratigraphic Reliability'.


Spans are a conceptual object that need to be used when a simple 'date' is not enough. There can be many different type of 'spans', these include Temperature, Percentage, Date Ranges, etc. They can also be slightly more conceptual such as the idea of the temporal vector. Effectively this is a 'span' between two contexts. A span has an explicit direction (i.e. From and To) - therefore a temporal vector has a From (context 341) and a To (context 345). The nature of the span implies the nature of the relationship between the from and to. Therefore if context 341 is cut by context 345 then the span is a 'Cut By' and the implication is that 345 CUTS 341 as well as the explicit statement 341 IS CUT BY 345.

Spans therefore have a type, a beginning and an end. A spantype has certain properties and can also limit what can be inserted in the From/To fields of the span. For instance, if the spantype is 'Percentage' the values that can be inserted in the From/To fields of the particular span have to be between 0 and 100. If the spantype is 'Years BP' then the values have to be above 0.

This all works in the database in the following way:

cor_tbl_span - holds all of the information about the instance of the span itself. It has the same structure as other cor_tbls, i.e itemvalue itemkey, etc. It also has spantype (links to cor_lut_spantype) and beg and end (the fields for the values for From and To).

cor_lut_spantype - has the alias(es) and id for the span type. It also has two special fields one is called 'meta' - this is for an explanation of the type of span, outlining what it is conceptually and also what it should and shouldn't contain. There is also a 'evaluation' field - this should contain parameters for the evalaution of whatever is put into the beg and end fields of cor_tbl_span. This evalaution will be run before any update of that table to ensure that the inputted values are valid. It is currently soncieved that evaluation will consist of a self-contained PHP script. This gives us the freedom to evlaute things as widely ranging as 'This number must be greater than 0' to 'This context code does not exist'.

For the Messenger conversation log in which this was thought up

THIS IS AN ADDITION AND MAY NOT BE IMPLEMENTED - Spans can now also be used to link together fragments of data or indeed data descriptors. This means that they can be used to build complicated hierarchies of pottery types for instance by linking parent and child types. so to create amphora->type45->type45a you would create 2 spans - one linking amphora to type45 and one linking type 45 to type 45a. This means that you can then draw hierarchical trees of data-types which can be enforced (if necessary) in the front-end but can also be searched so that if you search for an amphora your results include all of the type45as even tho they are not directly linked to amphora (just implicit link through type45).


The Ark system deals with the concept of an event abstractly. As an event can be considered to be a mixture of a date (either a date range or an absolute date) and an actor (theoretically this can be a non-human actor). Ark deals with this in the following way:

Let's take the example of issuing context number 341. We have a table that contains datetypes such as 'Issued on' for contexts. We also have a a table of action types such as 'Issued to'. Therefore the whole context number issue event is actually already stored in the database - its just not all in one table. It can however be joined by a simple SQL statement. Therefore to list all of the properties of the Context Issue Event we just have to:

 SELECT, cor_tbl_people.initials 
 FROM cor_tbl_date, cor_tbl_action, cor_tbl_people
 WHERE cor_tbl_date.itemkey = 'cxt_cd'
 AND cor_tbl_date.itemvalue = 'FBN05_341' (i.e. the desired context number)
 AND cor_tbl_date.datetype = 1 (i.e. 'issued on')
 AND cor_tbl_action.itemkey = 'cxt_cd'
 AND cor_tbl_action.itemvalue = 'FBN05_341'
 AND cor_tbl_action.actiontype = '1' (i.e. issued to')
 AND cor_tbl_action.people = (to make the link to get the actor's initials)

Spatial Data

!NB - as of v1.0 this is now deprecated, as all spatial data storage has been separated from the core data structure of ARK.

In order to dynamically build up the mapfile we include a number of different tables within the database structure (see conversation). These are the basic settings for a unlimited number of maps (i.e. sizes/outputs/etc) which is held in cor_tbl_mapfile. This includes the sections of the mapfile that are not related to any layer specific stuff, just the desired projection, size, and some path information.

As each GIS layer is modular (and we don't want the mapping to break if a module isn;t loaded) the actual layer information is organised per module. Therefore each module has a mod_tbl_maplayers. This table contains all of the different elements that go into to making the layer declaration within the mapfile (i.e. colour, outlinecolour, etc.). Also it states where the data is and what type of data it is. This is very important as different types of data have different connection strings. This links to cor_lut_maplayertype - which is a lut of all of the different types - so that the php builder script knows how to contrcut the connection.

In addition to this there is a mod_tbl_mapexpressions which contains layername, and the expressions need to query out colour and label information - should you want to colour different types of data different colours.

Finally this is done per the map id in the cor_tbl_mapfile - as for different types of map you may want different colours, or available layers, etc.

The coding notes for this are in Mapserver

How to retrieve two records for the same table and present them in one result set

Here's the SQL - this gives us the x and y coordinates of (a fasti site) and is a good example of using a nested query to get two items of the same class related to the same itemkey/itemval:

SELECT q1.fst_cd,q1.number as easting,q2.number as northing FROM 
 (SELECT a.fst_cd,b.number,b.numbertype FROM fst_tbl_fst AS a, cor_tbl_number AS b 
  WHERE a.fst_cd=b.itemvalue AND b.numbertype = 3) AS q1, 
 (SELECT a.fst_cd, b.number,b.numbertype FROM fst_tbl_fst AS a, cor_tbl_number AS b 
  WHERE a.fst_cd=b.itemvalue AND b.numbertype = 4) AS q2 
WHERE q1.fst_cd = q2.fst_cd