ARK2/Database

From ARK
Revision as of 19:20, 17 November 2016 by John Layt (talk | contribs) (Chains)

Jump to: navigation, search

DataBase

Database Abstraction

Currently, PDO is used to directly access only MySQL databases, and DB access statements are widely spread through the code base and manually coded. While PDO abstracts the connection, it doesn't abstract the SQL dialect so adding support for other databases such as Postgres or SQLite would require considerable work. It also makes migration to proper transaction support and performance improvements difficult, and is a security risk due to programmer error. A Database Abstraction Layer (DAL) can abstract away the differences in SQL between database systems, and also provide Query Builders, Schema Management, and Migration tools to address the other issues. Most are built on PDO and can seamlessly integrate with legacy code to make for an easier migration path.

Longer term, full OO code, most frameworks, and many components use an ORM to map relational data to objects. A key part of choosing a framework or component eco-system is the ORM it uses. Most ORMs however use the Active Record pattern which cannot map onto the existing ARK data model. ARK would require a Data Mapper ORM to access the legacy database structure. While using multiple ORMs would be possible, it is not recommended due to performance overhead and potential contention.

Doctrine ORM is the only PHP Data Mapper available, and is built on the Doctrine DBAL DAL. Doctrine is widely use and under active development, being the main ORM for the Symfony eco-system as well as many independent components. DBAL also provides the full set of required Drivers, Query Builder, Schema Management and Migration tools to abstract access to the required databases.

Automated database creation and upgrade will be implemented using Doctrine Migrations:

  • The core database schema will be defined in doctrine-xml (possibly initially reversed engineered from existing v1.x database). This schema will be stored in the build tools folder and will not be deployed.
  • A custom Doctrine\DBAL\Migrations\Provider\SchemaProvider class will use the doctrine-xml schema to generate the full schema for new databases.
  • A custom Doctrine\DBAL\Migrations\Provider\SchemaProvider class will use the doctrine-xml schema to generate schema diff to automatically create the Doctrine Migration class required for each version upgrade.
  • The admin console will provide the standard migration tools (diff perhaps only from build console?)
  • The ARK install script will call the doctrine tools to create the database
  • If outstanding migrations are found after an upgrade, then the site will go into maintenance mode

The creation of tables required for custom modules will be provided via API and not via the Migrations. This will be part of the data schema creation code and not the core database code.

Multiple database connections within an ARK will be supported, with the database to be used by any data model passed in using dependency injection. This will improve the code used for import, export and migration. It will also allow for separate databases for the admin (user, config, etc) and the data which may assist with AaaS and Multi-tenancy.

Database security will be tightly controlled to prevent security breaches in the AaaS model. There will be four levels of database account:

  • Database Admin - Full admin account, credentials never stored on webserver, asked for by install script
  • AaaS Admin - Limited admin account for AaaS client admin, only allowed to create new database and add required users, credentials never stored on webserver, passed in to install script by portal script
  • ARK Admin - Limited admin account for ARK instance admin (create module tables only?), created by install script, credentials never stored on server, requested from user when needed (create modules, etc)
  • ARK User - Limited user account for read/write data access only, created by install script, credentials stored on webserver

ID Generation

A number of possible strategies exist for the allocation of Item numbers, which must be performed in a way abstracted from the database server used, and allowing for the use of offline mobile devices.


Chains

Chains are a technique in ARK for storing hierarchical tree data in relation form. This is done using an adjacency table method. This is a problem in ARK2 for a number of reasons:

  • Knowledge of when data is stored in chains is held solely in the subform code that creates or reads the chain, which causes issues when the schema will need to represent the data structure without the subform
  • Chains are an internal implementation detail, their existence should not 'leak' into the schema or api for external clients, they must be free to choose their own storage solution
  • Access to chains can be slow and inefficient, especially walking down a tree when you don't know how 'wide' it is (i.e. what data fragments it has as descendants).

The data schema will not represent data as chains, trees or graphs. Instead the schema will merely represent the inherent hierarchical structure of the data using groups/objects/lists. The data persistence layer will know to persist those repeating groups in its model. In the case of the current ARK SQL database this means repeating groups will need to be stored as chains.

A number of other techniques exist to make reading/writing of trees faster, such as Nested Sets. The Closure Table technique has been chosen for a number of reasons:

  • Is a simple extension to the existing Adjacency Table method
  • Is fast to both read and write
  • An entire tree can be read with a single SQL query
  • Supports storing DAGs, i.e. Harris Matrix

The proposed implementation will be:

  • Chains may be renamed as Graphs?
  • Nodes and Edges are stored separately
  • The Nodes in a Graph are either Items or Fragments, but not both
  • The Nodes continue to be stored in their current tables
  • Edges are stored in a new ark_data_graph table
  • Item Graphs store their itemkey/itemvalue Edges in the ark_data_graph table
  • Fragment Graphs store their table/id Edges in the ark_data_graph table
  • Descendent Fragment Nodes will no longer be keyed by their direct parent table/id, but will instead be keyed by their root itemkey/itemvalue. This allows faster access to all nodes, and means the nodes never need to be updated whenever the graph is altered.
  • Root Fragment Nodes may need to carry a flag to indicate they are a root, otherwise a lookup is required every time on the graph table.

New code will be needed to create and maintain graphs in place of the current chain code. Migration code will be required to build the graph for existing data and update the nodes.

References: