Difference between revisions of "ARK2"

From ARK
Jump to: navigation, search
(Development Environment)
(Aims)
 
(84 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
 
The primary aims of ARK2 are:
 
The primary aims of ARK2 are:
 +
* Full code re-write to modern standards using modern components
 
* Separate the ARK Database backend from the ARK Web frontend
 
* Separate the ARK Database backend from the ARK Web frontend
* Implement a modern RESTful API to allow other frontends and apps to access and update the ARK Database
+
* Implement a modern REST API to allow other frontends and apps to access and update the ARK Database
 
* Simplify the setup and configuration of ARK by moving the config into the database and providing online config tools
 
* Simplify the setup and configuration of ARK by moving the config into the database and providing online config tools
 
* Improve the overall performance and data integrity of ARK
 
* Improve the overall performance and data integrity of ARK
 
* Make it possible to provide an ARK hosting service
 
* Make it possible to provide an ARK hosting service
 
=== Features ===
 
  
 
Modern frontend
 
Modern frontend
 
* HTML5
 
* HTML5
* Bootstrap 3 based
+
* Bootstrap based
 
* Twig templates
 
* Twig templates
 +
* Front Controller model
 +
* Config driven pages views
  
RESTful API
+
Modern backend
* Modern RESTful API to access and update all ARK data
+
* Full REST API to access and update all ARK data
 
+
* Database abstraction and Object Relational Mapping
Front Controller model
+
* Config driven data schema
* URL paths independent from source code paths for greater security and flexibility
+
* Controlled Vocabularies and Linked Open Data
* Most pages generated using common page layout code from config and data stored in database
+
* User Authentication via internal user/password and external OAuth2 providers (Facebook, Google, etc)
* Page roles allow for switching of generated page based on user role, module, etc
+
* User Authorisation via Role Based Access Control (RBAC) using hierarchical Roles and Permissions structure
* Local custom pages separated from core source and configurable by page role
+
* Field-level data access control
 
+
* Data Workflows in conjunction with User Authorisation control
User Authentication
 
* Token based
 
* Internal Authentication via password
 
* External Authentication via OAuth2 providers (Facebook, Google, etc)
 
 
 
User Authorisation
 
* Role Based Access Control (RBAC) using hierarchical Roles and Permissions structure
 
 
 
== Design ==
 
 
 
High level design decisions for ARK2.
 
 
 
=== Technical Standards ===
 
 
 
ARK will only actively support platforms that are actively supported by their maintainers. ARK may work on earlier versions but this is not guaranteed.
 
 
 
* HTML5 will be used
 
* Browser support restricted to those supported by Bootstrap 3
 
* PHP: A minimum of v5.6 will be supported (5.6 is in Security Support, 5.7 in active support, see http://php.net/supported-versions.php), v7 will be supported.
 
* MySQL/MariaDB v5.5 (lowest supported MySQL)
 
* PostgreSQL and SQLite will be provided for using a database abstraction layer, but not initially not officially supported
 
* mod_rewrite will be required
 
* All files will be UTF8 using UNIX LF
 
 
 
=== Development Standards ===
 
 
 
The [http://www.php-fig.org/psr/ PHP-FIG standards]  will be used:
 
* PSR-1 and PSR-2 Coding Standards
 
* PSR-3 Logging Interface for interchangeable logging objects
 
* PSR-4 Auto-Loading Standard
 
* PSR-7 HTTP Message Interface for interchangeable Request/Response objects
 
 
 
PSR-3 and PSR-7 allow mixing and matching of component libraries from different vendors, and supports future-proofing by allowing switching between libraries with minimal code changes.
 
 
 
PSR-4 will be used for packaging, namespace and auto-loading of OO code. A good series of articles explaining PSR-4 and modern development and packaging in general can be found at the following:
 
* http://culttt.com/2013/01/07/what-is-php-composer/
 
* http://culttt.com/2014/03/12/build-php-package/
 
* http://culttt.com/2014/05/07/create-psr-4-php-package/
 
 
 
In consequence:
 
* [https://getcomposer.org/ Composer] will be required for dependency management and PSR-4 auto-loading
 
* All new external libraries will be installed by Composer under vendor/ and not libs/
 
* All new OO classes will be namespaced under LPArchaeology\ARK\
 
* All new OO code will be under src/ and not php/ (this will also clearly separate new code from old)
 
 
 
Components will be carefully chosen to be well supported, stable, and interchangeable wherever possible.
 
 
 
=== Database Abstraction ===
 
 
 
Currently, PDO is used to directly access only MySQL databases, and DB access statements are widely spread through the code base and manually coded. While PDO abstracts the connection, it doesn't abstract the SQL dialect so adding support for other databases such as Postgres or SQLite would require considerable work. It also makes migration to proper transaction support and performance improvements difficult, and is a security risk due to programmer error. A Database Abstraction Layer (DAL) can abstract away the differences in SQL between database systems, and also provide Query Builders, Schema Management, and Migration tools to address the other issues. Most are built on PDO and can seamlessly integrate with legacy code to make for an easier migration path.
 
 
 
Longer term, full OO code, most frameworks, and many components use an ORM to map relational data to objects. A key part of choosing a framework or component eco-system is the ORM it uses. Most ORMs however use the Active Record pattern which cannot map onto the existing ARK data model. ARK would require a Data Mapper ORM to access the legacy database structure. While using multiple ORMs would be possible, it is not recommended due to performance overhead and potential contention.
 
 
 
[http://www.doctrine-project.org/ Doctrine ORM] is the only PHP Data Mapper available, and is built on the [http://doctrine-orm.readthedocs.io/projects/doctrine-dbal/en/latest/ Doctrine DBAL] DAL. Doctrine is widely use and under active development, being the main ORM for the Symfony eco-system as well as many independent components. DBAL also provides the full set of required Drivers, Query Builder, Schema Management and Migration tools to abstract access to the required databases.
 
 
 
Automated database creation and upgrade will be implemented using Doctrine Migrations:
 
* The core database schema will be defined in [http://andrewembler.com/2015/05/describe-parse-and-utilize-database-schemas-doctrine-xml/ doctrine-xml] (possibly initially reversed engineered from existing v1.x database). This schema will be stored in the build tools folder and will not be deployed.
 
* A custom Doctrine\DBAL\Migrations\Provider\SchemaProvider class will use the doctrine-xml schema to generate the full schema for new databases.
 
* A custom Doctrine\DBAL\Migrations\Provider\SchemaProvider class will use the doctrine-xml schema to generate schema diff to automatically create the Doctrine Migration class required for each version upgrade.
 
* The admin console will provide the standard migration tools (diff perhaps only from build console?)
 
* The ARK install script will call the doctrine tools to create the database
 
* If outstanding migrations are found after an upgrade, then the site will go into maintenance mode
 
 
 
The creation of tables required for custom modules will be provided via API and not via the Migrations. This will be part of the data schema creation code and not the core database code.
 
 
 
Multiple database connections within an ARK will be supported, with the database to be used by any data model passed in using dependency injection. This will improve the code used for import, export and migration. It will also allow for separate databases for the admin (user, config, etc) and the data which may assist with AaaS and Multi-tenancy.
 
 
 
Database security will be tightly controlled to prevent security breaches in the AaaS model. There will be four levels of database account:
 
* Database Admin - Full admin account, credentials never stored on webserver, asked for by install script
 
* AaaS Admin - Limited admin account for AaaS client admin, only allowed to create new database and add required users, credentials never stored on webserver, passed in to install script by portal script
 
* ARK Admin - Limited admin account for ARK instance admin (create module tables only?), created by install script, credentials never stored on server, requested from user when needed (create modules, etc)
 
* ARK User - Limited user account for read/write data access only, created by install script, credentials stored on webserver
 
 
 
=== Multi-Tenancy / Multi-Site / Multi-Config ===
 
 
 
A number of architectural issues surround Multi-Tenancy, Multi-Site and Multi-Config in an ARK instance. These primarily affect how a hosted ARK service will be run, but also how a standalone organisation will manage their ARK instances.
 
 
 
* An ARK instance is here defined as a combination of ARK users and the ARK site data they are able to access, usually under a single project/brand/organisation.
 
* A database is defined as a combination of a database user and the tables it can access, not the database server instance which can hold multiple database.
 
* Multi-tenancy is the ability to have multiple ARK instances in a single ARK install.
 
* Multi-site is the ability to have multiple sites within an ARK instance.
 
* Mulit-config is the ability to have multiple ARK schemas within an ARK instance, i.e. different sites having a different config.
 
 
 
Choosing an architecture involves a series of trade-offs around ease-of-development versus ease-of-maintenance. The simplest solution is the current structure, where an ARK instance has a single tenant with a single config across multiple sites. There are problems with this however:
 
* Each instance requires a separate code install, database and URL
 
* If a single organisation wants multiple ARK schemas (say trench-based rural and a full urban SCR) they must run separate ARK instances for each schema, meaning users must remember which instance has which sites and maintain separate user IDs, and the apps using the API must know this as well.
 
* Making significant upgrades to an organisation's config requires a separate ARK install
 
* Scaling up to 100's of instances creates 100's of installs and 100's of databases which will make support difficult and expensive even with automation
 
 
 
At the opposite extreme is an architecture where a single ARK install supports multiple tenants, sites and configs in a single database. While this solves the above issues by greatly simplifying maintenance there are a number of issues here too:
 
* Code and SQL is significantly more complicated, joins especially become difficult
 
* Key bloat on all tables as fields required for tenant and site which may affect performance
 
* Table bloat with all data being in a single set of tables which may affect performance
 
* Back up and archive is an issue as the data for different tenants needs to be separated, probably requiring custom code instead of standard tools
 
* Security is an issue with data access control now occurring in the app code
 
* A single tenant can overload the server and take all tenants down
 
* Distributing load across servers becomes difficult if not impossible
 
* Upgrading an install means all site configs must be upgraded too, you cannot leave a site on an old version
 
* Existing code and data would make ARK1 migration far more complex
 
 
 
A half-way house model would be to allow a single install to have multiple tenants, but each tenant has its own database instance:
 
* A simple key structure is kept, keeping the code simple
 
* Each tenants data is kept separate, solving the size, security and backup issues
 
* Load can be easily distributed by moving a tenant to another server by simply moving their database and/or redirecting their url
 
* Code maintenance is kept simple, but database management becomes more complex again
 
* Upgrading an install will still require upgrading all sites
 
 
 
Note: A practical limitation is imposed by MySQL and SQLite support which only allow a single 'namespace' per database, unlike PostgreSQL and others which support multiple 'namespaces' which would allow each tenant to have separate sets of tables within the same database.
 
 
 
The strongest case can be made for supporting Multi-Config, primarily as a a means of allowing larger clients to host all their data inside a single install with a single set of users (including LP ourselves). This has several implications however:
 
* It raises Site Code from an attribute of an item in a module, to being a key at a higher level than the modules themselves, i.e the modules available will change depending on the Site Code
 
* As a consequence it substantially changes the api to add the site code above the module
 
* It may make searching across site codes difficult
 
* ???
 
 
 
The full combination would allow a hosted ARK solution as follows:
 
* Lowest price tier (£5) / mass market / community dig type sites are hosted in a single multi-tenant install, only allowed a single site/config, may not allow own domain?
 
* Upgrade to lowest tier (£10) still in single multi-tenet install, but allowed say 5 sites/configs, maybe allow own domain?
 
* Next tier(s) (£15/£20/£25?) gives separate install, probably in own virtual host, own domain, with unlimited sites/configs?
 
* Possible top-tier for large-scale sites with guaranteed support contract
 
 
 
This would keep the maintenance burden on the lowest-profit sites to a minimum, while encouraging up-sells as and when needed.
 
 
 
Install management could be simplified by developing a set of built-in tools.
 
* Installs using git, run git pull to upgrade
 
* Doctrine migrations enable auto data updates
 
* Auto-check function for new releases and notify admin
 
* Admin panel to put site into maintenance mode, run code update, run data update
 
 
 
Splitting database roles may assist in this:
 
* User database - allows Multi-tenant to choose if shared users for all/some tenants, or any tenant to have own users
 
* Config database - The ARK configuration, schemas, forms, etc, allows multi-tenant to share all configs with all/some tenants, or any tenant to have own set
 
* Data database - The ARK data, each tenant will have their own database
 
 
 
The framework will manage three separate database connection variables, but where the database roles are shared by a database instance then the connection objects will be the same.
 
 
 
=== Framework ===
 
 
 
It is proposed to implement a new RESTful Request/Route/Response skeleton using a Front Controller model and token-based security, based on an external micro-framework and components adhering to the PSR standards and managed via Composer. This will reduce the amount of code maintained internally, update the code-base to modern web-app design principals, and provide a degree of future-proofing by allowing switching of components.
 
 
 
Choosing a full framework such as Symfony or Zend at this point would force refactoring all of the model and view code at the same time, but by initially building our own light-weight controller framework using PSR-compliant components we can migrate the model and view later. Once all parts are migrated, a full framework could be considered if required. A full framework would also impose a heavy overhead and steeper learning curve, albeit with less code required to be written.
 
 
 
The ARK root folder will contain only the index.php file which will act as a dispatcher, receiving all Requests, matching the Route and dispatching them to the correct Controller. Each ARK page type and the api will have a Controller to read the model and construct the view before returning the Response. This will allow future flexibility for new request formats while still being able to support persistent legacy links. It will also allow for database config and user auth driven routing, e.g. one install may only expose the RESTful API, while others may only expose read-only pages.
 
 
 
A number of criteria will be applied in selection:
 
* Must be standards compliant
 
* Must be well supported with a solid development history
 
* Must be well documented
 
* Must be widely used and supported
 
* Must have a strong community, small one-person efforts will only be considered if they are the de-facto standard
 
* Any database access must use Doctrine DBAL or ORM
 
 
 
Options for micro-frameworks or component eco-systems include:
 
* [http://www.slimframework.com/ Slim] - PSR7 based with minimal features, requires integrating more external components, but more flexible and future-proof
 
* [http://silex.sensiolabs.org/ Silex] - HTTPFoundation based and built on [http://symfony.com/ Symphony] components, far less work to start with but less flexible as a result
 
* [https://zendframework.github.io/zend-expressive/ Zend Expressive] or components joined by [https://zendframework.github.io/zend-stratigility/ Zend Stratigility] - PSR7 based, falls between other two in terms of effort, but limited in choice of components currently integrated
 
 
 
Frameworks or User Management skeletons considered but rejected include:
 
* Zend [https://apigility.org/ APIgility] - Automated API generation built on Zend2
 
* [https://lumen.laravel.com/ Lumen] based on [https://www.laravel.com/ Laravel] components - requires an Active Record ORM
 
* [http://usercake.com/ UserCake] - Very basic user management skeleton, no repo, not worth looking at
 
* [http://www.userapplepie.com/ User Apple Pie] - a UserCake fork using own Nova Framework, probably support issues
 
* [http://www.userfrosting.com/ User frosting], a UserCake fork with RBAC user management, using Slim2, SBAdmin2, use for ideas
 
 
 
Significant and reliable sources of components include:
 
* [https://packagist.org/ Packagist], the Composer index
 
* [http://symfony.com/components Symfony]
 
* [http://thephpleague.com/#packages PHP League]
 
* [https://zendframework.github.io/ Zend]
 
* [http://docs.sylius.org/en/latest/components/index.html Sylius], components from an e-commerce platform based on Symfony
 
* [http://auraphp.com/ Aura]
 
 
 
=== Security ===
 
 
 
ARK currently uses PEAR LiveUser for user authentication and authorisation, but this hasn't been updated since 2010. It is a security risk, and also lacks many features like federated login. The ARK API currently uses plain text user and password in the request URL which is insecure. ARK2 will require a new security solution, especially for the API calls from client apps.
 
 
 
Requirements
 
* User Authentication
 
** Token-based
 
** Local user database for stand-alone/internal use
 
** Via OAuth and OpenID authentication services (Google, Facebook, etc)
 
* User Authorisation
 
** Role-Based Access Control (RBAC) model based on Users/Roles/Permissions
 
* API authentication via token and secure login
 
** HTTPS will be required
 
** Use LetsEncrypt to obtain SSL certificates
 
* Anonymous/Unauthenticated User access as optional Role for both Web and API
 
* A migration path from LiveUser must be provided.
 
 
 
Any solution chosen will work best when integrated with the other framework components chosen and should be implemented in parallel as it is highly dependent on the Request/Response/Routing/Session components used.
 
 
 
The Symfony Framework provides a very powerful Security component, but not a simple all-in-one solution meeting our requirements. Combining a number of external components may be able to meet our requirements, at the cost of more custom code required.
 
* Use Symfony\Security\Guard to manage the Authentication process
 
* Use League\OAuth2-Client or Opauth or HWIOAuthBundle for external OAuth2 authentication
 
* Use League\OAuth2-Server or FOSOAuthServerBundle for OAuth2 server for API
 
* Use Sylius\RBAC or FOSUserBundle for User/Role management
 
 
 
The combination of HWIOAuthBundle / FOSOAuthServerBundle / FOSUserBundle is widely supported and more 'native' to Symfony, but requires the use of the full framework, bundles, Doctrine ORM, and YAML-based config. The alternatives are built as stand-alone interoperable PSR components and will provide greater future flexibility and a gentler migration path, but will require more work to integrate.
 
 
 
Alternatives such as Sentinal which provides all the required features in a single integrated component would require choosing a different component ecosystem, such as Laravel.
 
 
 
Possible packages:
 
* [https://cartalyst.com/manual/sentinel/2.0 Sentinel] - Full combined package, but requires Laravel ORM, extensions like OAuth are for-pay
 
* Sylius [http://docs.sylius.org/en/latest/components/Rbac/index.html RBAC]
 
* PHP League [http://oauth2-client.thephpleague.com/ OAuth2 Client] and [http://oauth2.thephpleague.com/ OAuth2 Server] (PSR7 based)
 
* https://github.com/knpuniversity/oauth2-client-bundle uses PHP League OAuth2 Client, requires framwork but fork?
 
* https://github.com/gigablah/silex-oauth uses other OAuth library but crib code?
 
 
 
OAuth2 Servers:
 
* PHP League [http://oauth2.thephpleague.com/ OAuth2 Server] - PSR7 based
 
* Friends of Symfony [https://github.com/FriendsOfSymfony/oauth2-php OAuth2 Server] - HTTPFoundation based
 
* [https://github.com/bshaffer/oauth2-server-php bshaffer OAuth2 server] - Has HTTPFoundation wrapper, example Silex app
 
 
 
Stuff:
 
* http://benedictroeser.de/2015/12/using-symfony-guard-component-without-the-whole-framework/
 
* https://github.com/chrootLogin/silex-userprovider/tree/nextgen
 
* http://www.jasongrimes.org/2014/09/simple-user-management-in-silex/
 
* http://loige.co/symfony-security-authentication-made-simple
 
 
 
==== Solution ====
 
 
 
Chosen components:
 
 
 
* User Manager: https://github.com/chrootLogin/silex-userprovider ported to Silex2 and upgraded
 
* RBAC: [http://docs.sylius.org/en/latest/components/Rbac/index.html Sylius RBAC] with custom Voter
 
* OAuth2 Client: TBC
 
* OAuth2 Server: TBC
 
 
 
User Manager modifications needed:
 
* Silex2 port
 
* Admin settings
 
* Add user screen
 
* Invite emails
 
* Guard?
 
* Console add user optional role
 
* Console enable/disable user
 
 
 
RBAC:
 
* Service provider
 
* Voter
 
* Forms
 
* User Manager override classes
 
 
 
OAuth:
 
* ???
 
 
 
Defined Roles:
 
* System Admin - Admin rights for system install, i.e. config, etc. Not inherited.
 
* ARK Admin - Admin rights for ARK instance, i.e. users, etc. Inherits Supervisor, User.
 
* ARK Supervisor - Site supervisor rights, i.e. checking, mod changes, etc.
 
* ARK User - General user rights, i.e. data entry.
 
* Anon User
 
 
 
=== REST / HATEOAS / Hypermedia ===
 
 
 
An evolution of the ARK data model and API to try realise the [http://ark.lparchaeology.com/hypertext/ full ARK vision] will be based arround the Hypermedia concepts of [https://en.wikipedia.org/wiki/Representational_state_transfer REST] and [https://en.wikipedia.org/wiki/HATEOAS HATEOAS] as developed by Roy Fielding in his doctoral thesis in 2000. These concepts include resources, relationships, state, and discoverability, and are closely related to the Semantic Web. ARK modules will evolve to represent Resources that can be linked in together in the current flat relationship structure, or organised into configurable hierarchies such as the default Site/Module/Item mostly used by ARK instances. These concepts will be most easily exposed through a RESTful API.
 
 
 
A RESTful API will be implemented using best practices which are outlined in the following articles:
 
* http://www.vinaysahni.com/best-practices-for-a-pragmatic-restful-api
 
* http://blogs.mulesoft.com/dev/api-best-practices-series-intro/
 
 
 
In particular, the following rules will be applied:
 
* Full level 4 [http://timelessrepo.com/haters-gonna-hateoas HATEOAS REST implementation]
 
* JSON will be the only format supported
 
* The [http://jsonapi.org/ JSON API] standard will be used to construct the response body, with [http://json-schema.org/ JSON Schema] standard defining the structure of the data payload.
 
* API versioning will be used to version the resource path structure, error messaging, and other API infrastructure. The actual data formats will be controlled by the JSON schema which will be available via a standard end-point.
 
* Authenticated access will only be available using HTTPS, API tokens, and OAuth2
 
* Read-only unauthenticated unencrypted access will be supported only if explicitly enabled
 
* Translation keys will be used, with the client downloading a translation catalogue for their required language
 
 
 
The general API URL structure will be as follows:
 
* /api/v2/sites/<site>/<module>/<item>
 
 
 
e.g. for Context MNO12_1 the resource will be www.lparchaeology.com/ark/api/v2/sites/MNO12/contexts/1
 
 
 
A question remains over the structure for hosted ARK installs:
 
* www.arkhive.org/api/v2/arks/lamas/sites/ABC12/cxt/1 - Allows easy exposure of all instances for LD consumption
 
* lamas.arkhive.org/api/v2/sites/MNO12/cxt/1 - Easier for migration to stand-alone ARK, more 'private'
 
* www.arkhive.org/lamas/api/v2/sites/MNO12/cxt/1 - As for 2.
 
 
 
The following HTTP actions will be supported:
 
* GET - fetch resource
 
* POST - insert new resource with next available id, i.e. insert a new item with next item number
 
* PUT - insert or update resource with a specified id, i.e. insert a new item or update an existing item with a set item number
 
* PATCH - update part of a resource, i.e. update a single field or group
 
* DELETE - delete a resource
 
* OPTIONS - What HTTP verbs the current authenticated API user can perform on a resource
 
 
 
The following other endpoints will be supported:
 
* /filters - The global saved filters
 
* /filters/123 - The filter definition
 
* /filters/123/items - The filter result set
 
* /users - The users
 
* /users/jlayt - The user details
 
* /users/jlayt/filters - returns the the list of user filters, etc as per filters endpoint
 
* /actors - The global actors
 
* /actors/123 - The actor details
 
 
 
The following query values will be supported:
 
* ?field1=value1&field2=value2 - basic search in fields (use /filters for advanced search)
 
* ?sort=field1,field2 - sorts results by fields
 
* ?fields=field1,field2 - return selected fields
 
* ?page=3&per_page=100 - pagination of results
 
* ?q=text - free-text search
 
 
 
Notes:
 
*Updating a resource will require some kind of timestamp or last update key to prevent overwriting subsequent changes
 
* All security / OPTIONS / anon access will be controlled by user roles
 
 
 
JSO Schema Implementations:
 
* JSON Schema validation via PHP League (simple, complete, extendable) or Justin Rainbow (most widely used)
 
 
 
JSON-API Implementations:
 
* http://fractal.thephpleague.com/ PHP League Fractal library to serialize as JSON-API - nice, multi-format, but missing some needed features
 
* https://github.com/woohoolabs/yin - Very nice but PSR-7 and JSON Schema validation uses Rainbow, 5 contrib, 29 releases, last update 2 weeks
 
* https://github.com/tobscure/json-api - Good, clean, complete standard, agnostic, but no extras/schema/header handling. 17 contrib, 5 releases, last update 1 month, 2nd popularity rank
 
* https://github.com/neomerx/json-api - 7 contribs, 42 releases, last update 1 week
 
* https://github.com/nilportugues/php-json-api - 8 contribs, 63 releases, last update 3 weeks
 
* https://github.com/nilportugues/symfony-jsonapi - Symfony 2 bundle, last update 2 weeks
 
* https://github.com/mauro-moreno/silex-jsonapi - Silex provider for nilportugues, ast update 2 months
 
* https://github.com/lode/jsonapi - Simple, 2 contrib, 10 releasses, last update 3 weeks
 
* PHP Client: https://github.com/Art4/json-api-client
 
 
 
Not considered:
 
* https://github.com/matthiasbayer/json-api - 1 contrib, last update 1 year
 
* https://github.com/matthiasbayer/BayerJsonApiBundle - Symfony2 bundle,1 contrib, last update 1 year
 
* https://github.com/rstgroup/json-api-php - 3 contrib, last update 2 years
 
* https://github.com/GoIntegro/hateoas deep symfony integration, would be ideal but is driven by Doctrine ORM map and RAML
 
 
 
=== Frontend ===
 
 
 
The frontend will be migrated to [http://getbootstrap.com/ Bootstrap], [https://jquery.com/ jQuery], and [http://twig.sensiolabs.org/ Twig], the most popular and well-supported frontend ui component and template systems. This will allow for easier customisation of ARK's appearance by third parties.
 
 
 
Bootstrap 3 supports both [http://lesscss.org/ Less] and [http://sass-lang.com/ Sass] templates to generate the Bootstrap CSS. Customising the appearance of Bootstrap (such as colour) usually requires modifying template variables and rebuilding the CSS. Bootstrap 4 (currently in alpha) switches to only using SASS for its templates. We should therefore choose to use the SASS version of Bootstrap 3 when building our own custom version of Bootstrap. Build tools will be provided to automate the customisation process.
 
 
 
The use of Twig templates for page layout will help separate the model and view code and allow third parties to easily modify the layout without having to alter the core code. Each Twig template will document the API contract it has with the data model, i.e. what variables are available to be used in the template.
 
 
 
The use of the Silex/Symfony Forms module will be considered. This provides dynamic form generation and validation with a Bootstrap theme.
 
 
 
There will be separation between the ARK Admin frontend and the ARK Web frontend. The required ARK Admin frontend will be static and consistent across all ARKs, but can be modified for site specific requirements if needed (i.e. adding extra user data fields). The optional ARK Web frontend will be the dynamic generated data-driven side, configurable for every ARK. This separation will allow for ARK to run as a pure database/API backend server with basic admin and auth frontend provided without the user having to configure or enable any of the web frontend.
 
 
 
The ARK Admin frontend will provide the core UI elements for the site, i.e. the Nav Bar and Nav Menu. An initial template will be inspired by [http://startbootstrap.com/template-overviews/sb-admin-2/ SB Admin 2] (Test [http://blackrockdigital.github.io/startbootstrap-sb-admin-2/pages/index.html here]) and [https://almsaeedstudio.com/ AdminLTE] (Test [https://almsaeedstudio.com/preview here]), but greatly simplified and converted to Twig templates.
 
 
 
* http://www.helloerik.com/the-subtle-magic-behind-why-the-bootstrap-3-grid-works
 
 
 
=== Migration ===
 
 
 
A migration process from ARK 1 to ARK 2 will be provided.
 
 
 
Data migration. Existing tables will need to change from MyISAM to InnoDB. Change in place carries a degree of risk of data loss if the migration fails part way. Attempting to restart failed migrations is also prone to error. To protect users data, a new database will be created with new tables and the data copied across. Should migration fail users will easily be able to roll back to their old install, or keep retrying the migration until it does succeed. In effect the ARK init script will be run, followed by the migration script.
 
 
 
User migration. Users will be migrated from LiveUser to the new RBAC system. This will require a compatible default user config.
 
 
 
Config migration. A config migration script will be provided, but may require adapting for individual ARKs.
 
 
 
=== Build Tooling ===
 
 
 
Build tooling is required for a number of reasons:
 
* Bootstrap can be customised most easily by changing variables used in the Sass templates, which then requires a build step to compile them into CSS
 
* Production deployment is more efficient if CSS and JS is stripped, merged and minified, while development is easier if a map is generated for the original code
 
* Bower component management downloads the entire package, not just the assets required, an extra step is required to copy just the required assets into the web root folder
 
* All the steps required for packaging and release management can be automated, e.g. clean, compile, tag, package, etc
 
* The build tooling for the default ARK bootstrap and twig theme can be generalised to allow clients to build and deploy their own customised themes with minimal effort
 
 
 
The build tooling will be as follows:
 
* All build tooling will be isolated in the /build/ folder and will be excluded from any release packages or production deployments
 
* Nothing in the /build/ folder may be depended on by any code outside the /build/ folder or required for running ARK itself
 
* [https://nodejs.org/en/ Node], [https://www.npmjs.com/ npm], [http://bower.io/ Bower] and [http://gulpjs.com/ Gulp] will be used to run the tooling (Bower requires Node/npm to be installed, so we may as well use its full power)
 
* Tooling should be cross-platform (Gulp provides this as opposed to bash scripts)
 
* Gulp will not be required as a global install, instead tasks will be aliased through Node scripts, e.g. 'npm run build' will call 'gulp build'
 
* Running tasks will only work inside the /build/ folder, trying to run outside the build folder should fail gracefully
 
 
 
=== Sysadmin Console ===
 
 
 
An sysadmin console will be provided for use on the command line. This will provide a number of tools:
 
* Database administration, such as creation, migration, backup, etc
 
* MultiArk installs
 
* ARK wide alerts
 
* Maintenance mode (immediate and schedule)
 
* Upgrade tools
 
* etc
 
 
 
Equivalents for some of these functions will be provided in an ARK Sysadmin panel (separate to the ARK Admin panel):
 
* ARK wide alerts
 
* Maintenance mode (immediate and schedule)
 
* etc
 
 
 
=== File Structure ===
 
 
 
The following file structure will be used, based on the default Silex and Composer structure.
 
* The web root will be in /web/
 
* The ARK source code will be in /src/ organised by source type (php, js)
 
* Composer installs external PHP packages into /vendor/
 
* NPM installs Node packages into /build/node_packages/
 
* Bower will be configured to install external packages into /build/vendor/
 
* The ARK and custom theme assets will be in /build/themes/<name>/ organised by source type (js, sass, etc)
 
* Compiled theme bundles will be written into /web/themes/<name>/
 
* Custom code will be in ???
 
* Packaging for release will not include the /build/ or /test/ directories
 
 
 
/
 
|- .gitignore
 
|- composer.json
 
|- composer.lock
 
|- .git/
 
|- arks/
 
  |- arks.json
 
  |- <ark>/
 
    |- config/
 
      |- database.json
 
    |- data/
 
    |- schema/
 
      |- data-schema.json
 
    |- themes/ ???
 
|- bin/
 
|- build/
 
  |- .bowerrc
 
  |- bower.json
 
  |- gulpfile.js
 
  |- packages.json
 
  |- assets/
 
    |- <name>/
 
      |- css/
 
      |- fonts/
 
      |- img/
 
      |- js/
 
      |- less/
 
      |- scss/
 
      |- twig/
 
  |- node_packages/
 
  |- schema/
 
      |- conf.xml
 
      |- core.xml
 
      |- spatial.xml
 
      |- user.xml
 
      |- <data-schema>.json
 
  |- vendor/
 
|- l10n/
 
|- src/
 
  |- js/
 
  |- php/
 
|- vendor/
 
|- tests/
 
|- var/
 
  |- cache/
 
  |- logs/
 
|- web/
 
  |- index.php
 
  |- fonts/
 
  |- themes/
 
    |- <name>/
 
      |- styles/
 
      |- images/
 
      |- scripts/
 
      |- templates/
 
 
 
=== Translations / Localisation ===
 
 
 
A key to providing Ark-As-A-Service will be translating the user interface and schemas into as many languages as possible to maximise the potential user base. ARK does not have the tools to make this process easy to perform or manage, and it would be a waste of resources to build them. It is recommended to use one of the existing online open-source translation projects to crowd-source the translations. This will allow interested parties and potential clients to translate ARK for themselves and to grow a local community to support ARK in their country.
 
 
 
Changes will be made to the translation process to bring ARK into line with industry best practices and tooling, allowing for common features such as correct plural forms.
 
* Translations would be performed using the Symfony Translation package
 
* Translations will be stored and transmitted in the XLIFF file format
 
* Translations will use keys (similar to current, but expanded)
 
* Translations will be split in a number of catalogues:
 
** Core interface
 
** User interface
 
** Admin interface
 
** Schema (one for each created)
 
 
 
Inparticular
 
* Markup table will be dropped entirely
 
* Aliases will be kept as table-driven translations using a custom catalogue loader
 
* A custom Address Book catalogue loader will be used to provide names via the translation system
 
 
 
 
 
Potential online portals include:
 
* [https://www.transifex.com Transifex] - Market leader, but now closed source, offers free hosting to open source projects, large community, great features and automated workflow, API, Github integration, etc.
 
* [http://zanata.org/ Zanata] - A Red Hat open-source project (in response to Transifex going closed) with free hosting and large existing community, or can be self-hosted (JBoss based), great features and automated workflow, API, Github integration, etc. No RTL?
 
* [https://weblate.org/en/features/ Weblate] - An open-source project with free and paid-for hosting, but no central community, or can be self-hosted, good features, API, Github integration, etc. Django based.
 
* [http://translationproject.org/html/welcome.html Translation Project] - A FSF open-source project with free hosting and large existing community, but very basic features and manual workflow.
 
* [http://pootle.translatehouse.org/index.html Pootle] - An open-source project, self-hosted, Django based
 
 
 
Zanata would seem the preferred option for a hosted service, but Weblate might be better if self-hosting.
 
 
 
A number of options exist for automating extraction of Symfony translations, or for allowing interactive editing inside the admin panel or profiler panel:
 
* https://jolicode.com/blog/translation-workflow-with-symfony2
 
* https://github.com/deanc/silex-web-translator
 
* https://github.com/Aecf/TranslatorToolBundle
 
* https://github.com/instaclick/TranslationEditorBundle
 
* https://github.com/lexik/LexikTranslationBundle
 
* https://github.com/manuelj555/ManuelTranslationBundle
 
 
 
None of these quite match our requirements, but may be adapted to achieve our workflow.
 
 
 
=== Chains ===
 
 
 
Chains are a technique in ARK for storing hierarchical tree data in relation form. This is done using an adjacency table method. This is a problem in ARK2 for a number of reasons:
 
* Knowledge of when data is stored in chains is held solely in the subform code that creates or reads the chain, which causes issues when the schema will need to represent the data structure without the subform
 
* Chains are an internal implementation detail, their existence should not 'leak' into the schema or api for external clients, they must be free to choose their own storage solution
 
* Access to chains can be slow and inefficient, especially walking down a tree when you don't know how 'wide' it is (i.e. what data fragments it has as descendants).
 
 
 
The data schema will not represent data as chains, trees or graphs. Instead the schema will merely represent the inherent hierarchical structure of the data using groups/objects/lists. The data persistence layer will know to persist those repeating groups in its model. In the case of the current ARK SQL database this means repeating groups will need to be stored as chains.
 
 
 
A number of other techniques exist to make reading/writing of trees faster, such as Nested Sets. The Closure Table technique has been chosen for a number of reasons:
 
* Is a simple extension to the existing Adjacency Table method
 
* Is fast to both read and write
 
* An entire tree can be read with a single SQL query
 
* Supports storing DAGs, i.e. Harris Matrix
 
 
 
The proposed implementation will be:
 
* Chains may be renamed as Graphs?
 
* Nodes and Edges are stored separately
 
* The Nodes in a Graph are either Items or Fragments, but not both
 
* The Nodes continue to be stored in their current tables
 
* Edges are stored in a new ark_data_graph table
 
* Item Graphs store their itemkey/itemvalue Edges in the ark_data_graph table
 
* Fragment Graphs store their table/id Edges in the ark_data_graph table
 
* Descendent Fragment Nodes will no longer be keyed by their direct parent table/id, but will instead be keyed by their root itemkey/itemvalue. This allows faster access to all nodes, and means the nodes never need to be updated whenever the graph is altered.
 
* Root Fragment Nodes may need to carry a flag to indicate they are a root, otherwise a lookup is required every time on the graph table.
 
 
 
New code will be needed to create and maintain graphs in place of the current chain code. Migration code will be required to build the graph for existing data and update the nodes.
 
 
 
References:
 
* SQL Anti-patterns book
 
* http://people.apache.org/~dongsheng/horak/100309_dag_structures_sql.pdf
 
* http://timwi.blogspot.co.uk/2010/03/query-tree-structures-and-dags-in-sql.html
 
* http://dirtsimple.org/2010/11/simplest-way-to-do-tree-based-queries.html
 
* http://mikehillyer.com/articles/managing-hierarchical-data-in-mysql/
 
* https://github.com/Atlantic18/DoctrineExtensions/blob/master/doc/tree.md
 
* https://github.com/EspadaV8/ClosureTable
 
 
 
=== Site Module ===
 
 
 
A new core module for Sites will be added to support recording of metadata about a site. The data schema will be configurable the same as other modules, but certain default fields will be enforced. This will be a special case module that does not appear in the modules for a site.
 
 
 
=== Address Book / Actors / Users ===
 
 
 
Currently actors are allocated at a site level. This may not scale very well to a major corporate install, for example LP or WCC, with a staff of 30-50 and hundreds of sites. Repeatedly allocating new actor IDs for each site could result in a user possibly having hundreds of actor IDs which could make reporting harder, site creation more effort, and is not very RESTful.
 
 
 
It is proposed instead to to rename the Address Book module as the Actors module and add a new global level of actors in addition to site-specific actors. Users will be in the global list, and allocated to sites with roles. Choice can be made between global level roles across all sites, or site specific roles. Non-user global actors can also be defined, can be referred to without allocation. Local site actors can be created.
 
 
 
It is also proposed to have the Actors module be the only source of actors for the Action dataclass, simplifying the storage and access.
 
 
 
=== Action Logging / Activity Streams / Gamification ===
 
 
 
All user actions and events in ARK will be logged, enabling an standard Activity Stream at user level that can support gamification. The actual gamification is considered outside of the scope of core ARK2. Some inspiration for the implementation can be taken from Event Sourcing, but  a full implementation will not be attempted.
 
 
 
* http://martinfowler.com/eaaDev/EventSourcing.html
 
* http://activitystrea.ms/
 
* https://www.w3.org/TR/activitystreams-core/
 
* https://github.com/barnabywalters/php-activitystreams
 
* https://github.com/redpanda/ActivityStreams
 
* http://jwage.com/post/54480997504/building-activity-streams
 
* https://dev.playlyfe.com/guides/getting-started.html
 
 
 
=== Command Bus Architecture / Application Services ===
 
 
 
A Command Bus / Event Bus architecture will be implemented to support execution or queuing of synchronous and asynchronous Commands and Events. Console Commands will be wrappers around Commands that parse input from the command line.
 
 
 
* http://php-and-symfony.matthiasnoback.nl/2015/01/a-wave-of-command-buses/
 
* https://github.com/SimpleBus/
 
* http://tactician.thephpleague.com/
 
* http://culttt.com/2014/11/10/creating-using-command-bus/
 
 
 
=== Workflow Management ===
 
 
 
Stretch goal. Packages exist to define workflows using state machines.
 
 
 
Possible workflow scenarios:
 
* Post-ex
 
* Sample taking
 
* Avalon jobs and documents
 
* Data checking
 
 
 
=== CMS / Blog / Project Websites ===
 
 
 
Either provide an OoB Wordpress integration and host client websites, or add features to ARK to provide the basic project website features (home page, blog, contact us, calendar).
 
 
 
* https://github.com/bolt/bolt CMS built using Silex/Symfony
 
 
 
=== Errors ===
 
 
 
Standardised error codes will be used, based on the JSON API standard format. Error codes will be stored in a database table and exported via the API, with actual error messages translated via the standard methods, with detailed debug/help available on the ARK wiki via standardised links. Fatal errors will be thrown as exceptions, with the Controller responsible for catching the error and reporting it in the appropriate manner, i.e. via web or api. Non-fatal errors such as validation errors can be batched before return. While numeric error codes are convenient, they make code hard to read and debugging tracebacks harder, so all error conditions should be explicitly commented in the code, and error codes should be as unique as possible..
 
 
 
=== Matrix / Graph ===
 
 
 
PHP:
 
* https://github.com/clue/graph
 
* https://github.com/graphp/algorithms
 
* https://github.com/cpettitt/graphlib/wiki/API-Reference
 
 
 
== Branding / Community ==
 
 
 
Two potential issues suggest an evolution of the ARK brand is required
 
* Building a development and support community may be easier if ARK is branded as a stand-alone project, rather than seen as owned/controlled by LP Archaeology
 
* Extending use of ARK and Hosted ARK to areas outside archaeology may be held back by emphasising the archaeology aspect in the branding
 
 
 
Branding as something like 'The ARK Project, sponsored by LP Archaeology', and coining a bacronym like 'ARK Recording Kit' might solve these issues.
 
 
 
The Hosted ARK would need a separate identity to the development project to keep the Open Source / Commercial split clear. A simple .com vs .org difference is probably not clear enough. Examples include:
 
* Salesforce.com vs Force.com
 
* Mediawiki vs Wikia
 
* Wordpress .com vs .org
 
 
 
Branding would thus consist of three parts:
 
* The project
 
* The products
 
* The service
 
 
 
The branding would need to be distinctive but consistent to make it clear they are part of a cohesive whole.
 
 
 
Words with Ark / Arc in them (but not arch) for possible project or theme names:
 
* Archive / ARKhive
 
* Arctic (very white/light theme?)
 
* Arcadia
 
* [https://en.wikipedia.org/wiki/Arkose Arkose] (type of sediment)
 
* Arkaeology (available in .com, .org, .net!)
 
* Arcade
 
* Archaic / Arcane / Arcanum / Arcana (more apt for ARK v1 ;-)
 
* Arcuate / Arcuated (Arc/bow shaped)
 
* Architrave
 
* Architect
 
* Archipeligo
 
* Archosaur
 
* Arktivity...
 
 
 
== Development Environment ==
 
 
 
To develop ARK requires the following tools to be installed:
 
* Git
 
* Composer
 
* Node.js (for frontend/theme development only)
 
 
 
On OSX, while you can install the requirements via packages, we recommend using HomeBrew as it makes life easier:
 
* Install XCode and ensure the Command Line tool are installed
 
* Install HomeBrew
 
* 'brew install composer node php-cs-fixer'
 
 
 
While text editors / IDEs are a deeply personal choice, we recommend using Atom as it is a cross-platform Open Source editor with powerful plugins to support the tools used by ARK.
 
 
 
== Changes ==
 
 
 
Details of changes made in ARK2.
 
 
 
=== Code Repository ===
 
 
 
Development of ARK 2.0 is occurring in the open on GitHub https://github.com/lparchaeology/ark2
 
 
 
=== Configuration ===
 
 
 
Significant changes to the configuration of ARK are being made to move from PHP file based configuration to database based configuration. This section will document these changes.
 
 
 
* The config/ folder will contain all user-editable php files required, all other config will be in the database
 
* The env_settings.php file is replaced by server.php and paths.php
 
* server.php contains the settings for the database connection and root server path and should be the only file requiring editing for a default ARK install
 
* paths.php contains the settings for the server file paths and should not need editing
 
* To set-up an ARK, copy the config folder from php/arkdb/config to teh root folder and edit as required
 
* preflight_checks.php now defaults to off, so needs to be enabled before running, and then deleted form config afterwards
 
* settings.php has moved from config/ to php/settings/ and no longer requires user editing, all settings are now held in the database and should be configured per the instructions
 
 
 
=== Database ===
 
 
 
* Configuration has been moved to the database
 
* A new ADO class wrapping PDO has been created to provide all database access for the new OO config classes
 
* db_functions.php has been cleaned up to move repeated code into new routines:
 
** dbTimestamp() returns a timestamp
 
** dbRunAddQuery() inserts a single row into a table
 
** dbUpdateSingleIdRow() updates a single ID table row
 
** dbUpdateAllRows() updates all rows matching a given key
 
* All DB functions have been moved into db_functions.php and use the new DB routines so they no longer create SQL themselves
 
 
 
=== Globals ===
 
 
 
Globals are being progressively removed and replaced where possible by access to config objects.
 
 
 
A number of config global variables have been renamed for consistency
 
* Any var ending in _dir is an absolute filesystem directory path
 
* Any var ending in _path is a URL path relative to the hostname and always starts with a '/'
 
* Neither var ever ends in a separator
 
* $ark_server_path -> $ark_root_dir
 
* $ark_dir -> $ark_root_path and no longer ends in a /
 
* $registered_files_host -> $registered_files_path
 
* $phMagickDir -> $phmagick_file
 
* ark_web_maptemp_dir -> ark_maptemp_path
 
* $ark_lib_dir and $ark_lib_path point to the library folder
 
* $skins_dir and $skins_path point to the skins folder
 
* $skin_dir and $skin_path point to the current skin folder
 
 
 
A number of config global variables have been renamed for clarity
 
* $mode -> $search_mode
 
* $ftx_mode -> $search_ftx_mode
 
 
 
A number of config global variables have been deleted as they are not used:
 
* $default_year
 
* $conf_non_search_words
 
* $conf_langs
 
* $loaded_map_modules
 
* $default_output_mode
 
  
The logging globals have been changed
+
== Documentation ==
* $log, $conf_log_add, $conf_log_edt, $conf_log_del are deleted
 
* $log_ins, $log_upd and $log_del are used instead
 
  
A number of globals have been replaced by PHP5 constants
+
Details of ARK2 can be found in the following sections:
* $fs_path_sep has been replaced with PATH_SEPARATOR
 
* $fs_slash has been replaced with DIRECTORY_SEPARATOR
 
  
A number of config globals have been removed as they are provided through alternative means:
+
* [[ARK2/Design|Design]] - High Level Design Decisions
* $conf_pages
+
* [[ARK2/Technical|Technical]] - Technical details, Development tools and procedures
* $conf_media_browser ($default_media_browser holds subform_id for now)
+
* [[ARK2/The_ARK_Way|The ARK Way]] - Web Development - The ARK2 Way
 +
* [[ARK2/Install|Install]] - Installation Instructions
 +
* [[ARK2/Architecture|Architecture]] - System Architecture
 +
* [[ARK2/Database|Database]] - Database / ORM details
 +
* [[ARK2/Cache|Cache]] - Cache details
 +
* [[ARK2/Model|Model]] - Data model / Schema
 +
* [[ARK2/View|View]] - Views on the Data Model
 +
* [[ARK2/Spatial|Spatial]] - Spatial data
 +
* [[ARK2/Vocabulary|Vocabulary]] - Controlled Vocabularies
 +
* [[ARK2/Files|File Management]] - File, Image, and other Media Management
 +
* [[ARK2/Localization|Localization]] - Internationalization, Localization, and Translation
 +
* [[ARK2/Security|Security]] - Security, Authentication, Authorisation, User Management
 +
* [[ARK2/API|API]] - REST API implementation
 +
* [[ARK2/Frontend|Frontend]] - Web Frontend
 +
* [[ARK2/Templates|Templates]] - Using Twig for the Web Frontend
 +
* [[ARK2/Console|Console]] - Admin Consoles
 +
* [[ARK2/Admin|Admin]] - Admin Frontend
 +
* [[ARK2/Branding|Branding]] - Branding of the ARK Project, Products, and Service

Latest revision as of 12:34, 7 August 2018

This page details the progress on development of ARK 2.0

Aims

The primary aims of ARK2 are:

  • Full code re-write to modern standards using modern components
  • Separate the ARK Database backend from the ARK Web frontend
  • Implement a modern REST API to allow other frontends and apps to access and update the ARK Database
  • Simplify the setup and configuration of ARK by moving the config into the database and providing online config tools
  • Improve the overall performance and data integrity of ARK
  • Make it possible to provide an ARK hosting service

Modern frontend

  • HTML5
  • Bootstrap based
  • Twig templates
  • Front Controller model
  • Config driven pages views

Modern backend

  • Full REST API to access and update all ARK data
  • Database abstraction and Object Relational Mapping
  • Config driven data schema
  • Controlled Vocabularies and Linked Open Data
  • User Authentication via internal user/password and external OAuth2 providers (Facebook, Google, etc)
  • User Authorisation via Role Based Access Control (RBAC) using hierarchical Roles and Permissions structure
  • Field-level data access control
  • Data Workflows in conjunction with User Authorisation control

Documentation

Details of ARK2 can be found in the following sections: