Difference between revisions of "Importing Text Data"

From ARK
Jump to: navigation, search
(Step 2. Determine 'Root' of the data structure.)
(Step 1. Identify datafile)
 
(13 intermediate revisions by the same user not shown)
Line 1: Line 1:
 
==ARK textfile importer user guide==
 
==ARK textfile importer user guide==
Since v1.1.2 ARK ahs had a tool to import text based files. in either JSON or CSV  
+
Since v1.1.2 ARK has had a tool to import text based files in either JSON or CSV.
Data represented in either json or csv format can be uploaded to the ARK database.The file can be filtered so that only certain records are uploaded. It can create new records or add to existing ones.
+
.The file can be filtered so that only certain records are uploaded. It can create new records or add to existing ones.
  
 
===Step 1. Identify datafile===
 
===Step 1. Identify datafile===
Specify the location of the file.The import page of ARK in 1.1.2 includes a function for importing data directly from a text file or a data stream available online. Currently the file must be available on a web server. (Fig. 1) Enter the location of the file, either a relative path from the ARK root eg data/uploads/your_data.csv or a remote web address eg http://www.example.com/ARK/data?format=json.
+
[[File:jsonhowto_fig1.png|thumb|400px|right|Figure 1: File Picker]]
 +
;Specify the location of the file.
 +
:The import page of ARK in 1.1.2 includes a function for importing data directly from a text file or a data stream available online. Currently the file must be available on a web server. (Fig. 1) Enter the location of the file, either a relative path from the ARK root eg data/uploads/your_data.csv or a remote web address eg http://www.example.com/ARK/data?format=json.
 
Click Submit.
 
Click Submit.
  
Line 10: Line 12:
  
 
===Step 2. Determine 'Root' of the data structure.===
 
===Step 2. Determine 'Root' of the data structure.===
The data you have will likely include data that relates to several ARK items.The level that these repeat in the data structure must be defined so that ARK can import all the records automatically. In a csv file where there is a single record per row the root will be 'root>items'. In a more complicated json file (for example one with file level metadata held in the root as well as the items) the records may be contained in a object within the root object.
 
  
[[File:jsonhowto_fig1.png|thumb|left|Figure 1: File Picker]]
+
The data you have will likely include information that relates to several ARK items. Within your data structure there will be a level that includes all of these item representations - in the same way as rows in a table. In a csv file where there is a single record per row the root will be 'root>items'. In a more complicated json file (for example one with file level metadata held in the root as well as the items) the records may be contained in an object within the root object. Json allows for extensible representation, so each 'row' in the json file may not include all the same fields. It may be necessary to use the filter tools to only import objects which have a certain criteria.
 +
 
 +
The example shown here has many items in the items object in root. The first available column is used to identify each of the contained objects - in this case it is an ARK ID, but it could be anything and may not be unique if it is not unique in your structure. For example if field 1 in your table is a site code or context type.
 +
 
 +
[[File:jsonhowto_fig2.png|thumb|400px|left|none|Figure 2: Objects in 'root>items']]
  
 
In order to define this level you will need to navigate to it.The path at the top of the page describes where in the data structure you currently are.The large panel below the advanced options shows the identifiers held in the current level.At the root level of a CSV file the first column will be used as an identifier. Clicking on these will open the data attached to that identifier for viewing.(Fig. 2) It is possible to go up one level using the button at the end of the list of objects, or any part of the location at the top of the page can be used to navigate to that level.When the main window would contain more than 20 items it is truncated and only the first 20 are shown.
 
In order to define this level you will need to navigate to it.The path at the top of the page describes where in the data structure you currently are.The large panel below the advanced options shows the identifiers held in the current level.At the root level of a CSV file the first column will be used as an identifier. Clicking on these will open the data attached to that identifier for viewing.(Fig. 2) It is possible to go up one level using the button at the end of the list of objects, or any part of the location at the top of the page can be used to navigate to that level.When the main window would contain more than 20 items it is truncated and only the first 20 are shown.
 
[[File:jsonhowto_fig2.png|thumb|right|none|Figure 2: Objects in 'root>items']]
 
  
 
Use the 'this is root' button in the navigation panel at the top of the page when the root is displayed in the main panel.
 
Use the 'this is root' button in the navigation panel at the top of the page when the root is displayed in the main panel.
  
 
===Step 3. Determine ARK IDs===
 
===Step 3. Determine ARK IDs===
Using the navigation method above find the path to the ARK id within your data.This will often be as simple as 'root>items>item1>Ark_id'.The object that you define the ARK ID in is not important.As long as the objects are structured consistently the ARK IDs will be retrieved automatically based on the root defined in the step above.
 
  
Figure 3: Root and ARK ID specified
+
[[File:jsonhowto_fig3.png|thumb|right|600px|Figure 3: Root and ARK ID specified]]
If you do not have ARK IDs in your data refer to the advanced options.
+
 
 +
Using the navigation method above find the path to the ARK id within your data. This will often be as simple as 'root>items>item1>Ark_id'.The object that you define the ARK ID in is not important. As long as the objects are structured consistently the ARK IDs will be retrieved automatically based on the root defined in the step above. so for instance the importer will import the ARK ID for METSUR_152 from root > items > METSUR_152 > ARK_id and then the ARK ID for METSUR_153 from root > items > METSUR_153 > ARK_id, and so on.
 +
 
 +
 
 +
If you do not have ARK IDs in your data refer to the [[#Advanced Options|advanced options]].
 
Otherwise, click 'This is ARK ID' button below the identifier that contains the ARK ID.
 
Otherwise, click 'This is ARK ID' button below the identifier that contains the ARK ID.
  
 
===Step 4. Test Import===
 
===Step 4. Test Import===
Click 'Import this' below the identifier that you would like to import. If you have completed the steps above a new panel will appear below the viewer. Confirm that this looks how you expect – this is what will be imported into the database.
 
  
The final panel allows you to define which records will be imported.A type must be specified. If you are importing a file it may be necessary to do a separate import of each type. This is because different types have different fields attached to them, the drop down menus respond to the ones above them, presenting only the options available. If they are not completed in order it may result in unexpected behaviour.
+
[[File:jsonhowto_fig4.png|thumb|right|400px|Figure 4: Example of dry run]]
 +
 
 +
Click 'Import this' below the identifier that you would like to import. If you have completed the steps above a new panel will appear below the viewer. Confirm that this looks how you expect – '''this is what will be imported into the database'''.
 +
 
 +
The final panel allows you to define which records will be imported. A type must be specified. If you are importing a file it may be necessary to do a separate import of each type. This is because different types have different fields attached to them, the drop down menus respond to the ones above them, presenting only the options available. If they are not completed in order it may result in unexpected behaviour.
  
 
===Step 5.===
 
===Step 5.===
Line 37: Line 46:
  
 
==Advanced Options==
 
==Advanced Options==
Ste_cd:The site code that imported items will have, is one is not specified in the data – uses the ARK default if not set. No site code: This check box will remove site codes from the import itemvalue, for use with chains.
+
;Ste_cd:
Regular Expression: This is used to extract the itemvalue from the ARK ID column – by default it grabs the first run of numbers in the field.
+
:The site code that imported items will have, is one is not specified in the data – uses the ARK default if not set. No site code: This check box will remove site codes from the import itemvalue, for use with chains.
Start at Arbitrary Number: If you wish to create a sequence of numbers starting from a given point, (including 1) you will need to specify this here, and the number to start on. A unique 'ARK ID' must still be specified in the data, but it will not be used.
+
;Regular Expression:
 +
:This is used to extract the itemvalue from the ARK ID column – by default it grabs the first run of numbers in the field.
 +
;Start at Arbitrary Number:
 +
:If you wish to create a sequence of numbers starting from a given point, (including 1) you will need to specify this here, and the number to start on. A unique 'ARK ID' must still be specified in the data, but it will not be used.

Latest revision as of 12:17, 3 April 2018

ARK textfile importer user guide

Since v1.1.2 ARK has had a tool to import text based files in either JSON or CSV. .The file can be filtered so that only certain records are uploaded. It can create new records or add to existing ones.

Step 1. Identify datafile

Figure 1: File Picker
Specify the location of the file.
The import page of ARK in 1.1.2 includes a function for importing data directly from a text file or a data stream available online. Currently the file must be available on a web server. (Fig. 1) Enter the location of the file, either a relative path from the ARK root eg data/uploads/your_data.csv or a remote web address eg http://www.example.com/ARK/data?format=json.

Click Submit.

Depend on the size of the file it may take some time to load into your browser. Once it is loaded it will be available to manipulate very quickly.A spinner will appear while this is happening.

Step 2. Determine 'Root' of the data structure.

The data you have will likely include information that relates to several ARK items. Within your data structure there will be a level that includes all of these item representations - in the same way as rows in a table. In a csv file where there is a single record per row the root will be 'root>items'. In a more complicated json file (for example one with file level metadata held in the root as well as the items) the records may be contained in an object within the root object. Json allows for extensible representation, so each 'row' in the json file may not include all the same fields. It may be necessary to use the filter tools to only import objects which have a certain criteria.

The example shown here has many items in the items object in root. The first available column is used to identify each of the contained objects - in this case it is an ARK ID, but it could be anything and may not be unique if it is not unique in your structure. For example if field 1 in your table is a site code or context type.

Figure 2: Objects in 'root>items'

In order to define this level you will need to navigate to it.The path at the top of the page describes where in the data structure you currently are.The large panel below the advanced options shows the identifiers held in the current level.At the root level of a CSV file the first column will be used as an identifier. Clicking on these will open the data attached to that identifier for viewing.(Fig. 2) It is possible to go up one level using the button at the end of the list of objects, or any part of the location at the top of the page can be used to navigate to that level.When the main window would contain more than 20 items it is truncated and only the first 20 are shown.

Use the 'this is root' button in the navigation panel at the top of the page when the root is displayed in the main panel.

Step 3. Determine ARK IDs

Figure 3: Root and ARK ID specified

Using the navigation method above find the path to the ARK id within your data. This will often be as simple as 'root>items>item1>Ark_id'.The object that you define the ARK ID in is not important. As long as the objects are structured consistently the ARK IDs will be retrieved automatically based on the root defined in the step above. so for instance the importer will import the ARK ID for METSUR_152 from root > items > METSUR_152 > ARK_id and then the ARK ID for METSUR_153 from root > items > METSUR_153 > ARK_id, and so on.


If you do not have ARK IDs in your data refer to the advanced options. Otherwise, click 'This is ARK ID' button below the identifier that contains the ARK ID.

Step 4. Test Import

Figure 4: Example of dry run

Click 'Import this' below the identifier that you would like to import. If you have completed the steps above a new panel will appear below the viewer. Confirm that this looks how you expect – this is what will be imported into the database.

The final panel allows you to define which records will be imported. A type must be specified. If you are importing a file it may be necessary to do a separate import of each type. This is because different types have different fields attached to them, the drop down menus respond to the ones above them, presenting only the options available. If they are not completed in order it may result in unexpected behaviour.

Step 5.

When you are happy that you are importing the correct data into the correct field, click 'Submit'. You will be shown the results of your import. Click 'Import Json' to repeat the process for any other fields you wish to import.

Advanced Options

Ste_cd
The site code that imported items will have, is one is not specified in the data – uses the ARK default if not set. No site code: This check box will remove site codes from the import itemvalue, for use with chains.
Regular Expression
This is used to extract the itemvalue from the ARK ID column – by default it grabs the first run of numbers in the field.
Start at Arbitrary Number
If you wish to create a sequence of numbers starting from a given point, (including 1) you will need to specify this here, and the number to start on. A unique 'ARK ID' must still be specified in the data, but it will not be used.