Importing a Batch of Files Using a Data File
The Therefore™ Document Loader can be used to import a batch of files to Therefore™ where the index data is contained in an accompanying data file. The following tutorial breaks down the import process with different steps depending on the data file type.
-
Before a profile can be created, a suitable category is required in Therefore™ to which the exported documents can be saved. Some thought and planning should be employed at this stage, since good indexing will improve how documents are handled later.
-
Start the Document Loader and under Profile click New.
Note: Profiles can also be created and managed in the Therefore™ Solution Designer.
-
The Profile dialog will open. Enter the Profile name. The next step is to set up a data extractor to get the index data from the data file. Click the Data extractor browse button. Depending on the type of file the data is contained in, follow the respective below instructions:
Data file type is XML
-
Select 'XML Data Extractor' from the drop-down.
-
The 'Data Extractor' dialog will open with a preview of the XML file. The XML tag defining the document needs to be set here, all the document information will be contained between entries of this tag. For example if the tag is 'Invoice' the information would be contained between <Invoice> and </Invoice>.
-
The data extractor can now extract all the index field tags which will later be used to map the category field assignments.
-
Before proceeding, identify which tag in the XML defines the reference files that make up the content of the document. Make a note of this tag for use in the next step of the profile design. This is not required in situations where the XML data file itself is the file that should be saved.
Data file type is text based (e.g. TXT, CSV, DAT)
-
Select 'Text-Line Data Extractor'.
-
The 'Data Extractor' dialog will open with a preview of the text file. Specify the 'Delimiter' and the part numbers with values will automatically be listed.
-
Enter names for the part numbers (index data fields) that are to be extracted. Depending on which field should be used to identify separate documents, select 'On change' under 'Document break'. Quotes can also be defined depending on the settings in the text file.
-
Once all index data fields have been defined, name which part defines the reference files making up the content of the document. Click OK to close the extractor.
After an extractor has been configured, proceed with the steps below.
-
-
Click the Script browse button and enter the following script: FilesToSave = ExtractList("FileName") where 'FileName' is the name of the tag defining the reference files making up the content of the document. The command 'ExtractList' is used as one document may contain multiple referenced files.
-
Select the category where the documents should be saved. If auto-append has been configured in the category properties, the auto-append mode can be specified as different to the category default. The category's fields can now be matched up to the index fields identified during the steps above. Click the drop-down list in front of every category field and select its respective index field.
-
Test the profile by clicking 'Test' and selecting a test file. Click on to save the indexing profile.
-
Specify the location of the data file for import as well as the log file. Select the relevant profile and click 'Process' to begin importing files.
-
Once the import has been complete, start the Therefore™ Navigator and verify the import finished without issue.
Handling date formats
Where the date format on the documents to be imported differs from the date on the operating system where the Document Loader is running, the 'ToDate' function can be used. For example if the documents have a date with format DD.MM.YYYY and the system uses another format, then in the assignment column on the Indexing Profile configuration use:
ToDate(Extract("Invoice Date"), "DD.MM.YYYY")
Line items
Importing line items requires a script. Below is an example for an XML data file with a tabled called 'myTable' with two columns: 'Text' and 'Number'.
<myTable>
<Text> Text1 </Text>
<Number> 1 </Number>
</myTable>
<myTable>
<Text> Text2 </Text>
<Number> 2 </Number>
</myTable>
Extracting information from PDF contents
-
Select 'PDF Data Extractor'.
-
The 'Data Extractor' dialog will open with a preview of the text file. Use the mouse cursor to draw an area around the required data.
-
A new index item dialog will open with the position in the file already filled in. Give this item a name and select 'on Change' for document break if this item is to be used as the document break indicator. Click OK to save. Repeat this to add further index data as required.
-
Define a second indexing profile which will process the data file. Select the respective data extractor depending on the type of data file created. For this example, a .TXT file is used containing the names and paths of the PDF files to be imported.
-
Add the following script to this same indexing profile:
CopyExecuteProfile "PDF Import Test Profile", Extract("PDF file")\
FilesToSave = Extract("PDF file")
Where 'PDF Import Test Profile' is the name of the original PDF data extraction profile created, 'PDF file' is the name of the field defined in the 'Text-Line Data Extractor' as the document break. With this script, this profile will first call the PDF extractor profile and extracts and saves the data for each processed PDF file. -
Note that the 'Script' field in the indexing profile will appear in red, indicating the script is invalid. This can be ignored as the script will execute correctly.
-
In the Therefore™ Document Loader, select the data file and choose the profile with the Text-Line Data Extractor. Click Process to start importing the files.
See also:
Sample Scripts for Indexing Profiles