Reader

From DISMARC Help

(Difference between revisions)

Current revision

A reader is part of an Importer

So-called RecordReaders transform the native data into an XML. RecordReaders are configured via the ImporterConfig which will set certain settings before you can get an iterator for a parsed file. What does that mean? Well, lets say that you have a really long text file where you have to break at each line starting with a BEGIN RECORD. You would need quite a lot of memmory to read the whole file especially when we need to convert it into a different character encoding. That's why you need to tell the RecordReader that you need to break at each of the BEGIN RECORD lines, hand it over to the Mapper and continue to read the file as soon as the mapper has finished converting the record. This behaviour is called Iterator because we go through each record at a time. The RecordReaders create theses RecordIterators for each file you've uploaded to the application.

So far, the application can handle the following native data and generate an XML out of it.

1 CsvRecordReader
2 ExcelXMLRecordReader
3 Iso2709RecordReader
4 XmlRecordReader
5 SequentialRecordReader
6 SeparatedSequentialRecordReader
7 MysqlDatabaseRecordReader
8 MarcTextRecordReader
9 OaiRecordReader

CsvRecordReader

The CsvRecordReader iterates through each line of the supplied file and returns an XML with an element for each column. It takes the following settings:

Enclosure: What character surrounds the value of a field in the CSV
Escape: What character escapes the following character (which is needed to escape the enclosure for example)
Delimiter: What character is used to split the line into the fields
FirstLineIsHeader: Is the first line the header line, then this line will be used as attribute for the XML output

ExcelXMLRecordReader

Since Office2003, Excel is capable of saving XML files. These files are read just like CVS files but this Reader can handle multiple tables.

FirstLineIsHeader: Is the first line the header line, then this line will be used as attribute for the XML output

Iso2709RecordReader

This Reader doesn't take any settings as it uses the standard. It will create a MARC-XML out of the text file.

XmlRecordReader

This Reader will split the XML at a certain XPath. the following settings are needed:

Path: The XPath for which will return each of the nodes
Namespace: for XPath to work, you may need to specify some namespaces. Use <prefix>=<namepsace> for this value. You can use this setting multiple times.

SequentialRecordReader

SeparatedSequentialRecordReader

MysqlDatabaseRecordReader

This reader is configured using an XML that is used as import file.

MarcTextRecordReader

OaiRecordReader

Retrieved from "http://192.168.0.43/dismarc/wiki/index.php5/Reader"

Category: Dismarc portal

@@ Line 1: / Line 1: @@
 A reader is part of an [[Importer]]
-The RecordReaders, as they are actually called, transform the native data into an XML. Record Readers are configured via the [[ImporterConfig]] which will set certain settings before you can get an iterator for a passed file. What does that mean? Well, lets say that you have a really long text file where you have to break at each line starting with a ''BEGIN RECORD''. You would need quite a lot of memmory to read the whole file especially when we need to convert it into a different character encoding. That's why you need to tell the RecordRedaer that you need to break at each of the ''BEGIN RECORD'' lines, hand it over to the [[Mapper]] and continue to read the file as soon as the mapper is done converting the record. This behaviour is called Iterator because we go through each record at a time. The RecordReaders create theses RecordIterators for each file you've uploaded to the application.
+So-called RecordReaders transform the native data into an XML. RecordReaders are configured via the [[ImporterConfig]] which will set certain settings before you can get an iterator for a parsed file. What does that mean? Well, lets say that you have a really long text file where you have to break at each line starting with a ''BEGIN RECORD''. You would need quite a lot of memmory to read the whole file especially when we need to convert it into a different character encoding. That's why you need to tell the RecordReader that you need to break at each of the ''BEGIN RECORD'' lines, hand it over to the [[Mapper]] and continue to read the file as soon as the mapper has finished converting the record. This behaviour is called Iterator because we go through each record at a time. The RecordReaders create theses RecordIterators for each file you've uploaded to the application.
-So faar, the application can handle the following native data and generate an XML out of it.
+So far, the application can handle the following native data and generate an XML out of it.
 == CsvRecordReader ==
-== ISO2709RecordReader ==
+The CsvRecordReader iterates through each line of the supplied file and returns an XML with an element for each column.
+It takes the following settings:
+; Enclosure : What character surrounds the value of a field in the CSV
+; Escape : What character escapes the following character (which is needed to escape the enclosure for example)
+; Delimiter : What character is used to split the line into the fields
+; FirstLineIsHeader : Is the first line the header line, then this line will be used as attribute for the XML output
+== ExcelXMLRecordReader ==
+Since Office2003, Excel is capable of saving XML files. These files are read just like CVS files but this Reader can handle multiple tables.
+; FirstLineIsHeader : Is the first line the header line, then this line will be used as attribute for the XML output
+== Iso2709RecordReader ==
+This Reader doesn't take any settings as it uses the standard. It will create a MARC-XML out of the text file.
 == XmlRecordReader ==
+This Reader will split the XML at a certain XPath. the following settings are needed:
+; Path: The XPath for which will return each of the nodes
+; Namespace: for XPath to work, you may need to specify some namespaces. Use <prefix>=<namepsace> for this value. You can use this setting multiple times.
 == SequentialRecordReader ==
@@ Line 16: / Line 31: @@
 == SeparatedSequentialRecordReader ==
-== DatabaseRecordReader ==
+== MysqlDatabaseRecordReader ==
+This reader is configured using [[DatabaseRecordReader.xml|an XML]] that is used as import file.
+== MarcTextRecordReader ==
+== OaiRecordReader ==
+[[Category:Dismarc portal]]

Reader

From DISMARC Help

Current revision

Contents

CsvRecordReader

ExcelXMLRecordReader

Iso2709RecordReader

XmlRecordReader

SequentialRecordReader

SeparatedSequentialRecordReader

MysqlDatabaseRecordReader

MarcTextRecordReader

OaiRecordReader

Views

Personal tools

dismarc help

Navigation

Search

Toolbox