Talend Exchange is the place where Talend community can share items related to Talend opensource products, such as Data Integration, Data Quality and Data Master Management. Contribution is open to any user, no specific validation is needed. As soon as you have your forum account, you automatically get a Talend Exchange account.
About: tTikaExtractor use Apache TIKA parser to easily extract information from many different formats like (html, pdf, doc, odt, image, audio, video, ...). See http://tika.apache.org/1.0/formats.html for more information about available parsers.
Revision 0.1 292 Downloads, Released on 2012-01-25
Compatible with: 5.4.0, 5.3.0, 5.2.3, 5.0.0
In the image attached you use a tRowGenerator after parsing the word doc. What is the setup of the tRowGenerator and the other components.
Need more insight to work out how to use this.
Very usefull indeed..
But Apache tika project is actually versionned to 1.5. Current ttika extractor is using the 1.0.
I upgraded it manually...