You are not logged in.

Unanswered posts

#1 2010-11-04 12:16:25

archenroot
Member
169 posts

[resolved] Split File with more than one data block into more files

Hello,

I have file with folowing content:
A01
F0301
R08
A35
R56
B879
V65656
A75
Z54254
E878987

I need to create follwing files based on condition (first line char == "A") and save everything below this condition until another occurence of char "A" found so the output files will look as below:
File 01.txt>
A01
F03
R08

File 35.txt>
A35
R56
B879
V65656

File 75.txt>
A75
Z54254
E878987

First character defines the TYPE of row, and this row is fixed-length type, so I wanted to use tFileInputMSPositional, but spent some time with that issue and nothing worked in required way.

Thank you very much for help.

Best regards,

archenroot

Last edited by archenroot (2010-11-10 14:19:02)


Emperor wants to control outer space Yoda wants to explore inner space that's the fundamental difference between good and bad sides of the Force

Offline

#2 2010-11-04 15:07:56

tchd
Member
79 posts

Re: [resolved] Split File with more than one data block into more files

Hi archenroot,

This is a really interesting problem, so I thought I would give it a try.  Here's my solution:

- read full line, but change line delimiter to "A".  This will create a group of rows into a single field (including newlines)
- using tjavarow create an output flow of fixed line (A added back in) and file number
               - Add the A back to the front of the line
               - remove a trailing "\n"
               - take the first two bytes from the front of the input line (which will be the file name)
- I then use tFlowToIterate to create an iteration for each "line"
- tRowGenerator grabs the line from global i.e. ((String)globalMap.get("row3.line"))
- and finally I use the  the global file number in a tFileOutputDelimited

Pictures below

Regards,
Rick

Last edited by tchd (2010-11-04 15:09:31)

Offline

#3 2010-11-05 11:39:20

archenroot
Member
169 posts

Re: [resolved] Split File with more than one data block into more files

Hello TCHD,

thank you very much for really nice and working scenario. Well there is one exception which I didn't mentione for the first time.

The issue is comming with content of files in a way that row descriptor(delimiter) char, in this case "A", but it could be whatever, could be also part of content of the data block, I made mistake when creating data to make an idea. To make it clear the could look as this to be more precisely:
A01
F030A
RA8
A35
R5A
B879
V6565A
A75
Z54254

This fact make it again difficult to work with, I will try to propose solution to this and then will post it with screens also. If you have any idea how to solve this, please, let me know.

Thank you very much again.

Best regards,

archenroot


Emperor wants to control outer space Yoda wants to explore inner space that's the fundamental difference between good and bad sides of the Force

Offline

#4 2010-11-05 12:09:35

tchd
Member
79 posts

Re: [resolved] Split File with more than one data block into more files

Hi Archenroot

My first thought is that you could do a preprocess on the file, replacing all "A" which is in the first position on the row with a character (or string) that doesn't exist in the data e.g. $$ and replace the delimiter in the tFileInputFullRow.

I does mean that you will have to do a couple of passes of the file, so it does depend on how performant you need it to be.

Regards,
Rick

Offline

#5 2010-11-05 12:19:58

archenroot
Member
169 posts

Re: [resolved] Split File with more than one data block into more files

Hm, good idea, the performace is hudge to consume, I will prepare the scenario, thanks a lot.

Best regards,

archenroot


Emperor wants to control outer space Yoda wants to explore inner space that's the fundamental difference between good and bad sides of the Force

Offline

#6 2010-11-05 12:21:58

janhess
Member
1305 posts

Re: [resolved] Split File with more than one data block into more files

How about adding a character such as '|' before the 'A' at the start of a line and output to a file then process the file using '|A' as the row seperator. That way you can read in a whole group of data and extract the first Ann field using new line as the string delimiter.

Oops - didn't see the above posts while I was creating this.

Last edited by janhess (2010-11-05 12:23:22)

Offline

#7 2010-11-05 12:43:43

tchd
Member
79 posts

Re: [resolved] Split File with more than one data block into more files

Hi Archenroot,

Another option is to use "\nA" as the delimiter rather than simply using "A".  It's certainly more exact.

This does require changes to tJavaRow because the first "A" record would not have the \n before it, so wouldn't be removed.

Here's the amended code:

// Add the "A" back to the start of the group if not first line
// that starts with "A"

String outline = "";

if (input_row.line.substring(0,1).equals("A")) {
    outline=input_row.line;
} else {
    outline="A" + input_row.line;
}

//Remove the trailing "\n"
output_row.line = outline.substring(0,outline.length()-1);

//Extract the file number
output_row.file_number = outline.substring(1,3);

Regards,
Rick

Last edited by tchd (2010-11-05 12:44:47)

Offline

#8 2010-11-10 12:24:37

archenroot
Member
169 posts

Re: [resolved] Split File with more than one data block into more files

Hi,

well this is nice solution, but I have to preprocess the files replacing "A" char witch some custom delimiter in my case and then read the files again. That is because there is only one "A" type record in some of input files and in this case the "\nA" is not working as row delimiter.

I will post later today whole solution with screen shots.

Anyway, very nice solution in case there is always more than one "A" data record in each file.

Best regards,

archenroot

Last edited by archenroot (2010-11-10 12:47:53)


Emperor wants to control outer space Yoda wants to explore inner space that's the fundamental difference between good and bad sides of the Force

Offline

#9 2010-11-10 14:07:48

archenroot
Member
169 posts

Re: [resolved] Split File with more than one data block into more files

So I have finally make it done and it works just fine.
The enhancement of whole scenario is that it seeks for the files from Linux on Windows share.

It is now ready to process files like this sample bellow which is the same as in the begining of this discussion, where first line char "A" (but whatever else) define beggining of datablock:
A654654321asdfasdf654    - datablock 1
VAVAADFSAFFSDFSDF       - datablock 1
A654654654654654A         - datablock 2
SDFSDFDSFSdf                 - datablock 2

But is also ready for the simplest posibility where there is only one data block in file
A654654321asdfasdf654
VAVAADFSAFFSDFSDF

And that's it. Thank you very much all of you who discussed with me the problematics about this issue.

Best regards,

archenroot :-)

Last edited by archenroot (2010-11-10 14:13:26)


Emperor wants to control outer space Yoda wants to explore inner space that's the fundamental difference between good and bad sides of the Force

Offline

#10 2010-11-10 18:26:47

tchd
Member
79 posts

Re: [resolved] Split File with more than one data block into more files

Hi Archenroot,

Glad to help. 

I've been thinking about the "twin pass of the input file" problem with this job and wondered whether I could do it using a more Java oriented approach that only performs a single pass  (I've seen something similar in other tools) and my rough(!!) windows prototype seems to work.  If you do find that your current solution is not performing, perhaps this might be useful.

So the flow is now tFileInputFlow --> tJavaRow --> tFileOutputDelimited.

Below is the code for a tJavaRow that performs the same process.  It will need error checking, non-fixed file paths, potentially O/S check  for the command etc. to make it truly usable. 

It also creates a dummy file since the tFileOutputDelimited requires a filename (which I set to dummy.txt).


String NewFile="";

//if row = A then:
if (input_row.line.substring(0,1).equals("A")) {

    //    close the current file (if this is the first row then it closes the dummy file)
    outtFileOutputDelimited_1.close();
   
    //    Find the new file name from the row
    NewFile=input_row.line.substring(1);
   
    //    Create a new file
    Process p=Runtime.getRuntime().exec("cmd /c copy /y nul c:\\tInput\\"+NewFile+".txt");
    p.waitFor();

    //    Reallocate the talend file objects
    outtFileOutputDelimited_1 = new java.io.BufferedWriter(
                        new java.io.OutputStreamWriter(
                        new java.io.FileOutputStream(
                        "C:/tInput/"+NewFile+".txt", false),"ISO-8859-15"));
    filetFileOutputDelimited_1 = new java.io.File("C:/tInput/"+NewFile+".txt");   

}

// Copy row from input to output as normal
output_row.line=input_row.line;

Regards,
Rick

Offline

#11 2010-11-12 06:16:46

pradap
Member
1 post

Re: [resolved] Split File with more than one data block into more files

good idea, the performace is hudge to consume,thanks for the information..
__________________________________________
Budget Hotels In kanyakumari | Kanyakumari Budget Accommodation | Kanyakumari Tourism

Offline

Board footer

Talend Contributor Agreement - Talend Website Privacy Policy