Wednesday, March 7, 2012

Different row delimiters in same connection manager

I have a situation where two CSV dmited types are read in. One file row type could be CRLF while another could be CR. Seeing how the only difference between these file types is the row delimiter, I would like to use one conn manager. Is it possible? Ideas?

Try this:
create a string variable Delimiter.
In Connection Manager window select your csv connection and look at its properties.
Open Expressions form and add an expression to the property RowDelimiter setting it to the variable User::Delimiter
Then you should be able to progamatically switch from one delimiter to another (toggling the variable value) before starting to load from the csv.
(i cannot try now this solution, take is as a hint)
|||

Unfortunately, this solution will not work. The RowDelimiter property is used only initially when no flat file connection columns are defined. After the columns get created the actual row delimiter is a column delimiter of the last column. You cannot do the same trick with the column delimiter because it is not expressionable.

Are you only concerned about the redundancy (with the two connection managers), or you had in mind more elegant data flow (with a multi flat file connection and one flat file source in a loop, perhaps)? I would like to find out more about your motivation to merge those two connection managers.

Thanks,

|||I am looking mainly to cut down the redundancy. There are 100+ inbound conns that need to support CRLF and CR returns, and it would end up being way too redundant to create a seperate manager for each return.|||

I can settle with just looking for CR for row delimiter. So, is there a way to maybe pad (or ignore) any LF's?

|||You can try that. The additional LF character would wrap up to the start of the first column of the next row. You should be able to clean it up using the Derived Column transform. However, I suspect it will make a headache with the incomplete last row, which would consist of a single LF character.

The solution I would go with is to have a Script Task before the Data Flow and preprocess files in it -- to unify the row delimiters.

Thanks,|||

Yah, I thought about doing this, however, won't this drain performance by having to essentially read/write the file twice (keeping in mind that some of these files can be rather large in size)? How would you handle the pre-processing in script?

Thanks.

|||

It will certainly have some performance impact. You should probably do some tests to verify how acceptable the cost is for the added manageability of your packages.

In the script, I would look for CR character and if the next one is not LF I would insert it. If you can be sure about the files with expected initial format ({CR}{LF}), you can skip those.

Thanks,

No comments:

Post a Comment