Saturday, 14 September 2013

Python script using regex (re) to remove extra newlines

Python script using regex (re) to remove extra newlines

guys!
I have a tab-delimited text file that may have some values containing
newlines, like this:
col1 col2 col3
row1 val1 "Some text
containing newlines. Yup, possibly
more than one... val3
row2 val4 val5 val6
Note:
Number of rows or columns may be different.
Any value may be text or may be a number, may contain newlines and may not
I am trying to write a small Python script using re in order to:
get rid of extra newlines (but preserve the original ones, i.e. at the end
of each row)
enclose every single value in double quotes
It would be great to have in a form like that:
def normalize_format(data, delimiter = '\t'):
data = re.sub(_DESIRED_REGEX_, r'"\1"', data)
return data
where data is the whole file contents as a single string and
_DESIRED_REGEX_ is the one I would like to have figured out
Usage of re is not mandatory, but short and elegant solution is
appreciated :)

No comments:

Post a Comment