I have a very large textfile in which some entries are missing. The logic is persistent, as the first line of each “section” has the correct entries, every line after this initial line is missing these entries. I’m trying to update every line which misses these entries with the information from the initial line until a new “inital information line” is found. After that I’m continuing with this new found data.
I have build a solution in bash with the help of sed, but the process is very, very slow and takes hours to complete. I guess the reason for the delay is the fact that I’m reading line by line, process these in bash and write them to a new file. My guess is that a sed script with variables and on the file itself (-f) could speed up the process dramatically. I’m not an expert in these advanced usages of sed. I am open to other suggestions or tools, too – as long as they can be called from a bash script, as this is part of an automation.
The example inputfile:
{"Initial line with more information like headers, unimportant, really only one line"
"Alpha","OldTheme","Some more text"
"","","Another rest text"
"","","Yet another text"
"Yadda","NewTheme","Crazy Text"
"","","More crazy text"
The expected result:
"Alpha","OldTheme","Some more text"
"Alpha","OldTheme","Another rest text"
"Alpha","OldTheme","Yet another text"
"Yadda","NewTheme","Crazy Text"
"Yadda","NewTheme","More crazy text"
And here’s my working (but very slow) bash script:
#!/bin/bash
first=0
cat inputfile |
while read line; do
if [ ${first} -eq 0 ]; then
first=1; continue
fi
partline=$(echo "${line}" | grep -o '","(.*)')
newinitial=$(echo "${line}" | sed 's/",".*//; s/^"//')
if [ ! -z "${newinitial}" ]; then
initial=${newinitial}
fi
newtheme=$(echo "${partline}" | sed 's/^","//; s/",".*//')
if [ ! -z "${newtheme}" ]; then
theme=${newtheme}
fi
restline=$(echo ${partline} | sed 's/^","//' | grep -o '","(.*)')
echo ""${initial}","${theme}${restline}"
done >outputfile