Now that the TEC and PAO transcript data is in pipe-delimited CSV format, I can start to use batch cleansing techniques to further clean the raw OCR output data (CSV) into 2nd phase cleaned CSV. These processes are all automated tasks with no manual intervention. Once again, I did this purposely to keep the string of steps automated all the way from the original OCR steps to the cleaned output CSV in case I ever need to go back and change one of my earlier OCR settings etc. Any manual changes to the data would be wiped out by re-exporting any of the earlier steps.