Now that the TEC and PAO transcript data is in pipe-delimited CSV format, I can start to use batch cleansing techniques to further clean the raw OCR output data (CSV) into 2nd phase cleaned CSV. These processes are all automated tasks with no manual intervention. Once again, I did this purposely to keep the string of steps automated all the way from the original OCR steps to the cleaned output CSV in case I ever need to go back and change one of my earlier OCR settings etc. Any manual changes to the data would be wiped out by re-exporting any of the earlier steps.
Monthly Archives: April 2012
Digitizing Apollo 17 Part 4 – Technical vs Public Affairs Office
There are two different air-to-ground transcripts associated with the Apollo 17 mission, the TEC or Technical Air-to-Ground transcript, and the PAO or Public Affairs Office transcript. These were both separately transcribed to typewritten pages in 1972 even though they contain 80% overlap.
Continue reading