Ben Feist
[email protected]
  • All Articles
  • Project Apollo 17
  • A47 Headphone Amp
  • Victor Animatophone

Digitizing Apollo 17 Part 2 – Transcript Restoration, A Beginning

Posted on March 1, 2012 by Feist Posted in Project Apollo 17 1 Comment

If you’re interested in this project, have a quick flip through the PDF of the technical air-to-ground mission audio transcript to get an idea of what the source material is like. The raw PDF document was published courtesy Stephen Garber (NASA HQ) and Glen Swanson (JSC) (55MB PDF). These transcripts were originally typed in 1972 by NASA typists.

typist“NASA Public Affairs employed legions of typists stationed in telephone booth-sized rooms whose single job was converting voice to paper. Armed with reel-to-reel tape players, electric typewriters, and reams of paper, these individuals hammered out transcripts within hours of when the astronauts first spoke the words.” – Glen E. Swanson

The PDF from NASA was digitized so that you can select text for copy/paste etc. This was probably done some time in the last decade. However, the way it was done makes the effort almost entirely useless for extraction of the information for digital manipulation. NASA can’t be blamed for this, it was probably the native OCR (optical character recognition) function within Adobe Acrobat that did such a poor job. Plus, I’m sure digitizing the information wasn’t the primary purpose of turning the original typewritten pages into a PDF.

The cover page of the 2500 page technical air-to-ground mission autio transcript

The cover page of the 2500 page technical air-to-ground mission audio transcript

For example, take this excerpt of the TEC transcript:

When the content is selected and copied from the PDF it turns into this:

APOLLO 17 AIR-T0-GROUND VOICE TRANSCRIPTION
00 00 00 03 CDR Roger. The clock has started. We have yaw.
00 00 00 12 CDR Roger; tower. Yaw's complete. We're into roll, Bob.
00 00 00 17 CC Roger, Geno. Looking great. Thrust good on all
five engines.
CDR Okay, babe. It 's looking good here.
00 00 00 21 CDR Roll is complete. We are pitching.
SC Wow woozle I

It doesn’t look too bad at first, but upon closer inspection you can see that there are a few problems:

  • There are many OCR errors–the letter O is a 0 in many cases. The exclamation mark is listed as an “I”. Spacing errors.
  • The line wrapping of the 3rd line contains a hard carriage return that puts the remainder of the line into the timecode column.
  • There is no delimiter between timecode, speaker, and verbiage other than a space, but when mixed with the hard wrapping there’s no immediately evident way to automate the separation of the information. It would be yet another huge manual effort to clean it up.

You’ll also notice that the 4th line contains no timecode. In fact, throughout the Apollo 17 transcripts there are thousands of these missing timecodes. This appears to be an issue unique to the Apollo 17 transcripts. One person suggested that this might be due to NASA getting lazy, knowing that Apollo 17 was to be the last flight thus not needing to learn from this flight for the next. Whatever the reason, it makes the resulting restoration effort even more difficult.

The Internet Archive includes high resolution JP2 images of the transcripts. JP2 is possibly the least helpful image format ever invented. It’s not widely supported and in my opinion is a dubious choice for archiving content. I wrote a batch job in Photoshop to convert all 2,460 JP2 pages to PNG format. Neither JP2 nor PNG is a lossy compression format so there was no data lost in the process. This folder of PNGs will serve as the input to the next step: reOCRing ever page of the Apollo 17 TEC transcript.

« Digitizing Apollo 17 Part 1 – Discovering Apollo
Digitizing Apollo 17 Part 3 – New OCR Techniques »

One thought on “Digitizing Apollo 17 Part 2 – Transcript Restoration, A Beginning”

  1. Pingback: Ben Feist

Leave a comment Cancel reply

Your email address will not be published. Required fields are marked *

Recent Posts

  • From Apollo 17 to NASA May 20, 2019
  • Digitizing Apollo 17 Part 16 – New Apollo17.org, 44th Anniversary Edition December 11, 2016
  • Digitizing Apollo 17 Part 15 – Apollo17.org v1.0 Launched for the Mission’s 43rd Anniversary December 2, 2015
  • Digitizing Apollo 17 Part 14 – A Fantastic Reception September 8, 2015
  • Digitizing Apollo 17 Part 13 – Apollo17.org – Alpha Release v0.1 March 23, 2015
  • Digitizing Apollo 17 Part 12 – YouTube Channel of Complete Mission February 13, 2015
  • Digitizing Apollo 17 Part 11 – More mission audio released by NASA December 14, 2014
  • Digitizing Apollo 17 Part 10 – Manual Transcript Corrections Completed! April 5, 2014
  • Digitizing Apollo 17 Part 9 – The Trip Home March 10, 2014
  • Digitizing Apollo 17 Part 8 – Changing The Clocks January 27, 2013
  • Digitizing Apollo 17 Part 7 – Listening in Real Time December 22, 2012
  • Digitizing Apollo 17 Part 6 – Timeline Reconstruction December 19, 2012
  • Digitizing Apollo 17 Part 5 – Python Processing April 30, 2012
  • Digitizing Apollo 17 Part 4 – Technical vs Public Affairs Office April 15, 2012
  • Digitizing Apollo 17 Part 3 – New OCR Techniques March 30, 2012

Categories

  • How-To (4)
  • Project Apollo 17 (17)
  • Technology (5)

Pages

  • All Articles
  • Home
  • Project Apollo 17
  • test

Categories

  • How-To (4)
  • Project Apollo 17 (17)
  • Technology (5)

Archives

  • May 2019 (1)
  • December 2016 (1)
  • December 2015 (1)
  • September 2015 (1)
  • March 2015 (1)
  • February 2015 (1)
  • December 2014 (1)
  • April 2014 (1)
  • March 2014 (1)
  • January 2013 (1)
  • December 2012 (2)
  • April 2012 (2)
  • March 2012 (2)
  • February 2012 (1)
  • April 2011 (1)
  • March 2011 (1)
  • January 2011 (1)
  • November 2010 (1)
  • February 2010 (1)
  • July 2009 (1)
  • February 2004 (1)
  • July 2003 (1)
  • November 2002 (1)

Recent Comments

  • HARVEY DUNN on My Victor Animatograph Corporation Animatophone Model 40, Type 13
  • Ed elfstrom on My Victor Animatograph Corporation Animatophone Model 40, Type 13
  • Feist on Digitizing Apollo 17 Part 16 – New Apollo17.org, 44th Anniversary Edition
  • Barry Brington on Digitizing Apollo 17 Part 16 – New Apollo17.org, 44th Anniversary Edition
  • Feist on Digitizing Apollo 17 Part 1 – Discovering Apollo

Tags

3COM 16mm ABBYY FineReader 11 Adobe Premiere Advertising ALSJ Amplifier Animatophone Apollo Apollo 17 Apple Audio Canada Circuits DIY Encryption Film FineReader 11 Fix Google Hard Drives Headphone Amplifier Headphones HomeConnect Innovation Jack Schmitt Mobile PGP Phil Zimmerman Privacy Processing Projector Python Regina Security Soldering Spacelog Streetview SxSW Transcript Transcripts UNRAID Victor Corporation WIFI
© Ben Feist