As an experiment, I recently recorded a lunch and learn session at work. The other R&D offices post theirs online, so I thought I’d do the same. On my work machine, my choice of tools is limited. As I researched the subject, I learned some good techniques and wanted to share.
Our office has a GoPro, but its microphone doesn’t record sound well, so I brought in my decade-old SD camcorder. The raw footage formed a “minimal viable product”. I pointed the camera at the slideshow, made sure the speaker Paul was in the frame, and recorded for 50 minutes. The slides are legible, and you can make out what Paul is saying.
Aside from compressing it, I felt there were a couple of small things I could do to enhance the quality of the video....
In retrospect, I wish I had a fill light on Paul. The white slides washed him out, and the age of the camera meant that the camera couldn’t cope. The colors were washed out, and Paul is a shadowy figure in the corner of the frame. The camcorder uses a proprietary .MOD file format and the raw footage was 1.8 GB, too bulky to share.
The projector made a noticeable whirring that ran throughout his talk. And, I wanted to preface the video with a title screen.
My work computer is a Mac, but I don’t have carte blanche to install anything I want. But I can install programs, so I downloaded all the tools I needed via Homebrew. I used FFmpeg for video, SoX for audio, and ImageMagick for pictures.
The first thing I needed to do was discover what a .MOD file is. I learned that a .MOD file is just a camcorder manufacturer specific MPEG-2 container. I was able to just rename it to an .mpg extension and proceed.
Step 1: Clean audio noise
I found a site that outlines how to use SoX to eliminate noise. I customized this procedure for my workflow.
First, I split the audio and video streams into 2 separate files:
- The video stream:
ffmpeg -i input.mp4 -qscale 0 -an tmpvid.mp4
- The audio stream:
ffmpeg -i input.mp4 -qscale 0 tmpaud.wav
I started operating on the audio stream using SoX. The goal was to filter out
the background noise. In addition to the sox
command itself, SoX distributes
some helper scripts like play
, which assumes some command-line arguments. I
used play tmpaud.wav
to listen and find a section of background noise.
As I narrowed in on a section of silence in the audio, I used a different form
on the command to make sure that I was only getting silence, play tmpaud.wav
trim 96.0 1.5
. The first number is the playback start position, and the second
number is the duration, both given in seconds.
Then, I used FFmpeg to extract that portion of the audio, where -ss
is the
time offset from beginning in (hh:mm:ss.ms) format, and -t
is the duration in
seconds.
Once I had a sample of the background noise, I could use SoX to create a “profile”, via:
In this case, the profile will be used to filter out the noise, like so:
The tutorial page suggests a sensitivity threshold in a range of .21-.3, but I found that even .21 introduced some attenuation. However, .15 left enough of Paul’s voice that he was intelligible when he was talking quietly.
This step makes a huge difference in watchability.
Each time I generated a new tmpaud-clean.wav
file, I listened to it with
play tmpaud-clean.wav
. When I was satisfied with the results, I used FFmpeg
to merge (remux) the two streams back together:
The q:a
and q:v
arguments tell FFmpeg to preserve the quality settings of
the input streams.
Step 2: Create a title banner video
I created the image using PowerPoint, which I already had installed. I used a
branded title page to put the title of Paul’s talk and his contact information.
I exported that slide to a JPEG of size 720x405, title.jpg
.
In order to combine this with my presentation, I need to convert the still picture into a video file that contains a video track and a blank audio track.
The first task was to create an audio file with X seconds of silence. The
easiest way I found is to have SoX generate the silence, and then use a script
to put it in a WAV file format. I found a person who wrote a Perl script to
generate silence. It’s called like
this: perl silence.pl 5 silence.wav
to yield 5 seconds of silence. His script
is in the public domain, so I’m reproducing the contents of silence.pl here:
I ran into a snag with the next step. As I said, PowerPoint exports in 720x405,
whereas the camcorder records in 720x404. That one extra line caused me a
headache because of a known bug in the H.264 encoder, producing the error
“height not divisible by 2”. To fix this, I found out I could add this argument
to any FFmpeg command: -vf 'scale=trunc(iw/2)*2:trunc(ih/2)*2'
. In a nutshell,
this calculation reads the height and width from the input file and rounds them
to the nearest even value.
Because of the H.264 bug mentioned above, I used ImageMagick to resize my slide export to a height with an even number:
Then, I combined the video and audio streams into a video file with FFmpeg:
For this command to work, the silence.wav
file needs to be at least as long as
the loop time (specified here by ‘-t 5’).
Step 3: Put these two videos that are different sizes together
This step one took a while to figure out. Once I used convert to make the file sizes the same as shown above, I still couldn’t proceed because of this recurring error:
Input link in1:v0 parameters (size 718x404, SAR 3232:3231) do not match the corresponding output link in0:v0 parameters (718x404, SAR 1:1)
It turns out that for concat to work, the videos not only need to be the same size, but also have the same values for a couple of technical parameters like screen aspect ration (SAR). After some experimentation, I learned to fixed the size and SAR of the second video on the fly to equal the first with the following command:
Let me unpack that filter_complex
argument. Sections are semicolon delimited.
The first section takes the second source’s first (and only) video segment
1:v:0
, scales it, and sets its SAR, naming the resulting stream “1v”
(which I named arbitrarily).
Stream 1v gets piped to the next operation, which does the concatenation. I specify source 0, then source 1, then say I’m going to concatenate them (2 sources into 1 video, 1 audio). Since this is a pipe, I need to terminate the pipe. This is handled by the map operations, which name the results ‘v’ and ‘a’ (also arbitrary names).
Step 4: Reduce file size
The resulting file was still really large, so I chose to compress it.
The “slow” preset is a good compromise between speed and quality. The result was about 10% the size of the raw MPEG2 file.
Step 5: Change file modification time
This one is the easiest of all.
This will make the file’s modification time May 13, 2016 at 11:00am. And, with that, I distributed the talk to my colleagues in other offices!