Blog-Archiv

Dienstag, 5. Januar 2021

Crossfade Transition Between Two Videos with ffmpeg

You may have noticed that video effect when one scene slowly fades out while the next fades in, looking like they grow into each other. The open-source tool ffmpeg can generate such crossfade transitions (I used version 4.2.4 for LINUX). Creating such a transition is very slow, because due to work on pixel-level, ffmpeg needs to demux and mux everything. Mind that a new filter called xfade is soon to come.

In this Blog I will present a UNIX shell script that joins two videos using a crossfade-transition, and I will try to explain the complex_filter language a little. ffmpeg is not a graphical video editor, you need to cope with complex command lines.

Transition Nature

A crossfade transition makes the result video shorter than the sum of both video parts, because the fade-out of the first video will be overlapped with the fade-in of the second video. So, if both videos last 5 seconds, and the transition was set to 2 seconds, the result video will be 5 + 5 - 2 = 8 seconds, not 10 seconds.

Shell Script

Below is the complete source code of my shell script joining two videos using a crossfade transition.

Click to expand script
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
# default definitions

fadeSeconds=2    # number of seconds the transition will last
outputVideo=output.MP4    # file path of the result video

syntax() {
    echo "SYNTAX: $0 firstVideo.MP4 secondVideo.MP4 [outputVideoFilename [fadeSeconds]]" >&2
    echo "    Concatenates firstVideo.MP4 and secondVideo.MP4" >&2
    echo "    with a crossfade-transition of $fadeSeconds seconds to file $outputVideo" >&2
    exit 1
}

# evaluate command arguments

[ -z "$1" -o ! -f "$1" -o -z "$2" -o ! -f "$2" ] && {
    echo "ERROR: make sure both input files exist: first=$1, second=$2" >&2
    syntax
}
inputVideo1=$1
inputVideo2=$2

[ -n "$3" ] && outputVideo=$3

[ -n "$4" ] && {    # test for number
    case $4 in
    \.*|*[!0-9\.]*|*\.*\.*)    # leading dot, or not numbers (with dot), or several dots, or leading dot
        echo "ERROR: not a number of seconds: $4" >&2
        syntax
        ;;
    *)    # is number with optional dot
        fadeSeconds=$4
        ;;
    esac
}

# analyze input videos

streamProperty()    {    # $1 = property to display, $2 = stream identifier, $3 = video file
    ffprobe -v error -select_streams $2 -show_entries stream=$1 -of default=noprint_wrappers=1:nokey=1 $3
}

pixelFormat1=`streamProperty pix_fmt v:0 \$inputVideo1`
duration1=`streamProperty duration v:0 \$inputVideo1`
width1=`streamProperty width v:0 \$inputVideo1`
height1=`streamProperty height v:0 \$inputVideo1`
widthXHeight1=${width1}x${height1}

pixelFormat2=`streamProperty pix_fmt v:0 \$inputVideo2`
duration2=`streamProperty duration v:0 \$inputVideo2`
width2=`streamProperty width v:0 \$inputVideo2`
height2=`streamProperty height v:0 \$inputVideo2`
widthXHeight2=${width2}x${height2}

# exit if fade time is bigger than one of the video durations
validateFadeSeconds()    {
    echo "$1" | awk '{ if ($1 > '$fadeSeconds') print "true"; else print "false" }'
}
[ `validateFadeSeconds \$duration1` != "true" -o `validateFadeSeconds \$duration2` != "true" ] && {
    echo "Both video durations ($duration1, $duration2) must be bigger than fade time ($fadeSeconds)!" >&2
    exit 2
}

# exit if videos are of different format
[ "$pixelFormat1" != "$pixelFormat2" -o "$widthXHeight1" != "$widthXHeight2" ] && {
    echo "Videos can not be combined, pixelformats: $pixelFormat1 - $pixelFormat2, dimensions: $widthXHeight1 - $widthXHeight2" >&2
    exit 3
}
echo "widthXHeight = $widthXHeight1, pixelFormat = $pixelFormat1"

# calculate the time when fade-out of first video starts
startFadeOut=`echo \$duration1 | awk '{ print $1 - '\$fadeSeconds' }'`
echo "fadeSeconds = $fadeSeconds, duration1 = $duration1, startFadeOut = $startFadeOut, duration2 = $duration2"

# join videos
ffmpeg -v error -y \
    -i $inputVideo1 -i $inputVideo2 \
    -filter_complex "\
        [0:v] fade=t=out:st=$startFadeOut:d=$fadeSeconds:alpha=1 [video1];\
        [1:v] fade=t=in: st=0:            d=$fadeSeconds:alpha=1, setpts=PTS-STARTPTS+$startFadeOut/TB [video2];\
        [video1][video2] overlay [resultVideo];\
        [0:a][1:a] acrossfade=d=$fadeSeconds:overlap=1 [resultAudio]" \
    -map "[resultVideo]" \
    -map "[resultAudio]" \
    $outputVideo || exit $?
    
echo "Created file $outputVideo"

I won't explain the whole script, because most parts are just about argument checking and avoiding usage errors, like all software needs to have. Moreover I want to focus on ffmpeg -filter_complex that creates the transition, and how to read the contained filtergraph. This starts on line 75.

ffmpeg Command

Here is the part that I want to document. The lines of the command are concatenated through the trailing backslash ("\"), which is the newline-escape for the UNIX shell. The line breaks are necessary to keep ffmpeg commands readable, as are the blanks inside the filter_complex filtergraph. When executed, the whole command will be expanded to one single line, and the $variables will be substituted into it.

 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
ffmpeg -v error -y \
    -i $inputVideo1 -i $inputVideo2 \
    -filter_complex "\
        [0:v] fade=t=out:st=$startFadeOut:d=$fadeSeconds:alpha=1 [video1];\
        [1:v] fade=t=in: st=0:            d=$fadeSeconds:alpha=1, setpts=PTS-STARTPTS+$startFadeOut/TB [video2];\
        [video1][video2] overlay [resultVideo];\
        [0:a][1:a] acrossfade=d=$fadeSeconds:overlap=1 [resultAudio]" \
    -map "[resultVideo]" \
    -map "[resultAudio]" \
    $outputVideo || exit $?

Line 75 starts ffmpeg with some common options: -v error reduces the logging output to errors, -y makes ffmpeg overwrite any existing output file without interactive confirmation.

Line 76: the -i options list the two input video files.

Line 77 opens a filtergraph specification by the -filter_complex option. This graph is enclosed into double quotes, because it may contain shell meta-characters. Nevertheless $variables will be inserted also here by the shell.

Line 78 references the 1st video-stream inside $inputVideo1 as [0:v]. Read [0:v] as "stream from first (0) input file of type video (v)", this is called stream specifier. The stream is pushed into a filter called "fade", after the "=" its parameters are given:
The "t" parameter is the fade's "type", in this case a fade-out.
The "st" parameter gives the "starttime" when to apply the filter.
The "d" parameter gives the "duration" of the fade-out.
The "alpha" parameter "1" tells the video to fade just the transparency (= alpha channel).
Finally the filtered stream is named [video1], which is an arbitrary internal label.

Line 79 references the 1st video-stream inside $inputVideo2 as [1:v]. It does the same as line 78, but sets the fade-type to "in", starting at begin ("0"). After the "," the "setpts" filter takes over. It shifts the timestamps of all frames relatively to the beginning of the fade-out of the first video.
Inside this calculation, "PTS" is the presentation timestamp of any frame, "STARTPTS" is the presentation timestamp of the video's first frame, and "TB" is the time base of the timestamps, most likely 1.
The resulting stream finally is named [video2].

Line 80 takes the streams [video1] and [video2] and combines them using the "overlay" filter. The result is called [resultVideo]. The video part is ready now for mapping into the output file, but audio is missing.

Line 81 references the 1st audio-stream inside $inputVideo1 as [0:a] and the one inside $inputVideo2 as [1:a]. It joins them using the "acrossfade" filter ("a" for audio).
The duration of the crossfade is given in "d" parameter.
The "overlap" (or "o") parameter says that the streams should overlap. This is not really necessary, because the default is overlap, but defaults change sometimes.
The audio result is labeled as [resultAudio].

Line 82 and
Line 83 map the streams labeled [resultVideo] and [resultAudio] into the output file, in that order, meaning video will be the first stream in output and audio the second.

Line 84 names the output file. If the whole ffmpeg command fails, the script would now exit with the exit-code of ffmpeg (due to the "||" indirection that is executed only when the preceding command failed).

Conclusion

ffmpeg bears all the hallmarks of hackware, but is surprisingly comprehensive and flexible. What is missing is a use-case-oriented documentation. The filtergraph specifications are really hard to read, thus errors inside them are difficult to find. It took me two days to get into that again after two months since my last Blogs about ffmpeg, and find out how I can generate crossfade transitions including audio.

There are lots of forum entries about simple video manipulations, but for transitions I was more or less left alone with the tool documentation. Also I noticed a kind of "garbage symptom" that you often find in CSS forum entries too: developers deliver lots of code in their examples that is actually not needed, but you need to understand that garbage to find out whether it is meaningless or not. In case of ffmpeg this really takes time.

I won't use crossfade transitions for my private video production, because that conversion takes too much time. I turned to ffmpeg because I wanted to see video cut results quickly. For any other case, OpenShot is a sufficient graphical video editor.




Keine Kommentare: