When using my video-scripts I came across a subtle problem with awk pipe statements. If you forget to close a pipe-command, the next one may deliver wrong results under certain circumstances (which makes this problem subtle).
This is about GNU Awk 5.0.1, API: 2.0 (GNU MPFR 4.0.2, GNU MP 6.2.0) on Ubuntu 20.04.1 with LINUX 5.4.0-59.
AWK is an ancestor of perl and a great tool for quick data interpretation, better than perl because simpler. But, like most script languages, it has its peculiarities that may cause undetected bugs. In my case a video length was reported to be too short to contain a given timestamp, which was not true, so I had to check the responsible awk-script.
AWK Pipe Statement
Example:
file = "a.txt" sizeCommand = "stat --printf='%s' " file sizeCommand | getline size print "Size of " file " is " size
This code fetches the size of the file a.txt
through an external command
that is piped into getline
to read the first line of its output.
The GNU documentation
puts a close()
immediately after the pipe statement. So the correct form would be:
.... sizeCommand | getline size close(sizeCommand) ....
→ In case you forget to close()
, you may experience strange results!
Problem Reproduction
Here is a reproduction of what I encountered. You need two text files:
- a.txt contaning the single character
'a'
(size = 1), and - ab.txt contaning
'ab'
(size = 2).
Then put following AWK script pipe-statement-problem.awk
into the same directory and make it executable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | #!/usr/bin/awk -f BEGIN { files[0] = "a.txt" files[1] = "ab.txt" files[2] = "a.txt" sizes[0] = 1 sizes[1] = 2 sizes[2] = 1 for (i in files) { externalCommand = "stat --printf='%s' " files[i] externalCommand | getline size print "size of " files[i] " = " size if (size != sizes[i]) print "ERROR in size of " files[i] ", should be " sizes[i] } } |
The shebang
#!/usr/bin/awk -f
in first line
tells the UNIX-shell to use /usr/bin/awk for execution.
As no data are processed by this script,
everything happens in the BEGIN
rule
that is executed on script start.
The script builds two arrays, one for file names and one for the expected sizes of these files.
The for-loop opens all files in the array,
which are a.txt, ab.txt, and again a.txt,
and fetches their sizes.
An ERROR message is printed if the size doesn't match what was expected.
This script doesn't make any practical sense,
but it is something like a unit test for the pipe-statement.
Output is ('$' is the UNIX command prompt):
$ pipe-statement-problem.awk
size of a.txt = 1
size of ab.txt = 2
size of a.txt = 2
ERROR in size of a.txt, should be 1
The error happens when, once again, executing the pipe for file a.txt
.
For some reason awk then delivers the size of file ab.txt
,
which is the one that preceded this pipe-statement.
Fix: Close Any Pipe!
The bug can be fixed by inserting a close()
immediately after the pipe-statement
on line 14:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | #!/usr/bin/awk -f BEGIN { files[0] = "a.txt" files[1] = "ab.txt" files[2] = "a.txt" sizes[0] = 1 sizes[1] = 2 sizes[2] = 1 for (i in files) { externalCommand = "stat --printf='%s' " files[i] externalCommand | getline size close(externalCommand) # MUST close any of such statements! print "size of " files[i] " = " size if (size != sizes[i]) print "ERROR in size of " files[i] ", should be " sizes[i] } } |
Running the fixed script you see:
$ pipe-statement-problem.awk
size of a.txt = 1
size of ab.txt = 2
size of a.txt = 1
This is the right output.
Now the size of file a.txt
has been read correctly.
Conclusion
When I got to know the AWK pipe-statement, I didn't even know that you can (or must) close it.
The resulting problems may stay undetected a long time, because there is no
warning and no error message, simply the result of getline
is wrong.
I didn't find out why the result is always that of the preceding pipe, and
why it happens only when repeating a pipe that was already executed once.
Keine Kommentare:
Kommentar veröffentlichen