fritzthecat-blog

Blog-Archiv

Freitag, 11. Oktober 2024

Upgrade to Ubuntu 24.04 with awk Problem

It started with mouse problems. The mouse wasn't where I expected it to be, movements were delayed or ignored, the mouse jumped to some edge or corner unexpectedly. At some point in time the mouse wasn't moving any more and the pointer disappeared, I had to plug the cable into another USB port to be able to continue. But after some time the same happened on the new USB port. I saw messages like the following following in my LINUX system-log of Ubuntu 22.04:

$ dmesg -wT    # polls for new kernel messages
...
[Mo Okt  7 20:51:32 2024] usb 1-1.3: USB disconnect, device number 17
[Mo Okt  7 20:51:32 2024] usb 1-1.3: new low-speed USB device number 18 using ehci-pci
[Mo Okt  7 20:51:33 2024] usb 1-1.3: New USB device found, idVendor=045e, idProduct=0040, bcdDevice= 3.00
[Mo Okt  7 20:51:33 2024] usb 1-1.3: New USB device strings: Mfr=1, Product=3, SerialNumber=0
[Mo Okt  7 20:51:33 2024] usb 1-1.3: Product: Microsoft 3-Button Mouse with IntelliEye(TM)
[Mo Okt  7 20:51:33 2024] usb 1-1.3: Manufacturer: Microsoft
[Mo Okt  7 20:51:33 2024] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/pci0000:00/0000:00:1d.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0040.000E/input/input27
[Mo Okt  7 20:51:33 2024] hid-generic 0003:045E:0040.000E: input,hidraw1: USB HID v1.10 Mouse [Microsoft Microsoft 3-Button Mouse with IntelliEye(TM)] on usb-0000:00:1d.0-1.3/input0
[Mo Okt  7 20:51:34 2024] usb 1-1.3: USB disconnect, device number 18
[Mo Okt  7 20:51:34 2024] usb 1-1.3: new low-speed USB device number 19 using ehci-pci
[Mo Okt  7 20:51:34 2024] usb 1-1.3: device descriptor read/64, error -32
[Mo Okt  7 20:51:34 2024] usb 1-1.3: New USB device found, idVendor=045e, idProduct=0040, bcdDevice= 3.00
[Mo Okt  7 20:51:34 2024] usb 1-1.3: New USB device strings: Mfr=1, Product=3, SerialNumber=0
[Mo Okt  7 20:51:34 2024] usb 1-1.3: Product: Microsoft 3-Button Mouse with IntelliEye(TM)
[Mo Okt  7 20:51:34 2024] usb 1-1.3: Manufacturer: Microsoft
[Mo Okt  7 20:51:35 2024] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/pci0000:00/0000:00:1d.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0040.000F/input/input28
[Mo Okt  7 20:51:35 2024] hid-generic 0003:045E:0040.000F: input,hidraw1: USB HID v1.10 Mouse [Microsoft Microsoft 3-Button Mouse with IntelliEye(TM)] on usb-0000:00:1d.0-1.3/input0
[Mo Okt  7 20:53:09 2024] usb 1-1.3: reset low-speed USB device number 19 using ehci-pci
[Mo Okt  7 20:54:27 2024] usb 1-1.3: USB disconnect, device number 19
[Mo Okt  7 20:54:27 2024] usb 1-1.3: new low-speed USB device number 20 using ehci-pci
[Mo Okt  7 20:54:27 2024] usb 1-1.3: New USB device found, idVendor=045e, idProduct=0040, bcdDevice= 3.00
[Mo Okt  7 20:54:27 2024] usb 1-1.3: New USB device strings: Mfr=1, Product=3, SerialNumber=0
[Mo Okt  7 20:54:27 2024] usb 1-1.3: Product: Microsoft 3-Button Mouse with IntelliEye(TM)
[Mo Okt  7 20:54:27 2024] usb 1-1.3: Manufacturer: Microsoft
[Mo Okt  7 20:54:27 2024] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/pci0000:00/0000:00:1d.0/usb1/1-1/1-1.3/1-1.3:1.0/0003:045E:0040.0010/input/input29
[Mo Okt  7 20:54:27 2024] hid-generic 0003:045E:0040.0010: input,hidraw1: USB HID v1.10 Mouse [Microsoft Microsoft 3-Button Mouse with IntelliEye(TM)] on usb-0000:00:1d.0-1.3/input0
...

I think that means some interrupt takes too long, and the kernel decides to reset the device. Is this a LINUX kernel fault or a hardware fault? Hints on the web suggested that this happens due to a broken mouse. Next day I bought another mouse, trying it out the bug was gone. So it was a mouse fault. Anyway, I decided to upgrade my Ubuntu to 24.04.

The invitation dialog to upgrade had appeared several times already, and I started with an updated using the Software Updater user-interface. In that GUI there was no invitation to upgrade. After a reboot the GUI finally told me that there are no more updates, but I can upgrade. Generally I don't like that difference, why not do everything by updates? However, I clicked "Upgrade" and, after some time, was told that this is not possible, I should do a "partial upgrade" instead. I clicked it, but nothing happened for minutes. I did another reboot and repeated what I had done before, now a dialog told me that there is not enough disk space for the upgrade. There was 20% free on my 24 GB root partition.
The cleanup took me several hours, especially the following command helped a lot:

$ sudo journalctl --vacuum-time=3d   # deletes log-files, keeps just 3 days backwards

Also very useful is the Disk Usage Analyzer user-interface. There you can delete files and folders directly through a context-menu. Clearing the caches of web-browsers is also a good idea. In my Eclipse installation there was a complete copy of a nodejs environment in plugin "language server", I hope Eclipse won't miss this.

I returned to the "partial upgrade" and now it ran. It took endless downloads and hours of package installs. Finally, after a reboot everything worked. The Ubuntu desktop was the same as 22.04, just the top right main menu changed layout and has more icon buttons than text items now. Unfortunately there are no tooltips for the icon buttons, so you have to try out each to see what it does. The Gear Wheel leads to "Settings" configurations.

But this was just the start of a not so entertaining afternoon that lasted late into the night. I have a self-written software to edit music scores, and this tool did not work any more. It is built on top of awk, and the execution of the awk-script crashed without a visible error message. I narrowed the bug down and saw following message finally:

free(): double free detected in tcache 2

Looking on the web for an explanation I found out that this appears when a C-programmer wrote code that releases the same memory-range more than one time (manually programmed memory management is one of the big weaknesses of the C language). That means awk crashed when executing my 2000 lines of awk-code. Normlly a core dump is generated in such cases, but I saw nothing in the current directory. Core dumps are now sent to Ubuntu via the apport utility, the web said. As this was not useful for my trouble shooting, I tried to find the file in system directories. The core dump itsef is a binary unreadable file in /var/lib/apport/coredump, but under Ubuntu you can look into the file /var/log/apport.log. Because this could be big you should open it with vi:

$ vi /var/log/apport.log
:q!

Page-Down and Page-Up keys should work, the ":q!" command is to terminate the editor without saving unintended changes. But this wasn't useful, I found just another confusing ERROR message:

is_closing_session(): no DBUS_SESSION_BUS_ADDRESS in environment

I did not follow this, as the DBUS_SESSION_BUS_ADDRESS variable was in the environment correctly. So there was nothing I could do. Awk 5.1.2 crashes with my big script, period. Can that be on a brand new released LINUX? I remembered that there was an issue with awk several years ago, and I had to put gawk in place of awk. Checking this I found out that they are the same now on Ubuntu 24.04:

$ ls -la /usr/bin/*awk
lrwxrwxrwx 1 root root     21 Jän 11  2015 /usr/bin/awk -> /etc/alternatives/awk
-rwxr-xr-x 1 root root 739840 Mär 31  2024 /usr/bin/gawk
-rwxr-xr-x 1 root root 170768 Apr  8  2024 /usr/bin/mawk
lrwxrwxrwx 1 root root     22 Jän 11  2015 /usr/bin/nawk -> /etc/alternatives/nawk

$ ls -la /etc/alternatives/*awk
lrwxrwxrwx 1 root root 13 Jän 14  2015 /etc/alternatives/awk -> /usr/bin/gawk
lrwxrwxrwx 1 root root 13 Jän 14  2015 /etc/alternatives/nawk -> /usr/bin/gawk

So I tried to use mawk, as it is the only alternative to the crashing awk and gawk. Bang, my script suddenly worked perfectly. This mawk is version 1.3.4, and it wasn't changed since 2020. Read about this "dialect problem" of awk on the web.

Awk is part of the UNIX operating-system, it is very old and has been heavily used. Somehow it is the worst nightmare for a software user to find out that it has to search a bug that an operating-system programmer has caused. Looks like awk is near the end of its life cycle. But the real problem lies in fact that so many different programming-languages and -environments are used (C, C++, C#, Java, .NET, Python, Rust, Perl, ...) and that they even spawn dialects like gawk, nawk and mawk, and even these dialects have versions and seem to be not downward compatible. The Tower of Babel unfolds:-(