Linux Notes: Real world problems and solutions

  1. The information presented here is intended for educational use by qualified computer technologists.
  2. The information presented here is provided free of charge, as-is, with no warranty of any kind.
  3. Edit: 2024-06-02

Real-world Linux Problems

1) We cannot install or update via YUM

We employ two CentOS-7 platforms: one for development and one for production (comment: two platforms may not be enough when using Linux; see the IBM-Managed warning after this section). The recommended approach is to first install (or update) software on the development box. If testing for the next few days (to weeks) proves that everything is working properly, then we would repeat the procedure on the production box. This also keeps both platforms more-or-less in sync.

I wanted to install the tree utility so I logged onto our DVLP platform where I entered this command:

sudo yum install tree

... which worked properly.

Then I repeated this command on our PROD platform which failed with numerous errors associated with file /usr/libexec/urlgrabber-ext-down which is a python script. What was worse was this: you could not execute firewall-cmd or most yum commands including yum check-update. Investigating further, I noticed that someone had installed python3 then updated the symbolic link so that typing a python command pulls up python3 rather than python2 (most Linux administrator utilities in 2018 require Python 2.7)

There are only two ways out of this problem (remember that this is an active business system).

  1. modify the first line (the shebang or sharp-bang line) of broken system scripts from this #!/usr/bin/python to this #!/usr/bin/python2
    -or-
  2. modify the symbolic link for "python" and point it back at the symbolic link "python2" which already points to "python2.7". This will restore the system to its previous functionality but could break something if newer customer scripts required python3 to be the default. So before modifying the symbolic link, modify the shebang line of customer scripts like this:
    #!/usr/bin/python to this #!/usr/bin/python3

Tip: if this is an emergency, just make minimal changes. For now, just modify the scripts for whatever is broken (eg. yum or firewall-cmd). But you will eventually need to put everything back to a pristine state. If your stuff needs python3 then you need to rely upon shebang. I have no idea why the Linux maintainers didn't do this for their scripts requiring Python2. They broke their own golden rule

# 1) partial example of a system with two versions of python
# 2) notice that "python" is pointing to "python2"
# 3) notice that "python2" is pointing to "python2.7"
# 4) utilities requiring python2 (like yum and firewall-cmd)
#    should say so on the shebang line of those scripts
#
$ cd /usr/bin
$ ls pytho* -la
lrwxrwxrwx. 1 root root     7 Jan 12 15:25 python -> python2
lrwxrwxrwx. 1 root root     9 Dec 20  2016 python2 -> python2.7
-rwxr-xr-x. 1 root root  7136 Nov  5  2016 python2.7
lrwxrwxrwx. 1 root root     9 Apr 12  2017 python3 -> python3.4
-rwxr-xr-x. 2 root root 11312 Jan 17  2017 python3.4
...snip...

2) Never use a graphical console to update Linux

I have experienced several instances where updating software though the graphical interface fails for some reason, then breaks the graphical interface (or the whole system). It should not surprise anyone that updating the gnome-session, or any of its dependencies, might disturb the very session that is running yum or rpm

For this reason I recommend doing updates from the command line over the network

However, if you must work from the console device, and want to move to non-graphical session console before running yum or rpm, then try one of the following keystrokes:

key press description notes
CTRL ALT F1 switch to terminal 1 (graphical interface) only graphical when runlevel >= 5
CTRL ALT F2 switch to terminal 2 (/dev/tty2) text only
CTRL ALT F3 switch to terminal 3 (/dev/tty3) text only
CTRL ALT F4 switch to terminal 4 (/dev/tty4) text only
CTRL ALT F5 switch to terminal 5 (/dev/tty5) text only
CTRL ALT F6 switch to terminal 6 (/dev/tty6) text only

The only other way to safely disable graphics is to lower the runlevel of your system to 3. (but only do this if you are certain that you won't kill some process currently needed by your customers).

update: Even though CentOS-7 does not use "/etc/inittab", and the text notes contained within say to do everything with systemctl, the following commands worked for me from the console as well as a network connection:

$ runlevel	# display current run level
runlevel N 5
$ init 3	# console #1 switches over to text mode
$ runlevel
runlevel 5 3
$ init 5	# console #1 switches back to graphics mode
$ runlevel
runlevel 3 5
Caveat: never init to a number below 3 over the network because that will kill networking which means you WILL NOT be able to restore runlevel remotely

3) Using Windows to access a Linux remote

The self-help blogs really fall down on this one because the only secure way to do this is to tunnel x-sessions over SSH. But whenever anyone on a self-help blog asks how to do this only using SSH, some idiot will chime in with a procedure on how to do it using VNC, RealVNC, TigerVNC or Vino which are all insecure.

To make matters worse, setting up a remote graphical session is almost impossible (at least under certain circumstances like Windows -> Red-Hat/CentOS) because GNOME3 contains 3-d extensions not found in Windows clients. The best way out of this is to setup Red-Hat/CentOS on a machine at the client end then use it to connect to the desired Linux platform.
 
comment: some conspiracy-minded people think this change was deliberately done to stop support professionals from from using Windows as their default platform to support all others. The might be correct.

Xming

CygWin and CygWin/X

4) Recovering a failed YUM update (2018-01-xx)

5) a recent YUM update broke our development box (2018-01-xx)

Caveat: the procedure just given will only fix the OpenSSL CLI. Note that msodbcsql will still be broken because that software calls routines in the shared libraries. To fix msodbcsql you (supposedly) need to do one of the following:
  1. fully install an older version of OpenSSL (libraries and all) in a secondary location then ensure all scripts invoking sqlcmd look there
    • I built an older version of OpenSSL from source code then installed it in /opt/oldopenssl
    • all scripts starting sqlcmd first define LD_LIBRARY_PATH to point to /opt/oldopenssl/lib
    • although strace proves that msodbcsql is first looking in a secondary location, msodbcsql still does not work
  2. completely replace the new version of OpenSSL (libraries and all) with an older version
    • playing with yum downgrade openssl* has not yet worked but I think I may be close
  3. reinstall the previous OS (CentOS-7.2 in this case)

6) a recent update broke our production box (2018-06-xx)

One of our developers was experiencing problems developing a new LDAP-based application. So he invoked YUM to update LDAP on our production box. The big problem here is that the update was done in a careless way (without reading all the release notes). So the LDAP update also updated OpenSSL for the whole system so now we can no longer connect to that older Microsoft platform in Montreal. (see: this previous note)

It now appears that we will need to install a third (older) CentOS platform whose only purpose would be to reach through to the older Microsoft platform. This platform would need to be modified so that it could never been updated.

7) Something is overwriting file "/etc/resolv.conf"

This problem is so weird that I'll stick to bullet points

8) Our main console is totally dead (2019-08-xx)

We are running CentOS-7.2 on two HP-ML370-g5 servers (one PROD, one DVLP) and both have been running for 30 and 24 months respectively without a reboot. These are older hardware platforms so I have been preparing to cut the whole thing over to to two newer servers (HP-DL385p-gen8) next month. I just noticed I can't access the console on PROD.

command result  
CTRL ALT F1 screen turns solid blue should be GUI mode
CTRL ALT F2 screen turns solid green should be text mode
CTRL ALT F3 screen turns sold green should be text mode

I should mention that we can do anything else we want via a remote ssh terminal session over the network. In fact, the customers are unaware of fact that anything is wrong.

I've tried everything (short of rebooting) including replacing the monitor and restarting various services (eg. "systemctl restart gdm.service") but it seems that the VGA port is locked up somehow.

SUGGESTION: every system admin must ensure that every system has at least one external network port configured -AND- that the firewall has been configured to permit ssh2 connections so you be able to manage your platform if your VGA console is FUBARed.

9) The old system won't reboot but some files are needed (2019-09-xx)

This is a continuation of item-8 after a 1 month delay. Okay so the good news is this: we have acquired a replacement server and copied all necessary files to it. Since everything appeared to be running properly on the new server, I finished the day by rebooting the old server to see if my VGA port was still broken. The VGA port was not defective but now the system only boots part way then drops into emergency text mode offering a few useless before presenting a root password prompt. Sometime later I got a call saying "we missed some files on the old server". Oops! So I tried rebooting again:

How To Mount a USB stick (thumb drive) without GUI support
(also works with a 1-TB drive on a USB cable)

10) Now it will boot after this fix (2019)

This is a continuation of items-8-9. I messed around with the old server (~ 30 minutes each day) following the on-screen suggestions after the GUI drops back to text mode during boot. Here is one of the messages presented to me:

Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)

I began Googling various pieces of the above phrase including "(g-io-error-quark, 1)" which took me to this link at askubuntu.com (even though this is a CentOS problem). That article implicates erroneous entries in "/etc/fstab". Apparently any mount failure during boot is considered fatal even though the basic root directory ( "/" ) is in good shape. So I used a text editor to disable my last line of "/etc/fstab" then rebooted. The system came right up.

p.s. that one line I disabled in fstab was pointing at a disk which had been unmounted and deslotted shortly after the first boot 30 months ago.

Caveat: I have seen one situation where log files written to files under "/var/log" had filled the associated partition (some files like wtmp and others under "/var/log/gdm" can grow forever if your system hasn't been rebooted for a while). Type "df -h" to inspect disk free space. If near full, and if an emergency, you might consider deleting some of the larger log files before you reboot. Use this command to display files larger than 2MB

find /var/log -size +2M -exec ls -la {} \; 

11) Doing better backups for faster system recovery (2019-09-xx)

New Block Diagram

 PROD  (Linux)    DVLP  (Linux)    OpenVMS  OpenVMS
+-------------+  +-------------+  +-------+ +------+
| primary     |  | primary     |  | PROD  | | DVLP | Level-1
+-------------+  +-------------+  +-------+ +------+

+-------------+  +-------------+
| local copy  |  | local copy  |                     Level-2
+-------------+  +-------------+

+-------------+  +-------------+
| remote copy |  | remote copy |                     Level-3
+-------------+  +-------------+ 

12) Python3 caching is currently broken on most Linux distros running SELinux (2019-10-xx)

caveat: this problem covers web applications using Python3 directly (i.e. when not using Django or WSGI)

update: on 2020-05-13 this patch was available from the CentOS repositories: libselinux-python3.x86_64 (ver2.5-15.el7)

13) FUBAR with USB Audio (2019-11-xx)

test #1

test #2

comments:

14) One LVM volume is too big, the other too small

Preliminary Steps

Next Steps

15) Yum is failing to initialize on one system of six (2020-09-xx)

I am running 6 servers (one PROD, one DVLP, two local shadows, two remote shadows) but YUM is failing to initialize on the oldest unit (both PROD and DVLP were built in 2018 as CentOS-7.5 then YUM updated to CentOS-7.6)

Now inspect the following (pay attention to the red text - especially the last one just before the final prompt)
[root@kawc0f /]# yum makecache fast
Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors
epel/x86_64/metalink | 16 kB 00:00:00 Could not retrieve mirrorlist ...
... https://mirrors.iuscommunity.org/mirrorlist?repo=ius-centos7&arch=x86_64&protocol=http
error was 14: HTTPS Error 404 - Not Found
             
One of the configured repositories failed (Unknown), and yum doesn't have enough cached
data to continue. At this point the only safe thing yum can do is fail. There are a few
ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is
most often useful if you are using a newer distribution release than is supported by the
repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=<repoid> ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just
ignore the repository until you permanently enable it again or use --enablerepo for
temporary usage: yum-config-manager --disable <repoid> or subscription-manager repos --disable=<repoid> 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will
try to contact the repo. when it runs most commands, so will have to try and fail each
time (and thus. yum will be be much slower). If it is a very temporary problem though,
this is often a nice compromise: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true Cannot find a valid baseurl for repo: ius/x86_64 [root@kawc0f /]#

Now Google the red quoted phrase "Cannot find a valid baseurl for repo: ius/x86_64" which returns hits like this:
https://github.com/iusrepo/announce/issues/18
where we learn that many of the repositories have moved from the ".org" domain to the ".io" domain. Apparently support for ".org" ended in April-2020.
Back in the day you would need to execute something like this:
yum install -y https://centos7.iuscommunity.org/ius-release.rpm
but this fixed my problem:
yum install -y https://repo.ius.io/ius-release-el7.rpm
Caveat: the ius-release repo was installed by a newbie in order to install python36 on CentOS7. Here is a better way to solve this problem:
sudo yum erase ius-release
sudo yum install epel-release
sudo yum clean all
sudo yum makecache
because the epel-release repo is much more up-to-date

16) The future of CentOS (the invisible hand of the market?)

Comments:

17) All four GUI consoles are locked up

caveat: all work here was done from a non-GUI session (usually a network connection)

going forward

18) Finally solved a slow-response problem (2021-06-18)

19) Updating an old installation in stages (2021-10-xx)

caveat: this problem is on going (so the follow proposed solution is untested)

20) rpm hack to get mpack + munpack on CentOS-7

21) procmail problems with SELinux on CentOS-7

22) cannot resolve host names after a CentOS-7 upgrade

23) yum errors on two of four platforms (2022-06-06)

I have four identical platforms all running CentOS-7 (last updated 4 month ago. This is what I see when I execute 'yum update'
Note that 'yum clean' fails the same way

[root@kawc3v ~]# yum clean
error: rpmdb: BDB0113 Thread/process 23380/139868776736832 failed: BDB1507 Thread died in Berkeley DB library
error: db5 error(-30973) from dbenv->failchk: BDB0087 DB_RUNRECOVERY: Fatal error, run database recovery
error: cannot open Packages index using db5 -  (-30973)
error: cannot open Packages database in /var/lib/rpm
CRITICAL:yum.main:

And here's the fix:

mv /var/lib/rpm/__db* /tmp/
rpm --rebuilddb
yum clean all       

comment: 'rpm --rebuilddb' is undocumented under 'man rpm' so look for it under 'man rpmdb' and use it sparingly

24) Run an old  32-bit program on WINE (2022-08-06)

Okay so this is a weird problem which might not affect too many others.

25) Can't restart Apache with new certificate files (2022-08-27)

26) Cannot install packages into python-3.6.8 (2022-08-29)

Facts: Here are two scripts for two different python implementations on the same platform:
#!/usr/bin/bash
# title  : pip36_install_p1.sh
# author : Neil Rieck
# created: 2022-09-08
python3.6 -m pip install ${1} \
 --index https://pypi.org/simple \
 --trusted-host pypi.org \
 --trusted-host files.pythonhosted.org \
 --proxy http://neilrieck.net:8083
#!/usr/bin/bash
# title  : pip39_install_p1.sh
# author : Neil Rieck
# created: 2022-09-08
python3.9 -m pip install ${1} \
--trusted-host pypi.org \
--trusted-host files.pythonhosted.org \
--proxy https://neilrieck.net:8083
 

27) Need a second version of python3 (2022-09-10)

Today I need to write a Python SFTP program based upon pysftp which is based upon paramiko but I cannot install paramiko on python-3.6.8 (I am seeing deprecated library alerts indicating that 3.6.8 is too old).

27a) Installing python-3.8.13

 CentOS-7.9 doesn't offer a better python via rpm so I tried an easy install of python-3.8.13 as seen here

1) sudo yum install centos-release-scl
2) sudo yum list rh-python3\*
3) sudo yum install rh-python38
4) scl enable rh-python38 bash
5) python3 --version
Python 3.8.13

Notes:

  1. most yum installs place python under /usr/bin
  2. this yum install placed python38 under /opt
    • which is why you need to execute step-4 every time any process wants to use this version of python-3.8.13
    • I never tried executing the interpreter directly from /opt (as you might do from within a shebang) but it could work
  3. I could not install paramiko in this version of python so I used yum to remove it

27b) Building python-3.9-13 from source code

This works -AND- I can now install paramiko

========================================================================================
title  : python3.x_build_on_centos7.txt
author : Neil Rieck
created: 2022-08-31
edit   : 2022-09-08 
links  :
1) https://www.python.org/downloads/source
2) https://docs.python.org/3/using/unix.html      (<<< how to build)
3) https://docs.python.org/3/using/configure.html (<<< how to build)
notes  :
1) this is an experimental build of python-3.9.13 on CentOS-7
2) Hopefully pip for python-3.9.13 works better with our corporate proxy server
3) python-3.9.13 allows me to install paramiko while python-3.6.8 does not 
========================================================================================
file linkage BEFORE the install:

[neil@kawc4n ~]$ ls -lad /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Jun  5 08:56 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Jun  5 08:56 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7144 Nov 16  2020 /usr/bin/python2.7
-rwxr-xr-x. 1 root root  1835 Nov 16  2020 /usr/bin/python2.7-config
lrwxrwxrwx. 1 root root    16 Jun  5 08:56 /usr/bin/python2-config -> python2.7-config
lrwxrwxrwx. 1 root root     9 Jun  7 07:13 /usr/bin/python3 -> python3.6
-rwxr-xr-x. 2 root root 11328 Nov 16  2020 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11328 Nov 16  2020 /usr/bin/python3.6m
lrwxrwxrwx. 1 root root    14 Jun  5 08:56 /usr/bin/python-config -> python2-config
[neil@kawc4n ~]$

file linkage AFTER the install:

[neil@kawc4n ~]$ ls -lad /usr/bin/python*
lrwxrwxrwx. 1 root root     7 Jun  5 08:56 /usr/bin/python -> python2
lrwxrwxrwx. 1 root root     9 Jun  5 08:56 /usr/bin/python2 -> python2.7
-rwxr-xr-x. 1 root root  7144 Nov 16  2020 /usr/bin/python2.7
-rwxr-xr-x. 1 root root  1835 Nov 16  2020 /usr/bin/python2.7-config
lrwxrwxrwx. 1 root root    16 Jun  5 08:56 /usr/bin/python2-config -> python2.7-config
lrwxrwxrwx. 1 root root     9 Jun  7 07:13 /usr/bin/python3 -> python3.6
-rwxr-xr-x. 2 root root 11328 Nov 16  2020 /usr/bin/python3.6
-rwxr-xr-x. 2 root root 11328 Nov 16  2020 /usr/bin/python3.6m
-rwxr-xr-x. 1 root root 16328 Sep  7 16:09 /usr/bin/python3.9
-rwxr-xr-x. 1 root root  3073 Sep  7 16:12 /usr/bin/python3.9-config
lrwxrwxrwx. 1 root root    14 Jun  5 08:56 /usr/bin/python-config -> python2-config
[neil@kawc4n ~]$ 
========================================================================================
steps:

1 ) sudo yum install epel-release
2 ) sudo yum update
3 ) sudo yum -y groupinstall "Development Tools"
4a) sudo yum -y install openssl-devel bzip2-devel libffi-devel xz-devel	# required
4b) sudo yum -y install openssl11-devel # required >= python3.10 5 ) gcc --version 6a) # get source file via pc <-----------------------------------------------+- pick one
visit: https://www.python.org/downloads/ | then download file Python-3.9.13.tgz to your pc | then ftp it to the server | 6b) # get source file via wget <---------------------------------------------+
https_proxy=https://neilrieck.net:8083 \
wget https://www.python.org/ftp/python/3.9.13/Python-3.9.13.tgz 7 ) tar xvf Python-3.9.13.tgz 8 ) cd Python-3.9*/ 9a) # recipe-a (generates a large binary) <----------------------------------+- pick one # caveat: the default destination is: '/usr/local/bin/python3.9' | # but the next line will put it in: '/usr/bin/python3.9' | ./configure --prefix=/usr --enable-optimizations | tee nsr_39_step1.txt | 9b) # recipe-b (generates a small binary) <----------------------------------+
# caveat: the default destination is: '/usr/local/bin/python3.9' # but the next line will put it in: '/usr/bin/python3.9' sudo ./configure --prefix=/usr --enable-optimizations --enable-shared \
| tee nsr_39_step1.txt # sudo ldconfig (do this step after the make step) 10) # caveat: altinstall allows multiple versions of python to coexist # install does not allow multiple versions of python to coexist # so type carefully sudo make altinstall | tee nsr_39_step2.txt 11) # optional (required if you enabled shared libraries) sudo ldconfig ========================================================================= 20) # test executable python3.9 exit() 21) # display our packages python3.9 -m pip list --trusted-host pypi.org Package Version ---------- ------- pip 22.0.4 setuptools 58.1.0 22) # upgrade pip sudo python3.9 -m pip install --upgrade pip \ --trusted-host pypi.org \ --trusted-host files.pythonhosted.org \ --proxy https://neilrieck.net:8083 23) # display our packages (again) python3.9 -m pip list --trusted-host pypi.org Package Version ---------- ------- pip 22.2.2 <<< setuptools 58.1.0

28) DNF tweak for Rocky Linux (2022-11-21)

For CentOS-7 systems sitting behind a corporate proxy server, you only need to add one line to file /etc/yum.conf
proxy=http://neilrieck.net:8083
For Rocky-8 systems sitting behind a corporate proxy server, you need to add two lines to file /etc/dnf/dnf.conf
proxy=http://neilrieck.net:8083
sslverify=false 

caveats:

29) Upgrading Rocky Linux 8 to Rocky Linux 9

30) Moving off anything tied to IBM

31) Cannot update AlamaLinux 8 via dnf

Caveat: this problem is not specific to AlmaLinux. On the day of this problem, it would also have affected Debian, Fedora, Kali, Ubuntu and possibly anyone using epel-release. BTW, I am sitting behind a corporate firewall and am working through a proxy server. The proxy server is security enhanced via bluecoat and other stuff.

Problem:
legend:
<ur> = user response
<sr> = system response
=======================================
<ur> sudo dnf clean all
<sr> ...stuff was deleted...
<ur> sudo dnf makecache
<sr> AlmaLinux 8 - BaseOS 48 kB/s | 14 kB 00:00 Error: Failed to download metadata for repo 'baseos': repomd.xml parser error: Parse error at line: 235 (Opening and ending tag mismatch: meta line 0 and head)
<ur> sudo tail -100 /var/log/dnf.librepo.log
<sr> ...100 lines are displayed...
2024-03-12T10:09:37-0400 INFO Downloading: https://mirrors.almalinux.org/mirrorlist/8/baseos 2024-03-12T10:09:37-0400 INFO Downloading: http://mirror.accuris.ca/almalinux/8.9/BaseOS/x86_64/os/repodata/repomd.xml 2024-03-12T10:09:37-0400 WARNING WARNING: Repomd xml parser: Unknown element "html"
<ur> [[[ interpret the results ]]]
<ur> attempt to fetch the file via "wget" or "curl"
<sr> received an html page from my employer's proxy server indicating blocked access to the destination
<ur> attempt to fetch the file via a browser
<sr> Destination site has been black-listed due to malware (was an error from Symantec)
Solution 1:
<ur> find /etc -iname almalinux\*repo		# where are the dnf repos? (could be dnf or yum)
<sr> ...
/etc/yum.repos.d/almalinux.repo # is one entry of many
...
<ur> cd /etc/yum.repos.d/ # change default directory
<sr> [root@123456 yum.repos.d]#
<ur> [[[ make backup copies of everything you intend to modify ]]]
vim almalinux.repo # edit the desired file(s)
[[[ comment out all lines beginning with "mirrorlist" ]]]
[[[ uncomment all associated lines beginning with "# baseurl" ]]]
[[[ consider changing each baseurl to a nearby mirror:
I used this nearby-to-me site:
https://mirror.csclub.uwaterloo.ca/almalinux/ ]]]

32) Do not jump to EL9 to soon (2024-05-30)

Moved to my  Enterprise Linux page


Back to Linux Notes
Back to Home
Neil Rieck
Waterloo, Ontario, Canada.