Linux Notes: Real-world problems

  1. The information presented here is intended for educational use by qualified computer technologists.
  2. The information presented here is provided free of charge, as-is, with no warranty of any kind.
Edit: 2021-11-19

Real-world Linux Problems

1) We cannot install or update via YUM

We have two CentOS-7 platforms: one for development and one for production (comment: two platforms may not be enough when using Linux; see the IBM-Managed warning after this section). The recommended approach is to first install (or update) software on the development box. If testing for the next few days (to few weeks) proves that everything is working properly then we would repeat the procedure on the production box. This also keeps both platforms more-or-less in sync.

I wanted to install the tree utility so I logged onto our DVLP platform where I entered this command:

sudo yum install tree

... which worked properly.

Then I repeated this command on our PROD platform which failed with numerous errors associated with file /usr/libexec/urlgrabber-ext-down which is a python script. What was worse was this: you could not execute "firewall-cmd" or most yum commands including "yum check-update". Investigating further, I noticed that someone had installed python3 then updated the symbolic link so that typing a python command pulls up python3 rather than python2 (most Linux administrator utilities in 2018 require Python 2.7 and this has not changed in 2019 or 2020)

There are only two ways out of this problem (remember that this is an active business system).

  1. modify the first line (the shebang or sharp-bang line) of broken system scripts from this "#!/usr/bin/python" to this "#!/usr/bin/python2"
    -or-
  2. modify the symbolic link for "python" and point it back at the symbolic link "python2" which probably points to "python2.7". This will restore the system to its previous functionality but could break something if newer customer scripts required python3 to be the default. So before modifying the symbolic link, modify the shebang line of customer scripts like this:
    "#!/usr/bin/python" to this "#!/usr/bin/python3"

Tip: if this is an emergency, just make minimal changes. For now, just modify the scripts for whatever is broken (eg. yum or firewall-cmd). But you will eventually need to put everything back to a pristine state. If your stuff needs python3 then you need to rely upon shebang. I have no idea why the Linux maintainers didn't do this for their scripts requiring Python2. They broke their own golden rule

# 1 ) partial example of a system with two versions of python
# 2a) notice that "python" is pointing to "python2"
# 2b) notice that "python2" is pointing to "python2.7"
# 3 ) utilities requiring python2 (like yum and firewall-cmd)
#     should say so on the shebang line of those scripts
#
$ cd /usr/bin
$ ls pytho* -la
lrwxrwxrwx. 1 root root     7 Jan 12 15:25 python -> python2
lrwxrwxrwx. 1 root root     9 Dec 20  2016 python2 -> python2.7
-rwxr-xr-x. 1 root root  7136 Nov  5  2016 python2.7
lrwxrwxrwx. 1 root root     9 Apr 12  2017 python3 -> python3.4
-rwxr-xr-x. 2 root root 11312 Jan 17  2017 python3.4
...snip...

2) Never use a graphical console to update Linux

I have experienced several instances where updating software though the graphical interface fails for some reason then breaks the graphical interface (or the whole system). It should not surprise anyone that updating the gnome-session, or any of its dependencies, might disturb the very session that is running yum or rpm

So if you are on the graphical system console (which is almost always a VGA monitor) and want to move to non-graphical session console before running yum or rpm, try one of the following keystrokes:

key press description notes
CTRL ALT F1 switch to terminal 1 (the graphical interface) only auto-graphical when runlevel >= 5
CTRL ALT F2 switch to terminal 2 (/dev/tty2) text only
CTRL ALT F3 switch to terminal 3 (/dev/tty3) text only
CTRL ALT F4 switch to terminal 4 (/dev/tty4) text only
CTRL ALT F5 switch to terminal 5 (/dev/tty5) text only
CTRL ALT F6 switch to terminal 6 (/dev/tty6) text only

The only other way to safely disable graphics is to lower the runlevel of your system to 3. (but only do this if you are certain that you won't kill some process currently needed by your customers). Alternatively, use ssh to log into your system via the network then execute yum on that session.

update: Even though CentOS-7 does not use "/etc/inittab", and the text notes contained within say to do everything with systemctl, the following commands worked for me from the console as well as a network connection:

$ runlevel	# display current run level
runlevel N 5
$ init 3	# console #1 switches over to text mode
$ runlevel
runlevel 5 3
$ init 5	# console #1 switches back to graphics mode
$ runlevel
runlevel 3 5

Caveat:
never init to a number below 3 over the network because that will kill the network so you WILL NOT be able to restore runlevel remotely

3) Using Windows to access a Linux remote

The self-help blogs really fall down on this one because the only secure way to do this is to tunnel x-sessions over SSH. But whenever anyone on a self-help blog asks how to do this only using SSH, some idiot will chime in with a procedure on how to do it using VNC, RealVNC, TigerVNC or Vino which are all insecure.

To make matters worse, setting up a remote graphical session is almost impossible (at least under certain circumstances like Windows -> Red-Hat/CentOS) because GNOME3 contains 3-d extensions not found in Windows clients. The best way out of this is to setup Red-Hat/CentOS on a machine at the client end then use it to connect to the desired Linux platform.
 
comment: some conspiracy-minded people think this change was deliberately done to stop support professionals from from using Windows as their default platform to support all others. The might be correct.

Xming

  • xming is a simple tool which is used in conjunction with a terminal emulator like: Tera Term or PuTTY

CygWin and CygWin/X

  • documentation: https://x.cygwin.com/docs/
    • https://x.cygwin.com/docs/ug/setup.html (jump to chapter 2.15)
    • x11 documentation tends to only speak of servers at each end (not true client server as one would normally think).
    • On the client side you must do this:
      • start an x11-server
      • connect through to the far-end
      • start a client over there which will open one, or more, x11 sessions back to your local server
  • Caveat: Caveat: do not install cygwin/x without first reading the build instructions (a full build will not produce what you want and will waste your time and bandwidth)
  • connecting:
    action	: start your local x-server (on the start menu)
    	: task will appear on your horizontal task bar
    
    action	: start xterm
    	: it is one of the items associated with the x-server icon on the task bar
    
    type	: ssh -X name@fully-qualified-domain-name 
    	  (replacing "-X" with "-Y" is even better)
    action	: wait for the password prompt then enter it
    
    type	: xterm &
    action	: a new window should open
    
    type	: /gnome-weather &
    action	: a graphical weather app will pop up	
    
    type	: /gnome-session --disable-acceleration-check &
    action	: a new window manager should open (but does not)
    	: 1) the switch shown is only available in gnome starting with version 3.16
    	: 2) without it you may see something like this: 
    		"Oh no! Something has gone wrong.
    		A problem has occurred and the system can't recover.
    		Please log out and try again."
    	: 3) gnome3 requires advanced graphics so may never work this way;
    		try a desktop other than gnome 

4) Recovering a failed YUM update (2018-01-xx)

  • I recently had a yum update fail on the graphical interface (see note #2 above regarding why you should never do this)
  • YUM was in the middle of updating ~ 1200 items when the GUI collapsed. I waited 8-hours then rebooted. The system came up then attempted to start a graphics console which failed (could see an arrow cursor and nothing else)
  • I typed CTRL ALT F2 and was able to login as root (from here I reran YUM)

5) a recent YUM update broke our development box (2018-01-xx)

  • we use sqlcmd from msodbcsql (ms odbc sql) to access an old database (SQL Server 2005) running on an old OS (Windows Server 2003). The platform is located in another city and province; they never installed mandatory patches; I have no faith in them upgrading any time soon; we have been able to connect for 14-months.
  • a recent yum update on our development platform broke msodbcsql (the production platform continues to work)
  • checking logs on our production platform shows that we can only connect to the Windows box only using the sslv2 protocol ???
  • testing on PROD
    # notes:
    # 1) well-known port 1433 is reserved for Microsoft SQL Server
    # 2) SQL Server 2005 appears to support ssl2 but nothing higher
    # 3) contrary to popular belief, as soon as you specify a username
    # and password in your ODBC connect string (via sqlcmd) then
    # the initial handshake will be encrypted # # this passes # openssl s_client -debug -state -connect ip-address:1433 -ssl2 # # this fails # openssl s_client -debug -state -connect ip-address:1433 -ssl3
  • Platform Differences
    Platform CentOS Notes  OpenSSL version  OpenSSL Notes
    Production CentOS 7.3 (built 14-months ago) OpenSSL-1.0.1e ssl2 is supported
    Development CentOS 7.4 (yum updated 2018-01-29) OpenSSL-1.0.2k ssl2 has been disabled prior to build
  • The simplest way out (at this time) is to build a fully-functional new version of OpenSSL-1.0.2k (in a local folder) then then copy the binary to "/usr/bin" after renaming the old version (just being paranoid here). A good friend provided me with this link on the Ubuntu site (a Debian flavor) which seems to work properly on CentOS-7
  • https://wiki.openssl.org/index.php/Compilation_and_Installation
  • https://askubuntu.com/questions/893155/simple-way-of-enabling-sslv2-and-sslv3-in-openssl
  • Building a new version of OpenSSL (only in your own folder for now)
     
    wget https://openssl.org/source/openssl-1.0.2k.tar.gz tar -xvf openssl-1.0.2k.tar.gz cd openssl-1.0.2k/ # --prefix will make sure that make install copies the files locally instead of system-wide # --openssldir will make sure that the binary will look in the regular system location for openssl.cnf # no-shared builds a mostly static binary ./config --prefix=`pwd`/local --openssldir=/usr/lib/ssl enable-ssl2 enable-ssl3 no-shared make depend make # # these next two steps are not required if openssl-1.0.2k already exists on your system. # make -i install sudo cp local/bin/openssl /usr/local/bin/ # # test the newly created binary like so: ./apps/openssl s_client -debug -state -connect ip-address:1433 -ssl2 # ...remembering that many other destinations will no longer accept ssl2 # then rename the old binary (paranoid): mv /usr/bin/openssl /usr/bin/openssl-old # then copy the new binary: cp ./apps/openssl /usr/bin/openssl #
  • Optional steps:
    1. for proper sslv23 handshaking (especially true when you only have an older ODBC connect string with no way to specify ssl parameters) you need to also include the switch no-tls-1-2-client
    2. building with the no-shared switch is necessary for testing your binary in a non-standard location but will result in the program being ~ 6 times larger (around 3.4 MB). Changing to switch shared will result in a binary size of ~ 600 KB)
Caveat: the procedure just given will only fix the OpenSSL CLI. Note that msodbcsql will still be broken because that software calls routines in the shared libraries. To fix msodbcsql you (supposedly) need to do one of the following:
  1. fully install an older version of OpenSSL (libraries and all) in a secondary location then ensure all scripts invoking sqlcmd look there
    • I built an older version of OpenSSL from source code then installed it in /opt/oldopenssl
    • all scripts starting sqlcmd first define LD_LIBRARY_PATH to point to /opt/oldopenssl/lib
    • although strace proves that msodbcsql is first looking in a secondary location, msodbcsql still does not work
  2. completely replace the new version of OpenSSL (libraries and all) with an older version
    • playing with yum downgrade openssl* has not yet worked but I think I may be close
  3. reinstall the previous OS (CentOS-7.2 in this case)

6) a recent update broke our production box (2018-06-xx)

One of our developers was experiencing problems developing a new LDAP-based application. So he invoked YUM to update LDAP on our production box. The big problem here is that the update was done in a careless way (without reading all the release notes). So the LDAP update also updated OpenSSL for the whole system so now we can no longer connect to that older Microsoft platform in Montreal. (see: this previous note)

It now appears that we will need to install a third (older) CentOS platform whose only purpose would be to reach through to the older Microsoft platform. This platform would need to be modified so that it could never been updated.

7) Something is overwriting file "/etc/resolv.conf"

This problem is so weird that I'll stick to bullet points

  • Last month I set up a new CentOS-7 platform for use in a project we will turn up in Feb of 2019
  • I logged onto the console then used a GUI session to setup several network connections which included three corporate DNS references.
  • From this point on, logging into that platform was slow (10 second delays). This included using the "su" command once you were logged in.
  • One of my peers in a city 100km away discovered a typo in file "/etc/resolv.conf" which he fixed using a non-GUI login. The delays disappeared.
  • However, just logging into the console with my GUI session, or rebooting the box, caused the manual repair to be overwritten (10 second delays were back)
  • One needs to remember that Linux began its life as a personal computing platform and many programmers who work in this ecosystem still see it that way. This means there are all kinds of special hooks put in to support the GUI user.
  • If you drop this text "centos networkmanager overwrites resolv.conf" into a Google search then you will get a bunch of hits like this:
    https://ma.ttias.be/centos-7-networkmanager-keeps-overwriting-etcresolv-conf/
    where we learn that this has been going on since CentOS-6.
  • Apparently you are not supposed to enter DNS addresses into the GUI dialog for each NIC. If you do, the Network Manager will continue to copy this information from NIC config files then overwrite "/etc/resolf.conf"
  • There are two ways you can fix this problem:
    1. use an editor to modify the appropriate network settings file(s) which is not recommended in case you make a typo
      • just "/etc/sysconfig/network-scripts/ifcfg-eno1" in my case
    2. use the GUI to remove all DNS references from all your active network configs then use an editor to edit "/etc/resolv.conf".
      • My file now looks like this:
        # NOT Generated by NetworkManager
        # /etc/resolv.conf
        options timeout:1
        options attempts:1
        options rotate
        options no-check-names
        search on.bell.ca
        nameserver 142.182.48.71
        nameserver 142.182.48.105
        #nameserver 142.113.87.152
      • you might consider making a copy of this file like so:
        cd /etc
        cp resolv.conf resolv.conf-copy

8) Our console device is totally dead (2019-08-xx)

We are running CentOS-7.2 on two HP-ML370-g5 servers (one PROD, one DVLP) and both have been running for 30 and 24 months respectively without a reboot. These are older hardware platforms so I have been preparing to cut the whole thing over to to two newer servers (HP-DL385p-gen8) next month. I just noticed I can't access the console on PROD.

command result  
CTRL ALT F1 screen turns solid blue should be GUI mode
CTRL ALT F2 screen turns solid green should be text mode
CTRL ALT F3 screen turns sold green should be text mode

I need to point out that we can do anything else we want via a remote ssh terminal session over the network. In fact, the customers are unaware of fact that anything is wrong.

I've tried everything (short of rebooting) including replacing the monitor and restarting various services (eg. "systemctl restart gdm.service") but it seems that the VGA port is locked up somehow.

SUGGESTION: every system admin must ensure that every system has at least one external network port configured -AND- that the firewall has been configured to permit ssh2 connections so you be able to manage your platform if your VGA console is FUBARed.

9) The old system won't reboot but some files are needed (2019-09-xx)

This is a continuation of item-8 after a 1 month delay. Okay so the good news is this: we have acquired a replacement server and copied all necessary files to it. Since everything appeared to be running properly on the new server, I finished the day by rebooting the old server to see if my VGA port was still broken. The VGA port was not defective but now the system only boots part way then drops into emergency text mode offering a few useless before presenting a root password prompt. Sometime later I got a call saying "we missed some files on the old server". Oops! So I tried rebooting again:

  • booting begins normally
  • I can see a nice solid gray GUI screen with a spinning white cursor so this was not a hardware problem
  • then the console crapped back to "text-only mode" with a prompt to choose between logging in as root or just continuing the boot process
    • I ran a few logs which did not help so I typed "exit" to allow the boot to continue
  • the console flipped to colored confetti in GUI mode with a red-orange spinning cursor; this would be okay if I could login over the network (I thought)
  • then the console crapped back to "text-only mode" with a prompt indicating to choose between logging in as root or just continuing the process
    • this system is not yet up; we cannot connect via the network
    • I can see the file system including the files which were missed
  • since this was an emergency, and my files were visible, I decided to try copying to a USB stick (a.k.a. thumb drive) but had never tried this before from the command line (it happens automatically in GUI mode).

How To Mount a USB stick (thumb drive) without GUI support
p.s. this also works with a one terabyte drive on a USB cable

  • before you install a thumb drive, first inspect the contents of /dev like so:
    cd /dev
    ls -ls sd* # most storage devices come up as sd letter number
  • you will see an "sd" device for every hard drive:
    sda name of disk #1
    sda1
    sda2
    sda3
    name of partition #1 (if one exists)
    name of partition #2 (if one exists)
    name of partition #3 (if one exists)
    sdb name of disk #2
    sdb1 name of partition #1 (if one exists)
  • So on a system with only one hard drive, it is likely that inserting a USB stick will cause Linux to discover the device as "sdb" and its partition (if there is one) as "sdb1"
  • The following steps assume you are inserting a USB stick that was formatted as FAT32 via Windows then was discovered by Linux as sdb/sdb1
    mkdir -p /media/usb
    mount -t vfat /dev/sdb1 /media/usb
    ls -la /media/usb
    -----------------------------------
    cd /home/neil
    cp * /media/usb
    -----------------------------------
    umount /media/usb
  • for many operations (like: "cp -t") it may make more sense to first reformat the USB stick (or USB hard-drive) with a Linux file system like ext4
    Note: just perform one of "format whole device" or
          "format partition" then move to "common" 
    
    <<< format whole device (deletes any partitions) >>>
    mkfs.ext4 /dev/sdb	# uses the whole device
    mount /dev/sdb /media/usb
    ----------------------------
    <<< format partition #1 >>>
    mkfs.ext4 /dev/sdb1	# only format partition #1
    mount /dev/sdb /media/usb
    ----------------------------
    <<< common >>>
    set -ve			# set verify, stop on error
    ls -la /media/usb
    rsync -a /etc /media/usb/etc

10) Now it will boot after this fix (2019)

This is a continuation of items-8-9. I messed around with the old server (~ 30 minutes each day) following the on-screen suggestions after the GUI drops back to text mode during boot. Here is one of the messages presented to me:

Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)

I began Googling various pieces of the above phrase including "(g-io-error-quark, 1)" which took me to this link at askubuntu.com (even though this is a CentOS problem). That article implicates erroneous entries in "/etc/fstab". Apparently any mount failure during boot is considered fatal even though the basic root directory ( "/" ) is in good shape. So I used a text editor to disable my last line of "/etc/fstab" then rebooted. The system came right up.

p.s. that one line I disabled in fstab was pointing at a disk which had be unmounted and deslotted shortly after the first boot 30 months ago.

Caveat: I have seen one situation where log files written to files under "/var/log" had filled the associated partition (some files like wtmp and others under "/var/log/gdm" can grow forever if your system hasn't been rebooted for a while). Type "df -h" to inspect disk free space. If near full, and if an emergency, you might consider deleting some of the larger log files before you reboot. Use this command to display files larger than 2MB

find /var/log -size +2M -exec ls -la {} \; 

 

11) Doing better backups for faster system recovery (2019-09-xx)

  • If your business requires 100% uptime then you will probably need to go to some sort of cloud-based solution
    • you could completely outsource to companies like Amazon Web Services ( https://aws.amazon.com ) which will make you a short term hero but eventually put yourself out of work
    • or you could build your own cloud solution using products like OpenStack
  • If you are not ready to jump to a cloud then you should harden your existing stuff
  • I thought I was doing due diligence when it comes to doing backups (and restores) but my real-world items 8-10 (above) proves that I was not
  • For most businesses in 2019, I should not need to mention that hardware is now relatively inexpensive -AND- operating systems almost free (at least this is the case for the open-source variety) so you could install more stand-by platforms if you can't afford to lose your primary system for too long (say 15 minutes).
  • Prior to 2019-09-xx we only only ran two systems, PROD (which is our production platform) and DVLP (which is our development and qualification platform). We were doing daily backups to magnetic media on a 14-day rotation but here you can see the big problem: loosing PROD means one of the following:
    • recover the platform from magnetic media (will take a very long time)
    • build a new platform from optical media then apply changes from magnetic media
    • apply changes from magnetic media to DVLP then divert your customers there (but then you would still need to develop a plan to go back)
    • or a fourth option described next
  • Since then, I have installed four more systems (two local; two very remote) then use rsync to copy (twice a day) changes to backup locations from which we can do rapid restores by just copying

New Block Diagram

   PROD (Linux)          DVLP (Linux)       4 OpenVMS systems 
+-----------------+  +-----------------+  +-------------------+
| primary         |  | primary         |  | 4 OpenVMS systems |
+-----------------+  +-----------------+  +-------------------+

+-----------------+  +-----------------+
| local stand by  |  | local stand by  |
+-----------------+  +-----------------+

+-----------------+  +-----------------+
| remote stand by |  | remote stand by |
+-----------------+  +-----------------+ 
  • primary employs rsync to copy to local stand by (same data facility) several times a day
  • primary employs rsync to copy to remote stand by (a different city more than 100-km away) several times a day
  • All Linux systems are currently running CentOS-7.7 with Apache and MariaDB
    • having a local stand by can provide peace of mind when you wonder if the next YUM update might break something
    • unlike Amazon or Alibaba, these systems do very little between 21:00 and 8:00
    • this scheme is also useful when migrating to newer server hardware
  • The box labeled "4 other systems" are OpenVMS platforms
    • these machines used to do daily backups to tape which were delivered off site (M-F, excluding holidays)
    • Now, these machines copy their backups into a folder on "DVLP Linux primary" which are then rsync'd to local standby and remote standby every day

12) Python3 caching is currently broken on most Linux distros running SELinux (2019-10-xx)

caveat: this problem covers web applications using Python3 directly (i.e. when not using Django or WSGI)

  • first off, click here to learn about Python3 caching
  • now imagine an Apache process running Python3 script /var/www/cgi-bin/file (without a file extension)
  • Until the problem is fixed you only have a few options:
    1. place SELinux in permissive mode.
      1. this isn't as bad as it sounds provided this is done temporarily; test your web-services via Apache then ensure that Apache has compiled-cached all the Python3 scripts; then shift SELinux back into enforcing mode
      2. remember to do this every time you update any Python3 scripts -or- or do major Python3 upgrades (like from 3.6 to 3.8)
    2. inspect the suggestions SELinux has written to /var/log/message for suggestions then craft your own temporary fix (this always happens; even when SELinux is in permissive mode)
    3. live with the problem (Python3 will run like Python2) but remember that every transaction may consume an additional 10-15 mS since you will be always compiling but never caching

update: as of 2020-04-30 I have not seen any movement on this problem for CentOS-7.7 but I have heard rumors of a beta RPM for CentOS-8.
update: on 2020-05-13 this patch was available from the CentOS repositories: libselinux-python3.x86_64 (ver2.5-15.el7)
comment: perhaps it is not unreasonable to wait 8 months for a patch on unsupported software (CentOS). Although "Cent" supposedly means "community enterprise", companies requiring faster service would be advised to move to RHEL along with a support agreement.

13) FUBAR with USB Audio (2019-11-xx)

test #1

  • build a CentOS-7.7 system using recipe "Server with GUI"
  • log into the GUI console with any priv account other than root
    • insert any USB audio device (I tried these two):
      • Logitech S150 Digital USB Stereo Speakers
      • Ugreen USB to Audio
    • audio testing from any software app (including Gnome audio settings) produces no audio
  • now log out then back in as root
    •  audio testing now works properly

test #2

  • build a CentOS-7.7 system using recipe "Gnome Desktop"
  • log into the GUI console with any priv account other than root
    • insert any USB audio device
    • audio testing works

comments:

  • USB storage devices (thumb-drives, hard drives, and DVD/CD drives) always work properly (they are owned by the GUI session logged into the console) so I wonder why this doesn't always happen with USB Audio devices. Perhaps this could this be fixed by making entries into one of the sudo files under /etc
  • After you get control of your audio device from Gnome Audio Settings, audio streaming from internet radio stations only work properly from Google Chrome (78) but not Firefox (68)
  • installing Google Chrome:
  • Firefox Linux updates:
    • Firefox version 78.11.0 now works (tested 2021-06-30 with CentOS-7.9 and Rocky Linux-8.4)

14) One LVM volume is too big, the other too small

  • Our production CentOS systems have been up and running for 560 days with not too many difficulties
  • Our primary PROD and DVLP platforms are implemented on HP DL385p_gen8 servers with 8-drives configured as a single RAID-60 volume with 1-TB of space
  • I was a Linux newbie when I installed CentOS-7 on these machines so went with the suggested partitioning and LVM (a big mistake). This means that the LVM representing slash (a.k.a. root) is sitting at 50-GB whilst the LVM representing slash-home is sitting at 950-GB (this would be okay if we were in a university with a lot of interactive users; but we only have 3 interactive accounts and 6 SAMBA shares)
  • The problem here is that our MariaDB database (an alternate fork of MySQL) has grown to the point that we've only got 30% free space on the root LVM
    caveat: I was a little wiser when I set up the 4 backup systems (2 local, 2 remote). On these machines I instructed the installer to do a 50-50 split between root and slash-home.
  • According to this document it should be easy to free up space on one LVM then apply it to the other LVM while the system is running
    https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_modifying-logical-volume-size-configuring-and-managing-logical-volumes
    but we have a problem. Apparently you cannot use lvreduce on an xfs formatted volume. Note that xfs is what I see when I type either one of these commands:
    command explanation
    df -Th display disk free space in human units with file system Type
    mount | grep mapper display mount points (1)
    mount | grep centos display mount points (2)
  • Also ignore all internet advice claiming you can reduce the size of an xfs volume because it is not possible. "xfs" is really fast because it was only made to grow.
  • At this point we've got three or four options:
    1. shutdown the application then take the database offline; move the database from the root LVM to the slash-home LVM; restart the application
      note: I've got a few example procedures which involve about 12 steps when SELinux is present
    2. Backup the LVM volume associated with "/dev/mapper/centos-home" then delete it. Create a smaller version then restore your backup into it
      note: the application does not needed to be taken off line
      # step comments
      1 a) log on as root
      b) make sure no interactive users are logged in
      c) stop-disable any cron jobs which may need "/home"
      d) stop the SAMBA service
      interactive non-root users require resources in "/home" which we want to dismount
      (so you must not log into a non-root account then use "su" or "sudo")

      command: sudo service smb stop
      2 mkdir /hack
      rsync -aX   /home   /hack
      ---
      du -a /home
      du -a /hack/home
      these two demo lines can only be used if you have sufficient space on "/"
      need "-X" to ensure we also get meta data (includes SELinux stuff)
      ----
      check the src file count
      verify the dst file count (do they match?)
      3 umount  /dev/mapper/centos-home un-mount this volume
      4 lvremove  /dev/mapper/centos-home delete the volume (danger; you will pass the point of no return)
      5 lvcreate -L 400GB -n home centos create a smaller replacement LVM (keep 100 GB free for some unforeseen need)
      6 mkfs.xfs -f /dev/centos/home format the volume with xfs
      note1: here the "-f" switch means "force"
      note2: normally I would have used this command "mkfs -t xfs /dev/centos/home"
      but it does not support "-f"
      7 mount -a mount everything in "/etc/fstab" which is not mounted
      alternatively use this command: "mount  /dev/mapper/centos/home  /home"
      8 lvextend -r -L500GB  /dev/mapper/centos-root extend this volume to 500 GB; use "-r" to resize
      without "-r" you will only see the new size in "lvs"
      with "-r" you will also see the new size in "df -h"
      9 rsync -aX   /hack/home   /
      du -a /hack/home
      du -a /home
      restore the contents of the LVM associated with /home
      check src file count
      verify dst file count (do they match?)
      10 start the SAMBA service
      is it running?
      command: sudo service smb start
      command: sudo service smb status
    3. Both our LVM volumes are sitting on the same RAID-60 volume so having two LVM volumes is surely redundant. We will backup the LVM associated with "/home" then delete it via lvremove. We also need to delete the "/home" mount point (it was created with the "-p" switch) via rmdir. Now just restore the backup into "/home"
      Caveat: remember to edit "/etc/fstab" then disable the line where "/dev/mapper/centos-home" was mounted as "/home". Why?
      The system may not properly boot to a functional state without this step. See tip-10 above
      note: the application does not needed to be taken off line
    4. shutdown the application then take the database offline; take the production server offline; take the production backup server offline; swap node names and i/p address; bring everything back up
      note: this will always be the default action if any of the above operations fail for whatever reason
Preliminary Steps
  • I downloaded a copy of Oracle VM VirtualBox from here ( https://www.oracle.com/virtualization/virtualbox/ ) then installed it on my PC
  • I next created a Red-Hat virtual machine (make sure your virtual hard disk is at least 100-GB in size for this 2-LVM experiment) then installed CentOS-7.7 into it with the default options (about 50-GB was assigned to each LVM)
    • Testing Option-2
      • preliminary tests worked as a can log in with my non-root user account
  • I next installed CentOS-7.7 into a surplus server (HP DL385p_gen8 with single 1-TB disk volume) in my lab
    • Testing Option-2 (attempt 1)
      • preliminary tests worked as a can log in with my non-root user account
      • secondary tests failed (I cannot start my Chrome browser or listen to any audio from NPR radio stations); this appears to be an SELinux problem since the the event logs are full of those kinds of messages
        • logged back on as root then ran this SELinux command which (I think) fixed my account problems because my Chrome browser now works properly
          restorecon -rv /home
          but it would have been better to not rely on something like this which is done after-the-fact
        • I realized that I should have been using the "-X" switch in all my rsync operations so I updated the table above
      • I will repeat the total Option-2 experiment tomorrow
    • Testing Option-2 (attempt 2)
      • preliminary tests worked as a can log in with my non-root user account
      • secondary tests passed as my non-root account can use the browser with streaming audio (yay!)

Next Steps

  • time to try this on our DVLP box (I need to do it on a Saturday when no interactive users are locking files in slash-home
  • 2020-08-15: executed option-2 (above) on node "kawc4n" (DVLP); appears to be 100% successful; the whole procedure took a little less than 2-hours because I had to restore 205-GB over a 1-Gb/s Ethernet
  • 2020-08-22: executed option-2 (above) on node "kawc0f" (PROD); appears to be 100% successful; the whole procedure took a little less than 1-hour because the contents of slash home was on 5-GB so I first performed an rsync backup to a local folder
  • conversions complete!

15) Yum is failing to initialize on one system of six (2020-09-xx)

I am running 6 servers (one PROD, one DVLP, two local shadows, two remote shadows) but YUM is failing to initialize on the oldest unit (both PROD and DVLP were built in 2018 as CentOS-7.5 then YUM updated to CentOS-7.6)

Now inspect the following (pay attention to the red text - especially the last one just before the final prompt)
[root@kawc0f /]# yum makecache fast Loaded plugins: fastestmirror, langpacks Determining fastest mirrors epel/x86_64/metalink | 16 kB 00:00:00 Could not retrieve mirrorlist ... ... https://mirrors.iuscommunity.org/mirrorlist?repo=ius-centos7&arch=x86_64&protocol=http error was 14: HTTPS Error 404 - Not Found   One of the configured repositories failed (Unknown), and yum doesn't have enough cached
data to continue. At this point the only safe thing yum can do is fail. There are a few
ways to work "fix" this: 1. Contact the upstream for the repository and get them to fix the problem. 2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is
most often useful if you are using a newer distribution release than is supported by the
repository (and the packages for the previous distribution release still work). 3. Run the command with the repository temporarily disabled yum --disablerepo=<repoid> ... 4. Disable the repository permanently, so yum won't use it by default. Yum will then just
ignore the repository until you permanently enable it again or use --enablerepo for
temporary usage: yum-config-manager --disable <repoid> or subscription-manager repos --disable=<repoid> 5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will
try to contact the repo. when it runs most commands, so will have to try and fail each
time (and thus. yum will be be much slower). If it is a very temporary problem though,
this is often a nice compromise: yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true Cannot find a valid baseurl for repo: ius/x86_64 [root@kawc0f /]#

Now Google the red quoted phrase "Cannot find a valid baseurl for repo: ius/x86_64" which returns hits like this:
https://github.com/iusrepo/announce/issues/18
where we learn that many of the repositories have moved from the ".org" domain to the ".io" domain. Apparently support for ".org" ended in April-2020.
Back in the day you would need to execute something like this:
yum install -y https://centos7.iuscommunity.org/ius-release.rpm

but this fixed my problem:
yum install -y https://repo.ius.io/ius-release-el7.rpm

16) The future of CentOS (the invisible hand of the market?)

Comments:
  • Scientific Linux (SL) first appeared in 2004 and was popular among scientists working at FermiLab, CERN and DESY to just name three of many. With the success of CentOS many organizations were convinced to swap CentOS for Scientific Linux. For example CERN (the home of the LHC) began favoring CentOS in 2015 although SL is still supported at CERN as of Dec-2020. Red Hat made an end-of-life announcement for Scientific Linux in 2019 (before Red Hat was acquired by IBM).
  • When it comes to Linux, my employer uses RHEL for all customer-facing production platforms and CentOS for everything else (this includes everything from application development, user acceptance, hands-on Linux training, etc). CentOS was also being used as on on-ramp for driving UNIX projects onto RHEL platforms. It appears that IBM has thrown a monkey-wrench into those plans. I have no idea what the future holds but history can be instructive. Recall that when Michael Widenius and others didn't like where SUN was taking MySQL, they created MariaDB (that decision seem prescient after Oracle acquired SUN; then promised the EU not to kill MySQL; then slowed MySQL bug fixes for more than a year until they noticed that the Linux community was preferentially installing MariaDB).

17) All four GUI consoles are locked up

caveat: all work here was done from a non-GUI session (usually a network connection)

  • Due to the COVID-19 pandemic, I only take a trip into the office once a week (usually Friday afternoons) just to ensure the environment is secure.
  • We are running four CentOS servers (all: HP DL385p Gen8; one PROD; one DVLP; a hot standbys for each)
  • On my Friday afternoon routines I always check the drive LEDs but never check the consoles which are usually dark (in energy-saving mode)
  • During my walk-through today:
    • I noticed PROD-console was not dark AND contained a solid charcoal-colored background with some green OK messages "I think" came from systemctl. I was able to get a non-GUI login prompt after hitting the CTRL-ALT-2 combo (3 and 4 worked as well)
    • I noticed DVLP-console was not dark AND contained a solid-black background with some white text messages "I think" came from dmesg. I was able to do the CTRL-ALT-2 thing here as well
    • The two hot standby units only presented text-prompts (no GUI).
      • Logging onto them then typing startx did not bring up a GUI. Typing systemctl get-default returned "graphical.target". Typing systemctl isolate graphical.target did nothing.
    • All units had not been running for more than 400 days.
  • Step-1 (hot standby units)
    • I tried both yum check-update and yum update but nothing was offered (perhaps these Linux instances were too old)
    • since no humans were logged into these, I used yum upgrade to bring them up from CentOS-7.7 to CentOS-7.9 then rebooted
    • Still no GUI on the consoles (even after logging in then typing startx) so I typed this:
      1. yum groups install "GNOME Desktop" (why wasn't this stuff updated during the OS upgrade?)
      2. systemctl isolate multi-user.target (equivalent to setting runlevel=3 on UNIX boxes)
      3. systemctl isolate graphical.target (equivalent to setting runlevel=5 on UNIX boxes)
    • Now the GUI auto-magically appeared on both consoles.
  • Step-2 (DVLP and PROD)
    • People logged on here so a reboot was not possible.
    • So I repeated steps 2-3 above which moved TTY1 out of GUI mode but could not put it back.
    • Next, I repeated steps 1-3 which brought back the console GUI.

going forward

  • We employ graphical consoles for the odd time that we prefer do something quickly (like make changes to the software firewall or reconfiguring a NIC). But this is a data center where console devices usually do not exist. On top of that, it is becoming apparent to me that GUI consoles are more trouble than they are worth so I'm going to permanently move these systems from graphical.target to multi-user.target with the hope that a simple startx will be all that is required for occasional graphical support at the console:
    systemctl isolate multi-user.target this command makes immediate changes
    systemctl set-default multi-user.target this command will affect the next reboot
  • caveat: remember that you will need to log out twice. Once from GUI mode then once from text mode

18) Finally solved a slow-response problem (2021-06-18)

  • I have been running six CentOS systems for the past few years (I employ RSYNC multiple times a day to keep local-copy and remote-copy systems reasonably up-to-date)
    humans here
    no humans here no humans here
    PROD local-copy remote-copy
    DVLP local-copy remote-copy
  • DVLP-remote-copy has never worked properly from an interactive point of view although RSYNC jobs to it are fine
    • symptoms:
      • after typing either "su -" or "sudo" I only see a password prompt after 5-10 seconds
      • typing "top -d 0.5" refreshes the display every 3-4 seconds rather than twice a second
      • lots of canary messages from rtkit-daemon in /var/log/messages like this one:
        Jun 4 16:46:00 bfdc0e rtkit-daemon[1021]: The canary thread is apparently starving.Taking action.
    • solution
      • for a time I thought the canary messages were associated with a bad USB device so I had someone unplug the console mouse and keyboard but this did nothing (the remote location is 160 km away)
      • for a time I thought it might be a BIOS problem since all my working machines employed a BIOS from 2014 whist this one was from 2016. HP's BIOS release notes contain a lot of references to timing issues associated with AMD processors so I decided to hack (er, play)
        • I typed the 'lscpu' command so I could see which cores were where
        • next, I disabled all cores associated with the second CPU and this fixed my slow-response problems.
        • use the man pages to learn how to disable CPU cores or review the lines in this BASH script: cpu_control.sh

19) Updating an old installation in stages (2021-10-xx)

caveat: this problem is on going (so the follow proposed solution is untested)

  • I have two old systems running CentOS-7.3 which appear to be too old to upgrade via yum
    (I have never seen this happen before so do not know (yet) what is going on; but I am seeing a lot of errors mentioning PROTECTED MULTILIB VERSIONS)
  • these two commands fail as yum tries to update directly to CentOS-7.9
    yum upgrade
    yum upgrade --skip-broken
  • I am going to attempt to update this system in stages as described here: https://digitolle.wordpress.com/2017/10/26/how-to-upgrade-centos-to-a-specific-version/
    where we will modify this file: /etc/yum.repos.d/CentOS-Base.repo
  • steps:
    1. first visit this site to see what you need: https://vault.centos.org/ (when updating to 7.4 you need to specify 7.4.1708)
    2. modify yum config files to only point at the centos vault:
      su -
      cd /etc/yum.repos.d
      cp CentOS-Base.repo CentOS-Base.repo-old
      vi CentOS-Base.repo
      # just comment the mirror list AND uncomment the baseurl in four places
      [base]
      ...
      #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=os
      baseurl=http://vault.centos.org/$releasever/os/$basearch/
      ...
       
      [updates]
      ...
      #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=updates
      baseurl=http://vault.centos.org/$releasever/updates/$basearch/
      ...
       
      [extras]
      ...
      #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=extras
      baseurl=http://vault.centos.org/$releasever/extras/$basearch/
      ...
       
      [centosplus]
      ...
      #mirrorlist=http://mirrorlist.centos.org/?release=$releasever&arch=$basearch&repo=centosplus
      baseurl=http://vault.centos.org/$releasever/centosplus/$basearch/
      ...
    3. now type this:
      yum upgrade --releasever=7.4.1708
    4. reboot then repeat all these steps for each required version (the yum config file will be clobbered each time)

20) rpm hack to get mpack + munpack on CentOS-7

  • I am attempting to move an inbound email processing interface from OpenVMS to CentOS-7
  • The old application is dependent upon munpack (mime-unpack) which is not available on CentOS-7 but was available on CentOS-6, UNIX and Windows
  • To make matters worse, if you drop these two quoted words: "centos 7" "munpack" into a google search, you will be referred to a Red Hat developer site where you are instructed to use uudecode which can only be used on pieces of an email after it has been pulled apart (so uuencode does not come close to being a drop in replacement for munpack)
  • Another promising program is ripmime but what do you do if you really want to stick with munpack?
  • I decided to first look at the application source files which were first published in UNIX. Then I looked at the CentOS-6 rpm file to see if it would be easier to modify it for use with CentOS-7. And that is when I discovered two binary executable files which can be used as-is:
    • I tested them on my CentOS-7 and they worked without any complaint
    • even though the command "munpack -?" only shows switches '-f' and '-q', the program supports '-t' (text try harder) which means this implementation is really version 1.6-2b which is seen on some OpenVMS systems
  • Here are my raw notes:
    caveat: ('el6' means 'enterprise linux 6' so the rpm is for CentOS-6 or RHEL-6):
    ======================================================================================
    title  : mpack_notes.txt
    author : Neil Rieck
    created: 2021-11-17
    edit   : 2021-11-18 
    notes  : the 'munpack' utility is also found here
    platfom: CentOS-7 
    stanzas:
    1) playing with file mpack-1.6.tar.gz              (mpack + munpack for UNIX + Windows)
    2) playing with file mpack-1.6-2.el6.rf.x86_64.rpm (mpack + munpack for CentOS-6)
    ======================================================================================
    1) mpack-1.6.tar.gz
    
    tar -tvf mpack-1.6.tar.gz		# list(test) verbosely (f=mpack-1.6.tar.gz)
    tar -xvf mpack-1.6.tar.gz		# extract verbosely (f=mpack-1.6.tar.gz)
    					# note: creates folder 'mpack-1.6' 
    tar -xvf mpack-1.6.tar.gz -C yada	# place output in folder 'yada'
    ======================================================================================
    2) mpack-1.6-2.el6.rf.x86_64.rpm
    
    mkdir                            mpack_rpm_hack		# create directory
    cp mpack-1.6-2.el6.rf.x86_64.rpm mpack_rpm_hack		# copy file to folder
    cd                               mpack_rpm_hack		# move into folder
    rpm2cpio mpack-1.6-2.el6.rf.x86_64.rpm | cpio -idmv	# extract contents
    tree --charset="ascii"					# see the mess
    	.
    	|-- mpack-1.6-2.el6.rf.x86_64.rpm
    	`-- usr
    	    |-- bin
    	    |   |-- mpack
    	    |   `-- munpack
    	    `-- share
    	        |-- doc
    	        |   `-- mpack-1.6
    	        |       |-- Changes
    	        |       |-- INSTALL
    	        |       |-- README.mac
    	        |       `-- README.unix
    	        `-- man
    	            `-- man1
    	                |-- mpack.1.gz
    	                `-- munpack.1.gz
    
    	7 directories, 9 files
    	[neil@kawc4n mpack_rpm_hack]$
    ./usr/bin/munpack -? # test the binary as-is
    munpack version 1.6 # yay!
    usage: munpack [-f] [-q] [-C directory] [files...]

Back to Linux Notes
 Back to Home
Neil Rieck
Waterloo, Ontario, Canada.