Linux Notes: Real-world problems
- The information presented here is intended for educational use by qualified computer technologists.
- The information presented here is provided free of charge, as-is, with no warranty of any kind.
- Edit: 2022-03-10
Real-world Linux Problems
1) We cannot install or update via YUM
We have two CentOS-7 platforms: one for development and one for production (comment: two platforms may
not be enough when using Linux; see the IBM-Managed warning after this
section). The recommended approach is to first install (or update) software on the development box. If testing for the next few
days (to few weeks) proves that everything is working properly then we would repeat the procedure on the production box. This
also keeps both platforms more-or-less in sync.
I wanted to install the tree utility so I logged onto our DVLP
platform where I entered this command:
sudo yum install tree
... which worked properly.
Then I repeated this command on our PROD platform which failed with numerous errors associated with file /usr/libexec/urlgrabber-ext-down
which is a python script. What was worse was this: you could not execute "firewall-cmd" or most yum
commands including "yum check-update". Investigating further, I noticed that someone had installed
python3 then updated the symbolic link so that typing a python command pulls up python3 rather than python2 (most Linux administrator utilities in 2018 require Python 2.7 and this has not changed in 2019 or
2020)
There are only two ways out of this problem (remember that this is an active business system).
- modify the first line (the shebang or sharp-bang line) of broken system scripts from
this "#!/usr/bin/python" to this "#!/usr/bin/python2"
-or-
- modify the symbolic link for "python" and point it back at the symbolic link "python2" which probably points to "python2.7".
This will restore the system to its previous functionality but could break something if newer customer scripts required
python3 to be the default. So before modifying the symbolic link, modify the shebang line of customer scripts like this:
"#!/usr/bin/python" to this "#!/usr/bin/python3"
Tip: if this is an emergency, just make minimal changes. For now, just modify the
scripts for whatever is broken (eg. yum or firewall-cmd). But you will eventually need to put everything back to a pristine
state. If your stuff needs python3 then you need to rely upon shebang. I have no idea why the Linux maintainers didn't do this
for their scripts requiring Python2. They broke their own golden rule
# 1 ) partial example of a system with two versions of python
# 2a) notice that "python" is pointing to "python2"
# 2b) notice that "python2" is pointing to "python2.7"
# 3 ) utilities requiring python2 (like yum and firewall-cmd)
# should say so on the shebang line of those scripts
#
$ cd /usr/bin
$ ls pytho* -la
lrwxrwxrwx. 1 root root 7 Jan 12 15:25 python -> python2
lrwxrwxrwx. 1 root root 9 Dec 20 2016 python2 -> python2.7
-rwxr-xr-x. 1 root root 7136 Nov 5 2016 python2.7
lrwxrwxrwx. 1 root root 9 Apr 12 2017 python3 -> python3.4
-rwxr-xr-x. 2 root root 11312 Jan 17 2017 python3.4
...snip...
2) Never use a graphical console to update Linux
I have experienced several instances where updating software though the graphical interface fails for some reason then breaks
the graphical interface (or the whole system). It should not surprise anyone that updating the gnome-session, or any of its
dependencies, might disturb the very session that is running yum or rpm
So if you are on the graphical system console (which is almost always a VGA monitor) and want to move to non-graphical session
console before running yum or rpm, try one of the following keystrokes:
key press |
description |
notes |
CTRL ALT F1 |
switch to terminal 1 (the graphical interface) |
only auto-graphical when runlevel >= 5 |
CTRL ALT F2 |
switch to terminal 2 (/dev/tty2) |
text only |
CTRL ALT F3 |
switch to terminal 3 (/dev/tty3) |
text only |
CTRL ALT F4 |
switch to terminal 4 (/dev/tty4) |
text only |
CTRL ALT F5 |
switch to terminal 5 (/dev/tty5) |
text only |
CTRL ALT F6 |
switch to terminal 6 (/dev/tty6) |
text only |
The only other way to safely disable graphics is to lower the runlevel of your system to 3. (but only do this if you are
certain that you won't kill some process currently needed by your customers). Alternatively, use ssh to log into your system via
the network then execute yum on that session.
update: Even though CentOS-7 does not use "/etc/inittab", and the text notes contained within say to do
everything with systemctl, the following commands worked for me from the console as well as a network
connection:
$ runlevel # display current run level
runlevel N 5
$ init 3 # console #1 switches over to text mode
$ runlevel
runlevel 5 3
$ init 5 # console #1 switches back to graphics mode
$ runlevel
runlevel 3 5
Caveat: never init to a number below 3 over the network because that will kill the network so you WILL NOT be able to
restore runlevel remotely
3) Using Windows to access a Linux remote
The self-help blogs really fall down on this one because the only secure way to do this is to tunnel
x-sessions over SSH. But whenever anyone on a self-help blog asks how to do this only using SSH, some idiot will chime in with a
procedure on how to do it using VNC, RealVNC, TigerVNC or Vino which are all insecure.
To make matters worse, setting up a remote graphical session is almost impossible (at least under certain circumstances like
Windows -> Red-Hat/CentOS) because GNOME3 contains 3-d extensions not found in Windows clients. The
best way out of this is to setup Red-Hat/CentOS on a machine at the client end then use it to connect to the desired Linux
platform.
comment: some conspiracy-minded people think this change was deliberately done to stop support
professionals from from using Windows as their default platform to support all others. The might be correct.
Xming
- xming is a simple tool which is used in conjunction with a terminal
emulator like: Tera Term or PuTTY
CygWin and CygWin/X
4) Recovering a failed YUM update (2018-01-xx)
- I recently had a yum update fail on the graphical interface (see note #2 above regarding why you should never do this)
- YUM was in the middle of updating ~ 1200 items when the GUI collapsed. I waited 8-hours then rebooted. The system came up
then attempted to start a graphics console which failed (could see an arrow cursor and nothing else)
- I typed CTRL ALT F2 and was able to login as root (from here I reran YUM)
5) a recent YUM update broke our development box (2018-01-xx)
- we use sqlcmd from msodbcsql (ms odbc sql) to access an old database (SQL Server 2005) running on an old OS (Windows Server 2003). The platform is located
in another city and province; they never installed mandatory patches; I have no faith in them upgrading any time soon; we have
been able to connect for 14-months.
- a recent yum update on our development platform broke msodbcsql (the
production platform continues to work)
- checking logs on our production platform shows that we can only connect to the Windows box only using the sslv2
protocol ???
- testing on PROD
# notes:
# 1) well-known port 1433 is reserved for Microsoft SQL Server
# 2) SQL Server 2005 appears to support ssl2 but nothing higher
# 3) contrary to popular belief, as soon as you specify a username
# and password in your ODBC connect string (via sqlcmd) then
# the initial handshake will be encrypted
#
# this passes
#
openssl s_client -debug -state -connect ip-address:1433 -ssl2
#
# this fails
#
openssl s_client -debug -state -connect ip-address:1433 -ssl3
- Platform Differences
Platform |
CentOS Notes |
OpenSSL version |
OpenSSL Notes |
Production |
CentOS 7.3 (built 14-months ago) |
OpenSSL-1.0.1e |
ssl2 is supported |
Development |
CentOS 7.4 (yum updated 2018-01-29) |
OpenSSL-1.0.2k |
ssl2 has been disabled prior to build |
- The simplest way out (at this time) is to build a fully-functional new version of OpenSSL-1.0.2k (in a local folder) then
then copy the binary to "/usr/bin" after renaming the old version (just being paranoid here). A good friend provided me with
this link on the Ubuntu site (a Debian flavor) which seems to work properly on CentOS-7
- https://wiki.openssl.org/index.php/Compilation_and_Installation
-
https://askubuntu.com/questions/893155/simple-way-of-enabling-sslv2-and-sslv3-in-openssl
- Building a new version of OpenSSL (only in your own folder for now)
wget https://openssl.org/source/openssl-1.0.2k.tar.gz
tar -xvf openssl-1.0.2k.tar.gz
cd openssl-1.0.2k/
# --prefix will make sure that make install copies the files locally instead of system-wide
# --openssldir will make sure that the binary will look in the regular system location for openssl.cnf
# no-shared builds a mostly static binary
./config --prefix=`pwd`/local --openssldir=/usr/lib/ssl enable-ssl2 enable-ssl3
no-shared
make depend
make
#
# these next two steps are not required if openssl-1.0.2k already exists on your system.
#
make -i install
sudo cp local/bin/openssl /usr/local/bin/
#
# test the newly created binary like so: ./apps/openssl s_client -debug -state -connect ip-address:1433 -ssl2
# ...remembering that many other destinations will no longer accept ssl2
# then rename the old binary (paranoid): mv /usr/bin/openssl /usr/bin/openssl-old
# then copy the new binary: cp ./apps/openssl /usr/bin/openssl
#
- Optional steps:
- for proper sslv23 handshaking (especially true when you only have an older ODBC connect string
with no way to specify ssl parameters) you need to also include the switch no-tls-1-2-client
- building with the no-shared switch is necessary for testing your binary in a non-standard
location but will result in the program being ~ 6 times larger (around 3.4 MB). Changing to switch shared
will result in a binary size of ~ 600 KB)
Caveat: the procedure just given will only fix
the OpenSSL CLI. Note that msodbcsql will still be broken because that software calls routines in the shared libraries. To fix
msodbcsql you (supposedly) need to do one of the following:
- fully install an older version of OpenSSL (libraries and all) in a secondary location then ensure all scripts invoking
sqlcmd look there
- I built an older version of OpenSSL from source code then installed it in /opt/oldopenssl
- all scripts starting sqlcmd first define LD_LIBRARY_PATH to point to /opt/oldopenssl/lib
- although strace proves that msodbcsql is first looking in a secondary location, msodbcsql still does
not work
- completely replace the new version of OpenSSL (libraries and all) with an older version
- playing with yum downgrade openssl* has not yet worked but I think I may be close
- reinstall the previous OS (CentOS-7.2 in this case)
6) a recent update broke our production box (2018-06-xx)
One of our developers was experiencing problems developing a new LDAP-based application. So he invoked YUM to update LDAP on
our production box. The big problem here is that the update was done in a careless way (without reading all the release notes).
So the LDAP update also updated OpenSSL for the whole system so now we can no longer connect to that older Microsoft platform in
Montreal. (see: this previous note)
It now appears that we will need to install a third (older) CentOS platform whose only purpose would be to reach through to the
older Microsoft platform. This platform would need to be modified so that it could never been updated.
7) Something is overwriting file "/etc/resolv.conf"
This problem is so weird that I'll stick to bullet points
- Last month I set up a new CentOS-7 platform for use in a project we will turn up in Feb of 2019
- I logged onto the console then used a GUI session to setup several network connections which included three corporate DNS
references.
- From this point on, logging into that platform was slow (10 second delays). This included using the "su" command once you
were logged in.
- One of my peers in a city 100km away discovered a typo in file "/etc/resolv.conf" which he fixed
using a non-GUI login. The delays disappeared.
- However, just logging into the console with my GUI session, or rebooting the box, caused the manual repair to be overwritten
(10 second delays were back)
- One needs to remember that Linux began its life as a personal computing platform and many programmers who work in this
ecosystem still see it that way. This means there are all kinds of special hooks put in to support the GUI user.
- If you drop this text "centos networkmanager overwrites resolv.conf" into a Google search then you will get a bunch of hits
like this:
https://ma.ttias.be/centos-7-networkmanager-keeps-overwriting-etcresolv-conf/
where we learn that this has been going on since CentOS-6.
- Apparently you are not supposed to enter DNS addresses into the GUI dialog for each NIC. If you do, the Network Manager will
continue to copy this information from NIC config files then overwrite "/etc/resolf.conf"
- There are two ways you can fix this problem:
- use an editor to modify the appropriate network settings file(s) which is not recommended in case you make a typo
- just "/etc/sysconfig/network-scripts/ifcfg-eno1" in my case
- use the GUI to remove all DNS references from all your active network configs then use an editor to edit
"/etc/resolv.conf".
- My file now looks like this:
# NOT Generated by NetworkManager
# /etc/resolv.conf
options timeout:1
options attempts:1
options rotate
options no-check-names
search on.bell.ca
nameserver 142.182.48.71
nameserver 142.182.48.105
#nameserver 142.113.87.152
- you might consider making a copy of this file like so:
cd /etc
cp resolv.conf resolv.conf-copy
8) Our console device is totally dead (2019-08-xx)
We are running CentOS-7.2 on two HP-ML370-g5 servers (one PROD, one DVLP) and both have been running for 30 and 24 months
respectively without a reboot. These are older hardware platforms so I have been preparing to cut the whole thing over to to two
newer servers (HP-DL385p-gen8) next month. I just noticed I can't access the console on PROD.
command |
result |
|
CTRL ALT F1 |
screen turns solid blue |
should be GUI mode |
CTRL ALT F2 |
screen turns solid green |
should be text mode |
CTRL ALT F3 |
screen turns sold green |
should be text mode |
I need to point out that we can do anything else we want via a remote ssh terminal session over the network. In fact, the
customers are unaware of fact that anything is wrong.
I've tried everything (short of rebooting) including replacing the monitor and restarting various services (eg. "systemctl
restart gdm.service") but it seems that the VGA port is locked up somehow.
SUGGESTION: every system admin must ensure that every system has at least one external network port
configured -AND- that the firewall has been configured to permit ssh2 connections so you be able to manage your platform if your
VGA console is FUBARed.
9) The old system won't reboot but some files are needed (2019-09-xx)
This is a continuation of item-8 after a 1 month delay. Okay so the good news is this: we have acquired a replacement server
and copied all necessary files to it. Since everything appeared to be running properly on the new server, I finished the day by
rebooting the old server to see if my VGA port was still broken. The VGA port was not defective but now the
system only boots part way then drops into emergency text mode offering a few useless before presenting a root password
prompt. Sometime later I got a call saying "we missed some files on the old server". Oops! So I tried rebooting again:
- booting begins normally
- I can see a nice solid gray GUI screen with a spinning white cursor so this was not a hardware problem
- then the console crapped back to "text-only mode" with a prompt to choose between logging in as root or just continuing the
boot process
- I ran a few logs which did not help so I typed "exit" to allow the boot to continue
- the console flipped to colored confetti in GUI mode with a red-orange spinning cursor; this would be okay if I could login
over the network (I thought)
- then the console crapped back to "text-only mode" with a prompt indicating to choose between logging in as root or just
continuing the process
- this system is not yet up; we cannot connect via the network
- I can see the file system including the files which were missed
- since this was an emergency, and my files were visible, I decided to try copying to a USB stick (a.k.a. thumb drive) but had
never tried this before from the command line (it happens automatically in GUI mode).
How To Mount a USB stick (thumb drive) without GUI support
p.s. this also works with a one terabyte drive on a USB cable
- before you install a thumb drive, first inspect the contents of /dev like so:
cd /dev
ls -ls sd* # most storage devices come up as sd letter number
- you will see an "sd" device for every hard drive:
sda |
name of disk #1 |
sda1
sda2
sda3 |
name of partition #1 (if one exists)
name of partition #2 (if one exists)
name of partition #3 (if one exists) |
sdb |
name of disk #2 |
sdb1 |
name of partition #1 (if one exists) |
- So on a system with only one hard drive, it is likely that inserting a USB stick will cause Linux to discover the device as
"sdb" and its partition (if there is one) as "sdb1"
- The following steps assume you are inserting a USB stick that was formatted as FAT32 via Windows then was discovered by
Linux as sdb/sdb1
mkdir -p /media/usb
mount -t vfat /dev/sdb1 /media/usb
ls -la /media/usb
-----------------------------------
cd /home/neil
cp * /media/usb
-----------------------------------
umount /media/usb
- for many operations (like: "cp -t") it may make more sense to first reformat the USB stick (or USB hard-drive) with a Linux
file system like ext4
Note: just perform one of "format whole device" or
"format partition" then move to "common"
<<< format whole device (deletes any partitions) >>>
mkfs.ext4 /dev/sdb # uses the whole device
mount /dev/sdb /media/usb
----------------------------
<<< format partition #1 >>>
mkfs.ext4 /dev/sdb1 # only format partition #1
mount /dev/sdb /media/usb
----------------------------
<<< common >>>
set -ve # set verify, stop on error
ls -la /media/usb
rsync -a /etc /media/usb/etc
10) Now it will boot after this fix (2019)
This is a continuation of items-8-9. I messed around with the old server (~ 30 minutes each day) following the on-screen
suggestions after the GUI drops back to text mode during boot. Here is one of the messages presented to me:
Error initializing authority: Could not connect: No such file or directory (g-io-error-quark, 1)
I began Googling various pieces of the above phrase including "(g-io-error-quark, 1)" which took me to this
link at askubuntu.com (even though this is a CentOS problem). That article implicates erroneous entries in "/etc/fstab".
Apparently any mount failure during boot is considered fatal even though the basic root directory ( "/" ) is in good shape. So I
used a text editor to disable my last line of "/etc/fstab" then rebooted. The system came right up.
p.s. that one line I disabled in fstab was pointing at a disk which had be unmounted and deslotted shortly after the first boot
30 months ago.
Caveat: I have seen one situation where log files written to files under "/var/log" had filled the
associated partition (some files like wtmp and others under "/var/log/gdm" can grow forever if your system hasn't been rebooted
for a while). Type "df -h" to inspect disk free space. If near full, and if an emergency, you might
consider deleting some of the larger log files before you reboot. Use this command to display files larger than 2MB
find /var/log -size +2M -exec ls -la {} \;
11) Doing better backups for faster system recovery (2019-09-xx)
- If your business requires 100% uptime then you will probably need to go to some sort of cloud-based
solution
- you could completely outsource to companies like Amazon Web Services ( https://aws.amazon.com
) which will make you a short term hero but eventually put yourself out of work
- or you could build your own cloud solution using products like OpenStack
- If you are not ready to jump to a cloud then you should harden your existing stuff
- I thought I was doing due diligence when it comes to doing backups (and restores) but my real-world items 8-10 (above)
proves that I was not
- For most businesses in 2019, I should not need to mention that hardware is now relatively inexpensive -AND- operating
systems almost free (at least this is the case for the open-source variety) so you could install more stand-by platforms if
you can't afford to lose your primary system for too long (say 15 minutes).
- Prior to 2019-09-xx we only only ran two systems, PROD (which is our production platform) and DVLP (which is our development
and qualification platform). We were doing daily backups to magnetic media on a 14-day rotation but here you can see the big
problem: loosing PROD means one of the following:
- recover the platform from magnetic media (will take a very long time)
- build a new platform from optical media then apply changes from magnetic media
- apply changes from magnetic media to DVLP then divert your customers there (but then you would still need to develop a
plan to go back)
- or a fourth option described next
- Since then, I have installed four more systems (two local; two very remote) then use rsync to copy
(twice a day) changes to backup locations from which we can do rapid restores by just copying
New Block Diagram
PROD (Linux) DVLP (Linux) 4 OpenVMS systems
+-----------------+ +-----------------+ +-------------------+
| primary | | primary | | 4 OpenVMS systems |
+-----------------+ +-----------------+ +-------------------+
+-----------------+ +-----------------+
| local stand by | | local stand by |
+-----------------+ +-----------------+
+-----------------+ +-----------------+
| remote stand by | | remote stand by |
+-----------------+ +-----------------+
- primary employs rsync to copy to local stand by (same data
facility) several times a day
- primary employs rsync to copy to remote stand by (a different
city more than 100-km away) several times a day
- All Linux systems are currently running CentOS-7.7 with Apache and MariaDB
- having a local stand by can provide peace of mind when you wonder if the next YUM update might break
something
- unlike Amazon or Alibaba, these systems
do very little between 21:00 and 8:00
- this scheme is also useful when migrating to newer server hardware
- The box labeled "4 other systems" are OpenVMS platforms
- these machines used to do daily backups to tape which were delivered off site (M-F, excluding holidays)
- Now, these machines copy their backups into a folder on "DVLP Linux primary" which are then rsync'd
to local standby and remote standby every day
12) Python3 caching is currently broken on most Linux distros running SELinux (2019-10-xx)
caveat: this problem covers web applications using Python3 directly (i.e. when not using
Django or WSGI)
- first off, click here to learn about Python3 caching
- now imagine an Apache process running Python3 script /var/www/cgi-bin/file (without a file
extension)
- /var/www/cgi-bin/file is a two-line shell script
- line-1 is a shebang telling the OS to use Python3
- line-2 contains the Python command "import file"
- Python3 will one-time compile /var/www/cgi-bin/file.py saving the result as /var/www/cgi-bin/__pycache__/version_file.pyc
which will fail if SELinux is running (no privs to write back)
- according to RPM finder, many Linux distros have already provided Python3 bindings for SELinux:
but CentOS and RHEL have not do so yet (but they know about the problem and are working on it; just visit https://bugs.centos.org
or https://bugzilla.redhat.com )
so until then, issue this yum command once a week:
- yum list *selinux*python3*
-or-
- yum list *python3*selinux*
or you can build your own solution using sources found at github
- Until the problem is fixed you only have a few options:
- place SELinux in permissive mode.
- this isn't as bad as it sounds provided this is done temporarily; test your web-services via Apache then ensure that
Apache has compiled-cached all the Python3 scripts; then shift SELinux back into enforcing mode
- remember to do this every time you update any Python3 scripts -or- or do major Python3 upgrades (like from 3.6 to
3.8)
- inspect the suggestions SELinux has written to /var/log/message for suggestions then craft your
own temporary fix (this always happens; even when SELinux is in permissive mode)
- live with the problem (Python3 will run like Python2) but remember that every transaction may consume an additional
10-15 mS since you will be always compiling but never caching
update: as of 2020-04-30 I have not seen any movement on this problem for
CentOS-7.7 but I have heard rumors of a beta RPM for CentOS-8.
update: on 2020-05-13 this patch was available from the CentOS repositories: libselinux-python3.x86_64
(ver2.5-15.el7)
comment: perhaps it is not unreasonable to wait 8 months for a patch on unsupported software (CentOS).
Although "Cent" supposedly means "community enterprise", companies requiring faster service would be advised to move to RHEL
along with a support agreement.
13) FUBAR with USB Audio (2019-11-xx)
test #1
- build a CentOS-7.7 system using recipe "Server with GUI"
- log into the GUI console with any priv account other than root
- insert any USB audio device (I tried these two):
- Logitech S150 Digital USB Stereo Speakers
- Ugreen USB to Audio
- audio testing from any software app (including Gnome audio settings) produces no audio
- now log out then back in as root
- audio testing now works properly
test #2
- build a CentOS-7.7 system using recipe "Gnome Desktop"
- log into the GUI console with any priv account other than root
- insert any USB audio device
- audio testing works
comments:
- USB storage devices (thumb-drives, hard drives, and DVD/CD drives) always work properly (they are owned by the GUI session
logged into the console) so I wonder why this doesn't always happen with USB Audio devices. Perhaps this could this be fixed
by making entries into one of the sudo files under /etc
- After you get control of your audio device from Gnome Audio Settings, audio streaming from internet
radio stations only work properly from Google Chrome (78) but not Firefox (68)
- installing Google Chrome:
- Firefox Linux updates:
- Firefox version 78.11.0 now works (tested 2021-06-30 with CentOS-7.9 and Rocky Linux-8.4)
14) One LVM volume is too big, the other too small
- Our production CentOS systems have been up and running for 560 days with not too many difficulties
- Our primary PROD and DVLP platforms are implemented on HP DL385p_gen8 servers with 8-drives configured as a single RAID-60
volume with 1-TB of space
- I was a Linux newbie when I installed CentOS-7 on these machines so went with the suggested partitioning and LVM (a
big mistake). This means that the LVM representing slash (a.k.a. root) is sitting at 50-GB whilst the LVM
representing slash-home is sitting at 950-GB (this would be okay if we were in a university with a lot of interactive users;
but we only have 3 interactive accounts and 6 SAMBA shares)
- The problem here is that our MariaDB database (an alternate fork of MySQL) has grown to the point that we've only got 30%
free space on the root LVM
caveat: I was a little wiser when I set up the 4 backup systems (2 local, 2 remote). On these machines
I instructed the installer to do a 50-50 split between root and slash-home.
- According to this document it should be easy to free up space on one LVM then apply it to the other LVM while
the system is running
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_logical_volumes/assembly_modifying-logical-volume-size-configuring-and-managing-logical-volumes
but we have a problem. Apparently you cannot use lvreduce on an xfs formatted volume. Note that xfs is what
I see when I type either one of these commands:
command |
explanation |
df -Th |
display disk free space in human units with file system Type |
mount | grep mapper |
display mount points (1) |
mount | grep centos |
display mount points (2) |
- Also ignore all internet advice claiming you can reduce the size of an xfs volume because it is not possible. "xfs" is
really fast because it was only made to grow.
- At this point we've got three or four options:
- shutdown the application then take the database offline; move the database from the root LVM to
the slash-home LVM; restart the application
note: I've got a few example procedures which involve about 12 steps when SELinux is present
- Backup the LVM volume associated with "/dev/mapper/centos-home" then delete it. Create a smaller version then restore
your backup into it
note: the application does not needed to be taken off line
# |
step |
comments |
1 |
a) log on as root
b) make sure no interactive users are logged in
c) stop-disable any cron jobs which may need "/home"
d) stop the SAMBA service |
interactive non-root users require resources in "/home" which we want to dismount
(so you must not log into a non-root account then use "su" or "sudo")
command: sudo service smb stop |
2 |
mkdir /hack
rsync -aX /home /hack
---
du -a /home
du -a /hack/home |
these two demo lines can only be used if you have sufficient space on "/"
need "-X" to ensure we also get meta data (includes SELinux stuff)
----
check the src file count
verify the dst file count (do they match?) |
3 |
umount /dev/mapper/centos-home |
un-mount this volume |
4 |
lvremove /dev/mapper/centos-home |
delete the volume (danger; you will pass the point of no return) |
5 |
lvcreate -L 400GB -n home centos |
create a smaller replacement LVM (keep 100 GB free for some unforeseen need) |
6 |
mkfs.xfs -f /dev/centos/home |
format the volume with xfs
note1: here the "-f" switch means "force"
note2: normally I would have used this command "mkfs -t xfs /dev/centos/home"
but it does not support "-f" |
7 |
mount -a |
mount everything in "/etc/fstab" which is not mounted
alternatively use this command: "mount /dev/mapper/centos/home /home" |
8 |
lvextend -r -L500GB /dev/mapper/centos-root |
extend this volume to 500 GB; use "-r" to resize
without "-r" you will only see the new size in "lvs"
with "-r" you will also see the new size in "df -h" |
9 |
rsync -aX /hack/home /
du -a /hack/home
du -a /home |
restore the contents of the LVM associated with /home
check src file count
verify dst file count (do they match?) |
10 |
start the SAMBA service
is it running? |
command: sudo service smb start
command: sudo service smb status |
- Both our LVM volumes are sitting on the same RAID-60 volume so having two LVM volumes is surely redundant. We will
backup the LVM associated with "/home" then delete it via lvremove. We also need to delete the "/home" mount point (it was
created with the "-p" switch) via rmdir. Now just restore the backup into "/home"
Caveat: remember to edit "/etc/fstab" then disable the line where "/dev/mapper/centos-home" was mounted as "/home". Why?
The system may not properly boot to a functional state without this step. See tip-10 above
note: the application does not needed to be taken off line
- shutdown the application then take the database offline; take the production server offline; take the production backup
server offline; swap node names and i/p address; bring everything back up
note: this will always be the default action if any of the above operations fail for whatever reason
Preliminary Steps
- I downloaded a copy of Oracle VM VirtualBox from here ( https://www.oracle.com/virtualization/virtualbox/
) then installed it on my PC
- I next created a Red-Hat virtual machine (make sure your virtual hard disk is at least 100-GB in size for this 2-LVM
experiment) then installed CentOS-7.7 into it with the default options (about 50-GB was assigned to each LVM)
- Testing Option-2
- preliminary tests worked as a can log in with my non-root user account
- I next installed CentOS-7.7 into a surplus server (HP DL385p_gen8 with single 1-TB disk volume) in my lab
- Testing Option-2 (attempt 1)
- preliminary tests worked as a can log in with my non-root user account
- secondary tests failed (I cannot start my Chrome browser or listen to any audio from NPR radio stations); this
appears to be an SELinux problem since the the event logs are full of those kinds of messages
- logged back on as root then ran this SELinux command which (I think) fixed my account problems because my Chrome
browser now works properly
restorecon -rv /home
but it would have been better to not rely on something like this which is done after-the-fact
- I realized that I should have been using the "-X" switch in all my rsync operations so I updated the table above
- I will repeat the total Option-2 experiment tomorrow
- Testing Option-2 (attempt 2)
- preliminary tests worked as a can log in with my non-root user account
- secondary tests passed as my non-root account can use the browser with streaming audio (yay!)
Next Steps
- time to try this on our DVLP box (I need to do it on a Saturday when no interactive users are locking files in slash-home
- 2020-08-15: executed option-2 (above) on node "kawc4n" (DVLP); appears to be 100% successful; the
whole procedure took a little less than 2-hours because I had to restore 205-GB over a 1-Gb/s Ethernet
- 2020-08-22: executed option-2 (above) on node "kawc0f" (PROD); appears to be 100% successful; the
whole procedure took a little less than 1-hour because the contents of slash home was on 5-GB so I first performed an rsync
backup to a local folder
- conversions complete!
15) Yum is failing to initialize on one system of six (2020-09-xx)
I am running 6 servers (one PROD, one DVLP, two local shadows, two remote shadows) but YUM is failing to initialize on the oldest
unit (both PROD and DVLP were built in 2018 as CentOS-7.5 then YUM updated to CentOS-7.6)
Now inspect the following (pay attention to the red text - especially the last one just before the final prompt)
[root@kawc0f /]#
yum makecache fast
Loaded plugins: fastestmirror, langpacks
Determining fastest mirrors
epel/x86_64/metalink | 16 kB 00:00:00
Could not retrieve mirrorlist ...
...
https://mirrors.iuscommunity.org/mirrorlist?repo=ius-centos7&arch=x86_64&protocol=http
error was 14: HTTPS Error 404 - Not Found
One of the configured repositories failed (Unknown), and yum doesn't have enough cached
data to continue. At this point the only safe thing yum can do is fail. There are a few
ways to work "fix" this:
1. Contact the upstream for the repository and get them to fix the problem.
2. Reconfigure the baseurl/etc. for the repository, to point to a working upstream. This is
most often useful if you are using a newer distribution release than is supported by the
repository (and the packages for the previous distribution release still work).
3. Run the command with the repository temporarily disabled yum --disablerepo=<repoid> ...
4. Disable the repository permanently, so yum won't use it by default. Yum will then just
ignore the repository until you permanently enable it again or use --enablerepo for
temporary usage:
yum-config-manager --disable <repoid> or subscription-manager repos --disable=<repoid>
5. Configure the failing repository to be skipped, if it is unavailable. Note that yum will
try to contact the repo. when it runs most commands, so will have to try and fail each
time (and thus. yum will be be much slower). If it is a very temporary problem though,
this is often a nice compromise:
yum-config-manager --save --setopt=<repoid>.skip_if_unavailable=true
Cannot find a valid baseurl for repo: ius/x86_64
[root@kawc0f /]#
Now Google the red quoted phrase "
Cannot find a valid baseurl for repo: ius/x86_64" which returns hits like this:
https://github.com/iusrepo/announce/issues/18
where we learn that many of the repositories have moved from the ".org" domain to the ".io" domain. Apparently support for ".org"
ended in April-2020.
Back in the day you would need to execute something like this:
yum install -y https://centos7.iuscommunity.org/ius-release.rpm
but this fixed my problem:
yum install -y https://repo.ius.io/ius-release-el7.rpm
16) The future of CentOS (the invisible hand of the market?)
- According to Wikipedia, the CentOS Linux distro project began in 2004
- In 2014 the CentOS project was informally merged with RHEL (Red Hat)
- Red Hat became a subsidiary of IBM on July 9, 2019
- On 2020-12-07 we learned from this blog post "future-is-centos-stream"
that IBM: Red Hat has decided to break up the current relationship between CentOS and RHEL
Articles:
With IBM at the helm, some have suggested that the Red Hat be renamed Blue
Hat
Comments:
- Scientific Linux (SL) first appeared in 2004 and was popular
among scientists working at FermiLab, CERN and DESY to just name three of many. With the success of CentOS many organizations
were convinced to swap CentOS for Scientific Linux. For example CERN (the home of the LHC) began favoring CentOS in 2015
although SL is
still supported at CERN as of Dec-2020. Red Hat made an end-of-life
announcement for Scientific Linux in 2019 (before Red Hat was acquired by IBM).
- When it comes to Linux, my employer uses RHEL for all customer-facing production platforms and CentOS for everything else
(this includes everything from application development, user acceptance, hands-on Linux training, etc). CentOS was also being
used as on on-ramp for driving UNIX projects onto RHEL platforms. It appears that IBM has thrown a monkey-wrench into those
plans. I have no idea what the future holds but history can be instructive. Recall that when Michael Widenius and others
didn't like where SUN was taking MySQL, they created MariaDB (that decision seem prescient after Oracle acquired SUN; then
promised the EU not to kill MySQL; then slowed MySQL bug fixes for more than a year until they noticed that the Linux
community was preferentially installing MariaDB).
17) All four GUI consoles are locked up
caveat: all work here was done from a non-GUI session (usually a network connection)
- Due to the COVID-19 pandemic, I only take a trip into the office once a week (usually Friday afternoons) just to ensure the
environment is secure.
- We are running four CentOS servers (all: HP DL385p Gen8; one PROD; one DVLP; a hot standbys for each)
- On my Friday afternoon routines I always check the drive LEDs but never check the consoles which are usually dark (in
energy-saving mode)
- During my walk-through today:
- I noticed PROD-console was not dark AND contained a solid charcoal-colored background with some green OK messages "I
think" came from systemctl. I was able to get a non-GUI login prompt after hitting the CTRL-ALT-2 combo (3 and 4
worked as well)
- I noticed DVLP-console was not dark AND contained a solid-black background with some white text messages "I think" came
from dmesg. I was able to do the CTRL-ALT-2 thing here as well
- The two hot standby units only presented text-prompts (no GUI).
- Logging onto them then typing startx did not bring up a GUI. Typing systemctl get-default returned
"graphical.target". Typing systemctl isolate graphical.target did nothing.
- All units had not been running for more than 400 days.
- Step-1 (hot standby units)
- I tried both yum check-update and yum update but nothing was offered (perhaps these Linux instances were
too old)
- since no humans were logged into these, I used yum upgrade to bring them up from CentOS-7.7 to CentOS-7.9 then
rebooted
- Still no GUI on the consoles (even after logging in then typing startx) so I typed this:
- yum groups install "GNOME Desktop" (why wasn't this stuff updated during the OS upgrade?)
- systemctl isolate multi-user.target (equivalent to setting runlevel=3 on UNIX boxes)
- systemctl isolate graphical.target (equivalent to setting runlevel=5 on UNIX boxes)
- Now the GUI auto-magically appeared on both consoles.
- Step-2 (DVLP and PROD)
- People logged on here so a reboot was not possible.
- So I repeated steps 2-3 above which moved TTY1 out of GUI mode but could not put it back.
- Next, I repeated steps 1-3 which brought back the console GUI.
going forward
- We employ graphical consoles for the odd time that we prefer do something quickly (like make changes to the software
firewall or reconfiguring a NIC). But this is a data center where console devices usually do not exist. On top of that, it is
becoming apparent to me that GUI consoles are more trouble than they are worth so I'm going to permanently move these systems
from graphical.target to multi-user.target with the hope that a simple startx will be all that is
required for occasional graphical support at the console:
systemctl isolate multi-user.target |
this temporary command makes immediate changes |
systemctl set-default multi-user.target |
this permanent command will affect the next reboot |
- caveat: remember that you will need to log out twice. Once from GUI mode then once from text mode
18) Finally solved a slow-response problem (2021-06-18)
- I have been running six CentOS systems for the past few years (I run rsync multiple times a day to keep local-copy and
remote-copy systems reasonably up-to-date)
humans here
|
no humans here |
no humans here |
PROD |
local-copy |
remote-copy |
DVLP |
local-copy |
remote-copy |
- DVLP-remote-copy has never worked properly from an interactive point of view although rsync jobs to
it are fine
- symptoms:
- solution
- for a time I thought the canary messages were associated with a bad USB device so I had someone unplug the console
mouse and keyboard but this did nothing (the remote location is 160 km away)
- for a time I thought it might be a BIOS problem since all my working machines employed a BIOS from 2014 whist this
one was from 2016. HP's BIOS release notes contain a lot of references to timing issues associated with AMD processors
so I decided to hack (er, play)
- I typed the 'lscpu' command so I could see which cores were where
- next, I disabled all cores associated with the second CPU and this fixed my slow-response problems.
- use the man pages to learn how to disable CPU cores or review the lines in this BASH script: cpu_control.sh
19) Updating an old installation in stages (2021-10-xx)
caveat: this problem is on going (so the follow proposed solution is untested)
20) rpm hack to get mpack + munpack on CentOS-7
- I am attempting to move an inbound email processing interface from OpenVMS to CentOS-7
- The old application is dependent upon munpack (mime-unpack) which is not available on CentOS-7 but was available on
CentOS-6, UNIX and Windows
- To make matters worse, if you drop these two quoted words: "centos 7" "munpack" into a google search, you will be
referred to a Red Hat developer site where you are instructed to use uudecode which can only be used on pieces of an
email after it has been pulled apart (so uuencode does not come close to being a drop in replacement for munpack)
- Another promising program is ripmime but what do you do if you really want to stick with munpack?
- I decided to first look at the application source files which were first published in UNIX. Then I looked at the CentOS-6
rpm file to see if it would be easier to modify it for use with CentOS-7. And that is when I discovered two binary executable
files which can be used as-is:
- I tested them on my CentOS-7 and they worked without any complaint
- even though the command "munpack -?" only shows switches '-f' and '-q', the program supports '-t' (text try
harder) which means this implementation is really version 1.6-2b which is seen on some OpenVMS systems
- Here are my raw notes:
caveat: ('el6' means 'enterprise linux 6' so the rpm is for CentOS-6 or RHEL-6):
======================================================================================
title : mpack_notes.txt
author : Neil Rieck
created: 2021-11-17
edit : 2021-11-18
notes : the 'munpack' utility is also found here
platfom: CentOS-7
stanzas:
1) playing with file mpack-1.6.tar.gz (mpack + munpack for UNIX + Windows)
2) playing with file mpack-1.6-2.el6.rf.x86_64.rpm (mpack + munpack for CentOS-6)
======================================================================================
1) mpack-1.6.tar.gz
tar -tvf mpack-1.6.tar.gz # list(test) verbosely (f=mpack-1.6.tar.gz)
tar -xvf mpack-1.6.tar.gz # extract verbosely (f=mpack-1.6.tar.gz)
# note: creates folder 'mpack-1.6'
tar -xvf mpack-1.6.tar.gz -C yada # place output in folder 'yada'
======================================================================================
2) mpack-1.6-2.el6.rf.x86_64.rpm
mkdir mpack_rpm_hack # create directory
cp mpack-1.6-2.el6.rf.x86_64.rpm mpack_rpm_hack # copy file to folder
cd mpack_rpm_hack # move into folder
rpm2cpio mpack-1.6-2.el6.rf.x86_64.rpm | cpio -idmv # extract contents
tree --charset="ascii" # see the mess
.
|-- mpack-1.6-2.el6.rf.x86_64.rpm
`-- usr
|-- bin
| |-- mpack
| `-- munpack
`-- share
|-- doc
| `-- mpack-1.6
| |-- Changes
| |-- INSTALL
| |-- README.mac
| `-- README.unix
`-- man
`-- man1
|-- mpack.1.gz
`-- munpack.1.gz
7 directories, 9 files
[neil@kawc4n mpack_rpm_hack]$
./usr/bin/munpack -? # test the binary as-is
munpack version 1.6 # yay!
usage: munpack [-f] [-q] [-C directory] [files...]
21) procmail problems with SELinux on CentOS-7
- I am using procmail v3.22 which is very old
- I just wrote an application where procmailrc starts a Python3 app which:
- opens a connection to a relational database (MySQL - MariaDB) on port 3306
- then sends an email reply on port 25
- The application does not work with SELinux in "enforcing mode" but does work in "permissive mode" (obviously)
- I am seeing messages in /var/log/messages from settroubleshoot indicating problems with:
- some scripts not being executable (as far as SELinux is concerned)
- ports 3306 and 25 being blocked (as far as SELinux is concerned)
- I have installed optional modules so that "man procmail_selinx" works but it now appears that I will need to write my
own SELinux module for this application
- here are two interim solutions: link
22) cannot resolve host names after a CentOS-7 upgrade
- CAVEAT: this problem is specific to my personal "Virtual Private Server" hosted by IONOS
- Since this is just my hobby site, I had not done a "yum update" for over two years.
- So I typed "sudo yum update" (which brought me up to CentOS-7.9) then rebooted.
- But now I cannot do any more updates because this system cannot resolve host names.
- I first checked file "/etc/resolv.conf" which was blank (oops!)
- steps:
<sr> [my-root-prompt]
<ur> ifconfig
<sr> ens192: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 74.208.23.87 netmask 255.255.255.255 broadcast 74.208.23.87
inet6 fe80::250:56ff:fe0a:4fab prefixlen 64 scopeid 0x20<link>
ether 00:50:56:0a:4f:ab txqueuelen 1000 (Ethernet)
RX packets 1822 bytes 254353 (248.3 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 1214 bytes 2218180 (2.1 MiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
[my-root-prompt]
<ur> ip addr
<sr> ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:0a:4f:ab brd ff:ff:ff:ff:ff:ff
inet 74.208.23.87/32 brd 74.208.23.87 scope global dynamic ens192
valid_lft 41794sec preferred_lft 41794sec
inet6 fe80::250:56ff:fe0a:4fab/64 scope link
valid_lft forever preferred_lft forever
[my-root-prompt]
<ur> cd /etc/sysconfig/network-scripts/
<sr> [my-root-prompt]
<ur> ls -l *ens192*
<sr> -rw-r--r-- 1 root root 148 Mar 5 08:00 ifcfg-ens192
[my-root-prompt]
<ur> cat ifcfg-ens192
<sr> BOOTPROTO=dhcp (note: since DHCP we need DNS info from the provider)
DEVICE=ens192
DHCPV6C=yes
DHCPV6C_OPTIONS="-nw"
IPV6_AUTOCONF=yes
IPV6INIT=yes
NM_CONTROLLED=no
ONBOOT=yes
TYPE=Ethernet
[my-root-prompt]
<ur> vim ifcfg-ens192 (edit desired file with vim or nano)
PEERDNS=yes (add on this line)
<sr> [my-root-prompt]
<ur> systemctl restart network.service
<sr> [my-root-prompt]
<ur> nslookup ibm.com
<sr> Server: 212.227.123.16
Address: 212.227.123.16#53
Non-authoritative answer: (success)
Name: ibm.com
Address: 23.35.139.245
Name: ibm.com
Address: 2600:1407:21:282::3831
Name: ibm.com
Address: 2600:1407:21:28f::3831
[my-root-prompt]
----------------------------------------- optional
<ur> cd /etc
<sr> [my-root-prompt]
<ur> cp resolv.conf resolv.conf-copy
[my-root-prompt]
Back to
Linux Notes

Back to
Home
Neil Rieck
Waterloo, Ontario, Canada.