Python Notes: Compiling + Caching
- The information presented here is intended for educational use.
- The information presented here is provided free of charge, as-is, with no warranty of any kind.
- Edit: 2023-08-20
Introduction via Java
Python and Java
are both BYTECODE implementations that run on virtual machines so first a few words about Java
A very high-level overview of Java
- Java executable code is implemented in BYTECODE (rather than binary) as described here:
- Java source code (file has a .java extension) cannot be run as-is so must first be compiled into
BYTECODE (resultant file has a .class extension) before execution by the JVM (Java Virtual
Machine) which is almost always written in C but sometimes C++
||write a hello-world program using Java syntax
||invoke the java compiler:
1) read "hello-world.java"
2) write "hello-world.class"
||invoke the JVM (Java Virtual Machine)
- Java programs are generally implemented in three ways
||on your desktop
||run client-side by a JVM (Java Virtual Machine) plugin inside your browser
||run server-side by a JVM built into a specialized server like Apache
Tomcat; output is sent to the client (usually a browser)
- Working with servlets can be a little more automated.
- You publish your source code inside a JAR (java archive) file. See this axis2
demo for more details
- Every time someone executes your Java application, the server compares the date-time stamp of the JAR file with the
date-time stamp of the CLASS file (if it exists)
- If the JAR file is newer than the CLASS file (or if the CLASS file doesn't exist), the server will invoke a JIT
(just-in-time) Java compiler to generate a new CLASS file before running it for the first time
- so the first request will experience a short delay
- subsequent requests will experience a much faster response
Now on to Python3
- what follows are some personal observations while working with Python-3.6.8 on CentOS-7.x
- what we refer to as a python interpreter is really a JIT (just in time) python compiler.
- when a python script is presented to the PVM (Python Virtual Machine), it is always JIT
recompiled to BYTECODE, executed, then discarded
- every time the PVM (Python Virtual Machine) encounters an import statement in your python
program, the PVM will look for the associated python source file (see Java servlet description 10-lines above)
- Assuming the python statement was import wizbang123 then the PVM will look for filename wizbang123.py
in the current folder
- If not found then PVM will search Python libraries stored elsewhere (usually a common system-wide location serviced
via the pip3 command)
- If found then PVM-3.6.8 will also look for __pycache__/wizbang123.cpython-36.pyc
just under the location of the imported python file
- verbalized as "dunder pycache dunder slash filename"
- the extension ".pyc" means "python compiled"
- the PVM name-and-version (cpython-36 in this instance) is jammed between the filename and the extension. If you
update to cpython-38 then all this auto-check and auto-compile stuff will be invoked again; the old files hang
- the PVM is called cpython because it was written in "C" (some are written in python)
- If the ".pyc" file was not found then PVM will compile the wizbang123.py file then attempt
to save the BYTECODE results to the "__pycache__" sub-directory
Problem 1: file accessibility
- many of our Python3 programs are served up directly by Apache via directory: "/var/www/cgi-bin/"
- depending upon your Apache config, any sub-directories under cgi-bin (including "__pycache__") will not be accessible from
Apache so you need to modify file "/etc/httpd/conf/httpd.conf" creating a "<Directory" configuration for the cache (must be
readable and writable)
- some python programmers will first test their code "in Apache space" which means that sub-directory "__pycache__" may be
created using the programmer's "owner" and "protection bits". This means that the PVM (when run by Apache) may not be able to:
- open the sub-directory
- read the cached ".pyc" file
- write an updated ".pyc" file
Problem 2: no attempt to save the compiled code (so will be repeated recompiled every time forever)
- the following demo (which could be much larger) will never be cache-checked (or compiled-to-cache) when executed
interactively or served-up by Apache.
# title : /var/www/cgi-bin/name2.py
# author: Neil Rieck
# edit : 2019-10-19
import name3 # name3.py will be cache-checked and/or compiled
import name4 # name4.py will be cache-checked and/or compiled
import name5 # name5.py will be cache-checked and/or compiled
# code that uses the imported libraries goes here
if __name__ == "__main__":
- however, this work-around will force "name2.py" (the previous script) to be cache-checked and/or compiled-to-cache
# title : /var/www/cgi-bin/name1
# author: Neil Rieck
# edit : 2019-10-19
import name2 # name2.py will be cache-checked and/or compiled
#name2.main() # optional (but recommended) step
Problem 3: SELinux
- If you are running RHEL or CentOS then it is highly likely that you are also running
SELinux (Security Enhanced Linux) which may be preventing the PVM from making any changes to anything under
- Start with: tail /var/log/messages (SELinux will write error messages here along with suggestions)
- While debugging your Python3 caching problem, I recommend that you "temporarily" disable SELINX like so:
- type "sestatus" to check the current state.
- If the mode is "enforcing" then type "sudo setenforcing 0" then type "sestatus" to see that the mode is now "permissive"
- If your Python caching problem is now fixed then you will need to issue some "semanage" commands for
"var/www/cgi-bin/__pycache__" described here
- Remember to re-enable SELinux as the commands just provided will not survive a reboot.
Python3 allows you to compile
Back to Home
Waterloo, Ontario, Canada.