Python Notes: Compiling + Caching

  1. The information presented here is intended for educational use.
  2. The information presented here is provided free of charge, as-is, with no warranty of any kind.
  3. Edit: 2023-08-20

Introduction via Java

Python and Java are both BYTECODE implementations that run on virtual machines so first a few words about Java

A very high-level overview of Java

  • Java executable code is implemented in BYTECODE (rather than binary) as described here:
  • Java source code (file has a .java extension) cannot be run as-is so must first be compiled into BYTECODE (resultant file has a .class extension) before execution by the JVM (Java Virtual Machine) which is almost always written in C but sometimes C++
    command file name description
    edit hello-world.java write a hello-world program using Java syntax
    javac hello-world invoke the java compiler:
    1) read "hello-world.java"
    2) write "hello-world.class"
    java hello-world invoke the JVM (Java Virtual Machine)
    read-then-execute "hello-world.class"
  • Java programs are generally implemented in three ways

    description
    interactive on your desktop
    applet run client-side by a JVM (Java Virtual Machine) plugin inside your browser
    servlet run server-side by a JVM built into a specialized server like Apache Tomcat; output is sent to the client (usually a browser)
  • Working with servlets can be a little more automated.
    • You publish your source code inside a JAR (java archive) file. See this axis2 demo for more details
    • Every time someone executes your Java application, the server compares the date-time stamp of the JAR file with the date-time stamp of the CLASS file (if it exists)
    • If the JAR file is newer than the CLASS file (or if the CLASS file doesn't exist), the server will invoke a JIT (just-in-time) Java compiler to generate a new CLASS file before running it for the first time
      • so the first request will experience a short delay
      • subsequent requests will experience a much faster response

Now on to Python3

Overview:

  • what follows are some personal observations while working with Python-3.6.8 on CentOS-7.x
  • what we refer to as a python interpreter is really a JIT (just in time) python compiler.
  • when a python script is presented to the PVM (Python Virtual Machine), it is always JIT recompiled to BYTECODE, executed, then discarded
  • every time the PVM (Python Virtual Machine) encounters an import statement in your python program, the PVM will look for the associated python source file (see Java servlet description 10-lines above)
    • Assuming the python statement was import wizbang123 then the PVM will look for filename wizbang123.py in the current folder
      • If not found then PVM will search Python libraries stored elsewhere (usually a common system-wide location serviced via the pip3 command)
      • If found then PVM-3.6.8 will also look for __pycache__/wizbang123.cpython-36.pyc just under the location of the imported python file
        • verbalized as "dunder pycache dunder slash filename"
        • the extension ".pyc" means "python compiled"
        • the PVM name-and-version (cpython-36 in this instance) is jammed between the filename and the extension. If you update to cpython-38 then all this auto-check and auto-compile stuff will be invoked again; the old files hang around forever
        • the PVM is called cpython because it was written in "C" (some are written in python)
      • If the ".pyc" file was not found then PVM will compile the wizbang123.py file then attempt to save the BYTECODE results to the "__pycache__" sub-directory

Problem 1: file accessibility

  • many of our Python3 programs are served up directly by Apache via directory: "/var/www/cgi-bin/"
  • depending upon your Apache config, any sub-directories under cgi-bin (including "__pycache__") will not be accessible from Apache so you need to modify file "/etc/httpd/conf/httpd.conf" creating a "<Directory" configuration for the cache (must be readable and writable)
  • some python programmers will first test their code "in Apache space" which means that sub-directory "__pycache__" may be created using the programmer's "owner" and "protection bits". This means that the PVM (when run by Apache) may not be able to:
    • open the sub-directory
    • read the cached ".pyc" file
    • write an updated ".pyc" file

Problem 2: no attempt to save the compiled code (so will be repeated recompiled every time forever)

  • the following demo (which could be much larger) will never be cache-checked (or compiled-to-cache) when executed interactively or served-up by Apache.
    #!/usr/bin/python3
    # title : /var/www/cgi-bin/name2.py
    # author: Neil Rieck
    # edit  : 2019-10-19
    # ==================
    import name3  # name3.py will be cache-checked and/or compiled
    import name4  # name4.py will be cache-checked and/or compiled
    import name5  # name5.py will be cache-checked and/or compiled
    #
    def main():
    #
    # code that uses the imported libraries goes here
    #
    if __name__ == "__main__":
    main()
    #
  • however, this work-around will force "name2.py" (the previous script) to be cache-checked and/or compiled-to-cache
    #!/usr/bin/python3
    # title : /var/www/cgi-bin/name1
    # author: Neil Rieck
    # edit  : 2019-10-19
    # ==================
    import name2	# name2.py will be cache-checked and/or compiled
    #name2.main()	# optional (but recommended) step

Problem 3: SELinux

  • If you are running RHEL or CentOS then it is highly likely that you are also running SELinux (Security Enhanced Linux) which may be preventing the PVM from making any changes to anything under "var/www/cgi-bin/"
  • Start with: tail /var/log/messages (SELinux will write error messages here along with suggestions)
  • While debugging your Python3 caching problem, I recommend that you "temporarily" disable SELINX like so:
    • type "sestatus" to check the current state.
    • If the mode is "enforcing" then type "sudo setenforcing 0" then type "sestatus" to see that the mode is now "permissive"
  • If your Python caching problem is now fixed then you will need to issue some "semanage" commands for "var/www/cgi-bin/__pycache__" described here
  • Remember to re-enable SELinux as the commands just provided will not survive a reboot.

 Python3 allows you to compile

  • Not sure if manual compile was developed to deal with situations like SELinux or not, but you might with to use it when publishing a python3 program on a troublesome system.  You should remember to do a manual compile then ensure Apache has the privs to read the cached file as well as sub-directory "__pycache__". In fact, everything under "/var/www" should be owned by Apache.
    method #1
    =========
    python3
    import py_compile
    py_compile("name1.py")
    exit()
    
    method #2
    =========
    python3 -m py_compile name1.py

External Links


 Back to Home
 Neil Rieck
 Waterloo, Ontario, Canada.