OpenVMS Notes: Apache HTTPd

  1. The information presented here is intended for educational use by qualified OpenVMS technologists.
  2. The information presented here is provided free of charge, as-is, with no warranty of any kind.
Edit: 2019-04-30 (updated <FilesMatch> )

Introduction

  1. Click here to visit Apache at DigitalApache got its name by being "a patched" NCSA web server. Back in the early 1990s system managers would download the public domain NCSA web server code then apply patches to it so that it worked properly. After a period of years it made more sense to download the patched product as a complete kit, so the not-for-profit Apache Foundation was created. Apache is open source which is the same paradigm making Linux and MariaDB so powerful and popular:
    "You take the best computer people on the planet and let them collaborate in a world wide public forum (the internet) to produce a product that is better than any commercial variety"

    According to www.netcraft.com approximately 65% of the web servers in the world are based upon Apache

  2. OpenVMS packages from Compaq HP/HPE (Alpha and Itanium only)
    Package Apache OpenSSL Notes
    CSWS-1.2 Apache 1.3.20 OpenSSL 0.9.5a minimum OS: OpenVMS-7.2 Alpha
    CSMS-1.3 Apache 1.3.26 OpenSSL 0.9.6b higher levels of OpenSSL with patches
    SWS-2.0 Apache 2.0.47 OpenSSL 0.9.6g minimum OS: OpenVMS-7.3 Alpha
    higher levels of OpenSSL with patches
    SWS-2.1 Apache 2.0.52 OpenSSL 0.9.7d higher levels of OpenSSL with patches
    SWS-2.2 Apache 2.0.63 OpenSSL 0.9.8h higher levels of OpenSSL with patches
    Notes:
    • Compaq's port of Apache HTTPd for OpenVMS Alpha was called CSWS (Compaq Secure Web Server) pronounced "C-Swiss"
    • After HP merged with Compaq, CSWS was unofficially rebranded to SWS (pronounced "Swiss")
       
  3. HTTPd offerings from other sources
    1. WASD HTTPd is free from VSM Software Services in Australia (VAX, Alpha, and Itanium)
    2. Purveyor is sold by www.process.com (VAX and Alpha; it has been dropped from their web site but you can still buy it with "no support")
    3. OSU DECthreads HTTPd (VAX and Alpha; it is no longer available from Ohio State University but can still be found on the FREEWARE Disks)
  4. caveat: do not even think about using https with with items 2 + 3

Do all your experiments on Windows

  1. If you are overly cautious and do not want to mess up your OpenVMS system, download Apache HTTPd for Windows and install it on your PC.
    • Be sure to first shut down Microsoft IIS (Internet Information Services)
    • While IIS is much easier to set up and configure, Apache is more feature-rich
       
  2. Just like the previous example, if you are going to play with Java Server Pages (jsp) or Java Servlets, then you should download and install Apache Tomcat for Windows. This is a standalone server which normally runs on port 8080 and doesn't require Apache. If Apache is installed, be sure to configure the Tomcat connector which allows Apache HTTPd to communicate with Tomcat over a back-channel.
     
  3. PHP is not an Apache product but Windows versions are available here
  4. Many people prefer to experiment with this all-in-one offering: XAMPP

Problems With CGI (common gateway interface)

While quite a bit of Apache documentation exists, CGI information is sparse but some can be found here

Be careful when Googling the phrase CGI because you might accidentally end up at www.cgi.com which happens to be the name of a computer services company headquartered in Montreal, Quebec, Canada.

CGI Tip-1 (evolution from beginner to expert)

Many older VMS programmers begin developing CGI-based web applications using this inefficient model:

  1. Triggered by a browser event, Apache accesses a document in one of the script directories (this is usually a DCL script; ignore PHP and server-side Java for now)
     
  2. The DCL script attempts to detect REQUEST_METHOD (or WWW_REQUEST_METHOD if not Apache) which should be "GET" or "POST".
     
  3. If your script detected "GET" or "POST" then it calls an external program to read the CGI data which is then translated into either "DCL symbols" or "process-level logical names"
     
  4. The script now runs the desired web application.

Quick Blurb about HTTP + HTML

Your server-side program should send something like this back to Apache:

	Status: 200
	Content-Type: text/html

	<!DOCTYPE html>
	<html>
	<head>
	<title>bla bla bla</title>
	</head>
	<body>bla bla bla
	</body>
	</html>

Begin with (mostly) HTTP directives then use a blank line before sending HTML
Apache will translate "Status: 200" into something like "HTTP/1.1 200"

Example DCL-based CGI script which must located in one of Apache's script directories
(this script invokes two executable images)

$  set noon					! do not stop on errors
$  say :== write sys$output			!
$  debug = f$trnlnm("NEIL$DEBUG","LNM$SYSTEM_TABLE")
$  if (debug .eqs. "Y")				! if CGI debugging is desired
$  then						!
$    say "Status: 200"				! start of document header
$    say "Content-Type: text/html"		! mime type declaration
$    say ""					! end of document header
$    say "<html><head></head>"			! start of HTML
$    say "<body><pre>"				!
$  endif
$! temps = "''WWW_REQUEST_METHOD'"		! purveyor method
$  temps = "''REQUEST_METHOD'"			! apache method
$  if (temps .eqs. "POST") .or.  -
      (temps .eqs. "GET")			!  
$  then						!
$     run csmis$exe:read_html_apache.exe	! create DCL symbols (and sets $status to 1 if ok)
$  endif					!
$  run csmis$exe:desired_application.exe	! this is our desired web application
$  if debug .eqs. "Y"				!
$  then						!
$    say "</pre>"				!
$    say "</body></html>"			!
$  endif					!

As your platform begins to serve hundreds of hits per hour you can reduce overhead by moving "symbol reading logic" from "read_html_apache.exe" into your application.

$  set noon					! do not stop on errors
$  say :== write sys$output			!
$  debug = f$trnlnm("NEIL$DEBUG","LNM$SYSTEM_TABLE")
$  if (debug .eqs. "Y")				! if CGI debugging is desired
$  then						!
$    say "Status: 200"				! start of document header
$    say "Content-Type: text/html"		! mime type declaration
$    say ""					! end of document header
$    say "<html><head></head>"			! start of HTML
$    say "<body><pre>"				!
$  endif					!
$  run csmis$exe:desired_application.exe	! this is our desired application
$  if debug .eqs. "Y"				!
$  then						!
$    say "</pre>"				!
$    say "</body></html>"			!
$  endif					!

or this:

$  set noon					! do not stop on errors
$  run csmis$exe:desired_application.exe	! this is our desired application (should return 1)
$  rc = f$integer($STATUS)			!
$  if ((rc .and. 7) .neq. 1)			! 1=success, 2=error, 4=fatal
$  then						!
$    say :== write sys$output			!
$    say "Status: 500"				! start of document header
$    say "Content-Type: text/html"		! mime type declaration
$    say ""					! end of document header
$    say "<html><head></head>"			! start of HTML
$    say "<body><pre>"				!
$    say "Script: 2001"				!
$    say "error: ",rc				!
$    say "</pre></body></html>"			!
$  endif					!

or this:

$ run csmis$exe:desired_application.exe		! note: "csmis$exe" is a directory on my platform

Because your ultimate goal is to run the binary application directly from a specially configured Apache directory. This can be done my enabling one, or more, directories to run executable binaries. In the NCSA web server, some directories were enabled to run applications by default. In Apache the default settings for scripting and running applications are disabled for security reasons. You turn them on my creating/modifying <Directory> declarations in a file named APACHE$COMMON:[000000.CONF]HTTPD.CONF

CGI Tip-2 (debugging)

If you're familiar with writing CGI's for either Purveyor (Process Software Corporation) or OSU DECthreads then you'll probably recognize the "show symbol" and "show logical" lines in the following CGI script.

$  set noon				! do not stop on errors
$  say :== write sys$output		!
$  say "Status: 200"			! start of document header
$  say "Content-Type: text/html"	!
$  say ""				! end of document header
$  say "<html><head></head>"		! start of HTML
$  say "<body><pre>"			!
$  show symbol /local/all		! show web-server process-level local symbols
$  show symbol /global/all		! show web-server process-level global symbols 
$  show logical/proc/job		! show web-server job-level logical names 
$! temps = "''WWW_REQUEST_METHOD'"	x purveyor method
$  temps = "''REQUEST_METHOD'"		! apache method
$  if (temps .eqs. "POST") .or.  -
      (temps .eqs. "GET")  
$  then
$     run csmis$exe:read_html_apache.exe ! create DCL symbols (and sets $status to 1 if ok)
$  endif
$  show symbol /local/all		! show all process-level local symbols
$  show symbol /global/all		! show all process-level global symbols 
$  show logical/proc/job		! show all job-level logical names 
$  say "</pre></body></html>"		! end of HTML

The program just listed will not work properly with CSWS because the interface has been locked down to limit hacking. This means that "$show symbol /all" statement won't display any environment variables passed to the CGI by the server. (but you will see FORM variables which were created by your application). To view server-created variables you must explicitly request them by name like this:

$ show symbol/local  SYMBOL-NAME
$ show symbol/global SYMBOL-NAME

... which means you need to know their names ahead of time. Inspect the contents of script "TEST-CGI-VMS.COM" to see what I mean. Note that this incomplete file is missing environment variables AUTH_TYPE, HTTP_COOKIE, REMOTE_USER (and probably a few more).

CGI Tip-3 (controlling the server with logical names)

Starting CSWS with "certain system-level logical names" will modify Apache's operation. Put these declarations in script sys$manager:SYSTARTUP_VMS.COM just before you invoke @sys$startup:APACHE$STARTUP.COM

  1. Inspect APACHE$CGI_MODE in the table below. You will need to start this with setting "1" or "2" if you want to process cookies larger than 970 characters (pretty much standard fare these days). I always use "1" but needed to write a stand-alone function to detect when the symbol overflows into multi-item logical names.
     
  2. I'm don't think that APACHE$SHOW_CGI_SYMBOL actually does a "wildcard show symbol" operation; it seems to run program TEST-CGI-VMS.EXE which was compiled from TEST-CGI-VMS.C which means that environment variables are probably missing from this program too.
     
  3. If you're porting CGI from Purveyor to CSWS, you might want to define logical APACHE$PREFIX_DCL_CGI_SYMBOLS_WWW to "YES" so that the environmental variables don't need to be changed.

This table was copied from Compaq's "V2.1-1 Installation and Configuration Guide" (more logical names were added with 2.2)

Table 3-5 User Defined Logical Names
Logical Name Description
APACHE$BG_PIPE_BUFFER_SIZE
(New in Version 2.0)
System logical name that is used to set the socket pipe buffer size for exec functions.
If this logical is not set, the default is 32767.
APACHE$CGI_BYPASS_OWNER_CHECK
(Obsolete in V2.x)
If defined to any value, this logical name causes the Secure Web Server to bypass the file owner check
of the CGI script file. The default is to enforce the owner check on CGI script files for security purposes.
APACHE$CGI_MODE System logical name that controls how CGI environment variables are defined in the executing CGI
process. There are three different options. Note that only one option is available at a time.
0 Default. Environment variables are defined as local symbols and are truncated at 970 (limitable with DEC C).
1 Environment variables are defined as local symbols unless they are greater than 970 characters. If the environment value is greater than 970 characters, it is defined as a multi-item logical.
2 Environment variables are defined as logicals. If the environment value is greater than 255 characters,
it is defined as a multi-item logical.
APACHE$CREATE_SYMBOLS_GLOBAL If defined, this system logical name causes CGI environment symbols to be defined globally.
They are defined locally by default.
APACHE$DAV_DBM_TYPE
(New in Version 2.0)
Used to define the desired DBM organization to use for MOD_DAV.
The valid options for this logical are: GDBM, SDBM, VDBM. If this logical is not set, the default is VDBM.
APACHE$DEBUG_DCL_CGI If defined, this system logical name enables APACHE$VERIFY_DCL_CGI and APACHE$SHOW_CGI_SYMBOL.
APACHE$DL_CASE
(New in Version 2.0)
System logical name that controls how Apache will locate shareable entry points.
There are four different options. Note that only one option is available at a time.
1 Entry points are located using upper case search.
2 Entry points are located using mixed case search.
3 Default. Entry points are located using upper case search, then mixed case search.
4 Entry points are located using mixed case search, then upper case search.
APACHE$DL_FORCE_UPPERCASE
(Obsolete in Version 2.x)
If defined to be true (1, T, or Y), this system logical name forces case-sensitive dynamic image activation symbol lookups. By default, symbol lookups are first done in a case-sensitive manner and then, if failed, a second attempt is made using case-insensitive symbol lookups. This fallback behavior can be disabled with APACHE$DL_NO_UPPERCASE_FALLBACK.
APACHE$DL_NO_UPPERCASE_FALLBACK
(Obsolete in Version 2.x)
If defined to be true (1, T, or Y), this system logical name disables case-insensitive symbol name lookups whenever case-sensitive lookups fail. See APACHE$DL_FORCE_UPPERCASE
APACHE$FIXBG
(Obsolete in Version 2.x)
System executive mode logical name pointing to installed, shareable images. Not intended to be modified by the user. Replaced by APACHE$SET_CCL.EXE
APACHE$FLIP_CCL (New in Version 2.0) Used by APACHE$SET_CCL.EXE, which replaces APACHE$FIXBG.EXE
APACHE$INPUT Used by CGI programs for PUT/POST methods of reading the input stream. (question: did the HP author mean "GET/POST"?)
APACHE$MB_PIPE_BUFFER_SIZE
(New in Version 2.0)
Used to set the mailbox pipe buffer size for exec functions. If this logical is not set, the default is 4096.
APACHE$PLV_ENABLE_<username>
(Obsolete in Version 2.x)
System executive mode logical name defined during startup and used to control access to the services provided by the APACHE$PRIVILEGED image. Not intended to be modified by the user.
APACHE$PLV_LOGICAL
(Obsolete in Version 2.x)
System executive mode logical name defined during startup and used to control access to the services provided by the APACHE$PRIVILEGED image. Not intended to be modified by the user.
APACHE$PREFIX_DCL_CGI_SYMBOLS_WWW If defined, this system logical name prefixes all CGI environment variable symbols with "WWW_".
By default, no prefix is used.
APACHE$PRIVILEGED
(Obsolete in Version 2.x)
System executive mode logical name pointing to installed, shareable images. Not intended to be modified by the user.
APACHE$READDIR_NO_DOT_FILES
(New in Version 2.0)
Used to disable the simulating of dot files when processing directories. There is no default value.
APACHE$READDIR_NO_NULL_TYPE
(New in Version 2.0)
Used to disable the elimination of the null type which contains a single dot when processing directories.
There is no default value.
APACHE$READDIR_NO_UNIX_OPEN
(New in Version 2.0)
Used to disable the processing of unix files when processing directories. There is no default value.
APACHE$SET_CCL (New in Version 2.0) Used by APACHE$SET_CCL.EXE, which replaces APACHE$FIXBG.EXE
APACHE$SHOW_CGI_SYMBOL If defined, this system logical name provides information for troubleshooting the CGI environment by dumping all of the symbols and logicals (job/process) for a given CGI. Use with APACHE$DEBUG_DCL_CGI.
APACHE$SSL_DBM_TYPE
(New in Version 2.0)
Used to define the desired DBM organization to use for MOD_SSL.
The valid options for this logical are: GDBM, SDBM, VDBM. If this logical is not set, the default is VDBM.
APACHE$SPL_DISABLED
(New in Version 2.0)
Used to determine whether Shared Process Logging is to be disabled. There is no default value.
APACHE$SPL_MAX_BUFFERS
(New in Version 2.0)
Used to determine the maximum buffer quota for each Shared Process Logging mailbox.
If this logical is not set, the default is 10.
APACHE$SPL_MAX_MESSAGE
(New in Version 2.0)
Used to determine the maximum message size for each Shared Process Logging mailbox.
If this logical is not set, the default is 1024.
APACHE$SPL_FLUSH_INTERVAL
(New in Version 2.0)
Used to determine the maximum message count per Shared Process Logging file before data is flushed to disk.
If this logical is not set, the default is 256.
APACHE$USE_CUSTOM_STAT
(New in Version 2.0)
System logical name that is used to indicate that the custom apache stat function should be used rather than the run-time stat function.
APACHE$USER_HOME_PATH_UPPERCASE
(Obsolete in Version 2.x)
If defined to be true (1, T, or Y), this system logical name uppercases device and directory components for user home directories when matching pathnames in <DIRECTORY> containers. This provides backward compatibility for sites that specify these components in uppercase within <DIRECTORY> containers. See the UserDir directive in
Modules and Directives section for more information.
APACHE$VERIFY_DCL_CGI If defined, this system logical name provides information for troubleshooting DCL command procedure CGIs
by forcing a SET VERIFY before executing any DCL CGI. Use with APACHE$DEBUG_DCL_CGI.

CGI Tip-4 (authorization problems)

I was recently working on an CSWS application for my employer to do the following:

  1. upon accessing a given web page, prompt the user for a username and password and then validate it against SYSUAF in OpenVMS
     
  2. display the web page, allowing the user to fill out a form
     
  3. upon clicking submit we would run a DCL script to:
    1. determine the REQUEST_METHOD
    2. determine the USERNAME previously entered
    3. run a VMS-BASIC program to handle the request then respond with some HTML

First off, I enabled these two lines in HTTPD.CONF

LoadModule auth_module         modules/mod_auth.exe
LoadModule auth_openvms_module modules/mod_auth_openvms.exe 

...then restarted the server. Everything seemed to work as expected except that I couldn't get any USERNAME info to show up in the DCL script. Furthermore, the DCL version of CGI debugging script "TEST-CGI-VMS.COM" seems to be missing some environmental variables like REMOTE_USER. To make matters worse, the otherwise excellent book OpenVMS with Apache, OSU, and WASD states that variable HTTP_AUTHORIZATION should be available in all servers including CSWS.

I was looking for this environmental as a sign that the base64 encoded username and password (Basic access authentication) were making it through to my CGI; I have since discovered that neither CSWS nor "Apache on UNIX" ever passed HTTP_AUTHORIZATION to a CGI program. I suppose this is so some evil programmer couldn't harvest usernames and passwords.

CGI Tip-4 (preparing for authorization)

According to some helpful folks at Compaq (now HP), in order to get authorization information like REMOTE_USER transferred to the CGI you must do the following:

  1. Add the "AuthAuthoritative On" directive to the ".HTACCESS" file which protects the directory containing the protected files (web pages and/or scripts)
     
  2. Since both the document directory and scripts directory require identical authentication data, copy the document ".HTACCESS" file to the scripts directory which will contain the CGI program requiring authentication info (this meant creating a "protected_scripts" directory because we have other CGI programs which do not require authentication).
     
  3. Modify file APACHE$COMMON:[000000.CONF]HTTPD.CONF using either the "AllowOverride AuthConf" or "AllowOverride All" for both the document directory and protected scripts directory. Failure to do this means that many "Auth" directives in ".HTACESS" will be ignored by the server but "AuthOpenVMSUser" and "AuthOpenVMSAuthoritative" aren't affected so invoking a username and password dialog doesn't mean that everything is working.
    ServerRoot "/apache$root"
    DocumentRoot "/apache$common/main"
    LoadModule auth_openvms_module /apache$common/modules/mod_auth_openvms.exe_alpha
    
    ScriptAlias /cgi-bin/ "/apache$root/cgi-bin/"
    ScriptAlias /scripts/ "/apache$root/scripts/"
    ScriptAlias /bell_private_scripts/ "/apache$root/bell_private_scripts/"
    ScriptAlias /ics_private_scripts/ "/apache$root/ics_private_scripts/"
    
    Alias /bell_private/ "/apache$root/bell_private/"
    <Directory "/apache$root/bell_private">
        Options FollowSymLinks
        AllowOverride All         # enable .HTACCESS (do not insert these remarks)
        Order allow,deny
        Allow from all
    </Directory>
    
    Alias /ics_private/ "/apache$root/ics_private/"
    <Directory "/apache$root/ics_private">
        Options Indexes FollowSymLinks Multiviews
        AllowOverride All         # enable .HTACCESS
        Order allow,deny
        Allow from all
    </Directory>
    
    <Directory "/apache$root/bell_private_scripts">
        AllowOverride AuthConfig  # enable .HTACCESS
        Options ExecCGI
        Order allow,deny
        Allow from all
    </Directory>
    
    <Directory "/apache$root/ics_private_scripts">
        AllowOverride AuthConfig  # enable .HTACCESS
        Options ExecCGI
        Order allow,deny
        Allow from all
    </Directory>
  4. File ".HTACCESS" after modification
    1. you can have different files in different directories
    2. since the file name begins with a period, it will not appear in any "file-view" in a browser)
    AuthType Basic
    AuthAuthoritative On
    AuthName "ICSIS Bell-ATS Authentication"
    AuthOpenVMSUser On
    AuthOpenVMSAuthoritative On
    require valid-user
  5. Apache 2.0 Update: For security and performance reasons, Apache 2.0 documentation recommends that you place these directives between <Directory> statements in APACHE$COMMON:[000000.CONF]HTTPD.CONF rather than using ".HTACCESS" files. Click here for official docs.
    #
    ScriptAlias /cgi-bin/                   "/apache$documents/cgi-bin/"             <-- enable scripting
    ScriptAlias /scripts/                   "/apache$documents/scripts/"             <-- enable scripting  
    ScriptAlias /ics_private_scripts/       "/apache$documents/ics_private_scripts/" <-- enable scripting 
    #
    <Directory "/apache$documents/cgi-bin">
        AllowOverride None
        Options None
        Order allow,deny
        Allow from all
    </Directory>
    #
    <Directory "/apache$documents/scripts">
        AllowOverride None
        Options None
        Order allow,deny
        Allow from all
    </Directory>
    #
    <Directory "/apache$documents/ics_private_scripts">
        AllowOverride AuthConfig							<-- enable authorization
        Options None
        Order allow,deny
        Allow from all
    </Directory>
    #
  6. Click here to see some recent (2015-03-xx) experiments writing my own MOD_AUTH routines.

CGI Tip-5 (dealing with large uploads)

Most form data sent to Apache will be less than or equal to 32,767 bytes. But if you are supporting file-upload like so:

<form method="post" enctype="multipart/form-data" action="/scripts/upload_test_neil">
  <input type="text"   name="textline">
  <input type="file"   name="datafile">
  <input type="submit" name="Send">
</form>
...then you might experience uploads larger than 32,767 bytes. Okay so what's the big deal? Well, if your CGI program was written in "C" then it would be no big deal to read Apache symbol "CONTENT_LENGTH" then malloc that amount (if you have enough memory). Next, you would call fopen(fname,"rb") then read the whole amount all in one operation.

This is not possible if your CGI program was written in BASIC since that language limits strings to a maximum size of 32,767. This means you would need to do multiple reads until you have extracted CONTENT_LENGTH bytes.

CGI Related Links:

CSWS Installation Tips

Installation Tip #1 (using TELNET to test a new installation)

Testing with HEAD

Telnet www.bellics.net 80					<<<--- type this then hit <enter> (or "telnet 127.0.0.1 80")
HEAD / HTTP/1.0							<<<--- type this then hit <enter>
								<<<--- hit <enter> (a blank line to end the HTTP request header)
HTTP/1.1 200 OK							<<<--- start of the HTML response header
Date: Mon, 08 Jun 2009 20:17:47 GMT				<<<--- server's current time stamp
Server: Apache/2.0.52 (OpenVMS) mod_ssl/2.0.52 OpenSSL/0.9.7d	<<<--- web server flavor and version; ssl version
Last-Modified: Thu, 13 Aug 2009 16:59:51 GMT			<<<--- web page time stamp (for receiver's caching logic)
Accept-Ranges: bytes						<<<--- server accepts "bytes"
Connection: close						<<<--- one request and one response
Content-Type: text/html						<<<--- the following document is HTML formatted
								<<<--- notice the blank line to end the HTTP response header

Testing with "GET and HTTP/1.0"

Telnet www.bellics.net 80					<<<--- type this then hit <enter> (or "telnet 127.0.0.1 80")
GET / HTTP/1.0							<<<--- type this then hit <enter>
								<<<--- hit <enter> (a blank line to end the HTTP request header)
HTTP/1.1 200 OK							<<<--- start of the HTML response header
Date: Mon, 08 Jun 2009 20:17:47 GMT                     	<<<--- server's current time stamp
Server: Apache/2.0.52 (OpenVMS) mod_ssl/2.0.52 OpenSSL/0.9.7d	<<<--- web server flavor and version; ssl version
Last-Modified: Thu, 13 Aug 2009 16:59:51 GMT			<<<--- web page time stamp (for receiver's caching logic)
Accept-Ranges: bytes						<<<--- server accepts "bytes"
Content-Length: 982 bytes					<<<--- the HTML content block is 982 bytes
Connection: close						<<<--- one request and one response
Content-Type: text/html						<<<--- the following document is HTML formatted
								<<<--- notice the blank line to end the HTTP response header
<html>								<<<--- start of HTML content (the web page)
<head>
<title>Integrated Convergence Support Information System</title>

Command Notes:
line notes
1 telnet to "www.bellics.net" using TCP/IP port 80 (telnet defaults to port 23)
2 GET will pull back the whole web page; use HEAD or OPTIONS to only pull back server data
  "/" requests the server's default document found in the root directory; you could have also entered something like: "/default.htm" or "/login.html" or "/scripts/whatever"
  HTTP/1.0 indicates we do not want a persistent connection etc. (keep things really simple in this demo)
3 a blank line indicates the end of the sender's HTML request block

Response Notes:
line data description
1 HTTP/1.1 I am able to support HTTP version 1.1 (persistent connections, etc.)
  200 OK HTTP status message indicating that everything went as planned
2 Date ... server's current date + time usually in international format
3 Server ... Server software and installed security modules
4 Last-Modified last modified date of the file I am sending you (for your cache)

Testing with "GET and HTTP/1.1"

Telnet www.bellics.net 80					<<<--- type this then hit <enter> (or "telnet 127.0.0.1 80")
GET / HTTP/1.1							<<<--- type this then hit <enter>
host: www.bellics.net						<<<--- host is mandatory with HTTP/1.1
content-type: text/html						<<<--- optional: says "I can process HTML documents"
connection: close						<<<--- optional: return a page then close (do not persist)
								<<<--- hit <enter> (a blank line to end the HTTP request header)
HTTP/1.1 200 OK							<<<--- start of the HTML response header
Date: Mon, 08 Jun 2009 20:17:47 GMT                     	<<<--- server's current time stamp
Server: Apache/2.0.52 (OpenVMS) mod_ssl/2.0.52 OpenSSL/0.9.7d	<<<--- web server flavor and version; ssl version
Last-Modified: Thu, 13 Aug 2009 16:59:51 GMT            	<<<--- web page time stamp (for receiver's caching logic)
Accept-Ranges: bytes						<<<--- server accepts "bytes"
Content-Length: 982 bytes					<<<--- the HTML content block is 982 bytes
Connection: close						<<<--- one request and one response
Content-Type: text/html						<<<--- the following document is HTML formatted
								<<<--- notice the blank line to end the HTTP response header
<html>								<<<--- start of HTML content (the web page)
<head>
<title>Integrated Convergence Support Information System</title>

Testing through a proxy server (just to show how it is done)

Caveats:

  1. a proxy server is a device employed by large enterprises to protect an intranet from an extranet (the real world internet).
  2. a proxy server is not the same as a router/firewall appliance which usually employs NAT (network address translation)
Legend:
<ur>	user response
<sr>	system response
--------------------------------------------------------------------------------
<sr>	$									! default DCL prompt
<ur>	Telnet 192.168.210.220 80						!connect to proxy server on port 80
<sr>	%TCPWARE_TELNET-I-TRYING, trying concealed.ca,http (192.168.210.220,80) ...
%TCPWARE_TELNET-I-ESCCHR, escape (attention) character is "^\" <ur> CONNECT www.bellics.com:80 HTTP/1.1 ! connect to node on port 80 using HTTP1.1 ! blank line ends HTTP connect header <sr> HTTP/1.1 200 Connection established ! proxy has connected <ur> GET / HTTP/1.1 !
host: www.bellics.com !
content-type: text/html !
connection: close ! ! blank line ends HTTP request header <sr> HTTP/1.1 200 OK ! start of HTTP response header
Date: Mon, 08 Jun 2009 20:17:47 GMT
Server: Apache/2.0.52 (OpenVMS) mod_ssl/2.0.52 OpenSSL/0.9.7d
Last-Modified: Thu, 13 Aug 2009 16:59:51 GMT
Accept-Ranges: bytes
Content-Length: 982 bytes
Connection: close
Content-Type: text/html

<html> ! start of HTTP payload
<head>
<title>Integrated Convergence Support Information System</title>

Installation Tip #2 (ODS-5 and the system disk)

Installation Tip #3 (ODS-5 + SSL)

Compaq states that an ODS-5 volume is not required for a non-JAVA installation of CSWS, but I've found that the online documentation for "mod ssl User" will be corrupt due to the presence of filenames with multiple dots (which is supported in the optional ODS-5 but not the standard ODS-2). What's worse is that you won't see any error messages during installation to an ODS-2 disk. Because I thought that other things could have become compromised, I decided to only install to ODS-5 volumes.

ODS-5 and DCL

This paragraph has nothing to do with the web servers but I decided to mention it here anyway. If you've enabled ODS-5 then you might notice a few strange changes:

If a file is created by an application program written in a high level language (e.g. BASIC, C, C++, etc.) and the file name was defined using lower case characters, and the file doesn't yet exist, then when the file is created you will see a lower case name in the associated directory. If the file already exists, a new file of the same name and location will match the case of the original file.

This DCL command (in effect by default) will make your interactive session work the ODS-2 way on an ODS-5 system.

$set proc/parse_style=traditional/case_lookup=blind

This command will allow you to make your process case-sensitive (so you can rename a file changing its case):

$set proc/parse_style=extended/case_lookup=sensitive

CAVEAT: If your system has been running for years in case-blind mode, then it would be a real bad idea to place this case-sensitive entry into system file "SYS$MANAGER:SYLOGIN.COM". However, it is a completely different situation if your system is new AND all users will be running in case-sensitive mode. Consult HP OpenVMS System Manager's Manual, Volume 1: Essentials before you make any changes affecting more than one account. I have encountered situations were a user couldn't even log off.

CSWS-2.2 (based upon Apache 2.0.63 and OpenSSL 0.9.8h)

HPQ says this is only a maintenance release but I wanted it anyway. Why?

SWS-2.0 Woes

New "Served" File Format

HP released SWS-2.0 (which is based upon Apache 2.0.47) in December of 2003. One shocking change is that all served up web pages (text files) must be first converted to STREAM_LF before they can be used. There were several reasons given in newsgroup: comp.os.vms (and archived here www.deja.com ) which included:

  1. more efficient server operation (since the server wouldn't need to convert from RMS text records to stream in real time)
  2. easier to compute byte count when doing stream/chunk operations (all I/O is now filtered in Apache-2.x)
  3. less work for HP when porting Apache releases to OpenVMS

IMHO, they should have created an Apache plug-in (mod_rms ?) to allow backward compatibility when desired. A good work around for this restriction involves setting up PHP to pickup and process unconverted HTML files.

SWS-2.1

My Apache Tweaks

What's up with favicon.ico ?

Analyzing Apache log files will show a huge number of references to a missing file named favicon.ico

#
# file: [.conf]httpd.conf
#

{ ...snip... }

# need this next line for favicon.ico (NSR - 2012-08-09)
#
AddType image/x-icon .ico

Don't let SSL and IE kill your system

#
# file: [.conf]ssl.conf
#

{ ...snip... }

#
#	this old Apache declaration is no long 100% true (why treat all IE browsers badly?)
#
#SetEnvIf User-Agent ".*MSIE.*" nokeepalive ssl-unclean-shutdown
#
#	experimental hack to speed up SSL for IE users - NSR (2012-08-13)
#
#BrowserMatch "MSIE [1-4]" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0
#BrowserMatch "MSIE [5-9]" ssl-unclean-shutdown
#
#	experimental hack to speed up SSL for IE users - NSR (2012-08-14)
#
#	note:	"MSIE 1" (above) didn't support SSL so it is superfluous
#       	Meanwhile, "MSIE 1" probably break MSIE 10 when it finally appears)
#
BrowserMatch ".*MSIE [2-5].*" nokeepalive ssl-unclean-shutdown downgrade-1.0 force-response-1.0

Browser Caching (caveat: test these tips on a captive system before putting into production)

Our intranet site employs AJAX and certain apps continually (every minute) pull back three little gifs. What is worse is this: these gifs are sent over the encrypted channel (which adds additional overhead) and the problem is multiplied by the fact that more than 120 employees are using the app at the same time. You can't control how a user sets up his browser "cache-wise", but the following tweak has greatly reduced the problem on my system:

Caveats:
  1. I have removed the 2012 hack which didn't work well in 2013 after most of our clients upgraded their browsers from IE7 to IE8
    1. why use such old browsers when Microsoft is just introducing IE11? Many of our corporate HR systems were built with IE8 dependencies so our employees work in a locked-down environment unable to upgrade.
    2. I THINK a very good case could be made for having our employees use IE10 or IE11 in IE7 compatibility mode. For example, many people do not know that IE8 can only utilize ONE CORE on a multicore platform. All modern versions of IE, Chrome and Firefox can utilize multiple cores in 2014 including IE10 in IE7 compatibility mode (it is only compatible in the way it renders HTML; not the bugs or limitations).
  2. I have removed the 2013 hack (ruthless caching) because I have found a better way in web ecosystems where THE MAJORITY OF PEOPLE are using browsers less than 5 years old (IE8 was released in 2009)
    • newer browsers prefer header responses with CACHE-CONTROL
    • older browsers prefer header responses with EXPIRES
    • ruthless caching employed both EXPIRES and CACHE-CONTROL which required the browser to use the correct method (VARY). There is quite a bit of controversy about how different browsers do this with many pundits thinking browsers wrongly choose EXPIRES over CACHE-CONTROL
    • the 2014 hack which follows only employs header responses with CACHE-CONTROL and a reduced FileEtag. If nothing else, just removing the EXPIRES line from the header of each web page component will slightly reduce the processing overhead of both client and server. Savings of a million lines-per-day at the sever end would be typical.
  3. This following tweak has been tested to work as-is with Apache/2.0.63 on OpenVMS
    1. Primary testing:
      1. I no longer test with WireShark which was useless when testing encrypted connections over HTTPS
      2. I use the Network Profiler built into Firefox (click on the monkey-wrench icon; click CUSTOMIZE if you do not see it)
      3. I use the Network Profiler built into FireBug (a Firefox add-on). Once you get used to the time-colored responses this tool is hard to give up.
      4. I also use the optional YSlow profiler which can be installed into FireBug
        • caveat: even though I have removed EXPIRES on my site, YSLOW still reports an "A" on the EXPIRES test provided a CACHE-CONTROL statement exists with a minimum time entry of 72-hours. This is normal behavior because EXPIRES is less important this side of 2009 
      5. I also use the Network Profiler built into Chrome
      6. I also use the Network Profiler built into IE10 (good) and IE11 (better)
        • caveat: when the IE Network Profiler comes up, the third button (Always Refresh From Server) is depressed by default which will always force the browser to bypass local cache. You will think that caching is not working when it actually is
    2. Secondary testing:
      1. hopefully you are rotating your Apache logs every day. If so, make changes in Apache config then allow the system to run for a day before using $SEARCH (a VMS app) to collect stats. You want to compare the ratio of 200 responses ("here is the file you requested") to 304 responses ("use the file in your cache").
        • $search *log*.* " 200 " /noout/stats
        • $search *log*.* " 304 " /noout/stats
      2. On my system I collected three numbers:
        • number of 200 messages divided by total messages
        • number of 304 messages divided by total messages
        • number of 200 messages divided by 304 messages
        and could see a large block of messages moving from "category 200" to "category 304"
    3. If you are running Apache on OpenVMS in case-insensitive mode (which is typical) then you might think that "file.ext" is the same as "FILE.EXT". While this is true on the server, the browser caches by case-sensitive URL so it is always best if your URLs are down-cased or camel-cased. If you are setting up a new instance of Apache on OpenVMS then I recommend always using case-sensitive mode (the norm in the UNIX world where this technology was first developed)
#
# file: [.conf]httpd.conf
#

[...snip...]

#----------------------------------------------------------------------------------------
# 2014 tweak to reduce data sent to the browser
#
#	send less text to the browser (repeated for every page component)
#
# (default)         send: Server: Apache/2.0.63 (OpenVMS) mod_ssl/2.0.63 OpenSSL/0.9.8w
# ServerTokens OS   send: Server: Apache/2.0.63 (OpenVMS)
# ServerTokens Min  send: Server: Apache/2.0.63
# ServerTokens Prod send: Server: Apache
#
#----------------------------------------------------------------------------------------
#ServerTokens	Min
ServerTokens	Prod

#----------------------------------------------------------------------------------------
#	consider sending less caching information to the browser
#	note: ETag is not sent with dynamic content
#
#	FileETag All		sends something like this:	"8f0101a-ec2-2de84100"
#	FileETag MTime Size	sends something like this:	"ec2-2de84100"
#----------------------------------------------------------------------------------------
FileETag	All

#----------------------------------------------------------------------------------------
# force browsers to cache more stuff but don't go crazy
#
# notes:
# 1) verified with Apache/2.0.63 on OpenVMS
# 2) Expires directives were removed so don't load mod_expire
# 3) Header directives require mod_header
# 4) Reduce the size of each response header as much as possible
# 5) typing CTRL-R or clicking page reload will force browsers to do cache revalidation
#----------------------------------------------------------------------------------------
#ExpiresActive	On	# bowsers after IE7 prefer Cache-Control rather than Expires
#ExpiresDefault	now	# bowsers after IE7 prefer Cache-Control rather than Expires
#
#	note: when enabled, this next line blocks IE8 from downloading DOCX files
# Header set Cache-Control "no-cache, max-age=0, s-maxage=0"
#
#	this next line is for all documents (dynamic and static) so...
#	always use a small value (good) or disable the directive (better)
#
# Header set Cache-Control "public, max-age=0"
#----------------------------------------------------------------------
#	this will override (if a static file) something set above
#----------------------------------------------------------------------
# a-block
<FilesMatch "\.(ico|gif|jpg|jpeg|png|pdf)$">
Header unset Cache-Control
Header set Cache-Control "max-age=86400, public"
#	note: the next line must not be used in production
# Header set MyHeader "trigged Neil's a-block"
</FilesMatch>
#----------------------------------------------------------------------
#	this will override (if a static file) something set above
#----------------------------------------------------------------------
# b-block
<FilesMatch "\.(htm|html|js|css)$">
Header unset Cache-Control
Header set Cache-Control "max-age=3600, public"
#	note: the next line must not be used in production
# Header set MyHeader "trigged Neil's b-block"
</FilesMatch>
#----------------------------------------------------------------------
#	this will override (if a static file) something set above
#----------------------------------------------------------------------
# c-block
# comments:
# 1) we are trying to better support the client-server model for clients around the world
#    (some of the clients to this Canadian system are in India; others are across Canada)
# 2) really big files that don't change much should be cached for 24-hours
# 3) experiments with regex wildcarding
#    regex reference: The wildcard . matches any character. For example,
#    a.b matches any string that contains an "a", then any other character and then a "b"
#    while a.*b matches any string that contains an "a" and a "b" at some later point.
<FilesMatch "(jquery|angular|ddsmooth).*\.(css|js)$">
Header unset Cache-Control
Header set Cache-Control "max-age=86400, public"
#	note: the next line must not be used in production
# Header set MyHeader "trigged Neil's c-block"
</FilesMatch>
#-----------------------------------------------------------------------------------------

{ ...snip... }

Better Apache config (for corporate use on an intranet)

#
# file: [.conf]httpd.conf
#

{ ...snip... }

#
#	enable HTTP/1.1 keepalives so clients do not open/close on every page component
#
KeepAlive		On

#
#	more efficient than the default of 15 (should never exceed 60 seconds)
#
KeepAliveTimeout	30

#
#	more efficient than the default of 100
#
MaxKeepAliveRequests	999

#
#	since Apache only grows once per second, start off with MaxSpareServers
#
StartServers		9
MinSpareServers		6
MaxSpareServers		12

#
#	you will never have more server processes than MaxClients
#
# If you set MaxClients too large then you might run out of VMS process slots when
# hackers use Apache to probe our system (I have seen this happen).
# Consider using SYSGEN to increase MaxProcessCnt when you increase MaxClients
#
MaxClients		100

#	MaxRequestsPerChild defaults to 0 (which is good when there are no memory leaks)
#
MaxRequestsPerChild	999

#
# send less text on the server line (affects every web page, and page component)
#
# "ServerTokens Full" (default) sends this:
#			Server: Apache/2.0.63 (OpenVMS) mod_ssl/2.0.63 OpenSSL/0.9.8w
# "ServerTokens Min" sends this:
#			Server: Apache/2.0.63
# "ServerTokens Prod" sends this:
#			Server: Apache
#
ServerTokens		Prod

[...snip...]

Changes to OpenVMS SYSGEN (for Apache)

Apache needs resources for interprocess communications (VMS mailboxes are like UNIX pipes)

legend:	<sr> = system response
	<ur> = user response
-------------------------------------------------------------------------
	recording your current sysgen settings to a file
<sr>	$
<ur>	def/user sys$output sysgen_20131031.txt	! output will be diverted
<sr>	$
<ur>	mcr sysgen
<sr>	SYSGEN>
<ur>	sho /all				! output goes to file
<sr>	SYSGEN>
<ur>	exit
<sr>	$
<ur>	type sysgen_20131031.txt
<sr>	...file contents are displayed...
-------------------------------------------------------------------------
	making changes to your running system (DANGER)
<sr>	$
<ur>	mcr sysgen
<sr>	SYSGEN>
<ur>	sho maxbuf
<sr>	Parameter Name Current Default    Min.    Max.  Unit Dynamic
        -------------- ------- ------- ------- -------  ---- -------
        MAXBUF            8192    8192    4096   64000 Bytes       D
<ur>	set maxbuf 64000
<sr>	SYSGEN>
<ur>	set defmbxbufquo 64000
<sr>	SYSGEN>
<ur>	set defmbxmxmsg 64000
<sr>	SYSGEN>
<ur>	write current
<sr>	SYSGEN>
<ur>	write active
<sr>	SYSGEN>
<ur>	exit
<sr>	$
--------------------------------------------------------------------------
	ensure you add these overrides to file sys$system:modparams.dat

Webpage hit-counters (2013-11-29)

Webmasters have always been told to never use hit-counters because they consume precious resources. But many rinky-dink sites, especially some on corporate intranets sitting behind a firewall, need them for good P.R. with other departments.

Image-based Counters

These seem to be the gold-standard in counters. Between 1995 and 2000 (when web servers were powerful platforms serving up mostly static webpages to underpowered desktop PCs sporting Pentium-2 or Pentium-3 processors) most sites used a bit of freeware written by Muhammad Muquit named "count". Needless to say, I am envious of his programming skills. But this solution places a more-than-a-trivial computational burden on the server. Why? Updating a count-file is the easy part so no problem here. However, using the count-file digits to reference a library of individual digits graphics then assemble binary slices scan-line by scan-line into a resultant GIF is the harder part.

Needless to say that Muquit's program works on many webserver flavors including those for OpenVMS. Some of those distributions can be found here
Text-based Counters (2013-11-xx)

I'm running an overworked decade-old server (an AlphaServer DS20e installed in 2002) and now think the time has come to shift the computational burden from the server to the client's browser. I have a little "C" program ready which increments the counter file then sends back the plain-text result. The calling webpage uses a small amount of AJAX (~20 lines) to send the increment request, receive the plain-text count, then inject the value into the browser's DOM for rendering. The browser may not have access to all the cool image libraries seen in Muhammad's offering but that may not be as big a deal as you might think. At least your boss (and user community) will get instant feedback.

On my systems, most users can't tell one font from another so never noticed anything other than a faster response.

What's up with "browserconfig.xml"?

Okay so it looks like IE-11 is always asking for file browserconfig.xml from your server root directory

  • If the file is sent then IE-11 will cache it (so will not ask for the file again during the current session)
  • If the file is not found then IE-11 will keep asking for it -AND- Apache will be continually writing file-not-found messages to the error log

So the best way forward is to create a minimal version of this file in the server root (on my system we have "/" mapped to "/apache$Documents/main/")

<?xml version="1.0" encoding="utf-8"?>
<browserconfig>
  <msapplication>
  </msapplication>
</browserconfig>

 ACL's on OpenVMS

  • Most Apache files in OpenVMS have been tagged for auditing via an ACL (see yellow block just below)
  • This is done with DCL command "$SET SECURITY/ACL=(stuff) file.ext"
  • This means that whenever any of these files are touched in anyway by anyone or anything, the touch event will be sent to the AUDIT_SERVER for logging.
  • This is a lot like running the system with full ACCOUNTING enabled except that auditing attaches and additional compute burden only to the stuff being audited (but is is seen my all)
  • Auditing is most likely a good idea since Apache is Open Source and most of us have not inspected every line of code.
  • This is especially true when we are using third-party plug-ins and add-ons.
  • However...
    • If your system is very busy (or too busy at critical peak times-of-the-day)
    • And you trust the software
    • And you trust your system admins
    • Then consider removing ACLs like so:

      SET SECURITY/ACL/DELETE=ALL apache$common:[000000...]*.*;*
$ dir/full APACHE$HTTPD.EXE

Directory APACHE$COMMON:[000000]
APACHE$HTTPD.EXE;5            File ID:  (13942,118,0)         
Size:           25/32         Owner:    [AP_HTTPD,APACHE$WWW]
Created:    22-AUG-2012 18:00:19.83
Revised:     6-DEC-2013 14:13:29.47 (3)
Expires:    <None specified>
Backup:     <No backup recorded>
Effective:  <None specified>
Recording:  <None specified>
Accessed:    6-DEC-2013 14:13:26.49
Attributes:  6-DEC-2013 14:13:29.47
Modified:   23-AUG-2012 09:27:27.75
Linkcount:  1
File organization:  Sequential
Shelved state:      Online 
Caching attribute:  Writethrough
File attributes:    Allocation: 32, Extend: 0, Global buffer count: 0, Version limit: 15, Contiguous best try
Record format:      Fixed length 512 byte records
Record attributes:  None
RMS attributes:     None
Journaling enabled: None
File protection:    System:RWED, Owner:RWED, Group:, World:
Access Cntrl List:  (IDENTIFIER=APACHE$EXECUTE,ACCESS=READ+EXECUTE)
ClClient attributes:  None

Getting rid of .HTACCESS and INDEXES

  • The official Apache docs are full of reasons why you should get rid of .HTACCESS files by moving directives into file httpd.conf
  • By implementing directive "AllowOverride None" Apache will no longer search individual directories looking to see if an .HTACCESS file has been recently added, removed or changed (.HTACCESS was developed in the 1990s for use by customers who wanted to modify access rules on "their" individual directories but did not have access to the common httpd.conf file)
  • But this problem gets much worse when INDEXES (provides you with a directory listing when DEFAULT.HTML is not found) is enabled on a directory requiring authentication
  • I was recently experimenting with my own MOD_AUTH authentication files and had inserted some trace code which would log all activities including date-time
  • With browser cache always initialized before each test...
    accessing this path ran the authentication code this many times (file: default.html exists?)
    http://node/yada/ 6 N
    http://node/yada/ 2 Y
    http://node/yada/file.ext 1 n/a
  • You can only image the additional load this would place on a system if each execution required a single database lookup.

Shift authentication from Apache to your apps -AND- remove unnecessary plugins

  • On many Apache servers you will see numerous authentication modules loaded from httpd.conf which may include several of this small subset:
    mod_auth
    mod_auth_basic
    supports Basic access authentication using a base64 Authorization string
    mod_auth was renamed to mod_auth_basic in Apache-2.2
    mod_auth_digest better than Basic in that the username + password are message digest encrypted
    mod_auth_kerberos authenticates by dipping into Active Directory (a.k.a. Microsoft stuff)
    mod_auth_ldap authenticates by dipping into LDAP
    mod_auth_openvms authenticates by dipping into SYSUAF (a.k.a. OpenVMS stuff)
  • many times these modules were enabled by a well-meaning webmaster who needed to solve an urgent problem while thinking that he would only enable the desired module in paths where it would be required. But the truth is actually a little more sinister:
    • every module which is loaded will create "per-directory OFF settings" for every directive supported by that plugin
    • every auth module will most likely contain a line similar to this:
      ap_hook_check_user_id(authenticate_whatever,NULL,NULL,APR_HOOK_MIDDLE);
      which will register routine "authenticate_whatever" into the check-user-id execution queue.
    • When Apache scans the check-user-id execution queue it will execute every hook until one registered plugin responds with SUCCESS (allow), FAIL (reject) or DECLINE (defer to the next hook).
      • Loading five authentication modules means that up to five hooks will be fired (hopefully each plugin author did the best job possible)
      • While I was tracing "all activity" (this includes the DECLINE) associated with an auth module I was writing for Apache 2.0, I was loggin all the Apache authentication events in the system. This happens for every authenticated transaction so only load the auth modules you really need then develop a plan to reduce the number of modules to one (mod_auth_basic) or none.
      • I got my system down to one by doing Basic access authentication in my own plugin. So it is my intention to:
        1. do cookie-based application authentication in all my apps
        2. use my custom plugin to limit access to certain restricted directories
  • If you shift authentication from "Apache proper (where some of this stuff would be passed to an application via CGI)" into your "Apache applications" -AND- unload the unnecessary Apache modules, you will greatly reduce per-transaction overhead.

Freeing Up Disk Space with Log Rotation

Note: this will aid in analyzing your Apache log files so you can improve your web server system. Failure to do this places all your transactions in one huge file.

<sr>	$
<ur>	sh time
<sr>	27-NOV-2005 20:52:11
	$ 
<ur>	dir [...]*.*/siz=all/date/sel=siz=min=10000
<sr>	Directory APACHE$COMMON:[000000.SPECIFIC.KAWC15.LOGS]

	ACCESS_LOG.;1        1061068/1061095  16-SEP-2003 19:48:46.00
	ERROR_LOG.;1          455636/455700   16-SEP-2003 19:48:44.50
	SSL_ENGINE_LOG.;1     238230/238280   16-SEP-2003 19:48:44.56

	$
<ur>	del APACHE$COMMON:[000000.SPECIFIC.KAWC15.LOGS]*_log*.*;*

Wow! These files have been growing 26 months

Notes:

  1. These files grow forever. I have not been able to get "built-in Apache log rotation" working on OpenVMS (it would have added overhead to Apache anyway) so it is probably a good idea to stop the server every month to delete the files
  2. Here is a better method (you do not need to stop the server)
    • start by creating 31 subdirectories named [.log01] to [.log31] which would hold log files for each day of the month
    • run a batch job every night (perhaps at one second after midnight) to do the following:
      1. rename the files into the desired subdirectory while they are open (this is legal in OpenVMS and writing will continue in the new location)
      2. now execute this job (a DCL command template for day 31)
        $ set def sys$common:[000000.specific.www.logs]	!
        $ ren *_log*.* [.log31]				! okay to do this while files are open
        $ @APACHE$COMMON:[000000]APACHE$SETUP		! create Apache symbols for our process
        $ httpd -k flush ! tell Apache to flush the log buffers to disk
        $ httpd -k new ! tell Apache to close current files then open new ones
  3. or you could just use this automated DCL script    <<<---***

Character Sets (why servers must never lie to clients)

Executive Summary
  • Unicode is what you store in memory or in a database
  • UTF-8 is what you use to move Unicode between computer systems and devices
    • confusion can arise here because many static web pages (files) employ UTF-8 to reduce file storage requirements
    • some programmers will mistakenly attempt to store UTF-8 in memory or in databases. These efforts almost always backfire and are never worth the effort in the long run (how do you collate UTF-8 strings? what is the character length of a given UTF-8 string? should C/C++ programmers terminate a Unicode string with a null byte, null word or null long?)

Overview: A few common character sets

Character Set Set Size Character Size Notes
7-bit ASCII 128 one byte standardized (uses 8 bits but the top bit is always zero)
8-bit ASCII 256 one byte non-standard; no rules for the upper half; lower half is the same as 7-bit ASCII
ISO-8858-1 256 one byte standardized; not all 256 are defined; lower half is the same as 7-bit ASCII
Windows-1252 256 one byte standardized; 32 more characters than ISO-8859-1; also called ANSI
Unicode 9.0 char: 128,237
total: >1 million
It depends:
2-bytes (Windows)
4-bytes (Linux)
standardized;
character scope last
code
point
data
size
many systems only support plane 0: BMP (Basic Multilingual Plane) U+xFFFF 2-bytes
many systems support planes 0+1: Basic and Supplemental U+1F9FF 3-bytes
newer systems support all 17 planes (0-16) U+10FFFF 3-bytes
UTF-8 1,112,064  variable one to 4 bytes (this limit was set in 2003 by RFC-3629)

Caveats:
  1. The last table entry is confusing because UTF-8 is not a character set; it is a Unicode encoding (read on)
  2. Unicode
    • All internal Windows modules only use plane-0 (e.g. wchar is defined in C as an "unsigned short" (16-bit binary))
    • Lesser-used characters (Arabian, Aramaic, Ancient Greek, Persian, Phoenician, etc) are defined in plane-1. If you intend to support plane-0 and plane-1 then you will require more than 16-bits
    • Ideographs associated with Asian languages are found in plane-2
    • 65,536 * 17 = 1,114,112  which require 20-bits or 3-bytes
      • e.g. 1,114,112 - 1 = 1,114,111 = x10FFFF

Single Byte Characters

  • In the early days of mainframe and minicomputers, 128 characters were represented by one single 8-bit byte usually represented by 7-bit ASCII. One exception to this was EBCDIC which was created and promoted by IBM for its mainframe business
  • With the sale of minicomputers into a worldwide marketplace, additional Western European characters were added by supporting 8-bit ISO-8859
    • the lower 128-characters mapped to ASCII
    • the upper 128-characters mapped to the extended set
      • ISO-8859-1 (also called latin1) was the most popular character set used in the western world
      • ISO-8859-2 to ISO-8859-9 was used through out the rest of the world
  • With the popularity of Windows-based personal computers, Microsoft added 32 additional characters to ISO-8859-1 which they called Windows-1252 (now commonly referred to as ANSI). This decision predates the internet and therein lies today's problem.

Wide Characters + Multibyte

  • With the advent of internet applications including email and web browsers, customer-facing software needed to move beyond the Latin Character Set and did so by supporting Unicode
  • Unicode began as Unicode88 which was nothing more than a 16-bit character set co-developed by Apple and Xerox. It was further developed by Sun Microsystems and Microsoft.
  • For programmers, the simplest way to represent Unicode is to use wide character sets where all characters are coded as 2-byte entities, 3-byte entities, and 4-byte entities (the choice depends on how many characters you are willing to support)
    • Wide Character Sets and Multibyte support has been available in C/C++ for a long while
      • the number of bytes used determines how many Unicode characters you are prepared to support
        • UCS-4 for Linux
        • UCS-2 for Windows (although UCS-2 is now defunct in the wild)
      • Here are two examples of many:
        • printf is used to output single byte characters
        • wprintf is used to output wide character sets
    • Unicode is built into Java
    • Unicode is built into some JavaScript technologies (JSON springs to mind)

Sending Unicode over a communications channel

Back in the early 1990s, most data communications occurred via modems so engineers were always looking for ways to send it so that it so that it would not lockup the modem. Many standards were developed including UCS-2 which precedes UTF-16, UCS-4 which precedes UTF-32 but UTF-8 became the most popular. Too bad its implementation in internet applications was so ad-hoc.

  1. HTML meta statements incorrectly use the word CHARSET where UTF-8 support was added with very little thought
    <meta charset="UTF-8">
    The creators of the "charset kludge" probably meant "character set will be Unicode but will follow UTF-8 encoding rules"
  2. HTTP 1.1 Response Headers are a little less confusing:
    HTTP/1.1 200 OK
    Date: Mon, 23 May 2005 22:38:34 GMT
    Content-Type: text/html; charset=UTF-8
    Content-Encoding: UTF-8
    Content-Length: 138
    Last-Modified: Wed, 08 Jan 2003 23:11:55 GMT
    Server: Apache/1.3.3.7 (Unix) (Red-Hat/Linux)
    Connection: close
  3. XML documents properly use the phrase ENCODING
    <?xml version="1.0" encoding="UTF-8"?>
  4. MySQL and MariaDB have some confusing settings. While you can store the data as "Unicode binary", you select the internal variable size with these directives:
    •  utf8 = 3-bytes
    •  utf8mb4 = 4-bytes
    On the flip side, if your client connects using charset=latin1 then sends latin1 data to a database expecting utf8, the database will convert the data before storing it. It is almost impossible to convert utf8 back to latin1 without loosing something (unless you resort to HTML entities) which is why most web apps today only support utf-8

Failover Logic in Browsers

While the browser wars are over, browser vendors are still going out of their way to make sure documents render properly no matter what the server told them. For example, most versions of IE will render documents even though they are not HTML compliant (e.g. a missing tags like: <html> , <head> , and  <body>). On top of this, all browsers tend to support only two character set flavors:

  1. utf-8 (which is not really a character set; it is a Unicode encoding)
  2. everything else fails over to windows-1252 (which is a superset of iso-8859-1 which is a super set of ASCII)
Failover Examples
  • If you send ASCII data to a browser expecting iso-8859-1, everything will work properly because ASCII is mapped over the lower half of the iso-8859-1 character set
  • If you send iso-8859-1 data to a browser expecting iso-8859-1, everything will work properly (obviously)
  • If you send windows-1252 (also called ANSI) data to a browser expecting iso-8859-1, everything will work properly because all browsers support windows-1252 even though iso-8859-1 was specified (huh?)

    quote: browsers will change to Windows-1252 when ISO-8859-1 is declared. This is done for any DOCTYPE: HTML4, HTML5, and XHTML.
    reference-1: http://www.w3schools.com/charsets/ref_html_ansi.asp

    quote: Most modern web browsers and e-mail clients treat the MIME charset ISO-8859-1 as Windows-1252 to accommodate such mislabeling. This is now standard behavior in the HTML5 specification, which requires that documents advertised as ISO-8859-1 actually be parsed with the Windows-1252 encoding.
    reference-2: https://en.wikipedia.org/wiki/Windows-1252

  • However, if you send UTF-8 data to a browser expecting iso-8859-1 then any characters between 128 and 255 will be interpreted incorrectly
  • Likewise, if you send iso-8859-1 data to a browser expecting UTF-8 then any characters between 128 and 255 will be interpreted incorrectly

Apache Directives

  • Apache configuration file httpd.conf is usually installed with directive: AddDefaultCharacterSet iso-8859-1 or AddDefaultCharacterSet On which both mean the same thing. This is instructive in that it appears that Apache was developed with iso-8859-1 in mind (a character set encoding first published in 1987)
  • Some newer Apache distributions provide a default httpd.conf where AddDefaultCharacterSet is set to windows-1252 but uniformed webmasters sometimes change this back to iso-8859-1 while others change this to utf-8
  • According to Apache documentation, DefaultCharacterSet is only used when you, the CGI (common gateway interface) programmer, do not supply a CHARSET in the HTTP response header. Here is an excerpt from the O'Reilly book Apache: The Definitive Guide which I highly recommend:
    This directive specifies the name of the character set that will be added to any response that does not have any parameter on the content type in the HTTP headers. This will override any character set specified in the body of the document via a META tag. A setting of AddDefaultCharset Off disables this functionality. "AddDefaultCharset On" enables Apache's internal default charset of iso-8859-1 as required by the directive. You can also specify an alternate charset to be used; e.g. "AddDefaultCharset utf-8"

Sending data back to the server

Both these HTTP methods...

  • GET (these are mostly data requests)
  • POST (these are mostly data pushes)

...can be coupled to a SUBMIT button on your HTML form -and- will usually (at least before 2008) default to the character set defined for the currently rendered web page. But this is not the case for AJAX-based pushes which now default to UTF-8 no matter what character set was used in the currently rendered web page. To confuse things further:

  1. people continually interchange the words Unicode and UTF-8
  2. UTF-8 transmits 7-bit ASCII (codes: 0-127) as-is but converts all codes above 127 to multi-byte encodings.
  3. If UTF-8 is declared, sending certain single byte characters (iso-8859-1 or windows-1252) as-is will...
    • be displayed incorrectly in most browsers
    • stop an XML parser and might throw an error message

More details on character sets

  • Unicode is a character set while UTF-8 is a Unicode encoding
  • What's the difference? Think about UTF-8 as a dialup-safe compression method for sending Unicode
    • some online references incorrectly show bytes 2-3-4 as "1xxxxxxx". That material comes from the initial proposal published in 1992. This was changed to "10xxxxxx" in the 1993 standard. The resulting change is a little less efficient but is better able to to detect encoding errors (e.g. detecting a first byte pattern like this "11110xxx" means that three bytes must follow based upon a pattern like this "10xxxxxx")
    • some online references incorrectly show 5 and 6 byte codings. These were dropped from the official spec in 2003 (see RFC-3629)
    • this table describes how UTF-8 is currently implemented on the internet
      UTF-8 (2003)
      Total
      Bytes
      Data
      Bits
      First
      code
      point
      Last
      code
      point
      Byte 1 Byte 2 Byte 3 Byte 4
      1 7 U+0000 U+007F 0xxxxxxx
      2 11 U+0080 U+07FF 110xxxxx 10xxxxxx
      3 16 U+0800 U+FFFF 1110xxxx 10xxxxxx 10xxxxxx
      4 21 U+10000 U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
  • At this point, you might want to inspect this page showing four character sets side-by-side: 

Behind the scenes (information for programmers)

  • As long as computers were using single byte character sets (ASCII, ISO-8859-1, Windows-1252) only a single byte of memory or storage was required
    • here is a list of c functions for doing byte i/o:
      • fgetc fputc fscanf fprintf ungetc fgets fputs scanf printf fread getc putc vfprinf fwrite gets puts vprintf getchar putchar
  • But switching to Unicode brings new problems. Do we...
    • represent Unicode characters and strings internally as multi-byte characters? (e.g. UTF-8)    -or-
    •  as wide characters which is also known as binary? (limited to one of: 2-bytes, 3-bytes or 4-bytes)
  • While UTF-8 would require much less memory, wide characters and wide strings would be easier to compare and collate
  • Here is a short list of C library functions for converting characters including converting between multibyte and wide:
    • ecvt fcvt toupper gcvt mbtowc towctrans mbrtowc wctrans mbsrtowcs wcrtomb toascii wcsrtombs tolower
  • Here is a list of c functions for doing wide i/o:
    • fgetwc fputwc fwscanf fwprintf ungetwc fgetws fputws wscanf wprintf getwc putwc vfwprintf getwchar putwchar vwprintf
      caveat: This are implementation dependent. For example, a wchar_t is definded as:
      • 32-bit wide on Linux holding UCS-4/UTF-32 encoded Unicode
      • 16-bit wide on Windows holding UTF-16 Unicode (originally it held UCS-2 Unicode, but UCS-2 is now officially obsolete)
  • Modern databases like MySQL and MariaDB can store data in whatever way you declare
    • latin1 means single character
    • UTF8 means 3-byte binary (for character-based languages)
    • UTF8mb4 means 4-byte binary (RFC-3629)
Hacking
  • Almost every text file used by System32 on Windows contains 16-bit characters (UCS-2). Most IDEs (Visual Studio, Eclipse, NetBeans) write characters as 16-bit binaries which is why you had to edit via an IDE because common text editors were only single byte aware. For people wanting to learn more, try playing with this free editor named Notepad++ making sure check out the Encoding menu.

    p.s. it can open ANY file including corrupt spreadsheets etc.

Additional Technical Info

Affects all of XML as well

  • I recently received an XML document where the encoding directive on line-1 was set to "utf-8" but this document could not be parsed.
  • I inspected the data and noticed this French character: é (e-acute) in the CDATA block
    • What? up until this point I thought anything could be placed in the CDATA block. Wrong!
  • Obviously this French data was coming from a database which was holding windows-1252 data but some boilerplate had defaulted the XML encoding to utf-8
  • Just using a plain-text editor to modify the encoding to "windows-1252" fixed the problem. All the JavaScript parsers built into IE, Firefox, and Chrome now could read the XML document.

Interim Solution for OpenVMS

  • In order to edit UTF-8 encoded data files on OpenVMS, you will require:
    1. a terminal emulator able to directly deal with UTF-8
      • this is the default for Tera Term where encoding defaults to "UTF-8" (in fact, only UTF-8 is available without optional plugin modules)
        • Menu:Setup Item:Terminal Selector:Coding (receive) and Selector:Coding (transmit)
      • Reflection-v14
        • Menu:Setup Item:Terminal Tab:Emulation Selector:Host Character Set
      • you terminal driver set to eight bit like so
        • $set term/eight
    2. an editor able to directly deal with UTF-8 (so changing one character position on your screen moves the file insertion pointer by 1-4 bytes)
  • Get a copy of VIM for OpenVMS from this cool site in Stockholm, Sweden
  • add these two lines to file VIM:defaults.vim
    • set encoding=utf-8
    • setglobal fileencoding=utf-8
  • refer to this reference:

CSWS-2.2-1 (2015-07-28)

  • Okay so I didn't see this one coming
  • I'm in the middle of migrating our business system from an AlphaServer-DS20e to an Itanium rx2800-i2
  • About 99% of the work so far is pretty-much a no-brainer (copy/compile/link/next) but my web services are unavailable
    • did I mention we transitioned all of our green-screen apps to the web in 2013 then forced all our users to move over
    • did I mention we now support ~ 1400 web users?
  • Most pages at our site also make extensive use of jQuery and AJAX so if the home page has problems then so will everything else.
  • Weird browser behavior
    • IE11 seemed to work properly
    • Firefox-39 rendered a page but there was no sliding menu, just a linked list
    • Chrome-44 didn't render anything at all but further examination with Developer Tool's (hit F12) showed a problem with incomplete chunked transfers
  • Rather than waste your time telling you all the things I did which did not work, let me say that CSWS-2.2-1 now requires text files to be stored in stream_lf format
  • We were running CSWS-2.2 on the Alpha which did not require stream_lf format text files so it looks like we're in a jam for a while with only two options:
    1. convert all text files to stream_lf                 -OR-
      • this includes rewriting software which generates static html-based reports
      • this includes rewriting software interfaces responsible for capturing file uploads
      • don't forget to convert css files (you will need to update the script)
      • don't forget to convert js files but you might not want to convert minified js files (oops, another wrinkle)
    2.  roll back to CSWS-2.1 then reapply update patches #1 and #2
      • however, the version of OpenSSL baked-into this release is vulnerable to numerous exploits

Update: 2015-07-29

  • Good news. I requested a patch from HP/HPE today via my OpenVMS support account. HP responded within an hour by placing the patch in a 24-hour drop box. I installed the patch (a replacement version of APACHE$HTTPD_SHR.EXE) and now everything works properly.
     
  • Bad news. This bug also affects the Alpha version of CSWS-2.2-1 but you can only get the patch if you have an "OpenVMS on Alpha" support contract. We dropped our Alpha support contract when we moved to Itanium (probably a mistake since we still have one production AlphaServer running but hey, we needed to do this to get approval for the Itanium business case). So I suppose we'll have to wait until HP/HPE releases a publically available patch at their CSWS site. Meanwhile it would appear that our site which is running a fully patched version of CSWS-2.2 got an "F" when I tested it on 2015-12-20 via this tool: https://www.ssllabs.com/ssltest/

Update: 2016-09-xx

  • Good news. We're an all Itanium shop now. Too bad no one wants our AlphaServer machines. They were rockets 15-years ago but no longer.

Links:


Back to Home
Neil Rieck
Waterloo, Ontario, Canada.