OpenVMS Notes: Text File Structures

Document Scope: a VMS/OpenVMS application programmer's view of text storage

Hack #1 (non-stream simple text)

On a DCL session (VMS or OpenVMS) use EDT (edit/edt) or EVE (edit/tpu) to create a 10-line RMS-based text file that looks like this. Make sure you have no trailing spaces, no embedded control characters, and no blank lines (so do not hit <enter> after typing in the last line; just save then exit).
p.s. for this first demo I will use the DCL command "create" in case you do not know how to use either EDIT/EDT or EDIT/EVE

Legend: <ur> = user response
        <sr> = system response
        <enter> = hit the enter key
        <ctrl z> = hit control Z
-----------------------------------
<sr> $
<ur> create yada.txt<enter>
     1234567890<enter>
     123456789<enter>
     12345678<enter>
     1234567<enter>
     123456<enter>
     12345<enter>
     1234<enter>
     123<enter>
     12<enter>
     1<ctrl z>
<sr> Exit
     $

Now inspect file attributes using the DLC command: DIRECTORY/FULL
Notice that the record format contains the word "variable" but not "stream". This means that your software will view the contents of this file using the RMS (Record Management Services) library routines built into OpenVMS (this additional processing is usually not performed by C/C++ programs which can be a source of confusion on VMS/OpenVMS systems)

<sr> $
<ur> dir/full yada.txt
<sr> Directory CSMIS$USER3:[ADMCSM.NEIL]
     YADA.TXT;5                    File ID:  (320,23,0)            
     Size:            1/9          Owner:    [NEIL]
     Created:     2-JAN-2005 14:54:05.35
     Revised:     2-JAN-2005 14:54:05.40 (2)
     Expires:    <None specified>
     Backup:     <No backup recorded>
     Effective:  <None specified>
     Recording:  <None specified>
     Accessed:   <None specified>
     Attributes: <None specified>
     Modified:   <None specified>
     Linkcount:  1
     File organization:  Sequential
     Shelved state:      Online 
     Caching attribute:  Writethrough
     File attributes:    Allocation: 9, Extend: 0, Global buffer count: 0, No version limit
     Record format:      Variable length, maximum 255 bytes, longest 10 bytes See note #1 
     Record attributes:  Carriage return carriage control See note #2 
     RMS attributes:     None
     Journaling enabled: None
     File protection:    System:RWED, Owner:RWED, Group:RWED, World:RWE
     Access Cntrl List:  None
     Client attributes:  None

     Total of 1 file, 1/9 blocks.

Notes:	1. Variable means each record uses a length indicator
	2. Means RMS will append <cr> and <lf> to each record after retrieval

Inspect other file attributes by using ANALYZE/RMS

<sr> $
<ur> ana/rms  yada.txt
<sr> Check RMS File Integrity                      2-JAN-2005 14:58:39.29   Page 1
     CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5

     FILE HEADER

     File Spec: CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5
     File ID: (320,23,0)
     Owner UIC: [NEIL]
     Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
     Creation Date:    2-JAN-2005 14:54:05.35
     Revision Date:    2-JAN-2005 14:54:05.40, Number: 2
     Expiration Date: none specified
     Backup Date:     none posted
     Contiguity Options:  none
     Performance Options: none
     Reliability Options: none
     Journaling Enabled:  none

     RMS FILE ATTRIBUTES

     File Organization: sequential
     Record Format: variable
     Record Attributes:  carriage-return 
     Maximum Record Size: 255
     Longest Record: 10
     Blocks Allocated: 9, Default Extend Size: 0
     End-of-File VBN: 1, Offset: %X'0050' 80 See note #1 
     File Monitoring: disabled
     File Length Hint (Record Count):     10 See note #2 
     File Length Hint (Data Byte Count):  55 See note #3 
     Global Buffer Count: 0

     The analysis uncovered NO errors.

     ANA/RMS YADA.TXT

Notes:	1. this file's EOF marker is at byte # 80
	2. this is the number of lines in my file
	3. this is the actual stored byte count without padding, length counts, etc.

Now use the DCL command "DUMP" to see how your data was stored in the RMS file on disk

<sr> $
<ur> dump yada.txt
<sr> Dump of file CSMIS$USER3:[ADMCSM.NEIL]YADA.TXT;5 on  2-JAN-2005 14:54:57.07
     File ID (320,23,0)   End of file block 1 / Allocated 9
     Virtual block number 1 (00000001), 512 (0200) bytes
                                                    <<<--- read this way ---|--- read this way --->>>
     36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
     32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
     00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
     00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0

Analysis #1 (for a non-stream format file)

  1. Each RMS record begins with a 16-bit length specifier (shown below in red) capable of a describing a variable length string up to 32767 (0x7FFF) bytes in length. OpenVMS strings cannot exceed this value.
    note: using either DIR/FULL or ANA/RMS for this file reveals that the maximum record size is set to 255 bytes
  2. Notice that occasionally a <null> byte is inserted into the file to word-align the next 32-bit length specifier. This padding byte is never counted in the length specifier.
  3. Notice that normally, there are no embedded paper commands like <carriage return> or <line feed>. The original designers of VMS realized that the driver associated with the desired output device would insert these so-called paper commands as required. This is one reason that text files must be FTP'd into VMS using ASCII or TEXT mode. The end-of-line character used on the remote system must be stripped off for storage in this kind of file structure.
  4. 0x0000 means blank line (no data bytes on this line)
  5. 0xFFFF followed by <nul> bytes until EOF means no more RMS data
              0008                       0009                       0010 <--- record length in bytes (not including padding)
  6 5 4 3  2 1        9 8 7  6 5 4 3  2 1      0 9 8 7  6 5 4 3  2 1     <--- data characters
                   00                                                    <--- padding to word-align length data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 36353433 32310008 00393837 36353433 32310009 30393837 36353433 3231000A ..1234567890..123456789...123456 000000
 32310004 00353433 32310005 36353433 32310006 00373635 34333231 00073837 78..1234567...123456..12345...12 000020
 00000000 00000000 00000000 0000FFFF 00310001 32310002 00333231 00033433 34..123...12..1................. 000040
                                     00                00                <--- padding to word-align length data
                                        1      2 1        3 2 1      4 3 <--- data characters
                                         0001     0002          0003     <--- record length in bytes (not including padding)
                                FFFF                                     <--- \ FFFF and null to EOF means...
 00000000 00000000 00000000 0000                                         <--- /     ...no more data

Hack #2 (non-stream file with some control characters)

Now use EDIT/EDT or EDIT/EVE to create a second text file on VMS or OpenVMS.

executing ANA/RMS on this file shows the EOF marker at $10 (16) where you see four blue 'F' characters (put there by the editor, not RMS)

                                              00010000     0001     0001 <--- record length in bytes (not including padding)
                                           ++--------------------------- "A"
                                           ||            ++------------- <bel>
                                           ||            ||       ++---- <nul>
                                         00            00       00       <--- padding to word-align the length data
 00000000 00000000 00000000 0000FFFF                                     <--- means no more data
 -------- -------- -------- -------- -------- -------- -------- -------- ---------------------------------------
 00000000 00000000 00000000 0000FFFF 00000041 00010000 00070001 00000001 ............A................... 000000
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000020

Analysis #2

Hack #3 (convert non-stream to stream_lf)

method #1 (simple)

! first we need a foreign command (two are provided)
$ rfmvar :== convert/fdl=nla0:
$ rfmstmlf :== convert/fdl="""record; format stream_lf"""
! now we use the foreign command to convert the file to stream_lf
$ rfmstmlf targetfile.txt

method #2 (detailed)

<sr> $
<ur> cre stream_lf.dat<enter>						! create a file
     <ctrl-Z>
<sr> $
<ur> set file stream_lf.dat /attr=(rfm:stmlf,lrl:32767,mrs:0,rat:cr)	! DCL cmd to set stream=lf with <cr> records
<sr> $
<ur> ana/rms/fdl/output=stream_lf.fdl stream_lf.dat			! create an FDL
<sr> $
<ur> convert/create/fdl=stream_lf.fdl yada.txt yada_lf.txt		! convert previous file into stream lf
<sr> $
<ur> dump yada_lf.txt							! dump file to terminal in ASCII and hex
<sr> $
<ur> ana/rms  yada_lf.txt						! analyze resultant file
<sr> Check RMS File Integrity                      3-JAN-2005 07:02:57.60   Page 1
     CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5

     FILE HEADER

     File Spec: CSMIS$USER3:[ADMCSM.NEIL]yada_lf.TXT;5
     File ID: (479,34,0)
     Owner UIC: [NEIL]
     Protection:  System: RWED, Owner: RWED, Group: RWED, World: RWE
     Creation Date:    3-JAN-2005 00:20:23.92
     Revision Date:    3-JAN-2005 00:20:23.96, Number: 2
     Expiration Date: none specified
     Backup Date:     none posted
     Contiguity Options:  none
     Performance Options: none
     Reliability Options: none
     Journaling Enabled:  none

     RMS FILE ATTRIBUTES

     File Organization: sequential
     Record Format: stream-LF                Note: means each record is terminated with <lf>
     Record Attributes:  carriage-return     Note: means add a <cr> to each record after retrieval
     Maximum Record Size: 255
     Longest Record: 10
     Blocks Allocated: 9, Default Extend Size: 0
     End-of-File VBN: 1, Offset: %X'0041' 65 Note: this file's EOF marker is at byte # 65
     File Monitoring: disabled
     Global Buffer Count: 0

     The analysis uncovered NO errors.

     ANA/RMS yada_lf.TXT
$dump yada_lf.txt

     ++------------------++---------------------++---------------------- <lf>
  2 1   8  7 6 5 4  3 2 1    9 8 7 6  5 4 3 2  1   0 9  8 7 6 5  4 3 2 1 <--- data characters
 32310A38 37363534 3332310A 39383736 35343332 310A3039 38373635 34333231 1234567890.123456789.12345678.12 000000
 310A3231 0A333231 0A343332 310A3534 3332310A 36353433 32310A37 36353433 34567.123456.12345.1234.123.12.1 000020
 00000000 00000000 00000000 00000000 00000000 00000000 00000000 0000000A ................................ 000040
                                                            5 4  3 2 1   <--- ASCII data characters
                                                                      ++ <lf>
                                                                      ++ EOF is also located at byte 0X47

Analysis #3

Hack-4 (playing with UTF-8)

Do a "directory/full" command to see the file's attributes

<sr>	$
<ur>	dir/full  utf8-test.txt
<sr>	Directory CSMIS$USER3:[ADMCSM.NEIL]

	utf8-test.txt;1               File ID:  (5683,193,0)          
	Size:            1/16         Owner:    [NEIL]
	Created:     4-APR-2017 05:24:06.38
	Revised:     4-APR-2017 05:32:33.90 (4)
	Expires:    <None specified>
	Backup:     <No backup recorded>
	Effective:  <None specified>
	Recording:  <None specified>
	Accessed:   <None specified>
	Attributes:  4-APR-2017 05:32:33.90
	Modified:    4-APR-2017 05:24:06.38
	Linkcount:  1
	File organization:  Sequential
	Shelved state:      Online 
	Caching attribute:  Writethrough
	File attributes:    Allocation: 16, Extend: 0, Global buffer count: 0, No version limit
	Record format:      Stream_LF, maximum 0 bytes, longest 32767 bytes
	Record attributes:  Carriage return carriage control
	RMS attributes:     None
	Journaling enabled: None
	File protection:    System:RWD, Owner:RWD, Group:RWD, World:RWD
	Access Cntrl List:  None
	Client attributes:  None

	Total of 1 file, 1/16 blocks.
	$
Do an "analysis/rms" command to see the file's attributes including the EOF position
<ur>    ana/rms  utf8-test.txt
<sr>	Check RMS File Integrity                      4-APR-2017 05:53:37.24   Page 1
	CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1

	FILE HEADER

        File Spec: CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1
        File ID: (5683,193,0)
        Owner UIC: [NEIL]
        Protection:  System: RWD, Owner: RWD, Group: RWD, World: RWD
        Creation Date:    4-APR-2017 05:24:06.38
        Revision Date:    4-APR-2017 05:32:33.90, Number: 4
        Expiration Date: none specified
        Backup Date:     none posted
        Contiguity Options:  none
        Performance Options: none
        Reliability Options: none
        Journaling Enabled:  none

	RMS FILE ATTRIBUTES

        File Organization: sequential
        Record Format: stream-LF
        Record Attributes:  carriage-return 
        Maximum Record Size: 0
        Longest Record: 32767
        Blocks Allocated: 16, Default Extend Size: 0
        End-of-File VBN: 1, Offset: %X'0033'	Note: EOF is found in block-1 at position 41
        File Monitoring: disabled
        Global Buffer Count  pre-V8.3:          0
        Global Buffer Count post-V8.3:          0
        Global Buffer Flags post-V8.3:       none

	The analysis uncovered NO errors.


	ANA/RMS utf8-test.txt
	$
Do a plain "analysis" command to analyze the file and test the contents
 
<ur>	ana      utf8-test.txt
<sr>	Analyze Object File                           4-APR-2017 05:53:29.0   Page 1
	CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1
	ANALYZ I01-55

	***  Object record 1 contains invalid type code 73:	73=x49 so why complain?
          7  6  5  4  3  2  1  0          01234567
        ------------------------          --------
         73 20 73 99 80 E2 74 49|  0000  |Itâ..s s|
         20 64 65 73 6F 70 70 75|  0008  |upposed |
         80 E2 20 65 62 20 6F 74|  0010  |to be â.|
         64 20 74 61 74 89 C3 9C|  0018  |.Ã.tat d|
         E2 65 74 78 65 74 20 75|  0020  |u texteâ|
                        2E 9D 80|  0028  |...     |
	
	***  Object record 2 contains invalid type code 49:	49=x31 so why complain?
          7  6  5  4  3  2  1  0          01234567
        ------------------------          --------
                  2E 34 33 32 31|  0000  |1234.   |

	***  Object record 3 has a length of zero.

	Analyze Object File                           4-APR-2017 05:53:29.0   Page 2
	CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1
	ANALYZ I01-55

	SUMMARY STATISTICS:

	Record Type     Count   Total Bytes

	OBJ$C_DBG           0        0
	OBJ$C_TBT           0        0
	EOBJ$C_EMH          0        0
	EOBJ$C_EEOM         0        0
	EOBJ$C_EGSD         0        0
	EOBJ$C_ETIR         0        0
	EOBJ$C_EDBG         0        0
	EOBJ$C_ETBT         0        0

	Totals              0        0

	The analysis uncovered 3 errors. (not true)

	ANA utf8-test.txt
	$

Here is a short table of the special codes we expect to find in the file dump:

Character Unicode code point UTF-8 equivalent
x2019 e2 80 99
x201c e2 80 9c
É xc9 c3 89
x201d e2 80 9d
<lf> x0a 0a

Optionally, do a "dump" command

<ur>	dump     utf8-test.txt
<sr>	Dump of file CSMIS$USER3:[ADMCSM.NEIL]utf8-test.txt;1 on  4-APR-2017 06:02:40.62
	File ID (5683,193,0)   End of file block 1 / Allocated 16

	Virtual block number 1 (00000001), 512 (0200) bytes
                                                                      <-- bytes | text --> 
	 64207461 7489C39C 80E22065 62206F74 20646573 6F707075 73207399 80E27449 Itâ..s supposed to be â..Ã.tat d 000000
	 00000000 00000000 00000000 000A0A2E 34333231 0A2E9D80 E2657478 65742075 u texteâ....1234................ 000020
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000040
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000060
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000080
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000A0
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000C0
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0000E0
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000100
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000120
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000140
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000160
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 000180
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001A0
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001C0
	 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 ................................ 0001E0
	$ 

EOL Markers + FTP

Just about every operating system uses its own peculiar way to store text data.

Text is stored in files using these two data formats:

Format Notes
ASCII  
EBCDIC seen on older IBM mainframes and IBM minicomputers

They each employ one of these EOL (end-of-line) markers:

EOL Marker Notes
<cr> Seen in DOS
<lf> Seen in older UNIX systems
<cr><lf> Seen in Windows and newer UNIX systems
<lf><cr>  
<ctrl-Z> Seen in some CP/M systems
<ctrl-^> Seen in older QNX systems

If you don't believe me then consider the following problem often seen on Windows platforms. Opening a text file with NOTEPAD work intermittently but if you see junk on the screen then reopening with WORDPAD almost always works. How can this be? Well, the authors of WORDPAD put some special logic into their app to take care of foreign-formatted text files. Excel can do this too when importing data from text files containing either CSV or XML data.

Back in the day, the people who invented FTP were aware of this problem and so developed ASC (ASCII) Transfer Mode for handling text files. When an FTP connection is placed into ASC mode

  1. the file is read using EOL rules at the sending end
  2. the data is transmitted to the far followed by an end-of-line meta data marker
  3. the file is written using EOL rules of the receiving end.

HPFM! (Hocus Pocus - Frickin Magic)

Unfortunately, the people who developed SFTP (FTP over SSH/SSH2) do everything as a binary transfer. This means that some files SFTP'd onto a VMS/OpenVMS may require some post-transfer processing.

References:


Back to Home
Neil Rieck
Waterloo, Ontario, Canada.