B5500 Emulation Project: 2012

Saturday, December 15, 2012

Decoding the ESPOL Loader Card

We have been making good progress on the B5500 emulator, and have recently started working on the I/O subsystem. There is quite a bit more in that area to be reported in later posts, but this one concerns some correspondence I had with a reader of this blog a few weeks ago.

The reader had stumbled across something called the "ESPOL Loader Card," which was a one-card binary loader used to bootstrap object programs that were written to punched cards by the ESPOL compiler. This bootstrap is described in a couple of Burroughs documents, notably the B5500 Operation Manual and the B5500 MCP Reference Manual, both available as scanned PDFs from bitsavers.org.

I gave the reader those references, but he came back shortly thereafter with some questions on how the loader worked. This was something I had been curious about myself, so decided to dig into it and try to completely understand that program. It took me a few evenings, but I was finally able to satisfy myself that I understood how this loader worked, and replied back to the reader with my findings.

This turned out to be a very interesting exercise, and I thought that others might also find it interesting, so I have reworked that original correspondence into this blog post. This is a highly technical subject, to say the least, so those not used to low-level bit twiddling may want to shield their eyes. Those who want to follow the workings of the individual instructions can find useful descriptions in the B5500 Reference Manual, and truely gruesome ones in the B5281 Processor Training Manual.

The loader is pretty simple, really, but some of the coding is crafty, and it turns out that you have to know something about how the ESPOL compiler generates card-load program decks. I had seen (but not tried to understand) the code that does that generation while working on porting ESPOL to the ESPOLXEM cross-compiler, so that prior experience turned out to be highly advantageous for this exercise.

Before we begin, there are a few significant things to note about the B5500 architecture when looking at a program like the ESPOL Loader:

The B5500 could load programs from either punched cards or drum (and later, the Head-per-Track disk). The mode of loading was determined by the Card Load Select switch on the operator's console. When depressed, this switch caused the system to load from cards; in its normal setting, the system loaded from drum or disk. The choice of drum or disk was hardwired in the I/O Units (channels).
When the Load button was pressed on the console, most system registers were cleared to zero, particularly A, B, F, L, R, and S in the processor. The exception was C, the instruction word address, which was set to @20. I'll use the ESPOL notation of "@" to indicate octal values. The other major registers are:

A -- the first (top) top-of-stack register.
B -- the second top-of-stack register.
S -- the top-of-stack memory address register, pointing to the word in the stack below the one in the B register. In Character Mode, the S register was repurposed as the destination address register, with the M register being used as the source address register.
F -- the stack frame register, which points to the base of the current procedure stack frame (activation record) in the stack.
R -- the PRT (Program Reference Table) address register. This pointed to the global environment for a program. In Algol, that would be the outer block; in COBOL, the Working-Storage Section. The R register had only nine bits, with the low-order six bits of the 15-bit PRT address always taken as zero. In Character Mode, R was repurposed as the TALLY register.
L -- the instruction syllable register. This was a two-bit register that identified the next syllable within a word to be executed in the code stream. That word was held in the P register. Words were loaded into the P register based on the address in the C register.

Each of the four possible I/O Units had separate I/O Finish interrupt vector addresses (@27-@32) and addresses to store a result descriptor word (@14-@17).
Subroutine calls worked as follows:

The program executed a Mark Stack (MKS) instruction which saved a few processor registers (notably R and F) in a Mark Stack Control Word (MSCW) that was pushed into the stack. At completion, F pointed to the MSCW.
Any parameters for the subroutine were typically evaluated and pushed into the stack at this point.
The program then executed an Operand Call (OPDC) or Descriptor Call (DESC) that referenced a Program Descriptor word. The PD principally indicated the address of the subroutine entry point, the code segment's presence in memory, and the mode: Character or Word. The call constructed a Return Control Word (RCW), containing the necessary state not already recorded in the MSCW, including the return address and current F register setting (which was pointing to the MSCW) and pushed that into the stack. At completion, F now pointed to the RCW, and the syllable branched to the entry address.
Parameters were addressed as negative offsets from F, accessing words in the stack below the RCW.
Upon exit, the processor restored the register state from the RCW, including the return address and F register setting. From the MSCW now pointed to by the restored value of F, the exit syllable moved F to S (the top of stack address register), restored the rest of the state from the MSCW and branched to the return address. By restoring the value of S, the stack used by the subroutine was automatically cut back to its original position.

The B5500 had a typical set of bitmask instructions (AND, OR), but instead of XOR had EQV, the complement of XOR. It did not, however, have shift or rotate instructions. Instead it had field insert operators, which in Word Mode were closely related to more generalized features in Character Mode. The idea was that the A and B registers were "dialed" to their respective starting bit numbers and then a transfer instruction would move a specified number of bits from A to B, leaving the rest of the bits in B unmolested and setting A to empty (i.e., deleting it from the stack). In Algol and ESPOL, these were used to implement "partial word" constructs, such as X:=Y.[3:17] and X:=Y&Z[24:42:6]. Dial A (DIA) and Dial B (DIB) set the character (G/K) and bit (H/V) number registers directly from the high order six bits of their opcode, so DIA @43 indicated character 4 bit 3, or word-bit number 27 (4*6+3). The count field in the Transfer Bits (TRB) instruction was just the binary number of bits.
Bits and characters were numbered starting with zero at the high-order position in a word as far as the hardware operators, Algol, and ESPOL were concerned. Thus, the bits were numbered 0-47 and the characters 0-7 moving from left to right in a word, with everything being big-endian. The circuit designers, however, decided to number the bits in their designs and documentation with 48 at the high end and 1 at the low end. A lot of the hardware documentation (particularly the B5500 Handbook and B5281 Processor Training Manual cited above) uses that latter convention, and sometimes it's not too clear which convention is in use, so watch out for that. (As an aside, in the B6500/6700, the two sides got together and agreed on a common, but still different convention, with the high-order bit as 47 and the low-order bit as 0. So after a lifetime working on that latter architecture, I'm now totally confused when working with the B5500).
While an instruction is executing, the C and L registers point to the next instruction in memory. Most branch instructions operate using a relative number of syllables or words from the current location, so those branch offsets are relative to the address of the syllable after the branch instruction.
You can think of the B register as the accumulator. Generally, A operated on B, leaving the result in B, although there are a few exceptions to this (such as INX). Typically A gets set to empty (deleted from the stack) at the end of dyadic operations. The hardware automatically adjusts the stack pointer (S register) and moves data between memory and the A and B registers as necessary.
Truth for conditional tests is determined by the low-order bit in a word, however you number it, 0=false and 1=true. The rest of the bits in the word are ignored. The bitmask operators worked on the low-order 47 bits of the word, leaving the high-order (flag) bit unchanged in the result (B register). The conditional branch operators branch on false, which is what you want in order to implement an IF statement -- a false condition should branch around the THEN part.
The internal character codes for the decimal digits were the same as their binary values. Thus, "0"=@00, "1"=@01, "2"=@02, ... "7"=@07, "8"=@10, and "9"=@11.

With that introduction, the following table presents the ESPOL Loader, syllable by syllable. Code addresses are referred to as ww:s, where ww is the absolute word address, and s is the syllable number (0-3) within the word. The hardware reads one binary card (80 columns = 960 bits = 20 words) starting at location @20 in memory. Since the Load button initializes the C register to @20 and the L register to 0, that is where execution begins after the card is successfully read. The Loader runs entirely in Control State.

Addr	Syl	Mnemonic	Description
20:0	0104	LITC @21	Entry point for the hardware load -- Literal Call: push a literal @21 into the stack (A register) as the absolute address of the card read descriptor
20:1	4411	IIO	Initiate the I/O by storing the A register at address @10, signalling Central Control, and setting A to empty. The I/O Unit picks up the descriptor from that stored address and initiates the I/O asynchronously
20:2	0020	LITC 4	Push the number of syllables to branch
20:3	4231	BFW	Branch forward 4 syllables to 22:0
21		(data)	Card read I/O descriptor: card reader #1 (CRA), 10 words (80 characters), alpha mode, address @44. Note that while the hardware load reads one binary card, the ESPOL Loader reads cards in alpha mode, i.e., the way they would normally be encoded by a keypunch machine.
22:0	4455	DIA @44	Dial A to bit 28 (numbered 20 in the Handbook word diagrams)
22:1	0211	ITI	Interrogate interrupts and branch to the vector address for the highest-priority one outstanding, if any. If no interrupts are outstanding, fall through to the next syllable.
22:2	0020	LITC 4	Push branch offset
22:3	4131	BBW	Branch back to 22:0 (loop until interrupt occurs)
23		(data)	Character mode program descriptor for the subroutine (starting in the next syllable) that will be called from 43:1, present in memory, character mode, F=0, C=@24
24:0	0453	RSA	Entry point of character mode subroutine: recall source address at F-4 (four words below the Return Control Word, or RCW, and in this case, three words below the Mark Stack Control Word, or MSCW) into the M and G registers
24:1	0304	RDA	Recall destination address at F-3 (two words below the MSCW) into S and K
24:2	0243	CRF	Call the repeat field (transfer count) at F-2 (one word below the MSCW)
24:3	0005	TRW	Transfer the designated number of words from source to destination addresses
25:0	0000	EXC	Exit character mode, in this case returning to 43:2
25:1	0065	TRB 0	Transfer zero bits from A to B (effectively a no-op, except that it sets A empty) [I don't think this and the next two instructions are normally executed, but @24 is the interrupt vector location for Keyboard Request from the SPO, and @23 is the location for I/O Busy, so the branch at 25:3 is probably just a trap for spurious interrupts. Undefined opcodes are no-ops on the B5500, so it's likely that a vector to one of the words above would eventually fall through to the next syllable at 25:2]
25:2	0100	LITC @20	Push branch offset
25:3	4131	BBW	Branch back to 22:0 to continue looping for interrupts
26:0	0110	LITC @22	Push branch offset [Don't know of an interrupt that vectors here, but this is also probably a trap for an unexpected vector]
26:1	4131	BBW	Branch to 22:0 to continue looping for interrupts
26:2	0055	DIA 0	(traditionally used as a no-op)
26:3	0055	DIA 0	another no-op
27:0	0000	LITC 0	Interrupt vector location for I/O Unit #1 finished: push a zero
27:1	0062	OPDC @14	Operand call: Push a copy of the result descriptor (RD) for I/O Unit #1 at R+@14 = @14 absolute. The R register is still zero after being initialized by the Load button
27:2	0050	LITC @12	Push branch offset
27:3	4231	BFW	Branch to 32:2 to handle interrupt
30:0	0000	LITC 0	Interrupt vector location for I/O Unit #2 finished: push a zero
30:1	0066	OPDC @15	Operand call: Push a copy of the result descriptor (RD) for I/O Unit #2 at R+@15 = @15 absolute
30:2	0030	LITC 6	Push branch offset
30:3	4231	BFW	Branch to 32:2 to handle interrupt
31:0	0000	LITC 0	Interrupt vector location for I/O Unit #3 finished: push a zero
31:1	0072	OPDC @16	Operand call: Push a copy of the result descriptor (RD) for I/O Unit #3 at R+@16 = @16 absolute
31:2	0010	LITC 2	Push branch offset
31:3	4231	BFW	Branch to 32:2 to handle interrupt
32:0	0000	LITC 0	Interrupt vector location for I/O Unit #4 finished: push a zero
32:1	0076	OPDC @17	Operand call: Push a copy of the result descriptor (RD) for I/O Unit #4 at R+@17 = @17 absolute into the stack
32:2	7561	DIB @75	Common point for handling an I/O finish interrupt -- Dial B to bit 47 (the low-order bit)
32:3	0165	TRB 1	Transfer bit 28 from the result descriptor in A (see the DIA at 22:0) to bit 47 in B (which currently holds the zero pushed at each of the interrupt vector entry points) and mark A empty to delete the RD from the stack.
33:0	0010	LITC 2	Push branch offset
33:1	0231	BFC	Branch Forward Conditional around the endless loop below to 34:0 if the low-order bit in B is 0 (false), i.e., if bit 28 (20 in the Handbook) in the RD was false, indicating no read-check error occurred on the card read. Since we are reading in alpha mode, false means there were no invalid punches on the card that couldn't be translated to internal character codes
33:2	0010	LITC 2	Push branch offset
33:3	4131	BBW	Branch back to 33:2 -- an endless loop. The reason for the loop is that there was no way to halt the processor in control state. We come to 33:2 if there was a read-check error, so in that case the loader just gives up and spins here until someone manually halts the system.
34:0	0004	LITC 1	To here if a successful card read -- push a 1 as an index for the DESC next. This is the beginning of code that builds parameters in the stack for the character-mode procedure that will be called at 43:1.
34:1	0107	DESC @21	Descriptor call: push a copy of the word at R+@21 (@21 absolute) into the stack. Since that word looks like a present data descriptor with a non-zero length field, things get really interesting. The descriptor must be indexed by the value just below it in the stack (the 1 just pushed there by the LITC). The DESC syllable does that automatically First, the index is checked against the length field, but 0 <= 1 < length, so we're okay; otherwise we would get an Invalid Index interrupt set (if we were in Normal State, that is). Next, the base address in [33:15] of the descriptor in B is indexed by adding (modulo 15 bits) the 1 in A to it. The length field [8:10] in the word is set to zero, indicating this copy descriptor points to a specific word address instead of an array of words. This is effectively an absolute address, and it's left in the stack as the operation's result. Note that it points to the second word in the 10-word card read buffer, so I'll refer to it as BUF[1]. Note: This word in the stack becomes the parameter for the source address at F-4 used by the character-mode procedure called at 43:1.
34:2	2025	DUP	Duplicate the indexed descriptor in the stack.
34:3	0044	LITC @11	Push @11 (9) into the stack as an index for the OPDC, next
35:0	0106	OPDC @21	Operand call: push another copy of the word at R+@21 into the stack. Since it still looks like a present data descriptor with a non-zero length field, OPDC indexes it by the 9 just below it. Instead of leaving the address in the stack, though, OPDC fetches the word at the indexed address (@44+@11 = @55, the last word in the 10-word buffer) and replaces the descriptor with a copy of that word's contents (I'll refer to this value as BUF[9]). Thus OPDC works somewhat like "load" on other machines and DESC works somewhat like "load address".
35:1	2025	DUP	Duplicate the value of BUF[9]
35:2	3355	DIA @33	Dial A to bit 21
35:3	4061	DIB @40	Dial B to bit 24
36:0	2565	TRB @25	Transfer 21 bits from A (the dup of BUF[9]) to B (the original copy of BUF[9]) and delete the dup.
36:1	2025	DUP	Duplicate the word just updated
36:2	2265	TRB @22	Transfer 18 bits from the dup word to the updated original word and delete the dup. What these next DUP/TRB sequences do is convert 6-bit character codes to 3-bit octal digits to form an absolute memory address in the low-order end of the word. See below for details.
36:3	2025	DUP	Duplicate the newly updated word again
37:0	1765	TRB @17	Transfer 15 bits from the dup word to the updated original word and delete the dup.
37:1	2025	DUP	Duplicate the newly updated word again
37:2	1465	TRB @14	Transfer 12 bits from the dup word to the updated original word and delete the dup. This is the last of the octal-to-binary conversion
37:3	5355	DIA @53	Dial A to bit 33 (start of the address field in the updated copy of BUF[9])
40:0	5361	DIB @53	Dial B to bit 33 (start of the address field in the dup of the indexed descriptor pointing to BUF[1])
40:1	1765	TRB @17	Transfer the 15-bit address field from the updated copy of BUF[9] to the dup of the indexed descriptor referencing BUF[1] and delete the updated copy of BUF[9]. The resulting descriptor in the top of stack is now pointing to a memory address that was constructed from fields in BUF[9]. Note: This word in the stack becomes the parameter for the destination address at F-3 used by the character-mode procedure.
40:2	0000	LITC 0	Push a zero into the stack to begin constructing the third parameter, the transfer count.
40:3	0044	LITC @11	Push another 9 into the stack to be used as an index
41:0	0106	OPDC @21	Index the I/O descriptor again and load a copy of BUF[9] into the stack
41:1	2025	DUP	Duplicate the copy of BUF[9]
41:2	1555	DIA @15	Dial A to bit 11
41:3	2261	DIB @22	Dial B to bit 14
42:0	0165	TRB 1	Transfer one bit, i.e., B:=B&A[14:11:1] (or equivalently, B.[14:1]:=A.[11:1]), and delete the dup copy of BUF[9]. This is another character-to-octal conversion. There is a two-character count of words on the card in the second and third characters of BUF[9]. The value can't be more than eight, since that's the maximum number of words that fit on a card (see the discussion on ESPOL below), so only the low-order bit of the high-order character can be significant (8=@10).
42:1	2255	DIA @22	Dial A to bit 14
42:2	7261	DIB @72	Dial B to bit 44
42:3	0465	TRB 4	Transfer 4 bits from bit 14 in A (from the last TRB result) to the low-order 4 bits of B (the zero pushed at 40:2). Note: This word in the stack becomes the the parameter for the transfer count at F-2 used by the character-mode procedure.
43:0	0441	MKS	Construct and push a Mark Stack Control Word (MSCW) in preparation for calling a procedure. On entry to the procedure, this word will be at F-1.
43:1	0116	OPDC @23	Construct and push a Return Control Word (RCW) with a return address of 43:2 and call the Character Mode procedure using the program descriptor at R+@23 (@23 absolute). On entry to the procedure, F points to the RCW. Note that the behavior of OPDC is based primarily on the type of word it finds at the relative address, not anything in the opcode it self -- a very B5500 thing to do. Also notice that there is no code to save or restore register state on entering and exiting the subroutine -- all of that is completely automatic. That procedure (entry point at 24:0) transfers a number of words from a source address to a destination address. The count of words is in the stack just below the MSCW, the destination address below that, and the source address below that. These will be addressed using negative offsets from F, which points to the RCW pushed by the OPDC procedure-call syllable. Typically parameters are pushed into the stack between the MSCW and RCW, but character-mode routines are unusual in that they sometimes address the stack directly below their stack frame.
43:2	0500	LITC @120	Push the branch offset
43:3	4131	BBW	Branch to 20:0 to read another card

If you were watching the stack closely, you would have noticed that the parameters for the character-mode procedure are below the MSCW, so they don't get cut back when that procedure exits, as normally happens with parameters that are pushed between the MSCW and RCW. So does the stack just grow? That's problematic, as it starts at address 0, so we'd process only about five cards before the stack ran into our loader routine at @20. This had me puzzled until I recalled that servicing an interrupt (which is what the ITI syllable does, even in control state) unconditionally sets S to @100. Thus, every time we read a card, the top of stack gets reset to address @100.

The reason for the weird sequence of DUPs and TRBs starting at 35:1 has to do with the format of a card load deck. I noticed the code to create this format when I was working on ESPOLXEM, but didn't pay much attention to it. When puzzling over what was going on with BUF[9], that piece of the compiler popped into my mind, and it provided a major clue.

The hardware load function reads this loader card in binary mode, which can hold up to 20 words. The deck produced by ESPOL is in alpha mode, however, and contains a maximum of eight words per card. The first word (8 characters) of the card are zero and are ignored by the loader (I suspect these could have been used to hold sequence numbers or something similar). The payload is in the next eight words. Word 9 (zero relative) holds a control word, as the relevant snippit from ESPOL compiler below shows, building the value of ELBAT[9]:

09432000  BEGIN ELBAT[0]~0; I~16;
09433000        DO BEGIN MOVE(8,SAVINFO[I.LINKR,I.LINKC],ELBAT[1]);
09434000            ELBAT[9]~B2D(I+96)&1[11:47:1]&(I+96)[23:35:1];
09435000                 WRITE(DECK,10,ELBAT[*]);
09436000           END UNTIL I~I+8}SAVNDX;
09437000        FILL ELBAT[*] WITH 0,
09438000             OCT7500000000000012,
09439000             OCT0004535530611765,
09440000             OCT7006000404210435,
09441000             OCT7700000000000015,
09442000             OCT0253010477527705,
09443000             OCT0051000004410046,
09444000             OCT0441070001000062,
09445000             OCT0040413100000000,
09446000             OCT0001000000000101;
09447000        WRITE(DECK,10,ELBAT[*]);
09447010       ELBAT[0] ~0&REAL(DECKTOG)[1:19:17];
09447020       FOR I ~ 0 STEP 1 UNTIL Q DO
09447030            BEGIN K ~ STACKHEAD[I].[23:15];
09447040                 M ~ STACKHEAD[I].[38:10];
09447050                 FOR J ~ 0 STEP 8 UNTIL M DO BEGIN
09447060                      MOVE(8,INFO[(J+K).LINKR,(J+K).LINKC],
09447070                            ELBAT [1]);
09447080                      ELBAT[9] ~ B2D(J)&"310"[1:31:17];
09447090                      WRITE(DECK,10,ELBAT[*]) END;
09447100            END;
09448000  END END END PROGRAM;

This occurs at the very end of code generation for the program. The DO loop in the first five lines writes out the code. Word 9 looks like this:

Bits [24:24] (the low-order 24) are the address+96 where this data should be loaded, as four octal characters. The six-bit character codes for the numeric digits are the same as their binary codes (e.g., "0" = @00, "4" = @04, etc.), so to convert from character codes to binary, all we need to do is remove the high-order 0 digit in each character and pack the low-order digits together.
Bit [11:1] is set to 1. This appears to be setting the number of words on the card to 8 in the second and third characters of the word, but expressed in octal like the addresses above. "10" = @0100, so just one bit needs to be set.
Bit [23:1] contains the value of (address+96).[35:1]. This appears to be the 13th bit of the biased address, so the full address (expressed in characters as octal) is probably in [18:30] (i.e., the low-order five characters of the word).

The address is probably biased by 96 (@140) to allow room for the control-state stack that starts at @100.

One of the smallest ESPOL programs we have is KERNEL, a bootstrap for the MCP that was stored on disk. I've enclosed at the end an octal dump of the DECK output from a compile of that program using ESPOLXEM, so you can see what the data looks like. You can compare this data to the code listing of the program at bitsavers.org.

What the TRBs starting at 36:0 do is compress the leading zero octades out of the character codes of the address to form a binary number. For example, if the address is @03160, this would be represented on the card as "03160" or @0003010600, and the progression of the DUP/TRB sequences would produce the following successive results (only the low-order 30 bits of the word are shown):

0003010600	Initially, as read from word [9] of the card: "03160" in alpha
0000301060	After TRB @25 at 36:0
0000030160	After TRB @22 at 36:2
0000003160	After TRB @17 at 37:0
0000003160	After TRB @14 at 37:2 (the high order character is 0, so there's no apparent change).

The low-order 15 bits of this result are eventually used as the destination address by the character-mode move procedure. That relocates the data from BUF[1] through BUF[8] at memory locations @45-@54 to the address specified by BUF[9] for the number of words also specified by BUF[9].

It helps to draw little pictures of the stack as you are studying code like this. After 40 years of working with stack machines, I still need to do that for all but the simplest cases, and I certainly did for this one.

The one remaining piece of this puzzle is, how does this program terminate? Thus far, we've seen that a read-check exception will stop it cold, but hopefully that doesn't happen. All other I/O exceptions appear to be ignored, so for example, if the card reader goes not ready (jams or is out of cards), the RD will probably contain a "not ready" exception. The loader ignores that and just keeps on truckin', moving the data from the card read buffer (which hasn't changed) to the appropriate destination address memory (which also hasn't changed), until finally the reader starts working again and the loader sees new data. It's crude, but since the processor does not have any way to idle, it's effective.

I think the answer to termination lies in what the ESPOL compiler outputs after the generated code for the program -- an extra card, as shown starting at line 09437000 in ESPOLXEM. The FILL statement just moves literal values into an array, and the WRITE statement outputs it to the card deck. Those octal constants are interesting -- they are the exactly code for the ESPOL Transfer Card, another one-card loader documented in the MCP Operation and MCP Reference manuals cited above. Note in particular the load address in the last word of the array (and the last word of the octal dump below) -- 8 words at address @11. That means the data on that Transfer Card will overlay locations @11-@20 -- that last being the first word of the code for the ESPOL loader, which will now contain a branch to 16:2.

Thus, after the ESPOL Transfer Card is read from the deck and loaded at @11, the ESPOL Loader blithely branches back to 20:0 to initiate the I/O for the next card, only to find the IIO syllable isn't there anymore, having been replaced by a branch into the Transfer Card's code. Very sneaky.

The Transfer Card program is equally sneaky. It calls two procedures, a word-mode one that constructs an MSCW and stores it at address 1 absolute, and also arranges for F to point to address 1 when the routine exits (this is required for proper stack linkage in the just-loaded program). The second is a character-mode procedure that slides 3969 (63*63) words from address @160 to @20 -- a difference of @140 (96) words, matching the bias the ESPOL compiler added to the addresses in BUF[9]. That second procedure is called from address 17:3, so it returns to address 20:0 -- which was just overlaid by the first word of the 3969-word sliding move. Effectively that second procedure exits into the entry point of the program that had just been loaded and relocated downwards in memory.

The technique of having initialization code exit into the entry point of whatever it's initialized is a common one in the B5500, and continues to be used in the Unisys MCP today.

The remainder of the code emitted by the ESPOL compiler after writing out the ESPOL Transfer Card appears to be some sort of stack initialization code, but it apparently isn't effective in the case of KERNEL, so I can't be sure just what is being output there, or (since it's after the Transfer Card) how it would get loaded.

Finally, here is the octal dump of the DECK output for the KERNEL program. Each line of text represents one 10-word card image:

0000000000000000 0740623100000000 0000000000000000 0211101607302231 0004613100000000 0000000000000000 0014613100000000 0020613100000000 0060202102350000 0001000000010600 
0000000000000000 0064202102350000 0070202102350000 0074202102350000 0000000000000000 0050613100000000 0000000000000000 0060613100000000 0064613100000000 0001000000010700 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020000 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020100 
0000000000000000 0000000000000000 0534623100000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020200 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020300 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020400 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020500 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020600 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000020700 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000030000 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000030100 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000030200 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000030300 
0000000000000000 5000000000000000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 7660000000000022 7560000000000371 7560000000000400 0001000000030400 
0000000000000000 0400001421410230 1003014102301003 0141202111120055 0101102510211003 0141000010250421 0211023010030141 2021111600550225 0024223102301003 0001000000030500 
0000000000000000 0141000010250421 0044613100550055 0000000000010000 0000000000070000 0004101441211020 2021062001411003 0141777462551261 1265101004210204 0001000000030600 
0000000000000000 1003014110102021 5355304510250421 0204100301412021 1003014102001025 0421021010030141 0064102504210441 1642005502041003 0141202154251032 0001000000030700 
0000000000000000 0000442504740231 0224100301410004 1012045510451025 0421020410030141 2021100301411646 0055022410030141 2021745550610265 1025042104411652 0001000000040000 
0000000000000000 0055020410030141 2021542510320051 0204100301412021 1003014116560055 0224100301412021 7455506102651025 0421044116520055 0204100301412021 0001000000040100 
0000000000000000 5425103200510210 1003014102101003 0141202102241003 0141202100240401 0101102504210204 1003014120211003 0141000010250421 0220100301410441 0001000000040200 
0000000000000000 0204100301412021 1662005501411032 1025042102201003 0141202100004425 0024213100041003 0141001010121025 0421021410030141 0004101210250421 0001000000040300 
0000000000000000 0210100301412021 1012100304211002 0004100301412021 4125004022310230 1003014116660055 1025042102341003 0141167200551025 0421024010030141 0001000000040400 
0000000000000000 1676005510250421 0441170200551032 0051000000140131 0200100301410444 1025042104411706 0055023010030141 2021021510360200 1003014100501025 0001000000040500 
0000000000000000 0421044117120055 0230100301412021 0215103602001003 0141000010250421 0441171600550230 1003014120210215 1036023010030141 0230100301412021 0001000000040600 
0000000000000000 0200010110250421 0064100301411722 0055102504210070 1003014117260055 1025042100741003 0141173200551025 0421100200100301 1003042110020224 0001000000040700 
0000000000000000 1003014120217455 2461026502141003 0141202174552261 0265100304210004 0014214100000010 2141006410030141 4231000401042231 0024413100550055 0001000000050000 
0000000000000000 0140004000000000 0000000000004060 0140000100000000 0000000000006060 0140000040100000 3145652143312460 2124245125626260 2646516044234737 0001000000050100 
0000000000000000 0740000000000046 0140000047704235 0140000047700473 0140000041200017 0441023201004441 0253010453527705 3705005101002411 0000102642314006 0001000000050200 
0000000000000000 0235000000000000 0000700744110220 1003014104411022 1025042102201003 0141202141552345 4004042140060024 0415000044250140 0131400602350000 0001000000050300 
0000000000000000 0441100202001003 0141202101017006 5355304544410222 0104106601007006 1003014120210555 1045003402317006 0060715503610565 7004042102201003 0001000000050400 
0000000000000000 0141044170061032 1025042102201003 0141202100004425 0100013104350000 0000000000000000 0000000000000000 0000000000000000 0000000000000000 0001000000050500 
0000000000000000 7500000000000012 0004535530611765 7006000404210435 7700000000000015 0253010477527705 0051000004410046 0441070001000062 0040413100000000 0001000000000101

Wednesday, October 31, 2012

Compiler and Testbed Progress

We have made a couple of small tokens of progress over the past several weeks. This post is a brief update to report on those.

ALGOLXEM Compiles Algol and ESPOL

As previously reported, we had succeeded in porting the B5500 Mark XVI Algol and ESPOL compilers to the modern Unisys MCP (E-mode) environment, where they function as cross-compilers to generate B5500 object code files. We hope ultimately to feed these object files to our emulator. We call these cross-compilers ALGOLXEM and ESPOLXEM, respectively, with "XEM" standing for "cross-compiled on E-mode."

These are not quite full ports of the B5500 compilers, but are intended only to be sufficient to compile enough software that we can test and bootstrap the emulator into running the MCP operating system. In particular, most of the exotic source handling (patch merging, include files, CAST tapes, etc.) was difficult to port, and was simply removed from the compiler.

We had compiled a number of relatively small programs with these compilers, and as far as we have been able to tell, they had been generating the correct output. This hardly constitutes exhaustive testing, however, and I had been somewhat dreading the prospect of trying to compile the Algol compiler with its XEM counterpart. Frankly, I was expecting a train wreck in the form of compiler crashes, conversion errors, and source transcription errors that had gotten past the initial tests, and I wasn't looking forward to it.

Late last month, I found myself in possession of an idle afternoon. On a whim, I decided just to try compiling the B5500 Algol compiler and see what happened. Imagine my surprise when not only did the XEM compiler finish normally, but reported only seven syntax errors in the 12,700-line source file.

Three of those errors turned out to be straightforward typos in the original B5500 compiler source transcription. The rest were due to two bugs in ALGOLXEM, which in turn were due to typos in the original B5500 source. Finding and fixing those took all of an hour, at which point the XEM compiler compiled the B5500 compiler successfully on the second attempt.

This was too easy, so I decided to try the same thing with the B5500 ESPOL compiler. The first compilation finished with eight syntax errors, which all turned out to be typos in the ESPOL compiler source. After fixing those, ESPOL compiled on the second attempt as well.

No one is more amazed than me at the ease with which we have reached this step in the project. Of course, successfully compiling those two large programs hardly indicates that ALGOLXEM is working entirely correctly, but it is a very encouraging sign, indeed.

Comparative Timings

One interesting byproduct of these tests is that we now have some significant run-time data to compare against the B5500 times. I did these compilations on an MCP SDK (development) system called the LX100. For the past 15 years, Unisys has been very successful in emulating the E-mode architecture on Intel processors running under Windows, using that approach to replace proprietary hardware in the lower end of their product line. The SDK is composed of the standard E-mode emulator and a special development-only software license. In my case, it is running on a several years-old Dell workstation under Windows XP.

The emulated systems are typically throttled to a specified performance level. Unisys has long had a system performance metric termed RPM (Relative Performance Measure). It is determined from a mix of benchmarks for various types of workloads and normalized so that a specific model of the A6 (a late 1980s design) had an RPM value of 100. My SDK system has an RPM of 300. By way of comparison, a single-processor B6500/6700 system had an RPM of 50-60, and current high-end MCP systems have RPM ratings in excess of 20,000 per processor. To make a more general performance comparison, there are slightly less than 25 RPM per Gartner MIP.

Somewhere in the early 1970s I read a report stating that the B6500 designers were expecting an 8-10 times performance improvement over the B5500, but that they seldom got as high as a factor of eight except on heavily numeric benchmarks. That suggests that the B5500 probably had an RPM rating of roughly 7 to 10.

There are two sets of timings available from B5500 compilations. One is the time printed by the compiler at the end of its listing. This is the elapsed time, from the Algol TIME(1) intrinsic, excluding any initialization and termination for the compilation—specifically, it's the amount of time the compiler spent executing its PROGRAM procedure.

The other timings are those reported by the MCP. We have this for the Algol compilation on the B5500 from 1977, but not for the ESPOL compilation. The following table summarizes the data for the B5500 and XEM compilations (ET=elapsed time, CPU=processor time, I/O=channel usage time). In each case, these compilations generated a full listing and a cross-reference. All times are in seconds.

Compilation	Lines	Compiler ET	MCP ET	MCP CPU	MCP I/O
B5500 Algol	12,712	532	1045	736	369
XEM Algol	12,706	36	48	40	14
B5500 ESPOL	6,918	816	n/a	n/a	n/a
XEM ESPOL	6,916	21	28	24	8

The small differences in the number of lines of code reported by the compilers are probably due to the differing numbers of "$-card" pragmas which are not visible in the listings.

To be fair, the XEM runs were made on my otherwise idle development system. In addition to a 300 RPM rating, this system has 32 megawords of E-mode memory, so it does no virtual memory overlay. Disk I/Os are also heavily cached by the underlying Windows environment.

By contrast, the B5500 probably had the maximum 32 kilowords of real memory. We cannot know what other load was on the system while the compilation was taking place, although there probably was some. The 30% difference between CPU and elapsed time for Algol suggests that interference from other tasks was not all that great; the significantly higher elapsed time for the ESPOL compilation suggests that it was running in a busier mix of jobs. The Head-per-Track disk used by the B5500 had an average access time of about 40 milliseconds and a transfer rate of perhaps 200 KB per second—reasonable for its time but very slow by today's standards, let alone the lack of any memory caching.

From the single data point available by comparing the two compilations of the Algol compiler, it appears that the B5500 may have had an RPM rating of about 16 for this type of workload—about twice what I had been expecting. That would have made it about one-fourth the speed of the B6500 in this case, which considering that the B6500 seldom showed its hoped-for 8-10X improvement over the B5500, may have been typical performance.

Tuesday, September 4, 2012

End of the Summer Update

It has been a few months since the last post to this blog, but we're still here, working away on the B5500 emulator project, and have been quite busy. This post will summarize where we are on the project, what we've been doing, and what we see as the next steps. By way of a brief outline, here is what I'll cover:

Transcription of source code from scanned listings continues.
Work on the B5500 emulator itself has begun.
ESPOLXEM lives!
We are not alone.

Transcription of Source Code

As has been covered in earlier posts, we began this project with no machine-readable source or object code. Our only resource for B5500 system software has been scanned listings on bitsavers.org. Nigel Williams transcribed the Mark XVI Algol compiler source by hand, which has now been proofread, and a clone of it migrated to the modern Unisys MCP environment.

Over a year ago, Nigel discovered that Hans Pufal, of Angouleme, France, had similarly transcribed the ESPOL compiler source. I don't know how Nigel finds these things while sitting in Hobart, Tasmania, but he's a wizard at it. Hans generously donated his transcription to the project, and we have already put it to good use, as described in more detail below in the section on ESPOLXEM.

In late April, believing that the emulator would be fairly useless without a full MCP to operate it, I started transcribing the Mark XVI.0.187 DCMCP from 1977. "DC" refers to the "Datacom" MCP, as opposed to the "TS", or Timesharing MCP. The DCMCP provided basic batch and transactional terminal support, along with RJE. The TSMCP provided a time-slicing memory management scheme and the CANDE (Command and Edit) user interface. It was used by a number of schools and by a couple of commercial timesharing services in the late '60s and early '70s.

I found the prospect of having to key the whole MCP source from a listing to be daunting, but almost everything about this project is daunting, so I simply set those feelings aside and started pecking way. As Nigel found with the Algol source, this sort of transcription is best done in small pieces under a daily regimen. The listing of the DCMCP is about 700 pages, single-sided. I had it printed duplex, and first thing each morning, I try to key in at least one two-sided sheet of paper -- typically 80-100 lines of code. This week I passed the 10,000-line mark, which I think means I'm a little more than 25% along.

It is difficult to tell from the listing exactly how many lines are in the DCMCP source, for two reasons. First, the compile-time options specified for this compilation cause additional lines and page skips to be inserted into the listing. Most of the extra lines provide information for memory allocation of variables and for code segmentation. It prints the line number of the corresponding BEGIN when an END is encountered. The compiler also skips to the top of the next page after the end of each global procedure. With all of that, you can't easily translate page counts of program listing into line counts of source code. My guess is that the 700 pages of listing amounts to somewhere between 35,000 and 40,000 lines of code.

Second, and an issue with potentially more serious consequences, the MCP source is peppered with $OMIT conditional-compilation pragmas. These work similarly to the #if mechanism in a C compiler's preprocessor. Lines encountered by ESPOL when OMIT is true are not compiled. These pragmas are used to include or exclude optional features of the MCP, allowing a site to customize the features (and more importantly, the resulting memory utilization) for their system. There were more than a dozen optional modules that could be selected when compiling the MCP, as described in the B5500/B5700 System Operation Guide. Specifying these options and compiling your own copy of the MCP (which took a little over 20 minutes, 40 if you generated the cross-reference) is about as close to a SYSGEN as you ever got with the Burroughs systems, a characteristic that still holds today in the modern Unisys MCP.

The compiler has another option, $LISTA, that controls whether any omitted lines are listed. Alas, in the listing I am working from, that option was reset, so even after completing the transcription of the tens of thousands of lines in the listing, we still will not have the complete DCMCP source. What we have should work, but we will be missing the code for the modules that were omitted from the compilation when this listing was generated in 1977.

There is some hope for getting a better copy of the source code, however. The Computer History Museum in Mountain View, California, is in possession of a set of B5500 system tapes, reputedly a Mark XV release from 1975. These are the classic 10-inch, open-reel, 7-track tapes that were in common use during the 1960s and '70s. The tape drives for these went out of use about 30 years ago, though, so getting the data off these tapes has been a long-term challenge. Al Kossow, software curator at the museum, and saver-in-chief at bitsavers.org, has informed Nigel that the museum recently acquired a custom tape transport that they hope will be able to read the Burroughs tapes, along with others in their collection. We are hoping to hear of progress in this effort within the next few months.

Work on the B5500 Emulator

After a lot of discussion between Nigel and me, and no small amount of trudging through the Burroughs documents available on bitsavers.org, we started detailed design and coding for the emulator at the end of May.

I had initially thought we would build the emulator using something along the lines of Java or Python, but Nigel proposed that we use Javascript, and design the emulator so that it could be run from within a web browser. I was stunned by this idea, but Nigel pointed to some existing emulators that work this way. In particular, there is Fabrice Bellard's quite amazing jslinux, which emulates a basic Intel x86 processor and boots Linux!

After recovering from my shock at Nigel's proposal, both the wisdom of and challenge associated with this approach started to appeal to me. Thus we are proceeding to code the emulator in Javascript. A B5500 consisted of one or two processors, up to eight 4KiWord memory modules, up to four I/O units (DMA channels), a variety of peripheral devices, and a Central Control module to tie the whole thing together. We are concentrating initially on processor emulation, but have had to implement significant parts of Central Control as well, as it managed processor initiation, I/O initiation, interrupts, and the switching exchange between memory and the processor and I/O units.

At present, we have instruction stepping, instruction decode, and interrupt detection and vectoring implemented for the processor, along with almost half of the word-mode operators and about a third of the character-mode ones. We have addressed a combination of both simple and complex operators, and have plenty of both kinds left to be worked on. The largest areas that have not yet been addressed are the word-mode arithmetic, relational, and branching operators.

We have done nothing yet for I/O, which is going to be a challenge to address in the context of a web browser environment. There are a number of very interesting capabilities under development within the browser community, however, under the generic heading of HTML5. We will probably be taking advantage of several of those developments as they mature and move towards standardization. As it now stands, the emulator will probably require a fairly modern browser in order to operate. We are currently targeting the emulator for Mozilla Firefox and Google Chrome, with Apple Safari high on the list to follow. Microsoft's Internet Explorer can play once it decides to grow up, which it appears to be in the process of trying to do.

We also hope to be able to build the emulator so that it can be used in off-line environments, e.g., Mozilla Rhino. This means that the emulator and its user interfaces (operator's console, SPO, and maintenance displays) will need to be done in a way that different versions of the UIs can be built for different run-time environments and plugged into a common emulator engine.

We do not plan to post the code for the emulator to the project's Subversion repository until it is much further along, but hopefully we will have something to share by the end of the year. The design and implementation of the emulator is a subject that deserves a blog post on its own, which will be forthcoming.

ESPOLXEM Lives!

As we got the instruction decoding and several of the instructions implemented in the emulator, Nigel and I both built small test harnesses in an attempt to find out if any of that worked. We simply assembled some short instruction sequences by hand, had the test harness store the resulting binary words in B5500 memory, then preset the processor registers appropriately, and used Firebug or the Chrome debugger to watch what the emulator did. Within the limits of these tests, we were gratified to find that the instruction stepping, decode, and stack management were working.

Assembling B5500 instruction sequences by hand is not particularly easy, and translating the resulting octal codes to hexadecimal Javascript constants and then plunking them into absolute memory addresses is even less so. After a little bit of this, Nigel expressed a wish for an assembler to make the job easier. I balked, as not having an assembler for these systems has been a matter of Burroughsian pride for 50 years. Then I realized that we had something much better than an assembler: ESPOL.

ESPOL (Executive System Problem-Oriented Language) is a derivative of Burroughs B5500 Extended Algol. It was developed for the B5500 Disk MCP project after experience using a pseudo-assembler, named OSIL (Operating Systems Implementation Language), for the B5000 Drum MCP. OSIL generated B5000 object code, but you programmed in it as if the target had a three-address instruction set. OSIL did not run on the B5000 or B5500. It ran, reputedly slowly, as a cross-assembler on the Burroughs 220, an earlier vacuum tube machine.

As is discussed in the 1985 Burroughs B5000 Conference oral history, the B5500 team created ESPOL by starting with their then-current Algol compiler and ripping out the I/O constructs, automated run-time storage management, diagnostic facilities, intrinsic functions (SQRT, SIN, etc.), and the ability to nest blocks (although the ability to nest compound statements remained). In other words, they took out of the language all of the stuff you need an operating system to support. They then added a few things they thought they would need in order to get close enough to the hardware to generate an operating system, and some that they probably thought would just be cool to have. Principally, these were:

An octal literal, written as @nnn, where nnn is a string of up to 16 octal digits. As a consequence of this feature, ESPOL did not support floating-point literals in exponential (scientific notation) format.
An intrinsic array, MEMORY, that could access physical memory by absolute word address. As a shorthand, the compiler also defined an alias to this, M. MEMORY was a reserved word, but M could be redeclared by the program at deeper nesting levels.
A POLISH expression construct with which one could code arbitrary instruction syllable sequences. This was somewhat like an in-line assembler, but more powerful, and integrated well with the surrounding higher-level language. For example, the Algol expression A:=B+C could be coded exactly that way in ESPOL, or it could be coded in "Polish notation" as POLISH(B, C, +, .A, :=), or equivalently, as POLISH(B, C, ADD, .A, STD), with ADD and STD being the mnemonics for the stack-based, zero-address add and store-destructive syllables, respectively. The ".A" notation caused the stack-relative address of A to be pushed into the stack for use by the store instruction. As with MEMORY, POLISH had a shorthand alias P that could be redeclared at deeper nesting levels.
An in-line STREAM construct that, like STREAM procedures in Algol, provided access the hardware character-mode instructions, but without the necessity of defining a procedure out of line and then calling it.
SAVE procedures, all of the generated code for which was collected into one segment at the beginning of the object code file. The idea was that this segment would be loaded by the bootstrap routine, and would contain the initialization and low-level management functions of the operating system, with which the other segments of the operating system could be loaded as required.
Extensions to the way that statement labels could be defined. If L is a label identifier, then:
1. L: had the same meaning as in Algol.
2. L:: was similar, but caused the code for the resulting statement to be word-aligned, which was required for some forms of branching. The compiler would emit NOP (no-op) syllables as necessary to achieve the word alignment.
3. L:<number>: caused the statement to begin at the absolute word address specified by <number>. This was needed to define hardware interrupt vector locations.
4. L:*: specified that the statement should start at the first executable location in memory following the interrupt vector locations and PRT (Program Reference Table) The compiler determined this address from the global declarations present in the program.
5. L::: was used to precede not an executable statement, but a list of literal constants. This was used to generate data pools for C-relative addressing, the C register being the word portion of the current instruction address. This form of label could be used only in a POLISH expression, and not in a designational expression, e.g., as the object of a GO TO statement.
Type equivalence between arithmetic and Boolean expressions. The Boolean hardware instructions for AND, OR, etc., operated bitwise on whole words. In ESPOL these operators were considered to be arithmetic. The truth of a Boolean expression was determined by the hardware using only the low-order bit in a word, so in ESPOL, TRUE was equivalent to an odd value and FALSE was equivalent to an even one.
A few additional notations to better control the generation and use of addresses:
1. NAME declarations. These created variables that could hold memory references to other variables.
2. An expression in square brackets resulted in the enclosed expression being evaluated for a descriptor call (DESC) rather than an operand call (OPDC) instruction. This was frequently used with indexed array elements, e.g., [HEADER[6]] effectively evaluated to a pointer to HEADER[6], not the value of HEADER[6].
3. Prefixing a name expression with an asterisk caused the expression to be dereferenced with the LOD (load) operator. My favorite use of this is A[X]:=(*P(DUP))+1;. This construct proceeds as follows:
  1. The left-hand side of the assignment is evaluated as a name expression (a form of address), and results in a descriptor to the location of A[X] being left on the stack.
  2. The POLISH expression P(DUP) is evaluated. This simply generates the DUP syllable, which duplicates the value on the top of stack. Now we have two copies of the descriptor on the top of stack.
  3. The asterisk preceding the POLISH expression generates the LOD syllable, which causes the top descriptor in the stack to be dereferenced and replaced by the value to which it points. Now we have the value of A[X] on the top of stack, with the original descriptor word still immediately below it.
  4. The +1 generates a syllable to push a literal 1 onto the stack, followed by the ADD syllable. Now we have the value A[X]+1 on the top of stack, with the address of A[X] still below that.
  5. The := operator generates an XCH syllable to exchange the two top-of-stack words (store requires the destination address to be on top), and a STD (store destructive) syllable. "Destructive" indicates that both the address and value words are popped from the stack. There are also non-destructive stores that pop the address but leave the value at top of stack.
  In C, you would have written the equivalent statement as A[X]+=1, or perhaps ++A[X]. In Algol for the modern MCP you can get the same effect using A[X]:=*+1, where "*" in this context means that the left-hand side of the assignment is to be duplicated, dereferenced, and then used as the initial primary in the right-hand side expression, just as the more verbose ESPOL POLISH notation did.

ESPOL was obviously intended for use by people who knew how the hardware worked and were accustomed to programming at a level close to that of the hardware. All of the language extensions in ESPOL were subject to abuse, but the POLISH construct seems to have generated the most examples of egregious use. Since you could use this construct to directly manipulate the stack, ESPOL programmers did just that, frequently leaving intermediate values on the stack, and popping them into expressions later on. This was undoubtedly efficient, but it is often very difficult to follow. A few small parts of the MCP almost look like they were written in FORTH.

With that brief introduction to ESPOL, we can now return to the subjects of emulator test cases and the development of ESPOLXEM. Like the Algol compiler from which it was originally derived, ESPOL is written in B5500 Extended Algol. As described in earlier posts, I had been able to migrate the B5500 Algol compiler to the modern Unisys MCP environment, where it runs as a cross-compiler to generate B5500 object code files. Or at least that's what we hope it does, because we don't yet have any way to prove it, other than static inspection of the resulting code. We call this cross-compiler ALGOLXEM, with the XEM standing for "cross-compiled on E-mode," and E-mode being the Engineering name for the modern MCP architecture.

What we needed was the same type of migration applied to the ESPOL compiler source to create an ESPOLXEM that will run under the modern MCP. Over the past several weeks I have largely set aside work on the emulator to do just that.

We had Hans Pufal's ESPOL source, but it had not been completely proofread. We had learned from working with the Algol compiler source that proofreading needs to be done by someone other than the transcriber, anyway, so I set about to do that. Looking for a way to make the job easier and more reliable -- proofreading is even duller than transcribing -- and knowing that ESPOL was originally cloned from Algol, I decided to compare the two sources, hoping that there would be large chunks of the code that would be the same, and that for those I could do a much more mechanical comparison.

To compare the two source files, I used the excellent Compare It! utility from Grig Software. Indeed, there were large blocks of code that were identical between the two compilers, or at least similar enough that I could easily spot typos and other problems. There were also large blocks of code that were completely different, and some that were somewhat similar, but clearly had been modified or rewritten since the two compilers had diverged. In all, about half of the Algol compiler's code is not in the ESPOL compiler, and of the remainder, almost two-thirds was the same, or nearly so. It's somewhat amazing that this comparison approach worked as well as it did, as ESPOL was cloned from Algol probably sometime in 1963, and I was comparing the sources of two slightly different release levels from 13 years after that.

What is also interesting is that, where I found differences between the two sources that were due to errors in transcription, about a quarter of the problems were in the Algol compiler source. Most of these were in benign areas, such as comments or sequence numbers, but I did find a couple of subtle errors in the Algol source that would have been extremely difficult to spot otherwise. Corrections to these typos have been posted back to both Algol and ALGOLXEM in the project code repository.

With the proofreading pass completed, the next step was to apply the same type of migration technique I had used to generate ALGOLXEM from the Algol compiler source. I ran the ESPOL source through the pretty-printing program that I used in preparing ALGOLXEM, then manually reversed the bit numbering conventions used in partial-word designators (in B5500 Algol, the high-order bit of a word is 0 and the low-order bit is 47, but in modern Algol those are reversed).

I used Compare It! again to help identify areas in the ESPOLXEM source that needed to be addressed. I was able to bring across all of the stream procedure emulation code I had written for ALGOLXEM without change, and quite a bit of the converted code that replaced the stream procedures themselves. There were a few significant differences in the way the compilers worked, particularly in the way that the generated code was buffered and output, but these were relatively easy to deal with.

I don't know if the approach of comparing the ESPOL compiler source to our Algol compiler source saved any time, but it did appear to result in a more reliable migration. The time from completing the proofreading pass to the first completely successful compile of an ESPOL program was less than two weeks. At this point we have successfully compiled two programs, a small bootstrap program, KERNAL, that I transcribed from a bitsavers listing, and the significantly larger system cold/cool-start loader program, COOL, which Nigel kindly transcribed.

The final step in this process, which I am just beginning, is to instrument the emulator with the ability to load an object code file produced by the ESPOLXEM compiler, store the contents of that file appropriately in memory, initialize the processor registers, and initiate execution of the program. Once that is in place, we can start writing test cases to exercise the emulator.

We Are Not Alone

When we started this project, Nigel and I assumed that we would be the only ones on the planet foolish enough to attempt such a thing, and among only a few who would be at all interested in it. It therefore came as quite a surprise to be contacted this summer by not one, but two other people who are working on B5500 emulators. Sid McHarg in Seattle, Washington, started out simply trying to emulate the lights on the B5500 operator console, and when that proved insufficiently satisfying, decided to try emulating the whole thing. He started perhaps a year ago, and is now quite far along, with significant pieces of the system running. Sid is building his emulator using C++ on an Apple Macintosh.

A few months ago, Terry Heidelberg found our project on the web and contacted me. After some email exchanges, Terry decided to try his hand at building an emulator himself. He is proceeding with Python. We have given him access to our ALGOLXEM and ESPOLXEM cross compilers, and I believe he has managed to get some test cases running, but at last word was still working on significant parts of the instruction set.

The 50th anniversary of first customer shipment for the Burroughs B5000 will occur in early 2013. The 50th anniversary for the B5500 will be in early 2015. It is interesting that after all that time, there is so much interest in reviving this radically innovative architecture.

At present, we are staying in contact with these other projects and sharing questions and information, but not emulator code or design details. Speaking on Nigel's and my behalf, we don't want to know too much about how the other guys are going about their efforts, thinking it will be much healthier if we all proceed in our independent ways to get something working. Once we get to that point, it will be very interesting to be able to peek under each others' hoods and see how we each approached the emulator design differently. Meanwhile, we have a lot of work left to do.

Tuesday, April 24, 2012

The Agony and the Ecstasy of Stream Procedures

We have object code. Well, a little bit of object code.

For the past four weeks, I have been busy working on the B5500 Algol cross-compiler we call ALGOLXEM. The "XEM" stands for "cross-compiled on E-mode" which is the technical name for the present Unisys MCP architecture that has descended from the B5500.

Background

The whole purpose of this exercise is to have a way initially to generate object code for the emulator. We presently have no machine-readable source or object code from the B5500 -- all that we have are scans of listings of the system software on bitsavers, and a few small paper listings. Thus we need to turn the listings into source code and the source code into object code -- hence the XEM portion of this project to bootstrap the creation of object code.

The B5500 Algol compiler is written in Algol, and the modern MCP has an Algol compiler, the dialect for which has evolved from and is still somewhat similar to that for the B5500. It therefore made sense to try to use the modern MCP as the host for a cross-compiler and to port the B5500 compiler to modern Algol. It now appears that this approach actually is going to work. That cross-compiler is starting to generate what looks to be valid B5500 object code in the "code file" format used by the B5500 for an executable file.

As discussed in the last post, Nigel manually transcribed the 12,700-line source of the Algol compiler from the listing on bitsavers, and I have taken it from there. The first steps were to get that source into a form that could be used by the current MCP development tools, proofread the source by comparing it to the listing, and then reformat it using a pretty-printing utility into something that would be easier to read and understand. In the process, we identified a number of major issues that would need to be addressed in order to port the B5500 dialect of Algol to modern E-mode Algol:

Bit numbering for partial-word designators is reversed between the B5500 and modern Algol.
The B5500 and modern E-mode use the same 48-bit word size and numeric formats, but the B5500 used 6-bit characters, while E-mode uses 8-bit ones. The ability to process 6-bit characters natively was removed from the product line 30 years ago.
The I/O for the compiler would need to be reworked, partly to address 6/8-bit character differences, partly due to changes in the way that files are declared in a program, and partly to replace some constructs (principally the RELEASE statement) that are no longer supported.
The B5500 operated in two modes, word and character, with word mode being the default and primary one, at least for Algol programs. To provide for character mode, the compiler supported an additional syntax termed stream procedures. You called a stream procedure much like a standard Algol procedure, but syntactically they were little more than a free-form assembler for the machine's character-mode instruction set.

Of these, stream procedures are by far the largest and most challenging issue. The challenge comes in two parts. First, all of the stream procedure syntax would need to be replaced with equivalent functionality using E-mode constructs. Second, and more seriously, while word mode had hardware-enforced bounds protection and prevented user programs from modifying control words, character mode was much more primitive and had no such enforcement. If you knew what you were doing, you could manipulate absolute memory addresses and pass them between character and word mode. The people who wrote the compiler knew what they were doing, and they did just that.

Simplifying Source Input Management and Addressing

This memory addressing issue had me really worried. One of the precepts of the architectural line that began with the Burroughs B5000 was that user programs do not manipulate addresses. Instead they manipulate offsets from a base (or indexes into a vector of elements -- same thing), with the vector's base address maintained by the system in a protected control word called a descriptor. This approach allows the system to establish (and change) the base address without the program being aware of it. Descriptors are the basis for relocation, bounds protection, and what was originally referred to as automated overlay, but we now (somewhat erroneously) refer to as virtual memory. Alas, on the B5500, these addressing concepts were not yet fully developed, and in the implementation of character mode and stream procedures, not only could you manipulate absolute memory addresses, you didn't even need a descriptor on order to do so.

The addressing and descriptor concepts are much more fully developed in modern E-mode, and there simply isn't any way to manipulate and use arbitrary memory addresses. All addressing is based, in one way or another, on offsets from system-controlled descriptors. Modern Algol has POINTER and ARRAY REFERENCE variables, which can help to abstract a base address and pass it around within a program, but they are subject to up-level addressing restrictions, i.e., a reference in a more global scope is not permitted to point to an object in a more local scope, to avoid the possibility of dangling references.

There were a couple of hopeful signs in the addressing issue, however. First, in order to work reliably, any memory addresses that escaped from character mode routines had to point to so-called "save" memory -- areas that were fixed in position by the system and not subject to overlay paging. Such areas include physical I/O buffers, the program's PRT and stack area, and arrays explicitly declared with the SAVE keyword. Fortunately, the B5500 Algol compiler restricted its manipulation of addresses to these types of entities.

Second, it appeared that most of the address manipulation occurred between the compiler's token scanner and the source code input mechanism. This input mechanism was quite complex. The primary input to the compiler was from cards, but that could be redirected to a different medium, such as magnetic tape or disk. The compiler could also merge the primary input file with a secondary input file, based on sequence numbers in columns 73-80 of each record. In this mode, the primary file served as a patch against the secondary. There was also an "include" capability that would inject the contents of a file into the source stream, and something called a CAST file, which was essentially a library of source modules on a tape volume that was arranged and formatted for quick searching. If storing source libraries on tape seems odd today, the idea of storing source code on disk was still new in the early/mid-1960s, because disks were relatively new, relatively small, and very expensive. Most source programs in those days were maintained exclusively on punched cards. The final piece of input complexity was the compiler's DEFINE mechanism, which is similar to the #define construct that showed up later in C.

Low-level scanning is carried out by a stream procedure named, not surprisingly, SCAN. Much of character mode processing was based on a source index (SI) and destination index (DI). Each index was represented physically in the processor by a triplet of registers holding the word address (15 bits), character offset within the word (0-7, 3 bits), and bit offset within the character (0-5, 3 bits). These indexes were established from descriptor and numeric parameters passed to the stream procedure. Some character mode instructions, such as those transferring characters or words from source to destination, implicitly updated these indexes. Other instructions allowed the programmer to increment or decrement the indexes directly. You could also store the index value back into a parameter or return it as the value of the procedure. In either case, this could make the address available to the word-mode procedure that called the stream procedure. These stream procedure address were an 18-bit value composed of the word address in the low-order 15 bits and the character offset within that word in the high-order three bits. The bit offset within a character was not encoded by this format and could only be manipulated within a stream procedure.

These character-mode address values were used by the scanning mechanism to keep track of the current location on the source line where scanning was taking place. The starting location was passed to SCAN, which returned an updated location at exit. So far, no problem, but when the source of input changed -- say from the primary to secondary file, or from one of the file sources to the DEFINE text stored in the symbol table -- that 18-bit character-mode address was stashed away while scanning was directed to the other source and then reinstated when the source reverted back. Note that those addresses reflected not only the offset within the record area, but also the identity of the record area itself.

Worse, these redirections could be nested -- a $INCLUDE card from the primary input could be inserted into the secondary input's stream, which would redirect to the specified file, which in turn could invoke DEFINEs, references to which could be nested about 30 levels deep. It's important to note here that the Algol compiler was strictly a one-pass affair -- there was no separate pre-processor phase, so all of these input redirections were taking place while all of the scanning, parsing, lexical analysis, and code emission were also being done.

It's impressive how all of this worked, but it was extremely complex, and there was no obvious way to emulate the way the character-mode address values were being saved and restored within a modern E-mode environment. Not wanting to redesign and graft a new input-handling mechanism into the compiler, I decided to punt and change the problem. For the purpose XEM would be put to, we did not need all of that input richness, so I ripped out all of the input files except for the primary input. That left only one case of input redirection -- between the primary input and DEFINE text in the symbol table -- and that proved to be relatively easy to handle as a special case.

Input scanning now works using an ARRAY REFERENCE variable named SBUFF to point to the input source record (either CBUFF, the primary file's record area, or DEFINEARRAY, an existing array that holds DEFINE text during processing). It uses a pseudo character-mode address that has the word offset from the SBUFF base in the low-order 15 bits and the character offset in the next-higher three bits. This facilitates some manipulation of the character-mode addresses that is done in word mode, e.g., adding 32768 to the address to advance to the next character within a word.

The compiler also had a number of output files that were not really needed in XEM, so those were ripped out as well, along with any dependent routines (including some stream procedures) that used those files. In the end, the input/output mechanism of the compiler was reduced to one file for source input (CARD), one for compiled code output (CODE), one for generating an optional listing (LINE), and two temporary disk files used to generate an optional cross-reference listing.

There were a few remaining cases of character-mode addresses being returned to and manipulated by word-mode procedures, but fortunately those were kept pointing within one array, so I was able to deal with the word portion of the address as just an offset within that array and not as a general memory address.

Understanding Stream Procedures

That simplification of the source input mechanism solved the most serious of the stream procedure issues, and actually eliminated a number of stream procedures from the source, but that still left the fact that stream procedures were used for most of the text manipulation and formatting in the compiler. What to do about those?

In looking at the compiler during the proofreading phase, I had come to the conclusion that it would be better to retain 6-bit character coding for the compiler's internal operations, even though modern Algol does not have any facilities for directly manipulating 6-bit units. A lot of the data structures within the compiler were designed for 6-bit characters, and the resulting object code was going to have to be in terms of 6-bit characters anyway, so this seemed like the better way to go. Experience has since demonstrated that this was the right decision, but it meant that I would not be able to use much of the current language and hardware's very capable character manipulation capabilities -- the stuff that replaced stream procedures starting with the B6500.

To understand what needed to be done, we need to look at some stream procedure examples to see how they work and what they can do. Here's a simple example from the original compiler source, except that I've changed the B5500 special characters to their digraph equivalents:

1:  BOOLEAN STREAM PROCEDURE OCTIZE(S,D,SKP,CNT); VALUE SKP,CNT;        
2:    BEGIN                                  
3:    SI:=S; SI:=SI+3; DI:=D; SKP(DS:=3 RESET); % RIGHT JUSTIFY.       
4:    CNT(IF SC>="8"THEN TALLY:=1 ELSE IF SC<"0"THEN TALLY:=1; SKIP 3 SB;   
5:      3(IF SB THEN DS:=SET ELSE DS:=RESET; SKIP SB));           
6:    OCTIZE:=TALLY; % "1" = NON OCTAL CHARACTER.               
7:    END OCTIZE;

This procedure converts a string of octal characters to its binary equivalent. The string of characters is addressed by S (actually, this routine is designed to work from the token structure built by the scanner -- the first three characters have length and control information, so the text of the token begins in the fourth character). The binary result will be placed at the address of D (typically a REAL variable) offset by SKP octades. The number of octal characters converted is determined by CNT. The parenthesized constructs are loops, each preceded by its repeat count. This presentation is a little obscure, so let's reformat it a bit and add some comments to show what is going on.

1:  BOOLEAN STREAM PROCEDURE OCTIZE(S,D,SKP,CNT);   
2:   VALUE SKP,CNT;      
3:   BEGIN                   
4:   SI:=S;               % load source index from address in S
5:   SI:=SI+3;            % advance source index forward 3 chars to token text
6:   DI:=D;               % load destination index from address in D
7:   SKP(DS:=3 RESET);    % transfer 3-bit groups of zeros for (SKP mod 64) times to dest
8:   CNT(                 % repeat the enclosed statements (CNT mod 64) times
9:     IF SC>="8"THEN     % if the source character (SC) pointed to by SI >= "8":
10:      TALLY:=1         %   set TALLY (a 6-bit hardware register) to binary 1
11:    ELSE               % otherwise:
12:      IF SC<"0"THEN    %   if the source character is < "0"
13:        TALLY:=1;      %     set TALLY to 1 (this is an error flag)
14:       
15:    SKIP 3 SB;         % advance SI forward over the first 3 bits of the character
16:    3(                 % repeat the following 3 times
17:      IF SB THEN       % if the source bit pointed to by SI is a 1 (true):
18:        DS:=SET        %   set the destination bit addressed by DI to 1, advance DI
19:      ELSE             % otherwise
20:        DS:=RESET;     %   set the destination bit to zero, advance DI
21:           
22:      SKIP SB          % advance SI by one bit
23:      )                % end the repeat-3-times loop
24:    );                 % end the repeat-CNT-times loop
25:  OCTIZE:=TALLY;       % return the error flag (1 = non-octal character)
26:  END OCTIZE;

This would have been called from word mode with a statement similar to:

1:  IF OCTIZE(ACCUM[1],ACCUM[4],16-COUNT,COUNT) THEN  
2:    FLAG(521) % NON-OCTAL CHARACTER IN STRING.

OCTIZE is a good example of the character and bit manipulation that was possible with stream procedures. The character codes for the decimal digits were assigned to octal 00 through 11, so the low-order three bits of characters 0-7 were the corresponding octal value. TALLY (a six-bit register available for general use) was implicitly set to zero on entry to a stream procedure. Assignments to DS (the destination string) transferred data to the address specified by DI and incremented DI automatically by the units of data transferred (words, characters or bits). If at the start of a character or word transfer operation either of the indexes were not on a character or word boundary, respectively, the hardware automatically adjusted them forward to the appropriate boundary. Repeat values were limited to 63 (six bits), so nested repeat loops were often used to transfer larger quantities of data. Most stream statements compiled to a single 12-bit character-mode instruction.

Here is a more typical, but somewhat trickier example. This transfers up to 63 characters (N) from a source address (SORCE) offset by SK characters to a destination address (DEST) offset by DK characters. Both offsets are limited in this procedure to 127 characters from their respective bases:

1:  STREAM PROCEDURE MOVECHARACTERS(N,SORCE,SK,DEST,DK);           
2:    VALUE N, SK, DK ;           
3:    BEGIN   
4:    SI:=LOC SK;       % point SI to the location of SK (to the parameter word itself)
5:    SI:=SI+6;         % advance 6 chars (to the low-order 12 bits of the offset value)
6:    IF SC^="0" THEN   % if the source char is not zero (i.e., offset > 63):
7:      BEGIN   
8:      SI:=SORCE;      % point SI to the address in SORCE   
9:      2(SI:=SI+32);   % advance SI by 64 characters
10:     SORCE:=SI       % save the updated index back into the SORCE parameter cell
11:     END;  
12:       
13:   SI:=LOC DK;       % repeat the same process for DEST, DK, and DI
14:   SI:=SI+6;   
15:   DI:=DEST;                 
16:   IF SC^="0" THEN   
17:     2(DI:=DI+32);                  
18:      
19:   SI:=SORCE;        % restore the updated source index
20:   SI:=SI+SK;        % advance SI by (SK mod 64) characters
21:   DI:=DI+DK;        % advance DI by (DK mod 64) characters
22:   DS:=N CHR;        % transfer (N mod 64) characters from source to destination
23:   END MOVECHARACTERS;

Notice the significance of the LOC keyword. It causes the source or destination index to point to the parameter word in the stack itself, instead of to the location the parameter word addresses (assuming what it holds is actually an address). Also notice how all index increments and repeat counts are modulo 64. This is because the registers in the processor that hold these values are only six bits wide.

The hardware could distinguish a descriptor from an ordinary numeric operand, because all control words had their high-order ("flag") bit set. Descriptors only pointed to word boundaries, with the address of the word in the low-order 15 bits, but a numeric operand could represent both the word address and character offset using the 18-bit representation described above. When loading an address from a parameter word, the character-mode instructions dealt with the two types of words differently. In the case of MOVECHARACTERS, the SORCE parameter would normally be passed as a descriptor but be overwritten by the procedure at line 10 with an updated address in the 18-bit numeric-operand form. Note that what got overwritten was just the parameter word in the stack, not the original descriptor referenced by the caller.

There is one more example I want to show, which terrified me when I came across it and started to understand what it did.

1:  INTEGER STREAM PROCEDURE GETF(Q); VALUE Q;                 
2:   BEGIN   
3:   SI:=LOC GETF;    % point SI to return value cell
4:   SI:=SI-7;        % move SI to bit 6 of MSCW
5:   DI:=LOC Q;       % point DI to Q
6:   DI:=DI+5;        % move DI to bit 30 of Q
7:   SKIP 3 DB;       % move DI to bit 33 of Q [start of word addr field]
8:   9(               % move 9-bit register R value (PRT location) from MSCW
9:     IF SB THEN     %   into high-order 9 bits of Q's word addr field
10:      DS:=SET   
11:    ELSE   
12:      DS:=RESET;   
13:       
14:    SKIP SB  
15:    );      
16:  DI:=LOC Q;       % point DI to Q
17:  SI:=Q;           % load SI from value of Q (R+Q = R+3 from only call)
18:  DS:=WDS;         % copy FPB descriptor word from M[R+3] to Q
19:  SI:=Q;           % load SI with addr of FPB
20:  GETF:=SI         % return FPB memory address as the result
21:  END GETF;

What's worse, it was called from just one place in the compiler, during initialization, with this cryptic statement:

    IF EXAMIN(RR11:=GETF(3)+"Y08" )^=12 THEN

Yikes! What is going on here?

EXAMIN is a stream procedure that extracts and returns the 6-bit character code at an absolute character address in memory. The implementation of this procedure is very straightforward, and I won't go into its details here.

Upon entry, GETF points SI to the stack cell allocated for its return value, and then decrements that address into the prior word in the stack (on the B5500, a stack push increases the top-of-stack pointer, not decrements it as on some systems). That prior word is the MSCW (Mark Stack Control Word) that was pushed into the stack when the procedure was called. The primary function of the MSCW is to link this procedure's stack frame (activation record) to the prior one lower in the stack, but it also holds the value of the R register for the prior frame. The R register holds the address of the program's PRT (Program Reference Table, where the global variables and descriptors for the program are stored). The R register holds only the high-order nine bits of the PRT address; the low-order six bits of that address are implicitly zero. Physically, the low-order six bits of the R register were used to implement the stream procedure TALLY register.

GETF then moves the nine-bit R register value into the high-order nine bits of the word address portion of the value passed in parameter Q. It uses that address in Q to move the corresponding word in memory to the procedure's return value word, and exits, leaving that word on the top of the stack.

Since GETF is called with a value of 3 for the Q parameter, what it effectively does is return the word at PRT[3]. That word is a data descriptor to the program's File Parameter Block, or FPB. The FPB holds "label equation" information for a program's files -- the actual file names and device types that will be used at run time. It is initially built by the compiler, but updated by the MCP at run time from information that may optionally have been entered on control cards associated with the program's execution. For those of an IBM OS/360 bent, those control cards provided a capability somewhat like DD statements in JCL.

The "Y08" had me stumped for quite a while, until I realized that value is octal 700010, and to a stream procedure, that looks like word address 8 plus character offset 7. Since that value gets added to the value returned from GETF, which is a data descriptor with the absolute address of the FPB in its low-order 15 bits, what effectively gets stored in RR11 and passed to EXAMIN is the address of character 7 in word 8 of the FPB. The FPB has five-word entries, so what we're really looking at is word 3 (0-relative) of the second file declared in the program, which is the primary input, CARD. Character 7 of that word (the low-order six bits) identifies the device type to which the file is assigned, and 12 is one of the values used for disk. All of this information is available in the B5500 Handbook.

Thus this whole rigmarole was gone through to determine if the compiler's primary input would be coming from disk instead of something else. Now we know why file attributes were invented for the B6500 (and in a limited form, were back-ported to late versions of the B5500 MCP).

What did the compiler do with this information? The answer is that it simply adjusted some buffering and blocking factors for the file. The address into CARD's FPB entry was saved in the variable RR11, and the compiler continued to use increments of that address in multiples of five to perform similar tests on other files in the program and make similar adjustments to their file declarations.

How did I deal with this in preparing XEM for E-mode? That turned out to be quite simple. None of this type of file manipulation is normally necessary in modern MCP systems, and we have a much easier and safer way to get this type of information via file attributes if it did. I ended up deleting GETF and all of the references to its result, but I had to verify first that I could do that with impunity, and along the way it was quite a lesson in what people could (and did) do with stream procedures.

Converting Stream Procedures

Hairball case studies like GETF aside, I still had to deal with the 40-odd stream procedures that survived the simplification of the compiler's input and output handling. I wanted something that was modeled very closely on the stream procedure syntax and semantics, so that wherever possible, I could do one-for-one substitutions of the stream statements, and hopefully not worry very much about what those procedures were doing.

While modern E-mode systems no longer support 6-bit character codes natively, they have something that the B5500 didn't -- a much more capable facility for manipulating bit fields within words. In the Algol dialects of both the B5500 and modern E-mode, you can refer to a sub-field of a word in this way:

    W.[31:4]

This is termed a partial-word designator. It is a form of arithmetic primary. Inside the brackets, the number before the colon specifies the first (high-order) bit of the field and the number after the colon specifies the number of bits in the field. The result is the specified bit field, right-justified in a word of zeroes. Note that binary values in both systems are big-endian.

This partial-word notation accomplishes directly what you usually try to accomplish on other systems with shift and mask instructions. The B5500 had mask instructions (AND, OR, etc.) but no shift or rotate instructions -- it didn't need them. Isolation of a field of bits was quite efficient. On the B5500 it required three instructions, two to load the bit numbers into registers, and a third to extract and shift the field.

This syntax works on both systems, but there are two significant differences in it between the B5500 and E-mode:

The bit numbering is reversed -- bit 0 is high order for the B5500 and low order for E-mode.
On the B5500 the field specifications had to be constants, but with E-mode they can be arbitrary arithmetic expressions, albeit limited in range to 0-47 for the starting bit number and 1-48 for the field length. E-mode also allows a field to extend beyond the low-order bit and wrap around to the high-order end of the word. Further, when the bit numbers are constants, E-mode isolates the field using a single instruction.

The bit number reversal was a nuisance that I managed to resolve largely using some regular expressions and the text editor's find-and-replace capability, but E-mode's ability to have dynamic bit field specifications saved my bacon when it came to implementing 6-bit character handling.

There is a second form of partial word operation, termed bit concatenation, that was especially useful in emulating stream procedure behavior. It looks like this:

    A & B[23:31:4]

This is another form of arithmetic primary. What it does is transfer a bit field from the value of B into the value of A and yield the updated value of A as the value of the primary. The first value inside the square brackets specifies the high-order bit in A where the destination field will start, the second value specifies the high-order bit in B where the source field will start, and the third value specifies the number of bits to transfer. On an E-mode system, all three values can also be arbitrary arithmetic expressions, limited in range the same way as for partial word designators. The bits in A that are outside the specified destination field are not altered. A must be a primary, but B can be an arbitrary expression. Bit concatenation also has a shorthand form, A&B[23:4], where the first value inside the brackets is the starting destination bit number and the second value specifies the field length. The transfer is implicitly from the low-order bits of B.

I decided the easiest thing to do would be roughly model the way that character mode worked on the B5500. Rather that try to develop a separate module for this, I just embedded the emulation in the XEM compiler source. I defined global variables for the word, character, and bit index registers. The word indexes could not hold full addresses as on the B5500, so I also defined ARRAY REFERENCE variables to point to the memory areas to be operated on (separately allocated memory areas are generically referred to as "arrays" in Algol, although they are in fact one-dimensional vectors of memory words). The word address "registers" then held just a word offset within these arrays.

Character mode operations were implemented with a set of defines and procedures. Most stream procedure parameters that reflected addresses were converted into two parameters -- the array reference and the word offset within that array. For example, a stream procedure invocation P(ACCUM[1]) in the original source would be converted to P(ACCUM,1) in XEM. Internal to P, the stream statement SI:=A was converted to STREAMSETSI(A,AX), where A and AX are the formal parameters corresponding to actual parameters ACCUM and 1, with AX being the parameter introduced to carry the word offset. STREAMSETSI was implemented like this:

1:  DEFINE WOFF = [14:15] #, COFF = [17:3] #;  
2:    
3:  DEFINE STREAMSETSI(A, X) =  
4:   BEGIN  
5:   MBASE:= A;  
6:   MREG:= (X).WOFF;  
7:   GREG:= (X).COFF;  
8:   HREG:= 0;  
9:   END STREAMSETSI #;

MBASE is the array reference variable and the rest are ordinary REAL variables. In character mode, the source word address was held in the M register, the character offset within the word was held in the G register, and the bit offset within the character in the H register. The corresponding registers for the destination index were S, K, and V.

One of the most common things to do with a stream procedure is to transfer characters from a source address to a destination address. On the B5500, that would have been coded similar to this:

1:  SI:=S;  
2:  DI:=D;  
3:  DS:=30 CHR;

In XEM, that now looks similar to this:

1:  STREAMSETSI(S,0);  
2:  STREAMSETDI(D,0);  
3:  STREAMTRANSFERCHR(30);

The implementation of character transfer is this:

1:   DEFINE STREAMTRANSFERCHR(N) =  
2:     STREAMTRANSFERCHRQQ(MBASE, SBASE, N) #;  
3:   PROCEDURE STREAMTRANSFERCHRQQ(MBASE, SBASE, N);  
4:    VALUE N;  
5:    ARRAY MBASE, SBASE[0];  
6:    REAL N;  
7:    BEGIN  
8:    REAL  
9:     NC;     % CHARS LEFT TO TRANSFER  
10:    
11:   STREAMADJUSTSICHAR;  
12:   AREG:= MBASE[MREG];  
13:   STREAMADJUSTDICHAR;  
14:   BREG:= SBASE[SREG];  
15:   NC:= N;  
16:   WHILE NC > 0 DO  
17:     BEGIN  
18:     BREG:= * & AREG[47-KREG*6 : 47-GREG*6 : 6];  
19:     NC:= *-1;  
20:     IF GREG < 7 THEN  
21:       GREG:= *+1  
22:     ELSE  
23:       BEGIN  
24:       GREG:= 0;  
25:       MREG:= *+1;  
26:       IF NC > 0 THEN  
27:         AREG:= MBASE[MREG];  
28:       END;  
29:    
30:     IF KREG < 7 THEN  
31:       KREG:= *+1  
32:     ELSE  
33:       BEGIN  
34:       SBASE[SREG]:= BREG;  
35:       KREG:= 0;  
36:       SREG:= *+1;  
37:       IF NC > 0 THEN  
38:         BREG:= SBASE[SREG];  
39:       END;  
40:     END WHILE;  
41:    
42:     IF KREG > 0 THEN  
43:       SBASE[SREG]:= BREG;  
44:   END STREAMTRANSFERCHR;

The source and destination array reference variables, MBASE and SBASE, had to be declared locally to the converted stream procedure body to avoid up-level addressing restrictions, so the define was used to pass these as hidden parameters to the "QQ" procedure that did the actual transfer. The STREAMADJUST... calls are used to ensure that the source and destination indexes are on a character boundary, as required by the character-mode semantics.

When the processor was in character mode, the top-of-stack A and B registers were used as word buffers for the source and destination strings, respectively, and I used similarly-named global variables as word buffers. Note that I do not use these quite the same way the A and B registers were used in the B5500 -- those registers would have maintained their state between character-mode instructions, but I load and flush them inside each emulation procedure.

The actual data transfer takes place by means of the bit concatenation expression on line 18. Everything else is register state maintenance.

This approach is not the most faithful implementation of stream procedure semantics, it's not complete, and it's certainly not the most efficient one, but it doesn't need to be any of those. The goal in this phase of the project is simply to get the ALGOLXEM cross-compiler to work. Ultimately that compiler needs to do only three things: (a) generate test cases for the emulator, (b) compile the B5500 version of itself, and (c) compile the ESPOL compiler (yes, some day we plan to have an ESPOLXEM). Once we have the MCP bootstrapped to the emulator environment and the compilers working there, we won't need the XEM tools anymore.

I was successful enough in this approach that most stream statements indeed can be replaced one-for-one with emulation constructs, and in many cases the conversion of stream procedures in the compiler source was just mechanical substitution of one form of statement for another. I'm relatively happy with how it turned out.

The remaining piece of 6-bit character handling concerned getting data into and out of the compiler. Output to the CODE file and I/O for the two cross-reference temporary files is done in word-mode binary, so those were not sensitive to six- vs. eight-bit character coding considerations. Source input to the compiler was done using "array-row I/O," somewhat like fgets() and fputs() in C. Output to the listing file was a combination of array-row I/O and formatted I/O, which is similar to a FORTRAN FORMAT. It was easy to leave the formatted I/O more or less alone and have it operate natively in EBCDIC, but the array-row I/O constructs had to be converted.

I wrote a routine to translate the native EBCDIC character set used by E-mode to the B5500 character codes and pack those codes eight to the word. I wrote another routine to reverse the process and produce EBCDIC from 6-bit coded data. At each array-row I/O statement I inserted the appropriate translation routine after a read or before a write to allow the actual I/O to the file to remain in EBCDIC but the data used internal to the program to be in B5500 6-bit code. This was fairly easy to implement and works well.

Interestingly, the B5500 had three six-bit character sets. The first of these is BCL (Burroughs Common Language), which was a standard character code used to communicate with peripheral devices. The second is the internal code used to represent character data. This is commonly referred to as BCL. but it's actually not. Some documents refer to it as BCL-Internal, but in this project, we're referring to this character coding as BIC -- B5500 Internal Code. The third code was the collating sequence for character comparisons. This code was not physically represented, but was used implicitly by the hardware to determine the relative ordering of character codes. This collating sequence was decidedly not the same as the binary value of either the BCL or BIC codes. The stream procedure emulation had to take this separate collating sequence into account in the implementation of character-mode comparison operations.

Current Status

Our pretty-printing utility reformatted the compiler source into almost 20,000 lines. Conversion work to date has modified over 4,000 of those. Much of that is in deletions resulting from the simplification of input and output, and additions associated with the new code to emulate stream procedure operations. Reversing bit numbers in partial-word expressions affected almost 4% of the remaining lines. Most of the other differences are changes to individual stream procedures and their invocation parameters, and corrections to typos in the original transcription.

At this point we have successfully compiled a small, but non-trivial program using ALGOLXEM. I had a listing from 1970 of a B5500 Algol program that included the generated object code. This provided a basis for comparing what XEM was generating against what the real compiler once generated, and we resolved a lot of problems in the process of getting XEM to produce comparable output. We are still finding typos in the transcription of source from the listings on bitsavers, and expect to continue to do so as we put the cross-compiler through other tests. Other than those typos, the only true bugs we've found are -- not surprisingly -- ones in my stream procedure emulation logic.

I have posted the output from this test to the project web site. There is a download file available on that site with the output of the XEM compilation, including the compiler listing and binary code file. I have also posted a PDF scan of the original B5500 listing from 1970.

With only one good test under our belts, ALGOLXEM is hardly finished, and we are nowhere near being ready to try compiling the B5500 compiler with it. Significant portions of the compiler have not yet been exercised, particularly defines and word-mode procedure calls. We are actively looking for more B5500 code listings we can use to assemble test cases and validate the output of the compiler, but we are probably going to have to write quite a few tests ourselves and determine on our own whether the generated code is correct -- or at least looks like it would have worked.

If you have any B5500 compiler output you think may be useful, particularly listings with generated code, please contact us through the project web site.

The XEM compiler in its current state has been committed back to the project's Subversion repository, along with a new utility, CODEDUMP55, I wrote to generate a semi-formatted dump of a B5500 code file. If you are interested, have a look.

Thus passeth the agony of stream procedures. The only ecstasy in this is that I'm done with them. Mostly. I think.