Enter email address here to subscribe to B5500 blog posts...

Saturday, March 29, 2014

SWITCH vs. CASE, Part 2

This post continues the discussion of SWITCH vs. CASE as implemented in the Burroughs B5500 Extended Algol compiler. In Part 1, we examined the code generated for each of these constructs and analyzed their differences. In this second part, I will analyze what was wrong with the program I wrote to explore those constructs and describe how to fix it.

To briefly recap the discussion thus far, I wrote a small Algol program shortly after going to work for Burroughs in 1970. The ostensible purpose of this program was to examine the code for both the SWITCH and CASE constructs of the language to determine which was more efficient.

Today, being surrounded by such a glut of inexpensive, incredibly powerful computing devices that they literally have become hazardous waste, it is easy to forget how precious and expensive computer time was a few decades ago, and how difficult it often was to come by. The start of my career at Burroughs was blighted by assignment to a boring documentation project that offered no opportunity to program. Recreational programming was rarely an option in those days, so when the subject of SWITCH vs. CASE came up within another group in the office, I leaped at the chance to get a coding fix and help them decide which construct they should use.

Capitalizing on this opportunity, I decided to go beyond writing a program simply to analyze SWITCH and CASE, and add a few more things to see how they worked as well, including a couple of Stream Procedures that would attempt to dump the program's Program Reference Table, or PRT. In hindsight, that was more than a little foolish, as Stream Procedures, improperly used, could compromise the health of the entire system, and they were quite easy to use improperly.

I managed to demonstrate that ease on my first attempt by making a couple of really dumb mistakes in coding the Stream Procedures. I did not get the desired results, and the program aborted with an Invalid Index (array bounds violation) fault. Now, decades later, the goal is to figure out what happened and fix it.

The Dumb Mistakes

While the program and listing discussed in Part 1 were successful in illuminating how SWITCH and CASE worked, I didn't get the PRT dump I wanted. I got part of a dump before the program aborted with the Invalid Index interrupt, but it turns out that dump was not of the program's PRT.

What the program tried to do was copy the PRT to the array A in the program, and then format the words from that array as octal to a printer file. Copying the PRT was to be the responsibility of the MOVEPRT Stream Procedure. It turns out that I was on the right track, but made two serious errors.

The basic idea of the procedure is simple -- get the address of a word with a known offset within the PRT, adjust that address downward to the beginning of the PRT, then copy some number of words from that address to the destination array. The first variable declared in an Algol program is always at PRT offset 25 octal, so I chose that location as the base. The procedure has four parameters: a descriptor containing the address of the first variable, a descriptor for the destination array, and two integers, the first representing the number of words to copy divided by 64, and the second the number of words modulo 64. The reason for having two parameters is that repeat counts in B5500 Character Mode are limited to six bits -- values 0-63 -- so the div/mod parameters will be used as repeat counts for two nested loops.

The mistakes in this procedure are all on line 19 of the program:
SI ~ LOC PRT25; SI ~ SI - 21;
The first statement was intended to assign to the source index (SI) the address of the word at PRT offset 25 octal (the variable I in the program). Alas, what it assigned was the address in the stack of the PRT25 parameter itself. Dumb. I was confused about the semantics of LOC -- that keyword should not be there. The second statement was intended to back down that address by 21 decimal (25 octal) words to the beginning of the PRT. Alack, what it did was back down the address by 21 characters. Dumber.

The rest of the procedure was written correctly. The destination index (DI) was assigned the address of the array A, and the appropriate number of words gets moved by the "DS~...WDS" constructs. The parentheses indicate repeat loops, which are preceded by their repeat count, limited to the range 0-63. The hardware forces word transfers to begin on a word boundary, so even though the adjustment to SI above left it pointing in the middle of a word, whole words starting on word boundaries were transferred to the array.

At this point the alert reader may have noticed that the destination array is declared in the inner block as A[0:I], and the call on MOVEPRT at line 50 in the program uses I as the number of words to transfer, but I is never assigned a value in the program. How could that ever work? The answer lies in the second control card of the deck, "?COMMON = 100". That command stores the specified value in the first scalar variable declared in the program, before execution of the program begins. In this case, that store is to the integer I at PRT offset 25. Thus, A has dimensions of [0:100] and a length of 101; 100 words will be moved by MOVEPRT.

But what got moved to the array A? Since SI was adjusted backwards by 21 characters (two words plus five characters), it is pointing into the third word below the location of the PRT25 parameter in the stack. A word-oriented transfer adjusts the address, if necessary, forward to the next word boundary, so the transfer actually began two words below the location of that parameter and continued for 100 words. What the output in the original listing shows is a piece of the program's stack, starting one word below the stack frame for the call on MOVEPRT:
  • The first word in the output (all zeroes) is whatever was at top-of-stack before MOVEPRT was called.
  • The second word (beginning with a 6) is the Mark Stack Control Word (MSCW) that starts the stack frame for the call. The primary purpose of this word is to link to the prior stack frame, which is at address 12262 octal.
  • The third word (beginning with a 5) is the parameter PRT25. This is a data descriptor pointing to the variable I in the program, at address 13325 octal.
  • The fourth word is a data descriptor for the array A. The data for this array is present in memory at address 11737 octal.
  • The fifth word (value 1) is the value of I (100) divided by 64 and truncated to an integer. This word is the parameter N1.
  • The sixth word is the value 36 (100 mod 64 = 36, or 44 octal), although that may not be very obvious, as it is in B5500 floating-point notation. The RDV syllable that implements the Algol MOD operator produces a result in floating-point format, even if it is an integer value. This word is the parameter N2. The fact that this value is not a normalized integer is another problem, as discussed below.
  • The seventh word is the Return Control Word (RCW). The primary purpose of this word is to hold the procedure return address (12163 octal) and to link back to the MSCW (at 12264 octal). 
  • Any local variables for the procedure would appear after the RCW, but this procedure has none. What we see in the rest of the output is whatever was left in the stack by prior push-pop activity.
 The corrected statements for line 19 should look like this:

SI ~ PRT25;   8(SI ~ SI - 21);

Removing the LOC keyword causes SI to load the address of the variable passed as the parameter PRT25 (i.e., I), not the address of the parameter word itself. Adding a repeat of eight around the adjustment to SI decrements the index backward by 21 words instead of 21 characters. This could also have been written "21(SI~SI-8)", but the former involves larger decrements with fewer loop iterations, so is more efficient.

There is another bug concerning the MOVEPRT procedure, but it is in the call on line 50, not in the procedure itself. I discovered this as I was looking over the program's output in the original listing. As mentioned above, the value of N2 is in floating point format from that MOD operator used in the call, not in normalized integer format. The problem is that B5500 Character Mode doesn't know about floating-point numbers, and when presented with a dynamic repeat count, it simply takes the low-order six bits from the word. The low-order six bits of that floating-point word are zero, so wherever N2 is used in the Stream Procedure, the repeat will be executed zero times instead of 36 as intended. The same problem occurs with the MOD operator in the call to BINOCT on line 51.

One way to fix this problem is to compute the modulo count and assign it to an integer variable, then pass that integer variable as the parameter. Another way to force integer normalization is an integer divide (Algol DIV operator) by one on the result of the MOD operator. Since the variable K was not being used for anything at that point in the program, a quick-and-dirty solution is to use that variable in an in-line assignment solely for the side effect of generating an integer result, thus:

MOVEPRT (I, A[*], I DIV 64, (K ~ I MOD 64));
BINOCT (I DIV 64, (K ~ I MOD 64), A[*], B[*]);

Perhaps the best way to deal with the limit of 63 for Character Mode repeat counts is to do the div/mod inside the Stream Procedure, thus:

  LOCAL N1, N2;
  SI ~ LOC N;   SI ~ SI + 6;          % POINT TO 7TH CHAR OF N
  DI ~ LOC N1;  DI ~ DI + 7;          % POINT TO 8TH CHAR OF N1
  DS ~ CHR;                           % MOVE SIX BITS TO N1
  DI ~ LOC N2;  DI ~ DI + 7;          % POINT TO 8TH CHAR OF N2
  DS ~ CHR;                           % MOVE SIX BITS TO N2
  SI ~ PRT25;   8(SI ~ SI - 21);      % POINT TO PRT (NO CHANGE)
  DI ~ A;                             % POINT TO ARRAY A (NO CHANGE)

In this approach, the N1 and N2 parameters have been replaced by a single parameter, N. It still relies on the value of N being an normalized integer, but that is easier to accomplish in the call than with the MOD operator. N1 and N2 are now declared as words local to the Stream Procedure; they will be allocated in the stack frame for that procedure. Actually, since Character Mode did not use the stack as such, these locals are passed as hidden parameters by the caller. This allocation is done automatically by the compiler. The hidden parameter words are guaranteed to have a zero value upon entry to the procedure.

The partitioning of the count N into div-64 (in N1) and mod-64 (in N2) portions works as follows:
  • The address of the parameter word for N in the stack is stored in the Source Index, SI. This is an example of proper use of the LOC keyword -- SI is loaded with the location (address) of N rather than the contents of N. That address is then advanced by six characters to point to the second-lowest order character in the word. Assuming the count will be less than 4096, that character holds the binary value of the count, div 64.
  • Similarly, the Destination Index, DI, gets the address of the word allocated for N1. This address is advanced by seven characters to point to the low-order character in the word.
  • One character is moved to the Destination String (DS). This implicitly references SI and DI, and advances both by the number of characters moved. That results in the div-64 value being stored in the low-order six bits of N1.
  • Next, DI gets the address of the word allocated for N2, and that address is advanced to the low-order character in that word.
  • One character is again moved to DS. SI was left pointing to the next (low-order) character of N by the prior move, so this moves the mod-64 value to the low-order six bits of N2.
At this point, the values of N1 and N2 can be used in the same manner as in the original procedure. There is another optimization that can be made, however. The original procedure moved the mod-64 portion of the data by means of "N2(DS~WDS)". This specifies a loop repeating N2 times, moving one word on each iteration. It is significantly more efficient to write this as "DS~N2 WDS", which moves N2 words in one operation, avoiding the loop management overhead.

A similar technique can be used with BINOCT to pass a single parameter for the count and have it partitioned into div-64 and mod-64 values within the procedure.

Now that we have the problems with the Stream Procedures corrected, the next issue is the Invalid Index interrupt that aborted the program. The fault occurred during evaluation of one of the list elements in the WRITE statement at line 55, which is part of the FOR J loop that starts on line 52. The termination message on the listing gives us a strong clue as to what the problem is:

-INVALD INDEX CASESW /PAULROS= 3, S =   3, A =  63, 201 GEQ 201

As mentioned earlier, "S=3, A=63" refers to the decimal segment number and word offset within the segment where the interrupt occurred. The "= 3" preceding that refers to the program's mix number (equivalent to a PID in Unix/Windows). "201 GEQ 201" describes the bounds violation. The index value used was 201, which being zero-relative, exceeds the 201-word length of the array.

The problem is either the terminating value for the FOR loop, or the dimensionality of the array B -- take your pick. The loop is attempting to format the words from the array A as octal values. The BINOCT call at line 51 does the translation from binary to a character string representing the octal values of the words, but since we are translating from three-bit octades to six-bit characters, the destination string must have twice as much space as the source. Therefore, B should have twice the length of A, but it doesn't:

ARRAY A[0:I], B[0:2|I];

The variable I has the value 100, so A has a length of 101, and B has a length of 201. Oops.

As the joke goes, there are only two hard things in Computer Science: cache invalidation, naming things, and off-by-one errors.

To fix this, we need either to make that length at least 202 or to terminate the loop one iteration earlier. The latter is probably the more correct solution (since we moved only I words from the PRT into A), but I chose the former. With either of those corrections, the program will complete without throwing the Invalid Index interrupt.

There is still one more bug in this program -- well, okay -- at least one more bug. If you look at the program's output on the original listing, it consists of two columns. The left column is intended to be the PRT offset in octal, with the right column containing the octal value of the word at that offset. Alas, the left column is all zeroes. The PRT offset is not being formatted properly.

The offset is formatted by the BINOCT call on line 54. One word is formatted from the address of J to two words at the address of Y. Since Z is declared immediately after Y, the second word of the formatted result (the one with the significant half of the offset) will be in Z. Alas, the WRITE statement at line 55 references Y, which coming from the high-order bits of J, contains all zeroes. The solution is simply to substitute Z for Y in the WRITE statement's list.

In sum, the fixes to the program from 1970 required the following changes (shown in red) to five lines:

13: ARRAY A[0:I], B[0:2|I+1];
19:   SI ~ PRT25;   8(SI ~ SI - 21);
50:   MOVEPRT (I, A[*], I DIV 64, (K ~ I MOD 64));
51:   BINOCT (I DIV 64, (K ~ I MOD 64), A[*], B[*]);
55:       WRITE (PR, F1, Z, B[J|2], B[J|2+1]); 

That is a lot of bugs for a 55-line program, but perhaps is not so bad for a first attempt. Considering what I have been able to learn from it, not only in 1970, but more recently in the retro-B5500 emulator project, I count this as a very successful effort. That I finally got the program to work properly is just gravy.


In addition to the original listing cited in Part 1, you may be interested in the following files and documents, generated from the retro-B5500 emulator. Note that the listings from 1970 were produced by a B5500 running the Mark X system software release, probably with some local-site patches. The emulator is running the base Mark XIII software release from late 1971, so you should expect to see some slight differences in the output.


About two weeks after my fling with the B5500 in October, 1970, I was released from the durance vile of that boring documentation project and transferred into a group at the Benchmark Facility at Burroughs Great Valley Laboratories, also in suburban Philadelphia. There, in preparation for eventually becoming the software tech at a customer site, I began learning a new machine, the Burroughs B6500. Now 43 years later, I'm still learning the B6500 -- in the guise of its modern direct descendants bearing the Unisys ClearPath logo.

That little piece of Algol discussed in these two posts was the last time I wrote a program for (or even used) a B5500 until I started working on the emulator two years ago. It's gratifying to have finally been able to get that last program right.

I have yet to code another SWITCH.

Sunday, March 23, 2014

SWITCH vs. CASE, Part 1

I wrote a small program in 1970, and finally got it to work a few months ago. Here's the story...

The story has gotten a little long in the telling, so I have divided it into two parts. The nature of the division will become clear shortly. This post represents Part 1. Part 2 will be published next week.


In the Spring of 1970, I graduated from the University of Delaware with a degree in Chemical Engineering and a well-developed aversion to anything having to do with chemistry, which persists to this day. I had become increasingly interested in computers and software, though. During my senior year, the University acquired a Burroughs B5500, which really grabbed my interest. To make a long story short, upon graduation I went to work for the Burroughs Corporation near Philadelphia, Pennsylvania.

Just as I began that job, an economic recession took hold in the United States, and Burroughs was hurting a bit. To avoid having to lay me off, my management assigned me to a technical documentation project for a message-switching system. That assignment was really boring, carried on into October, and offered no opportunities to program -- let alone use -- computers. It was depressing, but not having much in the way of options, I stuck with it.

Another group in the office was just starting a project to redesign a law-enforcement information system. That system was based on the B5500 and written in Algol. I would overhear the other group's discussions, and with my job not requiring a lot of concentration, occasionally pay attention. At some point a question arose within this group as to which was more efficient, the Algol SWITCH construct or the then-new CASE construct.
  • SWITCH is a standard Algol construct, and acts somewhat like an array of labels -- you use it with a one-relative index value in a <designational expression>, generally as part of a GO TO statement to implement a multi-way branch. In its simplest form it is similar to the "computed go-to" of FORTRAN or "go-to depending on" of COBOL. 
  • The B5500 Algol CASE statement was a Burroughs extension to standard Algol, and is much like a modern case or C-style switch statement, but the cases are not labeled -- each statement in the body of the CASE is implicitly numbered starting from zero, only one of which is selected for execution based on the index value. BEGIN/END pairs could be used, of course, to create a block or compound statement as one selection in the CASE body. 
  • A significant difference between the two is that using an out-of-range index value with a SWITCH effectively made its GO TO  a no-op. Using an out-of-range index with a CASE statement terminated the program.

I got caught up in this discussion, and someone (probably me, as I was desperate for an opportunity -- any opportunity -- to do some programming) suggested writing a test to try both approaches and examine the code generated by each one. In any case, I spent a little time playing hookey from what I was supposed to be doing and teamed up with one of the programmers from the other group, Rose, who had an account on the division's B5500. I probably volunteered to write the program and keypunch it. More likely, I begged to do it.

The Program

Young whippersnapper that I was, I decided to expand my charter a bit and try to find out more about the B5500 while I had the chance. I had done some Algol programming on the B5500 at Delaware, and as a student employee in the Computer Center there had even made some minor changes to the XALGOL compiler. I had only a rough idea, though, of the inner workings of the machine and its instruction set. Thus, I decided I would put in a few constructs I was interested in knowing more about, including Stream Procedures. Not only that, but in a moment of complete recklessness, I decided I would try to have the program dump its own PRT.

On the B5500, the PRT, or Program Reference Table, is an area of memory that stores the the global variables plus some information the MCP uses to manage execution of the program. In an Algol program, the PRT holds the declarations of the outer block; in COBOL, the Data Division declarations. The processor's R register points to the base of this area. The other major data area for a program is the stack, which immediately precedes the PRT in memory. That arrangement allows the R register to serve as a limit register for the S (top-of-stack) address register, to detect stack overflows.

In order to access the PRT as a vector of words, it was necessary to use a Stream Procedure. The intended purpose of Stream Procedures in Burroughs Extended Algol is to provide access to the processor's Character Mode capabilities. Character Mode operations consist largely of manipulations of a source address (SI, the Source Index) and a destination address (DI, the Destination Index), plus various ways to test data and transfer (or stream) it from source to destination. Most operations can start and end in the middle of a word; data transfers can take place in units of words, characters, or bits.

Character Mode was sort of a last-minute add-on to the design of the B5000, the original product that became the B5500. It was taken from the design for an earlier machine, possibly the 4111, which was never built. It was quite powerful, and gave efficient character and bit manipulation capabilities to what otherwise was a strictly word-oriented scientific machine. The most significant characteristic of Character Mode, though, is that all of the nice bounds protection built into Word Mode was bypassed. Stream Procedures could potentially address and manipulate anything in memory. This was a capability that could be used for good or evil, and it was. You could implement sophisticated parsing and data movement operations in Character Mode via Stream Procedures. You could also crash the system with it.

Thus, my self-expanded charter for that test program was more than a little dangerous, especially since I didn't really know what I was doing. I don't think Rose knew all of what I was trying to do, either. There was exactly one B5500 for my whole division of Burroughs. It was quite busy, doing everything from payroll to (quite literally) rocket science, and having it crashed by a new-hire trying to satisfy his curiosity would not exactly have been a career-enhancing move. Oblivious to these risks, I wrote the program, keypunched it onto cards, and put it in the inter-office mail to be run at division headquarters. Here is what the card deck would have looked like, using the character-coding conventions of our B5500 emulator:

 2: ?COMMON = 100
 3: ?DATA
 5: %    CASE VS. SWITCH      10/01/70              ROSE & PK
 7: INTEGER I, J, K;
 8: FILE OUT PR 18 (2,15);
 9: BEGIN         %%% INNER BLOCK %%%
10: REAL X, Y, Z;
11: LABEL L1, L2, L3;
12: SWITCH S ~ L1, L2, L3;
13: ALPHA ARRAY A[0:I], B[0:2|I];
14: FORMAT F1 (X20,O,X5,2O);
17:   VALUE N1, N2;
19:   SI ~ LOC PRT25;   SI ~ SI - 21;
20:   DI ~ A;
21:   N1(2(DS ~ 32 WDS));   N2(DS ~ WDS);
25:   VALUE N1, N2;
27:   SI ~ S;
28:   DI ~ D;
29:   N1(32(32(DS~ 3 RESET; 3(IF SB THEN DS ~ SET ELSE DS ~ RESET;
30:                           SKIP SB))));
31:   N2(16(DS ~ 3 RESET;   3(IF SB THEN DS ~ SET ELSE DS ~ RESET;
32:                           SKIP SB)));
35: L1:
36:   J ~ 3;   GO TO S[J];
37: L2:
38:   CASE J MOD 10 OF
39:     BEGIN
40:       J ~ 3;
41:       K ~ J;
42:       X ~ K +J;
43:       Y ~ X ~ SQRT(X);
44:       ;
45:       Z ~ 2|Y + 6.0;
46:       ;
47:       K ~ 5000;
48:     END CASE;
49: L3:
50:   MOVEPRT (I, A[*], I DIV 64, I MOD 64);
51:   BINOCT (I DIV 64, I MOD 64, A[*], B[*]);
52:   FOR J ~ 0 STEP 1 UNTIL I DO
53:     BEGIN
54:       BINOCT (0, 1, J, Y);
55:       WRITE (PR, F1, Y, B[J|2], B[J|2+1]);
56:     END;
58: END.
59: ?END
Note that in our emulator, we use the tilde (~) to represent the B5500 left-arrow for assignment, and the vertical bar (|) to represent the small-cross for multiplication.

There may have been an initial compile that had syntax errors -- I don't clearly remember -- but on 3 October 1970 this program compiled successfully and ran. The card deck above was reconstructed from a listing of that run, which you can view here. Since we were interested in how the two constructs worked at the machine level, that listing includes the generated code, enabled by the DEBUGN option on the $-card at line 4. The instruction mnemonics are described in the B5500 Reference Manual. The notations on the listing are mine from 1970.

Alas, while the program ran, it did not do what I had intended, and as you can see from the end of the listing, it aborted with an Invalid Index interrupt (i.e., an array bounds violation) at "S=3,A=63". That stands for "segment 3, word offset 63" (decimal). If you look on the listing at the four-digit numbers on the far right of the source code lines, offset 63 for segment 3 is within the code generated for sequence number 00051000 (the WRITE statement on line 55 in the card deck above). Specifically, the fault occurred at the instruction "0373 DESC 0036" (Descriptor Call on PRT offset 36 octal, the array "B"). The 0373 is an octal syllable offset from the beginning of the segment. At four syllables per word, offset 63 (decimal) times four is 252, which is 374 octal. Interrupt state is stored by the processor in a manner similar to that for a subroutine call, so the "return" address for the interrupt is one after the syllable that caused the fault. No return is possible from this type of error, however.

It turns out that I was really lucky with this run, because the Stream Procedure MOVEPRT did not work properly at all. Fortunately, what it did was benign, and while it accessed memory locations it shouldn't have, at least it did not overwrite anything it shouldn't have. The invalid index did not have anything to do with the dumb mistakes I made in MOVEPRT -- that was due to an entirely different dumb mistake, as will be seen in Part 2.

The remainder of this Part 1 post will analyze how the SWITCH and CASE constructs work, and what we learned about them from this little program. Next week, Part 2 will analyze the dumb mistakes I made in writing the program, what actually happened when it ran, and what it took to make the program work properly.

SWITCH Declarations and Invocations

The bulk of the program above consists of an inner block that begins at line 9. Execution for that inner block begins at line 35. The SWITCH S is declared at line 12 to select among three labels, L1, L2, and L3, based on index values 1, 2, and 3, respectively. That SWITCH is used on line 36, where the integer variable J is set to 3 just before the statement "GO TO S[J]".

If you look at the listing, the SWITCH declaration generates some code. The way that code works will be clearer if we first examine what happens when the SWITCH is invoked. From line 36 in the program:

J ~ 3;   GO TO S[J];
    0160  LITC  0003  0014
    0161  LITC  0026  0130
    0162  ISD         4121
    0163  OPDC  0026  0132
    0164  LITC  0021  0104
    0165  ISD         4121
    0166  LITC  0035  0164
    0167  LBU         6131
    0170  NOP         0055
    0171  NOP         0055
    0172  NOP         0055

The first three syllables handle the assignment to J by (a) pushing the literal value 3 onto the stack, (b) pushing a literal value for PRT offset 26 octal (J) onto the stack, and (c) executing the Integer Store Destructive (ISD) syllable. The "destructive" indicates that both the PRT offset and value will be deleted from the stack when ISD completes.

The code for the SWITCH invocation starts at offset 0163:
  •  The Operand Call (OPDC) syllable copies the value of the switch index at PRT offset 26 octal (J) and pushes that value onto the stack. Then the offset for PRT location 21 octal is pushed, followed by another integer-store syllable. This copies the value of J into a word in the lower part of the PRT that is reserved for the MCP and compilers -- variables in the outer block of an Algol program are assigned higher PRT locations beginning at 25 octal.
  • At offset 0166, the value 35 octal is pushed onto the stack, followed by a Long Backward Unconditional (LBU) branch instruction. The "long" branches compute a destination address by taking from the stack an offset in words that is relative to the location of the branch itself, and branching to the first syllable in that destination word. The LBU is at offset 0167 octal, which is word 35 octal plus syllable 3. Going backwards 35 octal words lands us at offset 0 in the code segment, which is the start of the code for the SWITCH declaration, discussed immediately below.
  • The No Operation (NOP) syllables after the LBU have a purpose, which will be explained shortly.

Note that the code shown above is not quite what you will see on the listing. The Burroughs Algol compilers were (and still are) strictly one-pass affairs. The compiler generates code as it is reading and parsing the input source lines. This means that forward references, such as a branch to a point in the program that has not been encountered yet, cannot be resolved until later.

To deal with this, the compiler resorts to what I call "back-patching." When it encounters a forward reference, it makes an entry in its symbol table for the as-yet unresolved destination point and reserves syllables at the current place in the instruction stream for the instructions that will ultimately need to go there. It often stores linkage data in that reserved space, so that multiple references to the same unresolved address can be chained together. After the compiler encounters the destination point in the input source, it reaches back into the previously-emitted instruction stream to fix up the syllables that had been reserved earlier, overwriting those syllables with the correct opcodes and address offsets.

This behavior can be really confusing when you first look at it in a code listing, especially since the data that is initially emitted in the reserved spaces is often formatted as if it were instructions, and the fix-ups are output in the code listing intermixed with whatever else the compiler is generating at the moment. You have to pay attention to the octal offset on the left side of the lines of generated code to understand what is really happening. In the examples here, I have unraveled all of that out-of-order generation so the code will be easier to follow.

With that introduction, the code generated for the SWITCH declaration is this:

SWITCH S ~ L1, L2, L3;  
    0000 LITC 0000 0000  
    0001 OPDC 0021 0106  
    0002 GEQ       0125  
    0003 OPDC 0021 0106  
    0004 LITC 0003 0014  
    0005 GTR       0225  
    0006 LOR       0215  
    0007 OPDC 0021 0106  
    0010 DUP       2025  
    0011 ADD       0101  
    0012 BFC       0231  
    0013 LITC 0156 0670  
    0014 BFW       4231  
    0015 LITC 0031 0144  
    0016 LFU       6231  
    0017 LITC 0033 0154  
    0020 LFU       6231  
    0021 LITC 0047 0234  
    0022 LFU       6231  

The LBU syllable from the SWITCH invocation branches to offset 0000 in the SWITCH declaration.
  • The SWITCH declaration begins by pushing a zero onto the stack, followed by the value from PRT offset 21 (the copy of the value of the switch index, J). 
  • The Greater-Than-or-Equal (GEQ) syllable tests whether the second word in the stack is greater than or equal to to the top word in the stack; if so it pushes a one in the stack, otherwise it pushes a zero. In both cases, the original values are popped from the stack before the result value is pushed. On the B5500, a binary value is considered to be "true" if its low-order bit is set, so the zero and one values correspond to Algol FALSE and TRUE, respectively. All other bits in the word are ignored. Another way to remember this is that on the B5500 and its descendants, the truth is always odd.
  • At offset 0003, the value of the switch index at PRT location 21 octal is again pushed onto the stack, followed by a literal 3 and the Greater-Than (GTR) syllable. This works just like GEQ except for the difference in the relation of the two top-of-stack values being tested. 
  • We now have two Boolean values on the top of the stack, which are combined using the Logical OR (LOR) syllable. Once again, both original values are popped from the stack before the result value is pushed. What these two tests and the LOR have accomplished is to determine whether zero is greater-than-or-equal to the switch index (i.e., index<1) or the index is greater than 3. Since the SWITCH has three elements, this tests whether the index falls outside the valid range for the SWITCH.
  • With the result of the LOR remaining on top-of-stack, an OPDC 21 at offset 0007 pushes another copy of the index on top of that, followed by Duplicate (DUP) and Add (ADD) syllables. DUP simply makes a copy of the word on top-of-stack, and the ADD sums the two copies, effectively multiplying the value of the index by two. 
  • This is followed at offset 0012 by a Branch Forward Conditional (BFC) syllable. Unlike the "long" branch above which only branches to word boundaries, this is a syllable-oriented branch. The word at top of stack is the number of syllables to branch, relative to the location of the syllable after the branch. With a conditional branch, the second word in the stack holds the condition, which in this case is the result of the LOR above. The branch takes place if the condition is false (i.e., its low-order bit is zero), which is what you want for IF statements. Whether the branch occurs or not, both the branch offset and condition are popped from the stack.
  • Following the conditional branch, starting at offset 0013, are four pairs of literal-call/branch syllables. If the condition resulting from the LOR above is true (i.e., the index is out of bounds), the branch will not take place, so control simply proceeds in sequence. This will result in a branch forward of 156 octal syllables, or to offset 173 octal, which is the first instruction after the code generated for the SWITCH invocation on line 36. Thus, if J is out of bounds, the GO TO on line 36 is effectively a no-op, as the semantics of Algol require.
  • If the conditional branch is taken due to J being within bounds, one of the next three pairs of literal-call/branch syllables will be selected. As it takes two syllables to effect a branch -- one to push an offset and one to do the branch -- the value of the switch index had to be multiplied by two (DUP, ADD) to obtain the correct offset. The LFU opcode is a Long Forward Unconditional word-oriented branch. 
  • Note that the literal values for the relative offsets used by the branches will take control to the locations for labels L1, L2, and L3, respectively. The compiler emits NOPs as necessary to align the code following these labels on word boundaries, as the long branches require.

So much for how the SWITCH works in this program. But wait -- it is possible to use a SWITCH in multiple places in a program. In the code above, if the switch index is out of bounds, there is a branch to a fixed location. How can that be right if you use the SWITCH more than once?

The answer to that is an example of back-patching at its finest. For the first invocation of the SWITCH, the compiler generates exactly what is shown above. If it encounters a second invocation of the SWITCH, the compiler changes its strategy and fixes up both the code for the SWITCH declaration and first use of the SWITCH to use the new strategy. The fixed-up code for the SWITCH declaration will look like this:

SWITCH S ~ L1, L2, L3;  
    0000  LITC  0000  0000  
    0001  OPDC  0021  0106  
    0002  GEQ         0125  
    0003  OPDC  0021  0106  
    0004  LITC  0003  0014  
    0005  GTR         0225  
    0006  LOR         0215  
    0007  OPDC  0021  0106  
    0010  DUP         2025  
    0011  ADD         0101  
    0012  BFC         0231  
    0013  LITC  0017  0074  
    0014  RTS         1235
    0015  LITC  0034  0160  
    0016  RTS         1235
    0017  LITC  0035  0164  
    0020  RTS         1235
    0021  LITC  0036  0170  
    0022  RTS         1235  

and an invocation of the SWITCH will look like this:

J ~ 3;   GO TO S[J];
    0160  LITC  0003  0014
    0161  LITC  0026  0130
    0162  ISD         4121
    0163  OPDC  0026  0132
    0164  LITC  0021  0104
    0165  ISD         4121
    0166  OPDC  0032  0152
    0167  XRT         0061
    0170  LOD         2021
    0171  BFW         4231
    0172  NOP         0055

The differences from the original code are shown in red. What the compiler has done is convert the code for the SWITCH into a subroutine.
  • It is not shown here, but the compiler also allocates four additional PRT locations to hold Program Control Words (PCWs) for the entry point to the subroutine that SWITCH S has become and the locations of the labels L1, L2, and L3. Like all control words for the B5500, these will have their high-order (flag) bit set. The branch syllables are sensitive to the flag bit and can accept either an integer offset or a PCW on the top-of-stack as a branch destination. Executing an OPDC that references a PCW results in a subroutine call to that code location (OPDC is the do-it-all kid -- it will also index arrays if its top-of-stack operand is a data descriptor).
  • With the code for the SWITCH declaration reconfigured as a subroutine, testing the bounds of the index and selecting a pair of syllables to branch works the same as before, but instead of directly branching, the selected syllables return a PRT offset value as the subroutine result. RTS is the Return from Subroutine syllable, which takes the value on top-of-stack (the result of the selected LITC syllable in this case), cuts back the stack used by the subroutine, and branches to the return address, leaving the original top-of-stack value at the new top-of-stack location.
  • The code that invokes the SWITCH now does an operand call on the PCW for the SWITCH's subroutine, which will return the PRT offset for one of the label PCWs, as selected by the index value. XRT (Set Variant) extends the range of PRT addressing for the next syllable (it is not needed in this small program, but might be in programs having more than 512 words in their PRT). LOD (Load Operand) takes a PRT offset on top-of-stack and replaces it with the value at that PRT location, which will be one of the label PCWs. BFW (Branch Forward) is an unconditional syllable-oriented branch. It can accept either an integer offset or a PCW as its operand.
  • Note how the NOPs at offset 0170-0171 have been overwritten by the extra code needed to implement the subroutine-based switch invocation code. The one-pass compiler could not know whether the SWITCH would be referenced more than once, so initially emitted code that efficiently supported the simpler scenario.

In this example, the PCW for the SWITCH's subroutine is at PRT offset 32 octal. That subroutine will return one of 17, 34, 35, or 36 octal, depending upon the value of J. Offsets 34, 35, and 36 represent the PCWs for labels L1, L2, and L3 respectively, but offset 17, which is returned if J is out of bounds, is in the area of the PRT reserved for the MCP and compilers. What is that? It turns out that the word at PRT offset 17 octal always contains a zero. Thus, if the SWITCH index is out of bounds, the BFW will branch forward zero syllables, which is effectively a no-op, and control continues with the next statement in the program.

Switches can be even more complex than we have seen here, as the elements of a SWITCH declaration can themselves be designational expressions (e.g., invocations of some other SWITCH) which must be evaluated at run time. Investigation of how that works is left as an exercise, dear reader, to you.

CASE Statements

In contrast to SWITCH, the implementation of CASE statements is simple and straightforward. For each statement (which may be a compound statement or a block) that is immediately subordinate to the CASE statement, the compiler determines a relative branch offset. It then constructs a one-dimensional array of those offsets, indexed by the zero-relative case value. The array is stored in the object file for the program, and is pointed to by a data descriptor that is placed in the PRT. Indexing that descriptor by the CASE expression yields the offset to the appropriate statement.

The CASE statement in the program above has seven subordinate statements, and the compiler generates syllable offsets of 0, 5, 10, 17, 40, 26, 40, and 35 decimal, respectively, for them. The code to execute the CASE statement looks like this (again, with the back-patching unraveled for clarity):

      0170  OPDC  0026  0132
      0171  LITC  0012  0050
      0172  RDV         7001
      0173  OPDC  0042  0212
      0174  BFW         4231
    J ~ 3;
      0175  LITC  0003  0014
      0176  LITC  0026  0130
      0177  ISD         4121
      0200  LITC  0043  0214
      0201  BFW         4231
    K ~ J;
      0202  OPDC  0026  0132
      0203  LITC  0027  0134
      0204  ISD         4121
      0205  LITC  0036  0170
      0206  BFW         4231
    X ~ K +J;
      0207  OPDC  0027  0136
      0210  OPDC  0026  0132
      0211  ADD         0101
      0212  LITC  0032  0150
      0213  STD         0421
      0214  LITC  0027  0134
      0215  BFW         4231
    Y ~ X ~ SQRT(X);
      0216  MKS         0441
      0217  OPDC  0032  0152
      0220  OPDC  0043  0216
      0221  LITC  0032  0150
      0222  SND         1021
      0223  LITC  0033  0154
      0224  STD         0421
      0225  LITC  0016  0070
      0226  BFW         4231
    Z ~ 2|Y + 6.0;
      0227  LITC  0002  0010
      0230  OPDC  0033  0156
      0231  MUL         0401
      0232  DESC  1777  7777
      0233  ADD         0101
      0234  LITC  0034  0160
      0235  STD         0421
      0236  LITC  0005  0024
      0237  BFW         4231
    K ~ 5000;
      0240  DESC  1777  7777
      0241  LITC  0027  0134
      0242  ISD         4121
      0243  LITC  0000  0000
      0244  BFW         4231

The statement begins by computing the CASE index. The OPDC pushes the value of J onto the stack, LITC pushes the value 10 decimal onto the stack, and RDV (Remainder Divide) implements the MOD operator. The heavy lifting is done by the next syllable, another OPDC for PRT offset 42 octal:
  • That PRT location contains the data descriptor for the array of syllable offsets. 
  • When OPDC detects that its operand on the top-of-stack is an unindexed data descriptor, it applies the second word in the stack as an index to that descriptor, computes the memory address of the indexed element of the array, and loads the value at that address onto top-of-stack after popping both the descriptor and index value. 
    • A data descriptor contains the length, address, and presence status of the data for the array it describes.
    • If the value of the index value is less than zero or greater than or equal to the length of the array stored in the descriptor, OPDC will raise an Invalid Index interrupt and quit. Unless this fault is trapped by the program, the MCP will terminate the program.
    • Initially, the presence status of the descriptor is "absent" to indicate that the contents of the array it describes are not present in memory. Thus, the first time we execute this CASE statement, the OPDC will simply raise a Presence Bit interrupt and quit, allowing the hardware to branch to the appropriate interrupt vector.
    • The MCP will respond to the P-bit interrupt by allocating an area in memory of appropriate size (seven words in this case), reading the array element values from the object file on disk into that new area, fixing up the descriptor with the address of the new area, setting the presence bit [2:1] in the descriptor to indicate it now points to a real memory address, and exiting back into the program to restart the OPDC syllable, which this time will complete the index-and-load operation.
    • It is possible that the array of syllable offsets may be forced out of memory later due to pressure from other memory allocation activity, in which case the MCP simply deallocates the memory area and fixes up the descriptor for it in the program's PRT to point back to the copy of the data in the object code file. The area does not need to be written out to disk, since the system considers it to be read-only, and therefore not dirty. The next time the CASE statement is executed (if ever), the same P-bit process will bring the array back into memory from the code file.
The final result, whatever machinations are involved, is that the syllable offset ends up on top-of-stack. The syllable after the OPDC, BFW, uses that offset to branch to the beginning of the selected subordinate statement in the CASE statement. The BFW is at syllable offset 174 octal in its code segment, so the offsets in the array identified above would yield branch locations of 175, 202, 207, 216, 245, 227, 245, and 240 octal, respectively. Recall that syllable branches are relative to the syllable after the branch.

After the end of each of the subordinate statements, the compiler inserts a branch to the code for the statement following the CASE statement. These are shown in blue in the code above.

Note that two of the subordinate statements in the CASE statement are empty statements, represented by just their delimiting semicolons. For these the compiler generates no code, just an offset that branches around the CASE statement to the syllable at 0245.


So which is better, SWITCH or CASE? I have forgotten what Rose and I concluded in 1970, but looking at it afresh, my answer now is that it depends on what you are trying to do and how you are using the multi-way branch. Both constructs are very efficient in certain cases.

SWITCH must store the index value in PRT cell 21 octal and then fetch it three times. If the SWITCH is referenced more than once in a program, it must be invoked in a subroutine call, which is not an inexpensive operation. Those issues aside, the actual dispatch to a location based on the index value is very efficient. You can also do things with a SWITCH that you cannot with a CASE statement, including nesting designational expressions that have expressions for indexes, all of which will be evaluated dynamically at the time the SWITCH is invoked.

CASE is more efficient in the way that it computes the ultimate branch location, but there is significant overhead the first time it is used to handle the P-bit interrupt for its array of offsets and bring it into memory. That overhead can be repeated later, possibly multiple times, if the array is pushed out of memory. CASE is an in-line construct, so it is not subject, lexically, to multiple references. Of course, code segments can be pushed out, too, so the code for the SWITCH is not entirely immune to overlay by memory allocation pressure.

Using an out-of-bounds index with a SWITCH results in no branch occurring at all. This can be considered a feature or a bug, depending on your point of view. Because CASE obtains its branch offset by indexing an array, an out-of-bounds index causes an Invalid Index interrupt, which unless trapped, will abort the program. The programmer may need to insert bounds tests before the CASE statement to protect against aborts. Eventually, a numbered CASE statement with provision for a default case was implemented in Algol for the B6700/7700, but that does not appear ever to have been (officially) implemented for B5500 Algol.

I matured as a programmer during the post-Dijkstra, Go-To-Considered-Harmful era, so my personal preference would be to ignore the relatively minor performance-difference issues, use the CASE statement, and eschew the SWITCH, labels, and go-to statements altogether.

That completes the analysis of CASE vs. SWITCH. Tune in next week for Part 2, where I will deconstruct my less-than-stellar 1970 programming abilities, show what went wrong in the execution of the program, and demonstrate how to fix it.


In addition to the original listing, cited earlier in this post, you may be interested in the following files and documents, generated from the retro-B5500 emulator. Note that the listings from 1970 were produced by a B5500 running the Mark X system software release, probably with some local-site patches. The emulator is running the base Mark XIII software release from late 1971, so you should expect to see some slight differences in the output.
    The original card deck, as displayed above.
  • CASESW-PAULROSE-20131106-OUTPUT.txt --
    Printer output from the emulator for that original card deck, with a compile listing showing the generated code, incorrect PRT dump, and Invalid Index abort. The emulator had exactly the same problems with my program that a real B5500 did.

Sunday, January 19, 2014

Emulator Version 0.19 Released

Nigel and I are pleased to announce that version 0.19 of the retro-B5500 emulator was released on 10 January. All changes have been posted to the Subversion repository for our Google Code project. The hosting site has also been updated with this release, and for those of you running your own web server, a zip file of the release can be downloaded from Google Drive.

This release supports the ability to write to magnetic tape and to persist tape image data created by the emulator.

Background on Magnetic Tape Images

The emulator has had an initial magnetic tape implementation for the past few months. That initial implementation supported only reading tape images, and only from images in the ".bcd" format. This release expands on that initial one by supporting additional tape image formats and the ability to write to a tape image.

Tape images exist in the form of files in your local file system. The initial implementation allowed you to select an image file, which the emulator would load into memory and then parse into blocks in response to read commands from the I/O Unit. The ".bcd" format was originally supported because that was the one used for Sid McHarg's Mark XIII Burroughs release tapes.

Since the retro-B5500 emulator runs in a web browser, we must live with some limitations on the way that the browser is allowed to interact with your local file system. Modern browsers can read just about any file the user can select from a "file picker" form control, but at present there is not a useful way for a browser to write files under program control to the local file system. This restriction exists for very good security reasons -- without it, hackers would be able to play havoc with your local files -- but it makes getting data out of the browser environment difficult.

There are File System and FileWriter APIs that have been proposed as part of the HTML5 standards initiative, but they have two problems. First, only the Google Chrome and Opera browsers presently support them. Second, the files they create are "sand-boxed," i.e., the files exist within a virtual file system that is managed by the browser. Different browsers may format and manage this virtual file system differently. Worse, the files are generally hidden deep within the user profile directory used by the browser, are not named in obvious ways, and are difficult to find and use outside of the browser environment.

The only reasonably practical way we have found to output data from a browser is to format the data as text within a window and allow the user either to save the contents of that window to their local file system, or to copy/paste the contents into another program, such as a text editor, and save it from there. This process is neither automatic, nor very convenient for the user, but it is safe, and it works with all browsers.

Therefore, this new implementation for magnetic tape supports reading multiple tape image formats, but will output tape image data in only one format, ASCII text, as described in more detail below.

Loading Tape Images

Within the emulator, all tape image files are loaded in their entirety into a memory buffer. All tape reads and writes operate on that memory buffer. Depending on the block size used, a 2400-foot reel of 7-track tape could hold between about two and 22 million characters. It would probably be appalling to the original designers of the B5000 and B5500 that we would implement tape devices by sucking all of the data into "core" memory and operating on it exclusively there, but that turns out to be the easiest and most effective way to do it.

To support multiple tape image formats, the process of loading an image into a tape drive has changed slightly from the original implementation. The file picker control on the tape drive window has been replaced by a text box that will display the name of the image file after it is loaded. When you click the LOAD button on the tape drive window, a small dialog window now opens instead of activating the file picker as before. This window contains the file picker, a Format drop-down list to select the image format, a Write Ring check box to indicate whether the image is writable, and a Tape Length drop-down list to select the length of the tape reel:

To load an existing tape image, simply select it using the file picker. The emulator will attempt to infer the image format from the file name extension, as follows:
  • ".bcd" implies a ".bcd" tape image.
  • ".tap" is reserved for supporting the "tap" (or "taput") format in the future. See http://simh.trailing-edge.com/docs/simh_magtape.pdf for a specification of this format. It is not implemented at present.
  • All other extensions presently imply the "ASCII Odd Parity" format described below under Unloading Tape Images.
You can override the inferred tape format by making a selection from the Format drop-down list. Regardless of the format of a tape image you load into the emulator, it converts the image to the ".bcd" format and uses that format internally. The options presently available are:
  • (blank tape) -- This will load a blank tape image into the drive. Any file you previously selected with the file picker will be ignored.
  • .bcd image -- This will interpret the bytes in the selected file as a ".bcd" image and load it into the drive without conversion.
  • ASCII Even Parity -- This will interpret the bytes in the selected file as ASCII characters and convert them to an internal ".bcd" image with even-parity BCL codes.
  • ASCII Odd Parity -- This will interpret the bytes in the selected file as ASCII characters and convert them to an internal ".bcd" image with odd-parity B5500 Internal Code (BIC) codes.
Next to the Format list is a check box labeled Write Ring. If this box is checked, the internal tape image will be writable. The tape image on disk will not be affected by writing unless you later save the image over the original file when the tape is unloaded. If you merely want to read the tape, leave this box unchecked.

The last control is another drop-down list labeled Tape Length. It is relevant only when Write Ring is checked and is disabled otherwise. This specifies the length of tape on the virtual "reel" loaded into the drive. It defaults to 2400 feet, but lengths of 1200 and 600 feet are also selectable. The primary purpose of the length is to allow the drive to report it has passed the reflective EOT (end of tape) marker, which will signal the MCP that it is near the near to the end of the tape and that it should switch to a new reel. When Write Ring is not checked, the length of the tape is inferred from the size of the tape image data.

To load a blank tape into the drive, do not select a file using the file picker. Select a format of "(blank tape)," which is the default. The Write Ring box will be checked automatically and the tape length will default to 2400 feet. You may select a different tape length, if you wish. Any image file you previously selected with the file picker will be ignored.

If the Format list has a value other than "(blank tape)," you must select a file before clicking the OK button. When a file is selected using the file picker, the Format value is inferred from the file name extension, the Write Ring box is unchecked, and the Tape Length list is disabled. You may then alter the values of these controls before proceeding. The tape image is not loaded into memory until you click the OK button.

Unloading Tape Images

When a tape image is loaded with the Write Ring box unchecked, the image will be read-only, and writing to it will be inhibited. Later, when you click the UNLOAD button, that image data will simply be discarded by the drive, as it is still in the file from which it was loaded.

When the Write Ring box for an image is checked, the image will be writable, whether it was loaded as a blank tape or from an existing image file. Later, when you click the UNLOAD button, two things can happen:
  1. If the tape was not actually written to, the emulator will simply discard the image data, as for the case of a read-only image above.
  2. If the tape was written to, the emulator will ask whether you want to save the altered image data.
    • If you cancel out of this dialog, the image data will be discarded.
    • If you reply OK, however, the emulator will then open a new window and format the contents of the tape image as ASCII text. This may take several seconds for a full, 2400-foot reel of tape.
    • After the tape image text is formatted in the window, you may save it or copy/paste the data into another program. If you paste the text into another program, make sure that program will not trim trailing spaces from lines or compress multiple spaces using tab characters.
    • When you are finished with the image data, simply close the window.
 The ASCII tape image format is very simple:
  • The six-bit BCL or BIC character codes are translated to ASCII using the emulator's standard convention for the B5500 characters that do not have ASCII equivalents ("~" for left-arrow, "|" for the multiplication sign, "!" for not-equal, "{" for less-than-or-equal, and "}" for greater-than-or-equal).
  • Each tape block becomes one line of ASCII text. Lines are delimited by line-feed or carriage-return/line-feed characters, depending on the convention used by your browser.
  • An end-of-file (tape mark) is represented by a block containing a single "}" character.
  • All other blocks should be at least seven characters long, although the emulator does not enforce this.
Both even- and odd-parity tape image data converts to the same ASCII representation. The parity data is lost during this conversion.

An ASCII tape image may be loaded into a tape drive later as either "ASCII Even Parity" or "ASCII Odd Parity" format. These two formats will apply the appropriate parity to the data when it is converted to the ".bcd" format used by the drive internally.

Because the ASCII tape image format is just text, you can create tape images outside the emulator, either programmatically, or manually using an ordinary text editor.

See the Using the Magnetic Tape Drive wiki for more details on tape drive operation and the format of currently-supported tape image files.

Monday, December 30, 2013

Emulator Version 0.18 Released

Nigel and I are pleased to announce that version 0.18 of the retro-B5500 emulator has been released. All changes have been posted to the Subversion repository for our Google Code project. The hosting site has also been updated with this release, and for those of you running your own web server, a zip file of the release can be downloaded from Google Drive.

This is a bug-fix release. The only significant new feature is the ability to dump the system processor and memory state to a window in the B5500Console and B5500SyllableDebugger interfaces, as described below.

Processor Bugs Fixed

The following corrections were made in B5500Processor:
  1. The TRN (Transfer Numerics) syllable was not clearing zone bits in the destination character string as it should have. In addition, it was not always setting the True-False Flip Flop (TFFF, also known as MSFF) properly.
  2. The TRW (Transfer Words) syllable was incorrectly throwing an Invalid Address interrupt when the source string ended at an address which did not have a valid next address (e.g., the highest memory address, @77777).
  3. A call on cc.fieldIsolate() in the ISO (Field Isolate) syllable had an extraneous parameter. Thanks to Peter Grootswagers for identifying this.

More Corrections for Tape Drive Oscillation

We thought we had all of the issues with tape drives oscillating between rewind and ready states when tape was at BOT resolved in release 0.17, but alas, that proved not to be the case. Part of the problem was that the I/O Unit did not have tape writes implemented yet, but the rewind and interrogate functions are degenerate forms of tape write. As a result, result descriptors for Mod III I/O Units were not being properly constructed in some cases.

We also discovered, under Sid McHarg's advice, that the MCP is capable of detecting Mod III I/O Units automatically, but the problems mentioned above with the way that result descriptors were being constructed prevented that. With these fixes in place, that automatic detection now works properly, and the "SO USE OPTN 2" command described in the blog post for the 0.17 release is no longer necessary.

Dumping the System State

We have not had a good way to look at the complete system state up to now. The SyllableDebugger script provides some means to do this, but it is awkward to use when having to look at many widely-dispersed memory locations, and it's of no use at all when running the standard B5500Console user interface.

With this release, you can now dump the complete system state. In the SyllableDebugger, there is a Dump button near the top of the page. In the B5500Console interface, you can click the MEMORY CHECK indicator. Both methods will open a separate window and format both the processor state and the contents of core memory in that window. You can examine the dump in the window, save the window's contents, or copy/paste the contents into another application (such as a text editor) for further analysis. When you are finished with the dump, simply close its window.

These dumps can be requested any time the emulator is in a powered-on state. You can generate multiple dumps and have multiple dump windows open at a time. The emulator will pause while the dump is taking place and then resume operation. Thus, you can easily take a snapshot of the system state while the MCP is running, and then continue.

Further Work

We are continuing to see occasional flag-bit and invalid-address errors when running the MCP under heavy load. At present, we are working under the assumption that these are due to emulator bugs, and are pursuing them as a priority. The system state-dump feature mentioned above was implemented to facilitate this.

The next major feature scheduled for implementation is writing to tape, along with providing a means to persist tape volume data outside the emulator environment. After that we are considering implementing a more dynamic and user-friendly mechanism for controlling the system configuration, and implementing (finally) the double-precision numeric syllables.

Monday, December 16, 2013

Emulator Version 0.17 Released

Nigel and I are pleased to announce that version 0.17 of the retro-B5500 emulator has been released. All changes have been posted to the Subversion repository for our Google Code project. The hosting site has also been updated with this release, and for those of you running your own web server, a zip file of the release can be downloaded from Google Drive.

This release contains one really big thing and a number of smaller things.

P2 Lives!

The really big thing is that the second processor, P2, is now functional. Getting this to work is especially gratifying to me, as it is an important feature of the B5000/B5500 and I have wanted it to work from the beginning. It has been very frustrating trying to get this going, as there are only a few places in the processor design where it needs to know whether it's P1 or P2, and it should have just worked. It didn't.

I had made several serious runs at this problem over the past six months, but made very little progress until one morning a few weeks ago when I awoke early. Lying in bed, I started thinking about an intermittent problem we have seen with the system under load and how to trap it. In the middle of those thoughts, it suddenly came to me why P2 wasn't working. It wasn't so much something to do with the design of the emulator's Processor object as with the way we interleave the activities for all of the processors, I/O units, peripheral devices, and console display on one Javascript thread. Without going into a whole lot of details, the way that interleaving was being done interfered with the way we need to start and stop P2. Changing less than ten lines of code fixed the problem.

The processors on a B5500 are physically identical, with a manual switch in Central Control determining which one is P1 (the control processor) and which is P2. Unlike the B6500/B6700 that followed it (and most multiple-processor systems today), the processors on the B5500 do not operate symmetrically. P1 is the control processor. Only P1 can run in Control State, service interrupts, and initiate I/Os. Much of the MCP runs only on P1.

P1 can also run in Normal State, which is used for user tasks. P1 enters Normal State during an Initiate P1 (IP1) syllable, which effectively assigns the processor to a user task by automatically loading state from a few words in the task's stack. P1 reverts to Control State when any interrupt occurs, automatically storing the task's state in the stack and PRT in the form required by IP1 for reactivation.

P2, on the other hand, can run only in Normal State, and thus can be used only to run user tasks. The MCP (running in Control State in P1) assigns P2 to a user task by means of the Initiate P2 (IP2) syllable, which functions similarly to the IP1 syllable. P2 runs until it detects one of its internal interrupts or the MCP running in P1 executes a Halt P2 (HP2) syllable. When one of those events occurs, P2 stores its state in the stack and PRT as P1 does and then idles, waiting for the next IP2 to occur. P2 does not sense external interrupts (such as the timer and I/O completions) and is not directly affected by them. They are handled by P1 instead.

P1 never idles until the system is halted -- its equivalent of idling is for the MCP to continuously execute its NOTHINGTODO loop, waiting for some interrupt or change in system status to occur.

The design of the MCP is intended to assign user tasks preferentially to P2 if it is available. If there is more than one user task ready to run and P1 does not have any Control State duties at the  moment, then both processors can simultaneously run user tasks. When running a single program in the mix, we have seen the program bounce between the processors, but if there is enough processor-bound work in the mix, then P2 will stay busy. Every time P2 senses an internal interrupt or needs some MCP service (which it signals by triggering a COM interrupt), P2 stores its registers and idles. If there is another task waiting for a processor, the MCP will generally initiate P2 again right away on it.

Compared with today's sophisticated SMP system designs, the multiple-processor mechanism on the B5500 probably seems a little clunky and inefficient, and in hindsight it was. But then, even multi-tasking was rare and controversial in the early 1960s, and for the B5000/B5500 to support both that and multi-processing was so far out that Burroughs was often accused of faking it when the capability was demonstrated. It did work, though, and was quite effective if you had a sufficiently processor-bound workload and the memory to support that workload. The two main reasons P2 can be underutilized are insufficient memory (generally resulting in MCP overhead and I/O to overlay memory to and from disk) and high levels of requests for MCP services, such as I/O, both of which can handled only by P1.

You can enable second-processor functionality for the emulator by a setting in the emulator/B5500SystemConfiguration.js script. Set both PA and PB to true, and set PB1L to indicate which processor is P1. The other will be P2. At present, the second processor is disabled by default in the standard release.

Those of you using a hosted web site are not able to modify that configuration script, so we have implemented a temporary workaround to allow you to enable the second processor:

  • Clicking the NOT READY lamp on the Console display will toggle the availability of Processor B in your local copy of the system configuration. When enabled, it will run as P2.
  • When P2 is enabled in this way, the version level to the left of the "retro-B5500" logo will be yellow. When P2 is disabled, the logo will be white as usual.
  • The availability of P2 is evaluated only when the POWER ON button on the Console is clicked, so it is best to change the status of P2 when the emulator is in a powered-off state.
When the system is running and P2 is disabled, the P2BF annunciator (P2 Busy flip-flop) on the Console will be constantly lit. This lamp reflects a flip-flop in Central Control that prevents P1 from initiating P2, and generates a P2 Busy interrupt if P1 attempts to do so.

When the system is running and P2 is available, you should see P2BF turn on and off in concert with the B NORMAL lamp (assuming Processor B is configured as P2) as the MCP assigns user tasks to the processor. You may also see the HP2F (Halt Processor 2 flip-flop) annunciator flash occasionally as the MCP stops P2, usually to assign another task to it after the current task's time slice has expired.

You will need a fairly fast system to run both processors simultaneously at full speed. If the P1 Slack indicator on the emulator console is not at least 50% when running one processor, then your system may not be powerful enough. My one-year old, 3GHz, quad-core Dell Optiplex 390 running 64-bit Windows 7 Pro handles this with ease, as does a six-year old Mac Mini with an Intel Core 2 Duo processor running OS X Snow Leopard. An eight-year old Dell Optiplex GX 520 with a 2GHz Pentinum P4 running Windows XP is borderline. All of these tests were done using Firefox 24.0. Dual B5500 processors will still work on less powerful hosts, but the entire emulation will run slower than it otherwise would.

Tape Drive Oscillation at Load Point

Since the initial magnetic tape drive implementation was released in 0.15, a few people have been bedeviled by some odd behavior when a drive is made ready after loading a tape image, or after a tape is rewound by the MCP. There were some problems in the way that the driver was handling the tape image and reporting status to the MCP at load point (also known as Beginning of Tape or BOT). Those were mostly fixed in release 0.16, but a couple of people were still having significant problems. I couldn't reproduce them.

After about two weeks of collective hair-pulling, we finally determined that the problem was not the tape drive module, but an MCP configuration problem. Paul Cumberworth and Tim Sirianni were the ones still having the problem on 0.16, but they were also the ones working on cold-starting the emulator from card decks. In fact, they had that working, and had each cold-started their emulator instances from the Mark XIII cold start deck instead of the ColdLoader web script.

For some reason that we still have not run to ground, that version of the cold-start program does not properly set bit #2 (MOD3IOS) in the MCP option word. The emulator's I/O Control Units were written to behave like Mod-III I/O units, but that setting in the option word was telling the MCP that the I/O units did not have Mod-III capabilities. As a result, the MCP was handling the reading of tape labels entirely differently, and that difference was resulting in an endless cycle of the MCP reading the tape label and then rewinding the tape.

Paul Cumberworth reported this past week that the problem does not occur when cold-starting using the Mark XVI deck.

Fortunately, we found an easy workaround for systems that have been cold-started using the Mark XIII card deck. After the MCP has halt/loaded and you have entered the time of day, simply enter this command on the SPO:

That will set bit #2 in the option word, and that setting will persist across halt/loads and emulator/browser restarts. Note that you will not need to do this if you have cold-started using the ColdLoader script. Also note that the SPO TO command does not display the status of bit #2 in its output.

If anyone feels left out and wants to experience the tape oscillation problem for themselves, you can manually reset bit #2 in a similar way and see what was driving Paul and Tim batty:

Then load a tape image and make the drive ready. When you are tired of watching the tape drive misbehave (it probably won't take long), just use the first command above to set things right again.

Other Changes in this Release

  1. The Character Mode syllables FAD (Field Add) and FSU (Field Subtract) would complement the wrong operand in certain cases. This was due to the initial compare of the operands being done in an alphanumeric mode instead of a numeric mode.
  2. The Character Mode syllables TRN (Transfer Numerics), TRZ (Transfer Zones) and TBN (Transfer Blank for Non-numeric) did not work properly if the transfer spanned more than two words of memory. Words other than the first and last in the destination string were not being fetched into the B register.
  3. If you have access to the B5500SystemConfiguration.js script, you may now configure up to 16 tape drives on the system.
  4. The flip-flop latching mechanism in Central Control that is used by B5500Console for displaying annunciators was completely redone.
  5. The routine that cleared interrupts in Central Control was reworked for more efficient operation.
  6. New algorithms were implemented to compute average slack and delay in the Processor.schedule() routine.
  7. Some Character Mode syllables were optimized by substituting local variables for Processor object "this" properties.
  8. The Processor single-precision divide syllables were leaving the stack in an incorrect state after a divide by zero in Control State. This has been corrected.
  9. A few further minor tweaks to performance throttling were implemented.
  10. References to this.cc in the Processor.run() routine were optimized.
  11. A few improvements to the tape drive module were implemented to eliminate oscillation at load point and improve the timing of rewind operations. The bulk of the tape drive oscillation problem was caused by the configuration problem discussed above.
Finally, I want to remind everyone that we now have a web/email forum on Google Groups linked to this project. If you have a question or a problem, this is the first place you should go to look for answers or post a request for help. Membership is open to everyone. To join the group, go to: