Tuesday, September 4, 2012

End of the Summer Update

It has been a few months since the last post to this blog, but we're still here, working away on the B5500 emulator project, and have been quite busy. This post will summarize where we are on the project, what we've been doing, and what we see as the next steps. By way of a brief outline, here is what I'll cover:
  • Transcription of source code from scanned listings continues.
  • Work on the B5500 emulator itself has begun.
  • ESPOLXEM lives!
  • We are not alone.

Transcription of Source Code

As has been covered in earlier posts, we began this project with no machine-readable source or object code. Our only resource for B5500 system software has been scanned listings on bitsavers.org. Nigel Williams transcribed the Mark XVI Algol compiler source by hand, which has now been proofread, and a clone of it migrated to the modern Unisys MCP environment.

Over a year ago, Nigel discovered that Hans Pufal, of Angouleme, France, had similarly transcribed the ESPOL compiler source. I don't know how Nigel finds these things while sitting in Hobart, Tasmania, but he's a wizard at it. Hans generously donated his transcription to the project, and we have already put it to good use, as described in more detail below in the section on ESPOLXEM.

In late April, believing that the emulator would be fairly useless without a full MCP to operate it, I started transcribing the Mark XVI.0.187 DCMCP from 1977. "DC" refers to the "Datacom" MCP, as opposed to the "TS", or Timesharing MCP. The DCMCP provided basic batch and transactional terminal support, along with RJE. The TSMCP provided a time-slicing memory management scheme and the CANDE (Command and Edit) user interface. It was used by a number of schools and by a couple of commercial timesharing services in the late '60s and early '70s.

I found the prospect of having to key the whole MCP source from a listing to be daunting, but almost everything about this project is daunting, so I simply set those feelings aside and started pecking way. As Nigel found with the Algol source, this sort of transcription is best done in small pieces under a daily regimen. The listing of the DCMCP is about 700 pages, single-sided. I had it printed duplex, and first thing each morning, I try to key in at least one two-sided sheet of paper -- typically 80-100 lines of code. This week I passed the 10,000-line mark, which I think means I'm a little more than 25% along.

It is difficult to tell from the listing exactly how many lines are in the DCMCP source, for two reasons. First, the compile-time options specified for this compilation cause additional lines and page skips to be inserted into the listing. Most of the extra lines provide information for memory allocation of variables and for code segmentation. It prints the line number of the corresponding BEGIN when an END is encountered. The compiler also skips to the top of the next page after the end of each global procedure. With all of that, you can't easily translate page counts of program listing into line counts of source code. My guess is that the 700 pages of listing amounts to somewhere between 35,000 and 40,000 lines of code.

Second, and an issue with potentially more serious consequences, the MCP source is peppered with $OMIT conditional-compilation pragmas. These work similarly to the #if mechanism in a C compiler's preprocessor. Lines encountered by ESPOL when OMIT is true are not compiled. These pragmas are used to include or exclude optional features of the MCP, allowing a site to customize the features (and more importantly, the resulting memory utilization) for their system. There were more than a dozen optional modules that could be selected when compiling the MCP, as described in the B5500/B5700 System Operation Guide. Specifying these options and compiling your own copy of the MCP (which took a little over 20 minutes, 40 if you generated the cross-reference) is about as close to a SYSGEN as you ever got with the Burroughs systems, a characteristic that still holds today in the modern Unisys MCP.

The compiler has another option, $LISTA, that controls whether any omitted lines are listed. Alas, in the listing I am working from, that option was reset, so even after completing the transcription of the tens of thousands of lines in the listing, we still will not have the complete DCMCP source. What we have should work, but we will be missing the code for the modules that were omitted from the compilation when this listing was generated in 1977.

There is some hope for getting a better copy of the source code, however. The Computer History Museum in Mountain View, California, is in possession of a set of B5500 system tapes, reputedly a Mark XV release from 1975. These are the classic 10-inch, open-reel, 7-track tapes that were in common use during the 1960s and '70s. The tape drives for these went out of use about 30 years ago, though, so getting the data off these tapes has been a long-term challenge. Al Kossow, software curator at the museum, and saver-in-chief at bitsavers.org, has informed Nigel that the museum recently acquired a custom tape transport that they hope will be able to read the Burroughs tapes, along with others in their collection. We are hoping to hear of progress in this effort within the next few months.

Work on the B5500 Emulator

After a lot of discussion between Nigel and me, and no small amount of trudging through the Burroughs documents available on bitsavers.org, we started detailed design and coding for the emulator at the end of May.

I had initially thought we would build the emulator using something along the lines of Java or Python, but Nigel proposed that we use Javascript, and design the emulator so that it could be run from within a web browser. I was stunned by this idea, but Nigel pointed to some existing emulators that work this way. In particular, there is Fabrice Bellard's quite amazing jslinux, which emulates a basic Intel x86 processor and boots Linux!

After recovering from my shock at Nigel's proposal, both the wisdom of and challenge associated with this approach started to appeal to me. Thus we are proceeding to code the emulator in Javascript. A B5500 consisted of one or two processors, up to eight 4KiWord memory modules, up to four I/O units (DMA channels), a variety of peripheral devices, and a Central Control module to tie the whole thing together. We are concentrating initially on processor emulation, but have had to implement significant parts of Central Control as well, as it managed processor initiation, I/O initiation, interrupts, and the switching exchange between memory and the processor and I/O units.

At present, we have instruction stepping, instruction decode, and interrupt detection and vectoring implemented for the processor, along with almost half of the word-mode operators and about a third of the character-mode ones. We have addressed a combination of both simple and complex operators, and have plenty of both kinds left to be worked on. The largest areas that have not yet been addressed are the word-mode arithmetic, relational, and branching operators.

We have done nothing yet for I/O, which is going to be a challenge to address in the context of a web browser environment. There are a number of very interesting capabilities under development within the browser community, however, under the generic heading of HTML5. We will probably be taking advantage of several of those developments as they mature and move towards standardization. As it now stands, the emulator will probably require a fairly modern browser in order to operate. We are currently targeting the emulator for Mozilla Firefox and Google Chrome, with Apple Safari high on the list to follow. Microsoft's Internet Explorer can play once it decides to grow up, which it appears to be in the process of trying to do.

We also hope to be able to build the emulator so that it can be used in off-line environments, e.g., Mozilla Rhino. This means that the emulator and its user interfaces (operator's console, SPO, and maintenance displays) will need to be done in a way that different versions of the UIs can be built for different run-time environments and plugged into a common emulator engine.

We do not plan to post the code for the emulator to the project's Subversion repository until it is much further along, but hopefully we will have something to share by the end of the year. The design and implementation of the emulator is a subject that deserves a blog post on its own, which will be forthcoming.

ESPOLXEM Lives!

As we got the instruction decoding and several of the instructions implemented in the emulator, Nigel and I both built small test harnesses in an attempt to find out if any of that worked. We simply assembled some short instruction sequences by hand, had the test harness store the resulting binary words in B5500 memory, then preset the processor registers appropriately, and used Firebug or the Chrome debugger to watch what the emulator did. Within the limits of these tests, we were gratified to find that the instruction stepping, decode, and stack management were working.

Assembling B5500 instruction sequences by hand is not particularly easy, and translating the resulting octal codes to hexadecimal Javascript constants and then plunking them into absolute memory addresses is even less so. After a little bit of this, Nigel expressed a wish for an assembler to make the job easier. I balked, as not having an assembler for these systems has been a matter of Burroughsian pride for 50 years. Then I realized that we had something much better than an assembler: ESPOL.

ESPOL (Executive System Problem-Oriented Language) is a derivative of Burroughs B5500 Extended Algol. It was developed for the B5500 Disk MCP project after experience using a pseudo-assembler, named OSIL (Operating Systems Implementation Language), for the B5000 Drum MCP. OSIL generated B5000 object code, but you programmed in it as if the target had a three-address instruction set. OSIL did not run on the B5000 or B5500. It ran, reputedly slowly, as a cross-assembler on the Burroughs 220, an earlier vacuum tube machine.

As is discussed in the 1985 Burroughs B5000 Conference oral history, the B5500 team created ESPOL by starting with their then-current Algol compiler and ripping out the I/O constructs, automated run-time storage management, diagnostic facilities, intrinsic functions (SQRT, SIN, etc.), and the ability to nest blocks (although the ability to nest compound statements remained). In other words, they took out of the language all of the stuff you need an operating system to support. They then added a few things they thought they would need in order to get close enough to the hardware to generate an operating system, and some that they probably thought would just be cool to have. Principally, these were:
  1. An octal literal, written as @nnn, where nnn is a string of up to 16 octal digits. As a consequence of this feature, ESPOL did not support floating-point literals in exponential (scientific notation) format.
  2. An intrinsic array, MEMORY, that could access physical memory by absolute word address. As a shorthand, the compiler also defined an alias to this, M. MEMORY was a reserved word, but M could be redeclared by the program at deeper nesting levels.
  3. A POLISH expression construct with which one could code arbitrary instruction syllable sequences. This was somewhat like an in-line assembler, but more powerful, and integrated well with the surrounding higher-level language. For example, the Algol expression A:=B+C could be coded exactly that way in ESPOL, or it could be coded in "Polish notation" as POLISH(B, C, +, .A, :=), or equivalently, as POLISH(B, C, ADD, .A, STD), with ADD and STD being the mnemonics for the stack-based, zero-address add and store-destructive syllables, respectively. The ".A" notation caused the stack-relative address of A to be pushed into the stack for use by the store instruction. As with MEMORY, POLISH had a shorthand alias P that could be redeclared at deeper nesting levels.
  4. An in-line STREAM construct that, like STREAM procedures in Algol, provided access the hardware character-mode instructions, but without the necessity of defining a procedure out of line and then calling it.
  5. SAVE procedures, all of the generated code for which was collected into one segment at the beginning of the object code file. The idea was that this segment would be loaded by the bootstrap routine, and would contain the initialization and low-level management functions of the operating system, with which the other segments of the operating system could be loaded as required.
  6. Extensions to the way that statement labels could be defined. If L is a label identifier, then:
    1. L: had the same meaning as in Algol.
    2. L:: was similar, but caused the code for the resulting statement to be word-aligned, which was required for some forms of branching. The compiler would emit NOP (no-op) syllables as necessary to achieve the word alignment.
    3. L:<number>: caused the statement to begin at the absolute word address specified by <number>. This was needed to define hardware interrupt vector locations.
    4. L:*: specified that the statement should start at the first executable location in memory following the interrupt vector locations and PRT (Program Reference Table) The compiler determined this address from the global declarations present in the program.
    5. L::: was used to precede not an executable statement, but a list of literal constants. This was used to generate data pools for C-relative addressing, the C register being the word portion of the current instruction address. This form of label could be used only in a POLISH expression, and not in a designational expression, e.g., as the object of a GO TO statement.
  7. Type equivalence between arithmetic and Boolean expressions. The Boolean hardware instructions for AND, OR, etc., operated bitwise on whole words. In ESPOL these operators were considered to be arithmetic. The truth of a Boolean expression was determined by the hardware using only the low-order bit in a word, so in ESPOL, TRUE was equivalent to an odd value and FALSE was equivalent to an even one.
  8. A few additional notations to better control the generation and use of addresses:
    1. NAME declarations. These created variables that could hold memory references to other variables.
    2. An expression in square brackets resulted in the enclosed expression being evaluated for a descriptor call (DESC) rather than an operand call (OPDC) instruction. This was frequently used with indexed array elements, e.g., [HEADER[6]] effectively evaluated to a pointer to HEADER[6], not the value of HEADER[6].
    3. Prefixing a name expression with an asterisk caused the expression to be dereferenced with the LOD (load) operator. My favorite use of this is A[X]:=(*P(DUP))+1;. This construct proceeds as follows:
      1. The left-hand side of the assignment is evaluated as a name expression (a form of address), and results in a descriptor to the location of A[X] being left on the stack.
      2. The POLISH expression P(DUP) is evaluated. This simply generates the DUP syllable, which duplicates the value on the top of stack. Now we have two copies of the descriptor on the top of stack.
      3. The asterisk preceding the POLISH expression generates the LOD syllable, which causes the top descriptor in the stack to be dereferenced and replaced by the value to which it points. Now we have the value of A[X] on the top of stack, with the original descriptor word still immediately below it.
      4. The +1 generates a syllable to push a literal 1 onto the stack, followed by the ADD syllable. Now we have the value A[X]+1 on the top of stack, with the address of A[X] still below that.
      5. The := operator generates an XCH syllable to exchange the two top-of-stack words (store requires the destination address to be on top), and a STD (store destructive) syllable. "Destructive" indicates that both the address and value words are popped from the stack. There are also non-destructive stores that pop the address but leave the value at top of stack.
      In C, you would have written the equivalent statement as A[X]+=1, or perhaps ++A[X]. In Algol for the modern MCP you can get the same effect using A[X]:=*+1, where "*" in this context means that the left-hand side of the assignment is to be duplicated, dereferenced, and then used as the initial primary in the right-hand side expression, just as the more verbose ESPOL POLISH notation did.
ESPOL was obviously intended for use by people who knew how the hardware worked and were accustomed to programming at a level close to that of the hardware. All of the language extensions in ESPOL were subject to abuse, but the POLISH construct seems to have generated the most examples of egregious use. Since you could use this construct to directly manipulate the stack, ESPOL programmers did just that, frequently leaving intermediate values on the stack, and popping them into expressions later on. This was undoubtedly efficient, but it is often very difficult to follow. A few small parts of the MCP almost look like they were written in FORTH.

With that brief introduction to ESPOL, we can now return to the subjects of emulator test cases and the development of ESPOLXEM. Like the Algol compiler from which it was originally derived, ESPOL is written in B5500 Extended Algol. As described in earlier posts, I had been able to migrate the B5500 Algol compiler to the modern Unisys MCP environment, where it runs as a cross-compiler to generate B5500 object code files. Or at least that's what we hope it does, because we don't yet have any way to prove it, other than static inspection of the resulting code. We call this cross-compiler ALGOLXEM, with the XEM standing for "cross-compiled on E-mode," and E-mode being the Engineering name for the modern MCP architecture.

What we needed was the same type of migration applied to the ESPOL compiler source to create an ESPOLXEM that will run under the modern MCP. Over the past several weeks I have largely set aside work on the emulator to do just that.

We had Hans Pufal's ESPOL source, but it had not been completely proofread. We had learned from working with the Algol compiler source that proofreading needs to be done by someone other than the transcriber, anyway, so I set about to do that. Looking for a way to make the job easier and more reliable -- proofreading is even duller than transcribing -- and knowing that ESPOL was originally cloned from Algol, I decided to compare the two sources, hoping that there would be large chunks of the code that would be the same, and that for those I could do a much more mechanical comparison.

To compare the two source files, I used the excellent Compare It! utility from Grig Software. Indeed, there were large blocks of code that were identical between the two compilers, or at least similar enough that I could easily spot typos and other problems. There were also large blocks of code that were completely different, and some that were somewhat similar, but clearly had been modified or rewritten since the two compilers had diverged. In all, about half of the Algol compiler's code is not in the ESPOL compiler, and of the remainder, almost two-thirds was the same, or nearly so. It's somewhat amazing that this comparison approach worked as well as it did, as ESPOL was cloned from Algol probably sometime in 1963, and I was comparing the sources of two slightly different release levels from 13 years after that.

What is also interesting is that, where I found differences between the two sources that were due to errors in transcription, about a quarter of the problems were in the Algol compiler source. Most of these were in benign areas, such as comments or sequence numbers, but I did find a couple of subtle errors in the Algol source that would have been extremely difficult to spot otherwise. Corrections to these typos have been posted back to both Algol and ALGOLXEM in the project code repository.

With the proofreading pass completed, the next step was to apply the same type of migration technique I had used to generate ALGOLXEM from the Algol compiler source. I ran the ESPOL source through the pretty-printing program that I used in preparing ALGOLXEM, then manually reversed the bit numbering conventions used in partial-word designators (in B5500 Algol, the high-order bit of a word is 0 and the low-order bit is 47, but in modern Algol those are reversed).

I used Compare It! again to help identify areas in the ESPOLXEM source that needed to be addressed. I was able to bring across all of the stream procedure emulation code I had written for ALGOLXEM without change, and quite a bit of the converted code that replaced the stream procedures themselves. There were a few significant differences in the way the compilers worked, particularly in the way that the generated code was buffered and output, but these were relatively easy to deal with.

I  don't know if the approach of comparing the ESPOL compiler source to our Algol compiler source saved any time, but it did appear to result in a more reliable migration. The time from completing the proofreading pass to the first completely successful compile of an ESPOL program was less than two weeks. At this point we have successfully compiled two programs, a small bootstrap program, KERNAL, that I transcribed from a bitsavers listing, and the significantly larger system cold/cool-start loader program, COOL, which Nigel kindly transcribed.

The final step in this process, which I am just beginning, is to instrument the emulator with the ability to load an object code file produced by the ESPOLXEM compiler, store the contents of that file appropriately in memory, initialize the processor registers, and initiate execution of the program. Once that is in place, we can start writing test cases to exercise the emulator.

We Are Not Alone

When we started this project, Nigel and I assumed that we would be the only ones on the planet foolish enough to attempt such a thing, and among only a few who would be at all interested in it. It therefore came as quite a surprise to be contacted this summer by not one, but two other people who are working on B5500 emulators. Sid McHarg in Seattle, Washington, started out simply trying to emulate the lights on the B5500 operator console, and when that proved insufficiently satisfying, decided to try emulating the whole thing. He started perhaps a year ago, and is now quite far along, with significant pieces of the system running. Sid is building his emulator using C++ on an Apple Macintosh.

A few months ago, Terry Heidelberg found our project on the web and contacted me. After some email exchanges, Terry decided to try his hand at building an emulator himself. He is proceeding with Python. We have given him access to our ALGOLXEM and ESPOLXEM cross compilers, and I believe he has managed to get some test cases running, but at last word was still working on significant parts of the instruction set.

The 50th anniversary of first customer shipment for the Burroughs B5000 will occur in early 2013. The 50th anniversary for the B5500 will be in early 2015. It is interesting that after all that time, there is so much interest in reviving this radically innovative architecture.

At present, we are staying in contact with these other projects and sharing questions and information, but not emulator code or design details. Speaking on Nigel's and my behalf, we don't want to know too much about how the other guys are going about their efforts, thinking it will be much healthier if we all proceed in our independent ways to get something working. Once we get to that point, it will be very interesting to be able to peek under each others' hoods and see how we each approached the emulator design differently. Meanwhile, we have a lot of work left to do.