Tuesday, October 22, 2013

On-going Progress with the Emulator

Another four months has come and gone since the last blog post for this project. There is much to report. We had made a lot of progress through July, then I was pulled away by other commitments for most of August and September, and have just reengaged with the project over the past few weekends.

In the work leading up to my two-month hiatus, the emulator became substantially more stable and more capable on almost a weekly basis. We are now at Release 0.14, which was pushed out in early October. There are still problems, and more features to implement, but the emulator is now in a very usable state.

The last blog post ("It's Alive...," 3 June 2013) seemed to strike a chord. Several people are now using the emulator, others have contacted us with comments, and we have received offers of additional B5500 material. There is more on this subject below.

Significant Changes and Improvements

Shortly before the last blog post, we resolved a very nasty problem with the so-called "R+7" aspect of subroutine stack linkage. That one fix made the emulator about an order of magnitude more stable than it had been prior to that point. It enabled us to begin using the system as you would a real B5500 under the control of its MCP operating system. Since then, the following major enhancements and fixes have been implemented:

Card Reader
Initially, all that we had were the SPO and Head-per-Track disk peripheral units. This made it impossible to run anything but programs we could load from the Mark XIII system tape images using the ColdLoader utility. A high priority was to implement card input. The current driver emulates the Burroughs B129 1400-card/minute reader. Card decks are ordinary ASCII text files. You load one or more files into the reader using a standard file picker dialog, then press the reader's START button. The MCP senses the reader's change in status, and starts reading cards, just as it worked on a real B5500.
"Dummy" Line Printer
Nigel started working on the implementation for a line printer peripheral unit, and ran into problems getting a prototype to work. It turned out the problems were mine in the way that printer I/Os were being initiated and terminated in the IOUnit and CentralControl modules. In the process of fixing those, I literally threw together a very basic diagnostic printer driver out of pieces of the SPO and card reader implementations. After getting my IOUnit problems fixed, I took out the diagnostic stuff, and well, we're still using that. It works fine for simple output, but at some point will need to be replaced by something with a better user interface and more complete functionality.
Card Punch
After getting the card reader and preliminary line printer units to work, it was a straightforward task to clone a card punch peripheral driver out of those. Besides, I was beginning to work on the Card-Load-Select mode of loading the system, and needed a way to output card decks for programs like the COOL and COLD loaders.
Improved Console Display
The B5500 had a very minimalist operator console -- just a few buttons and lights. There were lots more lights on the maintenance panels in the Distribution and Display unit, but those were usually hidden from view behind the "skins" of the mainframe cabinets. With the emulator, though, it was often difficult to see from the console what was happening with the system (or whether anything was happening at all), so I have added some annunciators to the console that show the activity of the I/O Units, external interrupts, and the individual peripheral devices. These are really helpful to gauge the activity of the system, and they make a nice show, besides. The extra lights can be disabled if you are a purist and want the console to look like it did on a real B5500.
Smaller User Interface Windows
The initial design of several of the windows for the peripheral devices were just too large. They worked fine on the 23-inch monitor I typically use for development and testing, but on other systems, particularly laptops, the windows crowded each other out and made it difficult to see what was happening overall with the system. The windows for the SPO, card reader, and card punch have all been reduced in size to better accommodate smaller displays.
RTS Presence-Bit Bug
RTS is the Return from Subroutine instruction. It is typically used to exit from what Burroughs termed "accidental-entry" procedures,  but what the rest of the world refers to as "thunks." This type of subroutine is used to implement the semantics of Algol Call-by-Name parameters. It turns out that such subroutines can return data descriptors, and a requirement of the RTS instruction is to throw a Presence Bit interrupt (page fault) if the returned descriptor points to an absent memory segment.

That requirement was poorly-documented, and we weren't checking for descriptor absent status in the emulator. The result was that the emulator could use the address field of an absent descriptor as if that field contained a valid memory address. This error only showed up while running some FORTRAN programs, and it proved to be very difficult to trace the symptoms back to the cause. It took several long, frustrating days of tracing and debugging before I finally found an obscure reference to the P-bit requirement, after which the fix was obvious and simple -- as is often the case with such problems.
Floating-Point Arithmetic Bugs
One of the first things I tried once the card reader and line printer were working was an Algol number-crunching program from my student days, for which I still had listings of source and output from a B5500 run in 1970. The program does orthonomalization of vectors to compute rheological parameters for two-phase flow in a round pipe (and before you ask, no, I don't understand what that means anymore).

Getting the program to run was no problem, but the results from the emulator were not even close to those on the 1970 listing. I was getting at best one digit of agreement between the two. I have a few other programs and listings from that era, and they were showing similar problems for cases involving complex calculations. Programs doing simpler calculations showed quite good agreement, however, so that smelled like some sort of rounding or normalization problem in the emulator.

After being deviled by this for months and frustrated in a couple of attempts to find the problems, earlier this month I wrote an Algol program for the B5500 that generated a variety of numeric-word bit patterns, computed all of the combinations of those bit patterns for the add, subtract, multiply and floating-divide operators, and dumped the results in octal. I then converted that program to the modern Unisys MCP architecture (which uses the same numeric format) and generated an equivalent set of results.

Comparing the two sets of results indeed revealed a number of cases of off-by-one differences in mantissa values, and a few cases where the differences were even worse. Knowing the bit patterns that generated these differences, I was able to trace the evaluation of those specific patterns in the emulator, and found several problems with rounding and normalization. All but one of the problems were in add/subtract, which internally is the same operation with some sign manipulation. The remaining problem was due to rounding when multiplying two integers -- which by their nature should never have their product rounded.

After correcting these issues, the results from the emulator now match -- to the digit -- the results in all of my listings from 1970 that I've been able to check thus far.
Card Load Select Bug
By default, the B5500 booted from disk. A push-button switch on the operator console would cause it to load from cards, however. Loading the MCP from disk has been working for months, but attempting to load from cards would read the binary boot loader card plus the first card of the program being loaded, then hang. After previously making several runs at the problem, earlier this month I finally found the cause -- hardware load proceeds much like any other I/O, but it is initiated as a special mode of the I/O Unit. It generates a result descriptor, but not a completion interrupt.

The problem turned out to be that the emulator was not suppressing the completion interrupt. Load from disk worked, either because of the timing involved, or more likely because the MCP's KERNEL bootstrap was smart enough to ignore the extraneous interrupt. In contrast, the binary one-card loader is pretty dumb, and apparently became confused by the pending interrupt left by the hardware load mechanism. Booting the system from cards now works.
Hosting Site and Wiki
While the emulator runs entirely within a web browser, it requires a web server from which it can be loaded into the browser. Not everyone has the wherewithal to set up and operate their own web server, so we have set up a web site to support the following:
  • Host the current release of the emulator.
  • Make available the Mark XIII tape images containing the Burroughs system software. 
  • Make available releases of the emulator source code for downloading.
  • Serve as a central source for emulator utilities and information. 
You are welcome to visit and use this site at http://www.phkimpel.us/B5500/.

We have also created a number of wiki pages on the project's Google Code site (http://code.google.com/p/retro-b5500/) GitHub site describing how to set up and use the emulator and its components. There are links to these wiki pages under the main Help link on the hosting site

Browser Status and Performance

One of the goals of this project has been to have the emulator execute programs at the speed of a real B5500, or as close to that as can be practical in its browser-based environment. Throughout the development of the emulator, especially in the Processor module, we have been concerned about the emulator's potential performance, and have tried the keep the coding as lean as possible. With the emulator becoming reasonably stable and the ability to compile and run various programs through the card reader, we are starting to get a feel for its performance.

Based on timing statistics in some of my listings from the 1970s, the emulator initially appeared to be running about 50% slower than a real B5500. I do most of my development and testing on a relatively new Dell Optiplex 390 with a quad-core, 3.3GHz Intel Pentium i3-2120 processor running 64-bit Windows 7. Monitoring performance in the Windows Task Manager while the emulator was running in Mozilla Firefox showed that raw processor power was not the problem -- the emulator may have been running slower than desired, but the core1 running that Javascript thread was loafing. So why was the emulator appearing to run slow?

The problem has turned out to be two-fold. First, the emulator attempts to throttle its performance by estimating for each instruction the number of 1MHz clock cycles that instruction should take to execute. It accumulates those cycle counts and periodically compares the number of microsecond cycles it has accumulated against the amount of real time that has elapsed. If the number of microseconds accumulated is greater than the elapsed time, then the emulator is running too fast, so it pauses using the Javascript setTimeout() function to allow real time to catch up to emulated time. We were obviously accumulating more clock cycles than we should have, especially in the number of cycles that memory access consumed. Simply lowering the number of clock cycles accumulated for each read or write memory access brought the emulator to within about 15% of the timing statistics from 1970.

Second, we found that the Javascript setTimeout() and time-of-day facility (using new Date().getTime()) were not nearly as granular as we were counting on. Some research revealed that (as of June of this year) most browsers on Windows had a timing resolution of about 15 milliseconds, and that the emerging HTML5 DOM standards call for a minimum resolution of 4ms. The emulator was requesting delays below 4ms, with the result that when the throttling mechanism tried to delay, say, 3ms, the real delay might be 15ms, resulting in the appearance that the emulator was running slow. It was actually running amazingly fast, but throttling too much.

The throttling mechanism has another important role besides regulating the speed of the Processor module. Everything in the emulator runs synchronously on one Javascript thread -- the Processor, the I/O Units, the peripheral devices, the interval timer in Central Control, even the second Processor, once we get that working. If the Processor does not yield control of the thread periodically, I/Os will not be initiated and completed, external interrupts will not be serviced, and in general the system just won't work. Thus, while one approach to dealing with the Javascript timing granularity would be to increase the amount of time the Processor runs before throttling, hogging the thread has a negative impact on the system's ability to do I/O and service interrupts.

We could see this in the difference of the emulator's performance when running in Google Chrome compared to Mozilla Firefox. Earlier this summer, Chrome was ahead in implementing the 4ms HTML5 DOM standard for setTimeout(), and the effective speed was much closer to the B5500 than with Firefox. Apparently Firefox made a change to its timer granularity in version 22, and the emulator performance is now better with Firefox that it is with Chrome.

Resolving this problem in the emulator has turned out to be an adventure. The main improvement has been to change the throttling mechanism so that it could better tolerate long setTimeout() delays without interfering with access to the Javascript thread by the other components. What we needed was a way to yield control of the thread without introducing any additional delay -- if other components were scheduled to run on the thread, they would, but control could return to the Processor as soon as everyone else had their turn.

There is a proposed setImmediate() function for Javascript that does just that, but only Microsoft Internet Explorer implements it, and it does not appear that this proposal will become a standard. Some freely-available "shims" have been written to implement the behavior of setImmediate() using existing DOM features, however, and we found one by Dominic Denicola (https://github.com/NobleJS/setImmediate) that works quite well. The throttling mechanism now computes a delay time and compares it to some threshold value (currently 4ms). If the delay is greater than the threshold, it uses setTimeout(); otherwise it uses the pseudo setImmediate() implementation.

In that latter case, the Processor will resume sooner than it should (and not actually throttle the performance very much), but the throttling mechanism does its computations based on total cycles accumulated vs. total elapsed time, so at the end of the next throttling cycle, it will typically compute an even larger delay. Eventually the delay will grow to the point that it exceeds the threshold, and setTimeout() will be called to do some real throttling. This approach generates jitter in the execution of the Processor, but the delays and non-delays average out, and it happens fast enough (15ms is approximately the refresh rate on most monitors), that the jitter usually is not noticeable.

With this new approach to throttling in place, the emulator is still running a little slower than a real B5500, but only by 7-8%. Further improvements in apparent performance will probably require detailed tuning of clock accumulation in the individual instructions. That can wait. In any case, B5500 instruction timings are difficult to model, because the Processor overlapped execution and memory access whenever it could, and the crossbar memory access mechanism in Central Control could generate random delays due to conflicting access to a memory module by multiple Processors and I/O Units.

Another outstanding problem in performance tuning is that I/O times appear to be four to eight times longer in the emulator that my listings from 1970 indicate they should be. I/O time is essentially channel time -- the MCP records the time when an I/O request when is initiated against an I/O unit, and again when the I/O complete interrupt is serviced. The difference between the two times is accumulated as I/O time for the requesting job. I suspect that this may be another problem with setTimeout() granularity, or possibly the IndexedDB mechanism we are using for disk I/O is just plain slow. [Correction as of 2013-10-23] Oops -- I am completely wrong about this. The program I have been using in timing tests outputs results at several points during its execution. I had thought that the times were differential between the output cases, but it turns out they are cumulative. When I compute the differences between the cases, the emulator's I/O times are actually lower than those on the 1970 listing by almost half. The exception is the first output case, where the I/O time is substantially higher. That might be due to the overhead of the AUTOPRINT spooler printing the output of the program's compilation during the time that first case is running.

Performance of the emulator also depends somewhat on the underlying platform. Firefox and Chrome remain the two browsers we have found that support the features the emulator needs to run. Apple Safari through 6.0 does not yet support IndexedDB, although the emulator should work in Firefox on a Mac. It does not work on Microsoft Internet Explorer through IE10. We have not yet tried Opera.

I have tried the emulator on a variety of Windows systems using Firefox, and it runs everywhere I have tried. It runs well, if slightly slower than on my quad-core Optiplex 390, on a five-year old Dell D830 with a 2 GHz Pentium Core Duo T7250 under 32-bit Windows 7, and also slightly significantly slower on an eight-year old Dell Optiplex GX520 with a 3 GHz Pentium P4 under Windows XP. [Clarifications added 2013-10-26]  It even runs acceptably on a four-year old HP Mini netbook with an Atom processor, also running under XP. The big problem with the Mini is not performance, but that the screen is too small to see much more than one of the windows at a time. On laptops, the best performance has been observed when they are on external power. The performance is noticeably slower when running on battery, or on a travel charger with a lower wattage rating than the standard charger.

Other Participants

One of the gratifying things about this project is the interest that other people have shown in it. We have been somewhat surprised at the number of people who have picked up the emulator and started using it, without much apparent difficulty, and only then let us know what they were doing. A few of those people have become more intensely involved with the project:
  • Fausto Saporito of Naples, Italy has been an early user of the emulator, and has contributed a number of FORTRAN benchmarks, including Whetstone and an arctangent program that appears to do a good job of measuring floating-point loss of significance for a processor. Fausto has also single-handedly transcribed the Mark XVI FORTRAN source from the listing on bitsavers.org. The current version is in the project's Subversion repository on Google Code.
  • Tim Sirianni of Eureka, California, US, stunned us by reporting that he had the TS (timesharing) MCP running and was using the CANDE timesharing editor, sort of. We don't have datacom working yet in the emulator, which is where the "sort of" comes in. Tim found a way to use the SPO as a CANDE terminal, but it is awkward to use, and not an approach for the less-than-determined.
  • Paul Cumberworth of Adelaide, Australia has transcribed the patches for our Mark XVI ESPOL compiler source and gotten those to compile with the base source. These are also available in our Subversion respository.
Out of the blue, Ed Vandergriff of Chaska, Minnesota, US, wrote me in late August to say that he had a listing of the APL interpreter for the B5500, created by Gary Kildall (of CP/M fame) and others at the University of Washington in the early 1970s. He asked if we were interested in it. I had not been aware of the existence of this interpreter, but Nigel had been looking for a copy of it for some time without success. Ed generously sent me the listing, which is actually a first-generation photocopy taken from a line-printer listing. We have scanned it for our use, and will eventually donate the original to a museum for long-term preservation.

A copy of the scanned listing is available on our hosting site at http://www.phkimpel.us/PickUp/APL-B5500-Listing-19710111.pdf. It is about 44MB in size.

Fausto and Hans Pufal of Angouleme, France, have volunteered to transcribe the APL listing. Hans has helped us before, having previously transcribed the Mark XVI source code we used to create our ESPOLXEM cross-compiler. Fausto is starting from one end of the scanned APL listing and Hans from the other. At last report, they had only 20 pages to go until they meet in the middle, à la the Mont Blanc tunnel. Their progress to date is available in the Subversion repository. Once their transcription is complete, it will need to be proofread and corrected before it can be used. We will also need to get datacom working in the emulator.

Current Efforts

We have some known problems and a couple of high-priority features requiring attention.
  1. A proper datacom interface will be required to run the TSMCP and CANDE, as well as the APL interpreter. I am currently working on a very basic, one-terminal implementation of the B249 Data Transmission Control Unit and B487 Data Transmission Terminal Unit. Supporting external terminals in a browser environment is extremely difficult (browsers are quite determined be be clients, not servers), so this initial implementation will simply host a single terminal as a user interface to the B249/B487, somewhat similar to the way the SPO currently works. That should be adequate for most users. We hope to have this feature available soon.
  2. After datacom, the next priority is support in the emulator for magnetic tapes. We think we know how to approach this, but detailed design work has not yet begun.
  3.  The B5500 would support two processors, but our attempts to get the second processor working have thus far been a failure. This has been especially frustrating, because the differences between P1 (the control processor, which is currently working) and P2 (which could only run Normal-State user programs under control of P1) are very minor. In fact, the two processors on a real B5500 were physically identical, and either one could be designated as P1 by means of a mechanical switch. I have made three serious runs at this problem, most recently last weekend, and come up short each time. I made some progress this last time, finding a problem in the way P2 was handled by the SFI (Store for Interrupt) instruction. With that change, P2 now runs for a few seconds before somehow failing. The problem is obviously subtle, and is proving difficult to trap, even with special code inserted into the emulator to do so. Getting P2 to work is a relatively low priority, so this problem has been set aside for now.
  4. The other major deficiency in the Processor implementation at present is that the double precision arithmetic operators have never been finished. Their single-precision equivalents are currently standing in for them. One of Fausto's benchmarks requires double precision, and the compilers require double-precision in order to properly compile double-precision literals, so the priority of this issue is rising.
We look forward to hearing from everyone who is using the emulator, or a least trying to. Please let us know why you are interested in the B5500 and what you are doing with the emulator. We are anxious to hear your comments and suggestions, and we especially want to hear about any problems or bugs that you encounter. Either comment on the blog, or contact me privately at paul (dot) kimpel (at) digm (dot) com. If you have a Google account, you can also post issues on our Google Code project GitHub project site at http://code.google.com/p/retro-b5500/issues/list.




____________
1 Has anyone else noticed that in the days of the B5500, "core" meant memory, but now it means processor?

Monday, June 3, 2013

It's Alive...

After almost six months of silence, this blog is back. You can code or you can blog, and since the last post in December, we have been coding -- and then, of course, debugging what we have been coding. The central components of the B5500 emulator are essentially finished and are starting to work. There is more to be done for I/O, we have ideas for a richer user interface and display of system state, and there are certain to be more bugs that have not yet been discovered, but in a technical sense, this project is over the hump -- the emulator is running, it has successfully halt/loaded (i.e., booted) a version of the B5500 Datacom MCP, and it is able to run a few programs natively, including the Algol compiler.

 

A Brief Background

As we have reported in previous posts, Nigel Williams and I started this project in early 2012, after having talked about it for several months previously. When we first started talking about it, I thought we would program the emulator in Java, or perhaps something like Python. Nigel stunned my by suggesting that we implement the emulator within a web browser, and do the programming in Javascript. To be honest, I initially didn't think this would be possible, but Nigel pointed to some existing browser-based emulators, and eventually won me over. The emulator is 100% Javascript and currently runs within a couple of standard web browsers.

We started with pretty good references for the hardware architecture, but no machine-readable source or object code for the Burroughs system software. Having an emulator for a system but no software to run on it is not much fun. There were, however, a number of scans of listings of B5500 source code available on the bitsavers.org web site, so it appeared that our only option was to transcribe the source code manually from those scans and then somehow figure out how to bootstrap that into the emulator environment.

Nigel had hand-transcribed the Mark XVI Extended Algol compiler source from such a listing the year before we started, and Hans Pufal in France had made a similar transcription of the Mark XVI ESPOL compiler and given it to Nigel. A little more than a year ago, I decided that I needed to do my part, so started transcribing the Mark XVI Datacom MCP source. I'm still at it.

Thinking that the only way we were going to be able to have system software for the B5500 was to transcribe the source from listings, we were going to need a way to bootstrap object code from that source. Thus, last year I took the manually-transcribed Extended Algol and ESPOL compiler sources and ported them to the Algol variant used by the modern successor to the B5500, Unisys ClearPath MCP systems. That effort was successful, producing what we term the ALGOLXEM and ESPOLXEM cross-compilers, but we were going to need more source code -- lots of it. Alas, it appeared it was going to take us a couple of years' work to prepare enough source code from the scanned listings to be able to make our emulator run.

Despite not having system software in a usable condition, and little hope for having much for a long time, we nonetheless started coding for the emulator itself exactly a year ago, over the (U.S.) Memorial Day weekend at the end of May 2012.

 

And Then the Most Amazing Thing Happened...


Sid McHarg of Seattle, Washington contacted me in early July 2012. I knew Sid slightly from the annual Unisys user conferences. It turns out that Sid is an old B5500 hand, and over the prior year had been working on a B5500 emulator of his own, which he had recently gotten working. Our surprise at finding that someone else was crazy enough to try to emulate this old machine was matched by Sid's surprise that our project existed. That was surpassed only by our surprise to learn that not only did Sid have a working emulator, he had software!

Sid started out in the same position with respect to software (or rather, the lack thereof) that we were in, but Sid had something we didn't -- a set of 7-track Burroughs Mark XIII release tapes from 1971 that had been sitting on a shelf for 40 years. Amazingly, Sid found someone who had a 7-track drive that still worked, and even more amazingly, his tapes proved to be readable after all this time. He was in possession of a complete set of machine-readable source and object files for the B5500 system software, as they existed just past the apogee of the B5500's productive life.

Sid generously started working with Unisys on a license to share this data with others. We felt obliged to keep quiet about this while Sid was negotiating with Unisys, and since we did not know when or if that software might be available to us, we kept trudging along on our original path. We were eventually able to obtain raw images of Sid's tapes in late October 2012, and have recently acquired our own license for the Mark XIII software from Unisys.

 

The Road to First Halt/Load

With usable system software now in hand, the priority in the project shifted late last year from source transcription to getting the emulator working. I started working on I/O in early December, and by January of this year had initial implementations for the Input/Output Units, Head-per-Track disk, and SPO (supervisory keyboard/printer). I also built a web-based testbed to exercise individual instructions, which eventually evolved into the B5500SyllableDebugger, a basic debugging environment that could load and run whole programs.

Using our ESPOLXEM cross-compiler, I wrote a program to exercise the Character Mode instruction syllables for the system, used the SyllableDebugger to test those instructions, and found and fixed many bugs in the process. I am not very good at writing these kinds of tests, however, so started casting about for an existing program I could use to debug more of the Word Mode side of the instruction set. I settled on the KERNEL, a small bootstrap program that was typically stored on disk. The hardware load mechanism would load and initiate KERNEL, which in turn would bring in the MCP's initialization code to finish the system boot process.

Booting the MCP is termed a "halt/load," after the two buttons on the operator's console that had to be pressed in series to accomplish the act. That term is still used in the modern Unisys MCP systems, even though the physical buttons disappeared long ago.

The KERNEL proved to be a very good vehicle for testing out a significant part of the instruction set. I compiled its source with ESPOLXEM to get an object file and a listing that included the generated machine instructions. I would then load the object file into the SyllableDebugger and single-step through the code, inspecting the processor registers and memory at each step for proper operation. Working this way I was able to identify and correct several problems in the processor portion of the emulator.

As part of the implementation of disk I/O earlier, I had written a standalone utility, B5500ColdLoader, that would initialize the disk subsystem, prepare a skeleton directory structure, and load files from the binary images of the tapes we had obtained from Sid. The standard B5500 software release had a couple of standalone programs that were loaded from cards to perform these functions -- COLD to initialize the disk subsystem and create an empty directory, and TAPEDSK to copy files from Library/Maintenance tapes (an MCP tape format used to dump and restore disk files, somewhat similar to Unix tar). We did not have the card reader working yet, though, and the HTML5 IndexedDB mechanism we were using to implement persistent storage within the browser for the B5500 disk subsystem required some initialization and setup of its own, so building a custom tool to do these initialization functions seemed like the best idea.

The ColdLoader and SyllableDebugger are just HTML files with a bunch of Javascript embedded in them, and you run these utilities from within a standard web browser, just as with the emulator itself.

Thus, by the time in mid-February I was single-stepping through KERNEL with the SyllableDebugger, we already had a disk subsystem in place, and some of the system files, including the Datacom MCP, loaded into it. Over a period of several days, I was able to progress through the instructions in KERNEL, finding and fixing problems as I went, eventually arriving at the point where the program reads sector 0 from EU (disk Electronics Unit) 0 to get the MCP bootstrap address, which had been placed there (correctly, I hoped) by the ColdLoader utility. That eventually worked, so I continued stepping into the code that actually reads the MCP initialization segment.

Hardware load operations finish by branching to memory location @20 (@ indicating an octal literal in ESPOL), so the ESPOL compiler generates object code with the assumption that their execution will start at that address. The MCP cannot be read directly into that address, however, since KERNEL (which is about 250 words in size) was loaded there and is still running in that area of memory. Therefore, KERNEL reads the MCP code into the next available 4KW memory module, generally at address @10000. Disk reads are limited to 63 sectors (1890 words) each, so the MCP initialization segment is read in three chunks into addresses @x4236, @x0474, and @x0020, in that order. Disk I/Os take their disk address from the first word of the memory buffer, thus the backwards sequence allows each read to overlay the disk address word of the prior one.

After reading the MCP initialization code into the higher memory addresses, KERNEL deposits a small in-line Character Mode routine, in raw machine code, into addresses @15-17, and branches to it. That routine simply slides 4042 words starting at address @x0040 down to address @20, overwriting the memory used by the rest of KERNEL. The routine exits back to Word Mode while executing the last syllable at address @17, thus cleverly falling into the first syllable of the MCP initialization code that now resides at address @20.

So there I was, having started out intending just to debug some instructions, but now sitting there with the first word of actual Mark XIII MCP code loaded into the emulator's processor registers, poised to execute a LITC @704 syllable that had been generated in 1971. I thought, why not? and kept right on stepping into the MCP. The date was Saturday, 2 March 2013.

Much of my spare time over the next couple of weeks was taken up with continuing to step through the MCP initialization segment, which is quite large. That consists of the INITIALIZE procedure itself, at about 570 words, plus the kernel MCP routines for allocating memory and doing disk I/O, plus DIRECTORYBUILDER, which implements the complex process of "complementing" the disk directory to determine which areas of disk are not allocated and to build the tables that manage available disk space. Not surprisingly, there were quite a few more emulator bugs resolved during this process.

On 13 March, I saw the first halt/load message to come out of the emulator. This screen shot captured the moment, with the SyllableDebugger and emulated SPO shown running in Mozilla Firefox:
retro-B5500 first halt/load message -- click to enlarge
This message comes out at the point after most memory tables have been initialized and just before initialization of the disk tables begins. The text may be difficult to make out clearly. It reads:
-H/L WITH MCP/DISK MARK XIII MODS RR@@RR7#H13!4:7#H13!4:@@-
The underlined characters are junk, and should not be there. They were due to a bug in Character Mode that did not reset the BROF flip-flop when a memory transfer ended on a word boundary. That bug was easily found and fixed. There were several more emulator bugs as I got further into the disk initialization code, but by the 18th, it appeared I had gotten all the way through it, and the SPO now produced this:
-H/L WITH MCP/DISK MARK XIII MODS RR@@RRRR-
 SYSTEM/LOG REMOVED
DKA EU0 SU 1,2,3,4,5, EU1 SU 1,2,3,4,5 WENT READY
 TIME IS 0000
 DATE IS WEDNESDAY, 6/29/10
#DT PLEASE
This was very encouraging. The date was not what it was supposed to be, but alas, the SPO would not respond to any input requests so I could change it. It turned out that the MCP was looping endlessly in its NOTHINGTODO loop, and the reason for that eventually turned out to be a bug in handling the F register during accidental entries (thunks). That was easily fixed, and on the 19th I was finally able to see a reasonable halt/load completion with the MCP responding to SPO commands. The underlined text represents my inputs to the system:

-H/L WITH MCP/DISK MARK XIII MODS RR@@RRRR-
 TIME IS 0026
 DATE IS WEDNESDAY, 6/29/10
#DT PLEASE
#TR PLEASE
DT 3/19/83
INV KBD DT 3/19/83
DT 03/19/83
INV KBD DT 03/19/83
DT 3-19-83
INV KBD DT 3-19-83
DT 031983
INV KBD DT 031983
DT 3/19/73
INV KBD DT 3/19/73
TR 2128
 TIME IS 2128
DT 3/19/83
INV KBD DT 3/19/83
PD =/=
 INT/DISK
 DUMP/ANALYZE
 PBD/SYSNOTE
 ALGOL/DISK
 LDCNTRL/DISK
 MCP/DISK
 ESPOL/DISK
 LIBMAIN/DISK
 PRNPBT/DISK
MX
 NULL MIX

CU
0:MCP/DISK= 0:SAVE=13485 OLAY=3649
TOTAL MEM IN USE= 17134

OL DKA
 DKA SCRATCH
OL SPO
 SPO SCRATCH
OL LPA
 LPA NOT READY
The problem with the DT command (which sets the system date) turned out to be another Character Mode bug, which was fixed the next day. The PD command lists the files in the disk directory (these were all put there by the ColdLoader utility). The MX command lists the tasks in the "mix" -- the set of currently-running jobs. CU prints the memory in use by all tasks in the mix. OL reports on the status of a specified peripheral unit -- DK for a disk controller, SPO for the SPO, and LP for a line printer.

I had long wanted Nigel to share in the experience of the emulator's first successful halt/load, but was feeling a little guilty about that at this point, because I had been slowly, manually, doing the first halt/load all by myself inside the SyllableDebugger. I had been keeping him posted on my progress by email, but there wasn't much else to be done about it, as Nigel is in Hobart, Tasmania, while I am in San Diego, California -- 12,840 km (almost 8,000 miles) away. At that time of the year we are 18 time zones apart, so most days he is going to bed just as I am getting up. This does not make for close collaboration.

It was time to try halt/loading the emulator on its own, though, so Nigel and I arranged to meet on Skype the following weekend. I wired up the Load button on the console UI to the load function in the emulator's CentralControl module, and on 22 March we watched together as our emulator booted the Mark XIII MCP all by itself for the first time.

Life After Halt/Load

This project has been an exercise in deferred gratification, and that first completely autonomous halt/load was certainly both deeply gratifying and a significant milestone for the project, but things were far from perfect. Some SPO commands (e.g., CD) crashed the system, and our initial attempts to run programs under the MCP were completely unsuccessful.

I was busy with work and travel during April, so spent far less time on the project that month than in the ones previously. I managed to chip away at several problems, however, and by the end of the month was able to see one program (XREF/JONES) run to a successful completion. There were more bugs in the emulator, including a very serious one where the Program Release (PRL) operator was looking at the wrong bit in the I/O Descriptor word to determine whether to cause a Program Release or a Continuity Bit interrupt. The consequence of that was that I/Os would not get initiated against empty buffers, which in turn caused programs to wait for I/O completion interrupts that would never occur. There were a couple of problems with the way that the ColdLoader utility was initializing the disk, one of which was the cause of the problem with the CD command. Then there were two problems for which the root cause was determined to be that Paul was no longer a competent B5500 operator. The solution for that was to RTFM, only more carefully this time.

XREF/JONES was an interesting program -- sort of a Swiss Army knife for B5500 source code -- and I had been very happy to see it included in the files on Sid's tape images. As its name implies, it could generate an identifier cross-reference, but could also do flowcharting and generate a block/procedure structure outline. It was a basic text formatter, along the lines of nroff, and could even extract and format documentary comments from source programs, making it a very early form of Javadoc processor.

XREF/JONES was controlled by "$" pragmas embedded in the source code, but since we did not yet have any way to input data into the system (the SPO is an operations console, not a user terminal), the program was running in its default mode of simply listing its input file, which in our initial tests was its own 14K-line source.

Once we got past the initial post-halt/load problems noted above (especially the one concerning the incompetent operator), XREF/JONES started to show very bizarre behavior. Sometimes it would finish successfully. Sometimes it would abort with a disk addressing error. Other times it would abort with an Integer Overflow or Invalid Address interrupt. The remaining times the system simply crashed, sometimes with an Invalid Address message as its last gasp, and sometimes with no indication at all.

Resolving this took three long, agonizing weeks in May. There were a number of problems, including some missing or misplaced break statements and missing "this." prefixes on method calls -- all really bad news when programming in Javascript. I also reworked the implementation for IP1 (Initiate Processor 1), which is what does all of the register setup to assign a task to a processor, as I had never been happy with the way I had coded it originally.

The biggest fly in the ointment, though, which I noticed entirely by accident while looking for something else, was an error in the way the processor traced MSCWs (Mark Stack Control Words) during procedure exit. Those who are familiar with the B5500 will recall that the MSCW stored the state of the F register, MSFF (Mark Stack Flip-Flop) and SALF (Subprogram-Level Flip-flop). During procedure exit, the processor had to follow the F-register links in the MSCWs backwards in the stack until it found the first one where MSFF was not set, in order to restore that word at address R+7 in memory.

Alas, I had written this mechanism to stop when either the MSFF or SALF bits in the word were not set, which sometimes resulted in the wrong word being restored (especially when a procedure was called from MCP global code in Control State), and consequently messing up the (R+7) relative addressing mode within the processor. That relative addressing mode is used infrequently, but the results were catastrophic and extremely difficult to trace back to the source. This wasn't actually a bug, but a misreading of how procedure exit was supposed to work, not that it made me feel any better about it once I discovered what the problem was.

With that problem resolved, the emulator suddenly started running much more reliably. Not only was I able to run XREF/JONES to a successful completion, but last weekend all of the following worked:
  • Compile a smaller Algol source file, SYMBOL/TAPCOPY, used to copy tapes. The compiled code is bit-identical to the version on Sid's tape image.
  • Compile XREF/JONES with the Algol compiler. The 14K-line source compiled in about 13 minutes elapsed time with default compile options, including generation of a listing.
  • Run the newly-compiled XREF/JONES against another small source file, SYMBOL/DSKDUMP.
  • Compile the Extended Algol compiler with itself. This source is 11K lines and compiled in about nine minutes elapsed time.
  • Compile SYMBOL/TAPCOPY with the new version of the compiler. The output of that is also bit-identical to the original object file.
This is hardly a scientific test, but it demonstrated an order of magnitude more capability and reliability than we had seen before. We are now ready to submit the emulator to more demanding tests, but in order to do that we need to implement more I/O capability.

 

Current State and Next Steps

The emulator consists of a number of Javascript objects:
  • CentralControl, which serves as the node through which all inter-component communication takes place. It has the memory exchange, allowing multiple processors and I/O Units (channels) to access memory. It also prioritizes interrupts, hosts the interval timer, initiates I/Os, and controls the system-load functions.
  • Memory Modules. These are actually implemented as a part of CentralControl. There can be up to eight 4K-word mods, for a total of 32K words on a system. We are currently running six mods, with a hole in the @2xxxx-@3xxxx address range to test the MCP's ability to configure around missing mods. The MCP handles that just fine.
  • Processor, which implements the instruction set. There can be two processors, but we have only tested the system with one thus far. The mechanisms to support a second processor have been implemented, but never tested.
  • I/O Unit, which is basically a DMA device that asynchronously manages input/output operations. There can be up to four of these on a system. We are currently running with two units, but need to test varying numbers. CentralControl is designed to adapt to the actual number automatically.
  • Peripheral controls, or drivers. With the exception of disk and datacom, these were implemented within the I/O cabinet and connected to the I/O Units through a peripheral exchange.
At present, we consider CentralControl and Memory to be functionally complete. Some additional work may be needed to support the maintenance display panels, but otherwise they are done.

The Processor is complete except for the double-precision multiply and divide operators. At present those have been stubbed out with their single-precision equivalents. It is likely some debugging may be necessary to get a second processor to work, but we won't know that until we try.

We currently have the SPO and Head-per-Track disk peripherals working. A small amount of coding needs to be done in the I/O Unit object to support new peripheral types, and a separate object must be created to support each peripheral type.

Nigel has taken on the task of implementing a line printer control, and I am ready to start work on one for the card reader. With those two in place, we will have a basic way to interact with the system in batch mode. That will be much nicer than typing control card images on the SPO, as we must do now, and will give us the ability to compile and run programs other than those available on Sid's tape images.

Next after that will be a control for tape drives. This will be a bit of a challenge. Getting data from the local file system into a web browser is easy, but getting data out of a browser into the local file system is, by design, extremely difficult. Tapes are input/output devices, and a reel of tape could be read in both forward and backward directions (of which the MCP sort intrinsic took advantage). We may need to import tape reels into a persistent disk mechanism as we currently have for the Head-per-Track disk, and find a way to export data out of that for output tape reels that need to be saved.

Datacom will also be a challenge. Not only is the datacom control complex (and from today's perspective, somewhat bizarre), but web browsers are by their nature client environments, and for datacom, the emulator needs to act as a server. Ideally we would like to implement something that current networking technology can connect to, e.g., Telnet NVT, but that may prove to be difficult. The Web Sockets protocol is worth looking into. At worst, we could probably implement some sort of network interface in a web server, and connect to that from the browser-hosted emulator environment using AJAX techniques.

The remaining peripheral types -- card punch, paper tape reader, and paper tape punch -- are much lower on the priority list. Unless we find a compelling reason, we may not implement paper tape devices at all.

We have the small operator console working, but would like to implement a full set of maintenance panels for the system. That is a lot of lights, and one concern is whether the performance will suffer from trying to refresh all of that state in real time.

Also on the horizon are ideas to implement the emulator in non-browser environments, such as Mozilla Rhino and Node.js. Those will probably offer richer I/O capabilities, but the user interfaces will need to be completely rethought. We have also discussed implementing the emulator in other languages. Nigel has expressed interest in Google's Dart, and no doubt to his dismay, I am still interested in Java.

It is too early to say much about the performance of the emulator. One of our goals has been to make it run at the speed of a real B5500. We have some rough indications that it is running slower than that, but within 50%. Once we can get our own programs compiled and running within the emulator, we will be in a better position to judge the actual performance, and to understand where some tuning and optimization may be required.

Parts of the foregoing may read as if we think the existing emulator components are fully debugged, but we know that is not the case. Based on the most recent problems we have seen, we suspect we are entering the stage where the easy bugs have been found, and the rest are going to be subtle, difficult to reproduce, difficult to trace, and otherwise downright nasty. The only way to flush those out is to push the system harder, and that is what we intend to do.

As I told Nigel, we have a couple of careers' worth of things that we can do yet with just this emulator, provided we aren't locked away first, muttering incoherently about MSCW F-register linkages, interrupt-driven T-register syllable injection, C-register update during Inhibit Fetch, and the like.