Writing an emulator: memory management

In the previous post, we got to a slow, arguably underwhelming start by implementing a very small fraction of the Game Boy’s CPU’s instruction set — one instruction. That’s about 1/500th of the whole set, if you were wondering, since it contains 244 “core” instructions111 values are unused opcodes, and 0xcb is used to prefix extended instructions. and 256 “extended” instructions — which have a two-byte opcode starting with 0xcb — for shift/rotation/bit operations. Remember, though, that I also restricted our goal to implementing only the 49 instructions needed to execute the boot ROM.

Where are we now?

I thought I’d implement CPU instructions in the order they appear in the boot ROM until the program crashed or until I reached the end of said boot ROM. As mentioned at the end of the previous article, I tackled XOR A first, then LD HL,$9fff, for which I extended our CPU structure to transparently manage HL as a single register, then LD (HL-),A, then—

panic: runtime error: index out of range

goroutine 1 [running]:
main.ldaddrr(...)
	/home/kefen/emulator/the-first-steps-extended.go:89
main.ldhlda(...)
	/home/kefen/emulator/the-first-steps-extended.go:115 +0x58
main.main()
	/home/kefen/emulator/the-first-steps-extended.go:139 +0xdd
exit status 2

Oh. Right!

Remember how I mentioned we could just keep happily implementing CPU instructions until we hit an instruction that needed to access the RAM? Well, the fourth instruction in the boot ROM does exactly that: LD (HL-),A stores the value currently contained in register A into the memory location corresponding to the address in register HL, then decrements HL by one. This is the heart of a loop that forces memory from 0x8000 to 0x9fff to be zero2Since memory values in a real Game Boy are undefined at power up.:

        XOR A                   ; $0003  Zero the memory from $8000-$9FFF (VRAM)
        LD HL,$9fff             ; $0004
Addr_0007:
        LD (HL-),A              ; $0007
        BIT 7,H                 ; $0008
        JR NZ, Addr_0007        ; $000a

The first two instructions initialize A to zero3XOR r stores into A the result of an XOR operation between register r and A, and something XORed with itself is always zero. This is actually shorter and faster than using LD A,$00 which would require twice as many bytes and machine cycles to complete. and HL to 0x9fff. Then comes a typical case where two 8-bit registers are used as a single 16-bit register: HL is initialized and decremented as a whole, but the BIT instruction only checks the seventh bit of H because we’re not really interested in the whole 16 bits, here. The ROM code’s sole concern is to exit that loop when HL is no longer greater than or equal to 0x8000, and to know that, it only needs to check that the value in H is strictly lower than 0x80. This is what is checked by BIT 7,H and JR NZ,Addr_0007: if bit 7 of H was Non Zero4Which is the result of the previous BIT instruction. Also note that bits are counted from the right and from zero so here, bit 7 is the leftmost bit in H, e.g. 0x80., then its value is necessarily 0x80 or higher, and the code jumps back to the beginning of the loop at address 0x0007.

Okay, so why is the program crashing? Well, it turns out I blindly implemented those instructions using the same memory array we came up with last time, which only contains the boot ROM code and is only 256 bytes long.

// Generic instructions used to group common code.
//
// r    single (8-bit) register (A, F, B C, D, E, H or L)
// rr   double (16-bit) register (AF, BC, DE or HL -- SP and PC are handled
//      separately because I made them uint16s and I stand by that decision)
// d8   8-bit (unsigned) parameter (1 byte) after opcode
// d16  16-bit (unsigned) parameter (2 bytes, little-endian) after opcode
// r8   8-bit (signed) parameter (1 byte) after opcode

// ...

// LD (addr),r
func ldaddrr(c *CPU, addr uint16, reg uint8) {
        c.memory[addr] = reg
}

// Individual instructions in the order they appear in the boot ROM.

// ...

// LD (HL-),A
func ldhlda(c *CPU) {
        // Using extension methods to treat HL as a single 16-bit register.
        ldaddrr(c, c.HL(), c.A)
        c.SetHL(c.HL() - 1)
}

The culprit is c.memory[addr] = reg. When the program crashes, the value of addr is the current value in HL. We’re trying to write something at element 0x9fff, which is way beyond the end of our 256-byte array! Few languages allow that — and even when they do, it’s not usually a good thing. It was also very foolish of me to blindly access an array without checking its bounds first, obviously.

Can I see your manager?

The easiest and quickest solution would be to extend that memory array to be 64KB long, since the CPU can only manipulate 16-bit addresses between 0 and 65535. It will work, the program will no longer crash, and the assembly loop shown above will happily fill elements 0x8000 to 0x9fff of that array with zero. The thing is, many memory addresses don’t just store data, but do some sort of thing when written to, or return a value that varies depending on other factors. Right now, we only have a CPU structure, but very soon we’ll get started on the graphics part, hardware registers and so on. So the question is: “how am I going to dispatch memory accesses between my emulator’s components?”

Example of a slightly more complex memory access: in some states, the CPU can’t access video RAM. How do we implement this?
Source: The Ultimate Game Boy Talk (46:55)

I toyed with a few ideas — like having a 64KB-long array of pointers to objects responsible for that memory segment — but I wasn’t crazy about any of them. I eventually looked at existing implementations for more inspiration and ended up really liking the way coffee-gb did it, so I went the same direction and implemented an interface that covered exactly three things:

  • Reading from an address
  • Writing to an address
  • Indicating whether a given address is within the range managed by the object implementing the interface

I tinkered with names a little and finally settled for the following Addressable interface:

// Addressable interface provides functions to read/write bytes in a given
// 16-bit address space.
type Addressable interface {
        // Contains returns true if the given address belongs to the address space.
        Contains(addr uint16) bool
        // Read returns the value stored at the given address.
        Read(addr uint16) uint8
        // Write attempts to store the given value at the given address if writable.
        Write(addr uint16, value uint8)
}

This code is a small step towards reliable memory access: given any Addressable object, I know I can ask whether it contains an address, then read from or write to it if it does. But what if it doesn’t? The interface itself is only the first step. We’ll use it to build something a lot more useful: a memory management unit (a.k.a MMU).

What we really want is to be able to hit a single object, request an address (for reading or writing) and have it automatically dispatch that request to the Addressable instance that can handle it. Or, if that can’t be done — because some memory zones in the Game Boy are either poorly understood or just unusable — do something sensible that won’t crash the program instead. Like ignoring all write operations and returning a dummy value such as 0xff on reads. Try turning a Game Boy on with no cartridge in it: you get a black bar scrolling down instead of the Nintendo logo. Where the Game Boy would normally read graphical data from the cartridge itself, there is nothing to read, so it assumes that this non-existent memory is all 0xff and that translates to fully black tiles5Don’t take my word for it, we’ll get to that point in a few articles, I promise..

Memory accesses while executing LD (HL-),A using a Memory Management Unit. Notice the Boot ROM shadowing the first 256 bytes of the cartridge.

This sounds potentially funnier than blindly implementing opcodes at the moment — and the existing ones will need refactoring to make use of that MMU anyway. Let’s switch to that and come up with a quick structure and a couple methods to make it happen. We want an object that contains an arbitrary amount of addressable components, regardless of their actual type. This is where our Addressable interface comes into play.

// MMU manages an arbitrary number of ordered address spaces. It also satisfies
// the Addressable interface.
type MMU struct {
        Spaces []Addressable
}

// Returns the first address space that can handle the requested address or nil.
func (m *MMU) space(addr uint16) Addressable {
        for _, space := range m.Spaces {
                if space.Contains(addr) {
                        return space
                }
        }
        return nil 
}

// Contains returns whether one of the address spaces known to the MMU contains
// the given address. The first address space in the internal list containing a
// given address will shadow any other that may contain it. 
func (m *MMU) Contains(addr uint16) bool {
        return m.space(addr) != nil 
}

// Read finds the first address space compatible with the given address and 
// returns the value at that address. If no space contains the requested
// address, it returns 0xff (emulates black bar on boot).
func (m *MMU) Read(addr uint16) uint8 {
        if space := m.space(addr); space != nil {
                return space.Read(addr)
        }
        return 0xff
}

// Write finds the first address space compatible with the given address and 
// attempts writing the given value to that address.
func (m *MMU) Write(addr uint16, value uint8) {
        if space := m.space(addr); space != nil {
                space.Write(addr, value)
        }
}

I now have an object I can point the CPU to, which will support reading or writing to any memory address between 0x0000 and 0xffff. Making the MMU structure implement the Addressable interface itself makes it easy to embed in future types that will contain several address spaces (like all the various memory zones for controlling the display or playing sound).

Back to square zero, but with memory management

It’s time to get rid of that single byte array and refactor our code to use that brand new MMU structure. It mostly amounts to replacing all direct array accesses with calls to MMU.Read() or MMU.Write(). And lo, we’re back to pretty much exactly where we were at the beginning of this post, except now we can plug in various emulated components as we develop them. For the time being, let’s just use a big chunk of RAM as suggested earlier. Since all memory accesses are going to happen through our MMU, we can extend it later without affecting the implementation of our CPU instructions. Turning our byte array into an Addressable object is as trivial as giving it a type name and three tiny methods but we’ll go a tiny step further than that and also define a starting address.

// RAM as an arbitrary long list of R/W bytes at addresses starting from a
// given offset.
type RAM struct {
        bytes []uint8
        start uint16
}

// NewRAM instantiates a zeroed slice of the given size to represent RAM.
func NewRAM(start, size uint16) *RAM {
        return &RAM{make([]uint8, size), start}
}

// Read returns the byte at the given address (adjusting for offset).
func (r RAM) Read(addr uint16) uint8 {
        return r.bytes[addr-r.start]
}

// Write stores the given value at the given address (adjusting for offset).
func (r RAM) Write(addr uint16, value uint8) {
        r.bytes[addr-r.start] = value
}

// Contains returns true as long as the given address fits in the slice.
func (r RAM) Contains(addr uint16) bool {
        return addr >= r.start && addr < r.start+uint16(len(r.bytes))
}

Right now, we can get away with defining a block of RAM starting from 0x8000 (the beginning of video RAM) and covering all addresses up to 0xfff. This will be a good enough approximation of a Game Boy with no cartridge in it. We also need to convert all occurrences of c.memory[<address>] in our original code to c.mmu.Read(<address>) or c.mmu.Write(<address>).

But wait, what about our boot ROM, now? We could just cheat and fit it inside a RAM object starting from address zero, which is pretty much what I did at first. However, now is a good time to turn that ROM into its own address space with a twist: at the very end of the boot process, the Game Boy writes to a special memory address, 0xff50, which disables the boot ROM and lets the CPU access addresses 0x0000 to 0x00ff in the cartridge instead. It’s the very last instruction in the boot ROM so it’s borderline out of scope, but it’s a nice and simple way to illustrate how useful our Addressable interface is.

// Boot address space translating memory access to Boot ROM and the BOOT
// register at address 0xff50.
type Boot struct {
        rom      RAM   // The contents of the boot ROM.
        register uint8 // BOOT register at address 0xff50.
        disabled bool  // Writing to 0xff50 will disable boot ROM access.
}

// NewBoot returns a new address space containing the boot ROM and the BOOT
// register.
func NewBoot(filename string) *Boot {
        rom, err := ioutil.ReadFile(filename)
        if err != nil {
                panic(err)
        }
        return &Boot{rom: RAM{rom, 0}}
}

// Contains returns true if the given address is that of the BOOT register,
// or if the boot ROM is not disabled and contains the address.
func (b *Boot) Contains(addr uint16) bool {
        return addr == 0xff50 || (!b.disabled && b.rom.Contains(addr))
}

// Read returns the value stored at the given address in ROM or BOOT register.
// If the boot ROM was disabled, Contains() should ensure this method will
// only be called with address 0xff50.
func (b *Boot) Read(addr uint16) uint8 {
        if addr == 0xff50 {
                return b.register
        }
        return b.rom.Read(addr)
}

// Write is only supported for the BOOT register and disables the boot ROM.
func (b *Boot) Write(addr uint16, value uint8) {
        if addr == 0xff50 {
                b.register = value
                b.disabled = true
        } // Writing to the boot ROM itself is obviously not allowed.
}

Now that’s a little richer than just a dumb block of empty RAM. First, our structure groups both the memory occupied by the boot ROM code itself, and the register6Here, I mean “hardware register”, as I’ve seen used in a lot of documentation. This isn’t an actual CPU register but a special address in the Game Boy’s memory, which can be read/write, read-only or even write-only, and operates on various hardware components such as sound, video, etc. that controls whether the ROM is accessible. We still hijacked the RAM type we defined earlier to avoid writing the boring parts of Read() and Write() again. Instead, we just reused those methods and added a couple extra checks:

  • If the requested address is exactly 0xff50, we allow reading the register’s value and writing to it.
  • If the requested address is between 0x0000 and 0x00ff and the ROM is enabled, we allow reading from that address in the boot ROM.
  • Otherwise, we report that we don’t contain the address and the enclosing MMU will try its next address space.

You’ll notice that this 0xff50 address is also technically contained in our RAM object (which currently covers 0x8000 to 0xffff) but by placing the Boot ROM first in our MMU’s list of Addressable objects, it will take precedence and do the expected thing. And all of this is entirely transparent to the CPU!

“Are we there yet?”

That pretty much covers it. We can now split the Game Boy’s memory in as many Addressable objects as we need, as long as these objects satisfy our rather straightforward interface. Another example of an Addressable type we can build is a mapping of hardware register addresses to a corresponding uint8 variable. The Contains() check can just look up the address as a key in the mapping.

This is getting long, so I’ll gloss over going back to the implementation of more CPU instructions using our new MMU type. I will just point out that after writing some 38 more instructions, the program no longer exits. Not even in error! It just keeps running forever, like stuck in some kind of loop. This was a good opportunity to use vscode’s debugging mode and pause the execution to check the CPU’s state. Or, you know, just use a bunch of print statements, like I did in the example program. One way or another, we’ll figure out that the CPU is executing things in a loop between addresses 0x0064 and 0x0068. Back to the boot ROM’s assembly code, we’ll read:

Addr_0064:
        LD A,($FF00+$44)        ; $0064  wait for screen frame
        CP $90                  ; $0066
        JR NZ, Addr_0064        ; $0068

This little bit of code simply reads the value in hardware register 0xff44 repeatedly as long as this value is not 0x90 (144). Since we haven’t implemented anything specific to that register, whenever the CPU reads the value at address 0xff44, the MMU returns the content of that big empty RAM we came up with earlier, which, in this range, is full of zeroes.

“Wait for frame” endless loop roughly illustrated.

What is in register 0xff44? Why 144? How do we get this address to contain something else than zero? I’ll get to that in the next article. We’re finally going to start implementing the display!

References

You can download the example programs above and run them anywhere from the command line:

$ go run the-first-steps-extended.go
$ go run memory-management.go

Both of them expect a dmg-rom.bin file to be present in the same folder. You can trick the first example program into working (well, up to the crash mentioned at the beginning of this article) by creating a stub with the following command:

$ echo -n -e '\x31\xfe\xff\xaf\x21\xff\x9f\x32\xcb\x7c' > dmg-rom.bin

If you do want to run the second example program, you’re going to have to look up an actual DMG boot ROM online. You can still just have a look at the source code. Both programs are heavily commented and, hopefully, somewhat understandable.

Thank you for reading!