Writing an emulator: memory management
In the previous post, we got to a slow, arguably underwhelming start by implementing a very small fraction of the Game Boy’s CPU’s instruction set — one instruction. That’s about 1/500th of the whole set, if you were wondering, since it contains 244 “core” instructions1 and 256 “extended” instructions — which have a two-byte opcode starting with 0xcb
— for shift/rotation/bit operations. Remember, though, that I also restricted our goal to implementing only the 49 instructions needed to execute the boot ROM.
Where are we now?
I thought I’d implement CPU instructions in the order they appear in the boot ROM until the program crashed or until I reached the end of said boot ROM. As mentioned at the end of the previous article, I tackled XOR A
first, then LD HL,$9fff
, for which I extended our CPU structure to transparently manage HL as a single register, then LD (HL-),A
, then—
panic: runtime error: index out of range
goroutine 1 [running]:
main.ldaddrr(...)
/home/kefen/emulator/the-first-steps-extended.go:89
main.ldhlda(...)
/home/kefen/emulator/the-first-steps-extended.go:115 +0x58
main.main()
/home/kefen/emulator/the-first-steps-extended.go:139 +0xdd
exit status 2
Oh. Right!
Remember how I mentioned we could just keep happily implementing CPU instructions until we hit an instruction that needed to access the RAM? Well, the fourth instruction in the boot ROM does exactly that: LD (HL-),A
stores the value currently contained in register A
into the memory location corresponding to the address in register HL
, then decrements HL
by one. This is the heart of a loop that forces memory from 0x8000
to 0x9fff
to be zero2:
XOR A ; $0003 Zero the memory from $8000 to $9FFF (VRAM)
LD HL,$9fff ; $0004
Addr_0007:
LD (HL-),A ; $0007
BIT 7,H ; $0008
JR NZ, Addr_0007 ; $000a
The first two instructions initialize A
to zero3 and HL
to 0x9fff
. Then comes a typical case where two 8-bit registers are used as a single 16-bit register: HL
is initialized and decremented as a whole, but the BIT
instruction only checks the seventh bit of H
because we’re not really interested in the whole 16 bits, here. The ROM code’s sole concern is to exit that loop when HL
is no longer greater than or equal to 0x8000
, and to know that, it only needs to check that the value in H
is strictly lower than 0x80
. This is what is checked by BIT 7,H
and JR NZ,Addr_0007
: if bit 7 of H
was Non Zero4, then its value has to be 0x80
or higher, and the code jumps back to the beginning of the loop at address 0x0007
.
Okay, so why is the program crashing? Well, it turns out I blindly implemented those instructions using the same memory array we came up with last time, which only contains the boot ROM code and is only 256 bytes long.
// Generic instructions used to group common code.
//
// r single (8-bit) register (A, F, B C, D, E, H or L)
// rr double (16-bit) register (AF, BC, DE or HL -- SP and PC are handled
// separately because I made them uint16s and I stand by that decision)
// d8 8-bit (unsigned) parameter (1 byte) after opcode
// d16 16-bit (unsigned) parameter (2 bytes, little-endian) after opcode
// r8 8-bit (signed) parameter (1 byte) after opcode
// ...
// LD (addr),r
func ldaddrr(c *CPU, addr uint16, reg uint8) {
c.memory[addr] = reg
}
// Individual instructions in the order they appear in the boot ROM.
// ...
// LD (HL-),A
func ldhlda(c *CPU) {
// Using extension methods to treat HL as a single 16-bit register.
ldaddrr(c, c.HL(), c.A)
c.SetHL(c.HL() - 1)
}
The culprit is c.memory[addr] = reg
. When the program crashes, the value of addr
is the current value in HL
. We’re trying to write something at element 0x9fff
, which is way beyond the end of our 256-byte array! Few languages allow that — and even when they do, it’s not usually a good thing. It was also very foolish of me to blindly access an array without checking its bounds first, obviously.
Can I see your manager?
The easiest and quickest solution would be to extend that memory array to be 64KB long, since the CPU can only manipulate 16-bit addresses between 0 and 65535. It will work, the program will no longer crash, and the assembly loop shown above will happily fill elements 0x8000
to 0x9fff
of that array with zero. The thing is, many memory addresses don’t just store data, but do some sort of thing when written to, or return a value that varies depending on other factors. Right now, we only have a CPU structure, but very soon we’ll get started on the graphics part, hardware registers and so on. So the question is: “how am I going to dispatch memory accesses between my emulator’s components?”
So. How do we implement this?
I toyed with a few ideas — like having a 64KB-long array of pointers to objects responsible for that memory segment — but I wasn’t crazy about any of them. I eventually looked at existing implementations for more inspiration and ended up really liking the way coffee-gb did it, so I went the same direction and implemented an interface that covered exactly three things:
- Reading from an address.
- Writing to an address.
- Indicating whether a given address is within the range managed by the object implementing the interface.
I tinkered with names a little and finally settled for the following Addressable
interface:
// Addressable interface provides functions to read/write bytes in a given
// 16-bit address space.
type Addressable interface {
// Contains returns true if the given address belongs to the address space.
Contains(addr uint16) bool
// Read returns the value stored at the given address.
Read(addr uint16) uint8
// Write attempts to store the given value at the given address if writable.
Write(addr uint16, value uint8)
}
This code is a small step toward reliable memory access: given any Addressable
object, I know I can ask whether it contains an address, then read from or write to it if it does. But what if it doesn’t? The interface itself is only the first step. We’ll use it to build something a lot more useful: a memory management unit (a.k.a MMU).
What we really want is to be able to hit a single object, request an address (for reading or writing) and have it automatically dispatch that request to the Addressable
instance that can handle it. Or, if that can’t be done — because some memory zones in the Game Boy are either poorly understood or just unusable — do something sensible that won’t crash the program instead. Like ignoring all write operations and returning a dummy value such as 0xff
on reads. Try turning a Game Boy on with no cartridge in it: you get a black bar scrolling down instead of the Nintendo logo. Where the Game Boy would normally read graphical data from the cartridge itself, there is nothing to read, so it assumes that this non-existent memory is all 0xff
and that translates to fully black tiles5.
This sounds potentially funnier than blindly implementing opcodes at the moment — and the existing ones will need refactoring to make use of that MMU anyway. Let’s switch to that and come up with a quick structure and a couple methods to make it happen. We want an object that contains an arbitrary amount of addressable components, regardless of their actual type. This is where our Addressable
interface comes into play.
// MMU manages an arbitrary number of ordered address spaces. It also satisfies
// the Addressable interface.
type MMU struct {
Spaces []Addressable
}
// Returns the first address space that can handle the requested address or nil.
func (m *MMU) space(addr uint16) Addressable {
for _, space := range m.Spaces {
if space.Contains(addr) {
return space
}
}
return nil
}
// Contains returns whether one of the address spaces known to the MMU contains
// the given address. The first address space in the internal list containing a
// given address will shadow any other that may contain it.
func (m *MMU) Contains(addr uint16) bool {
return m.space(addr) != nil
}
// Read finds the first address space compatible with the given address and
// returns the value at that address. If no space contains the requested
// address, it returns 0xff (emulates black bar on boot).
func (m *MMU) Read(addr uint16) uint8 {
if space := m.space(addr); space != nil {
return space.Read(addr)
}
return 0xff
}
// Write finds the first address space compatible with the given address and
// attempts writing the given value to that address.
func (m *MMU) Write(addr uint16, value uint8) {
if space := m.space(addr); space != nil {
space.Write(addr, value)
}
}
I now have an object I can point the CPU to, which will support reading or writing to any memory address between 0x0000
and 0xffff
. Making the MMU structure implement the Addressable
interface itself makes it easy to embed in future types that will contain several address spaces (like all the various memory zones for controlling the display or playing sound).
Back to square zero, but with memory management
It’s time to get rid of that single byte array and refactor our code to use that brand new MMU structure. It mostly amounts to replacing all direct array accesses with calls to MMU.Read()
or MMU.Write()
. And lo, we’re back to pretty much exactly where we were at the beginning of this post, except now we can plug in various emulated components as we develop them. For the time being, let’s just use a big chunk of RAM as suggested earlier. Since all memory accesses are going to happen through our MMU, we can extend it later without affecting the implementation of our CPU instructions. Turning our byte array into an Addressable
object is as trivial as giving it a type name and three short methods but we’ll go a tiny step further than that and also define a starting address.
// RAM as an arbitrary long list of R/W bytes at addresses starting from a
// given offset.
type RAM struct {
bytes []uint8
start uint16
}
// NewRAM instantiates a zeroed slice of the given size to represent RAM.
func NewRAM(start, size uint16) *RAM {
return &RAM{make([]uint8, size), start}
}
// Read returns the byte at the given address (adjusting for offset).
func (r *RAM) Read(addr uint16) uint8 {
return r.bytes[addr-r.start]
}
// Write stores the given value at the given address (adjusting for offset).
func (r *RAM) Write(addr uint16, value uint8) {
r.bytes[addr-r.start] = value
}
// Contains returns true as long as the given address fits in the slice.
func (r *RAM) Contains(addr uint16) bool {
return addr >= r.start && addr < r.start+uint16(len(r.bytes))
}
Right now, we can get away with defining a block of RAM starting from 0x8000
(the beginning of video RAM) and covering all addresses up to 0xffff
. This will be a good enough approximation of a Game Boy with no cartridge in it. We also need to convert all occurrences of c.memory[<address>]
in our original code to c.mmu.Read(<address>)
or c.mmu.Write(<address>)
.
But wait, what about our boot ROM, now? We could just cheat and fit it inside a RAM object starting from address zero, which is pretty much what I did at first. However, now is a good time to turn that ROM into its own address space with a twist: at the very end of the boot process, the Game Boy writes to a special memory address, 0xff50
, which disables the boot ROM and lets the CPU access addresses 0x0000
to 0x00ff
in the cartridge instead. It’s the very last instruction in the boot ROM so it’s borderline out of scope, but it’s a nice and simple way to illustrate how useful our Addressable
interface is.
// Boot address space translating memory access to Boot ROM and the BOOT
// register at address 0xff50.
type Boot struct {
rom RAM // The contents of the boot ROM.
register uint8 // BOOT register at address 0xff50.
disabled bool // Writing to 0xff50 will disable boot ROM access.
}
// NewBoot returns a new address space containing the boot ROM and the BOOT
// register.
func NewBoot(filename string) *Boot {
rom, err := ioutil.ReadFile(filename)
if err != nil {
panic(err)
}
return &Boot{rom: RAM{rom, 0}}
}
// Contains returns true if the given address is that of the BOOT register,
// or if the boot ROM is not disabled and contains the address.
func (b *Boot) Contains(addr uint16) bool {
return addr == 0xff50 || (!b.disabled && b.rom.Contains(addr))
}
// Read returns the value stored at the given address in ROM or BOOT register.
// If the boot ROM was disabled, Contains() should ensure this method will
// only be called with address 0xff50.
func (b *Boot) Read(addr uint16) uint8 {
if addr == 0xff50 {
return b.register
}
return b.rom.Read(addr)
}
// Write is only supported for the BOOT register and disables the boot ROM.
func (b *Boot) Write(addr uint16, value uint8) {
if addr == 0xff50 {
b.register = value
b.disabled = true
} // Writing to the boot ROM itself is obviously not allowed.
}
Now that’s a little richer than just a dumb block of empty RAM. First, our structure groups both the memory occupied by the boot ROM code itself, and the register6 that controls whether the ROM is accessible. We still hijacked the RAM type we defined earlier to avoid writing the boring parts of Read()
and Write()
again. Instead, we just reused those methods and added a couple extra checks:
- If the requested address is exactly
0xff50
, we allow reading the register’s value and writing to it. - If the requested address is between
0x0000
and0x00ff
and the ROM is enabled, we allow reading from that address in the boot ROM. - Otherwise, we report that we don’t contain the address and the enclosing MMU will try its next address space.
You’ll notice that this 0xff50
address is also technically contained in our RAM object (which currently covers addresses from 0x8000
to 0xffff
) but by placing the Boot ROM first in our MMU’s list of Addressable
objects, it will take precedence and do the expected thing. And all of this is entirely transparent to the CPU!
“Are we there yet?”
That pretty much covers it. We can now split the Game Boy’s memory in as many Addressable
objects as we need, as long as these objects satisfy our rather straightforward interface. Another example of an Addressable
type we can build is a mapping of hardware register addresses to a corresponding uint8
variable. The Contains()
check can just verify that the given address exists as a key in that mapping.
This is getting long, so I’ll gloss over going back to the implementation of more CPU instructions using our new MMU type. I will just point out that after writing some 38 more instructions, the program no longer exits. Not even in error! It just keeps running forever, like stuck in some kind of loop. This was a good opportunity to use vscode’s debugging mode and pause the execution to check the CPU’s state. Or, you know, just use a bunch of print
statements, like I did in the example program. One way or another, we’ll figure out that the CPU is executing instructions in a loop between addresses 0x0064
and 0x0068
. Back to the boot ROM’s assembly code, we’ll read:
Addr_0064:
LD A,($FF00+$44) ; $0064 Wait for screen frame
CP $90 ; $0066
JR NZ, Addr_0064 ; $0068
This little bit of code simply reads the value in hardware register 0xff44
repeatedly as long as this value is not 0x90
(144). Since we haven’t implemented anything specific to that register, whenever the CPU reads the value at address 0xff44
, the MMU returns the content of that big empty RAM we came up with earlier, which, in this range, is full of zeroes.
What is in register 0xff44
? Why is the CPU expecting 144? How do we get this address to contain something else than zero? I’ll get to that in the next article. We’re finally going to start implementing the display!
References
- Gameboy (LR35902) OPCODES (we haven’t seen the last of it yet)
- Pan Docs: CPU Instruction Set (see above)
- Gameboy Bootstrap ROM
- The Ultimate Game Boy Talk
- The coffee-gb source code (credit where credit’s due)
- Example program: the first steps (extended)
- Example program: memory management
You can download the example programs above and run them anywhere from the command line:
$ go run the-first-steps-extended.go
$ go run memory-management.go
Both of them expect a dmg-rom.bin
file to be present in the same folder. You can trick the first example program into working (well, up to the crash mentioned at the beginning of this article) by creating a stub with the following command:
$ echo -n -e '\x31\xfe\xff\xaf\x21\xff\x9f\x32\xcb\x7c' > dmg-rom.bin
If you do want to run the second example program, you’re going to have to look up an actual DMG boot ROM online. You can still just have a look at the source code. Both programs are heavily commented and, hopefully, somewhat understandable.
Thank you for reading!
-
11 values are unused opcodes, and
0xcb
is used to prefix extended instructions. ↩︎ -
Since memory values in a real Game Boy are undefined at power up. This isn’t going to be an issue for the emulator, as Go will initialize all memory arrays to zero automatically. ↩︎
-
XOR r
stores intoA
the result of an XOR operation between registerr
andA
, and something XORed with itself is always zero. This is actually shorter and faster than usingLD A,$00
which would require twice as many bytes to encode. ↩︎ -
Which is the result of the previous
BIT
instruction. Also note that bits are counted from the right and from zero so here, bit 7 is the leftmost bit inH
, e.g.0x80
. ↩︎ -
Don’t take my word for it, we’ll get to that point in a few articles, I promise. ↩︎
-
Here, I mean “hardware register”, as I’ve seen used in a lot of documentation. This isn’t an actual CPU register but a special address in the Game Boy’s memory, which can be read/write, read-only or even write-only, and operates on various hardware components such as sound, video, etc. ↩︎