A colored table describing all the (non extended) CPU opcodes for the Game Boy microprocessor.

Writing an emulator: the first steps

2019-07-09 / Development, Emulator

So!

I’ve read an enlightening blog post. I’ve set up a fresh new git repo. I’ve tracked down a dump of the Game Boy’s boot ROM. I’m pumped as fuck!

…

Now what?

Setting the bar (low?)

The task ahead, exciting as it is, might seem a little daunting at first, so let’s set ourselves a small, reasonably attainable goal. Implementing a CPU from scratch isn’t so scary, there are good ways to get started with that, like the Synacor Challenge or the Advent of Code. It basically consists in writing a loop in which we’ll associate the next value in a memory array to some function, call said function, update our index in the memory array and repeat the whole thing ad infinitum — or until we run out of memory to read from.

So my first milestone will be to write a very simple command-line program that:

Defines a CPU object representing the Game Boy
Reads the boot ROM file and stores it in memory
Executes the first few bytes of that ROM
Displays the current state of the CPU when it’s done

All of this could reasonably be done in a day with a single array, a big switch/case block and a for loop. I’m only going to have to interpret 256¹ bytes after all. To make things easier, the commented assembly code for the boot ROM is also available online, and I can easily pick out the first couple machine instructions I’ll need to implement.

All right, I can do this, let’s get started

Let’s get a couple things out of the way right now: I’m writing this emulator in Go because it’s become my favorite language, I’m having a lot of fun using it, and it’s low-level enough for my purpose. I’m also using Visual Studio Code because it’s got good support for Go — especially the debugger — and runs great on Linux. That’s it, your mileage may vary.

Now, what’s a CPU anyway? Structurally, it’s a bunch of data (8-bit) registers to use as storage when operating on numbers and a couple address (16-bit) registers used to refer to the next instruction to execute in memory, or the bottom of the stack. Yes, this is a terrible oversimplification, but still a good enough approximation of the Game Boy’s CPU² to get us started.

If you haven’t spent most of your childhood reading up on Z80 assembly (you missed out), those letters may not look familiar. They just represent boxes into which the CPU can temporarily store 8-bit values (0 to 255) or 16-bit addresses (0 to 65535). Those pairs of 8-bit registers can also be used, in some circumstances, as four 16-bit registers (AF, BC, DE and HL) but at this stage, this isn’t too relevant.

// CPU represents the CPU registers that you would find in a Game Boy.
type CPU struct {
    A, F   uint8
    B, C   uint8
    D, E   uint8
    H, L   uint8
    SP     uint16
    PC     uint16
}

The first register we need to look at to get started is PC (for Program Counter), which will tell the CPU where to read the next instruction in memory. It conveniently starts at 0 so we already know to look at the very beginning of the boot ROM to see what is the very first CPU instruction we should implement. We could check the disassembled code from the boot ROM, or we could also just open said ROM with an hexadecimal editor, see that the first byte is 0x31 and look up what CPU instruction that is, either in the Pan Docs or in the instruction set tables available online — I mostly used the latter to look up opcodes and the former when I needed more information about an opcode in particular.

Either way, we’ll find out that 0x31 is the opcode for LD SP,d16 — a somewhat barbaric way to say “store (LoaD) into register SP the value contained in the next two bytes in memory”. The disassembled boot code available online even has comments offering a quick explanation:

LD SP, $fffe  ; $0000  Setup Stack
XOR A         ; $0003  Zero the memory from $8000-$9FFF (VRAM)
LD HL, $9fff  ; $0004

So the first thing a Game Boy’s CPU does when booting up is to set its SP register to the value 0xfffe. Easy! Obviously, I still need a tiny bit of logic to extract that 0xfffe value from memory after reading the instruction, which implies implementing memory first. I’ll get to that in a moment. For the time being, I’ll assume that, somewhere in that infinite loop I was mentioning back in the introduction, there will be a bit of code reading a little like:

for {
    opcode := memory[cpu.PC]
    cpu.PC++

    // ...
    if opcode == 0x31 {
        cpu.SP = memory[cpu.PC] | (memory[cpu.PC+1] << 8)
        cpu.PC += 2
    }
    // ...
}

Now, this is assuming that we have a memory array which contains, from index 0 onward, the individual bytes of the boot ROM code. Still, wasn’t that quick? I’ve already virtually executed three bytes from the ROM, only 197 to go! It’s not quite as bad as it might sound since many instructions will repeat throughout the boot ROM, only with different parameters. For instance, LD SP,$fffe is very similar to the LD HL,$9fff instruction two lines below, or, a bit further down, to LD HL,$ff26. I could just implement a generic function that takes a 16-bit register as parameter and stores the next two bytes in memory in it.

In fact, if I remove all numerical parameters from the boot ROM’s assembly code, it turns out only 49 different instructions are used. This means I could technically run the whole boot sequence by only implementing the remaining 48 CPU instructions and I’d never have to look at the rest of the instruction set!

ADD [HL]       INC B             LDH A,[$ff00+d8]  LD [d16],A      LD L,d8
BIT 7,H        INC C             LD A,B            LD D,A          LD SP,d16
CALL address   INC DE            LD A,d8           LD D,d8         POP BC
CP d8          INC H             LD A,[DE]         LD DE,d16       PUSH BC
CP [HL]        INC HL            LD A,E            LD E,d8         RET
DEC A          JR address        LD A,H            LD H,A          RLA
DEC B          JR NZ,address     LD A,L            LD [HL],A       RL C
DEC C          JR Z,address      LD B,d8           LD [HL+],A      SUB B
DEC D          LDH [$ff00+C],A   LD C,A            LD [HL-],A      XOR A
DEC E          LDH [$ff00+d8],A  LD C,d8           LD HL,d16

The list above is similar to the way those opcodes are listed in the instruction set tables mentioned earlier. Here d8 represents an 8-bit parameter value read from memory just after the opcode, d16 a 16-bit parameter or address, and r8 a signed 8-bit number to add to or subtract from PC (for relative jumps). You will also notice that many of these distinct opcodes actually have a very similar purpose. All these LD register,value instructions could definitely be described by even more generic functions. Let’s group together all instructions working on single registers, those working on double registers, those working with addresses, etc. Now we’re down to 25 generic instructions, that sounds totally manageable!

ADD (address)    DEC r              LD A,(address)    LD rr,d16     RET
BIT n,r          INC r              LD r,r            LD (HL+),A    RLA
CALL address     INC rr             LD r,d8           LD (HL-),A    RL r
CP d8            JR cond,address    LD r,(address)    POP rr        SUB r
CP (HL)          LD (address),A     LD (address),r    PUSH rr       XOR r

Incidentally, that is pretty much how the CPU instruction set is organized in the Pan Docs, and it did influence my implementation. Now, the code excerpt I wrote roughly shows how to implement LD SP,d16 but to truly execute it, we should really read the instruction directly from the ROM, so we need to implement some kind of memory for that…

The simplest memory model

Memory is conceptually simpler than a CPU: we can see it as an arbitrarily long list of numerical values that we can access in any order we want. In the vast majority of languages I can think of, this is easily represented by an array. Of course, we’ll want something better than that: the Game Boy has several distinct memory zones (work RAM, video RAM, boot ROM, cartridge ROM…) some of which have a direct effect on the display, sound and other bits of the hardware. Right now, though, we’ll make do with a mere byte array.

Actually, if you remember the criteria for the basic program I’m starting with, we don’t even need any kind of RAM yet: the three first CPU instructions only have an effect on the CPU itself. We can start with just the boot ROM as a 256-byte long array, point the CPU at element zero of that array and run our loop.

Reading a file to an array³ is a one-liner⁴. But let’s also properly rewrite that code snippet from earlier into something that’s got a better chance of compiling:

// Imports and CPU structure defined further above.

func main() {
    memory, err := ioutil.ReadFile("./dmg-rom.bin")
    if err != nil {
            panic(err)
    }

    cpu := CPU{}
    for {
        opcode := memory[cpu.PC]
        cpu.PC++

        switch opcode {
        // ... other opcodes

        case 0x31: // LD SP,d16

            // Here, Go requires some type conversion to fit two
            // uint8 into one uint16 and also to avoid grossly
            // overflowing an uint8 by shifting it left 8 bits,
            // which would just make it zero and yield 0x00fe in
            // the end instead of 0xfffe. *That* was fun to debug!
            cpu.SP = uint16(memory[cpu.PC]) | uint16(memory[cpu.PC+1])<<8
            cpu.PC += 2

        // ... other opcodes
        }
    }
}

Some implementation details (the package name, imports and the CPU structure) have been left out. The point is that the program reads the first byte in memory, gets ready to read the next one, figures out that the first byte is the opcode for LD SP,d16 and sets the SP register accordingly by building a 16-bit value from the next two bytes in memory. When it’s done, the CPU’s program counter is incremented to point to the next instruction right after the two bytes we just read.

Note that the Game Boy is little-endian: when reading a 16-bit value from memory, the least significant byte (the rightmost 8 bits) will come first (its memory address will be lower), and the most significant (the leftmost 8 bits) will come second (its address will be higher). In our case, we’re loading the value 0xfffe in the SP register, but in that memory array, it’s stored as two consecutive bytes: 0xfe followed by 0xff.

This is why we set SP to the first byte, and then OR it with the second one shifted left by 8 bits (i.e. we add the second value that’s been multiplied by 256). In the end, representing each value with 16 bits instead of 8, the operation is SP = 0x00fe + 0xff00 = 0xfffe.Think of it as reading a number backwards and rebuilding it from its individual digits. 421 would be written 124 but you’d decode it by computing 1+(2×10)+(4×100).

This is already quite a bit of work just to set a variable’s value, so let’s wrap it up. The code above will blissfully run through the whole memory array, updating the value in register SP whenever it reads the byte 0x31 — provided there are at least two more bytes left after it in memory. But we won’t see any of it since the program will crash as soon as it gets to the end of the boot ROM and tries reading outside the array.

What I initially did was to exit the loop as soon as it encountered an opcode it didn’t know, print the CPU’s internal state and exit the program. This made for an encouraging metric in the beginning: each opcode I implemented got me further along the boot ROM code, which showed in the way the PC register contained a higher and higher value at each new successful run.

        // ... rest of the switch block above.

        // Quit the program when reading an unimplemented opcode from memory.
        default:
            fmt.Printf("Unknown opcode: %#2x\n", opcode)
            fmt.Printf("cpu.PC=0x%04x\n", cpu.PC)
            fmt.Printf("cpu.SP=0x%04x\n", cpu.SP)
            return

Let’s try and execute that tiny, contrived example now:

$ go run the-first-steps.go
Unknown opcode: 0xaf
cpu.PC=0x0004
cpu.SP=0xfffe

It works! The emulator stopped at the second opcode, 0xaf, which is the XOR A on the second line in the boot code. It properly decoded and executed LD SP,$fffe and the program counter is already pointing at the next instruction in the fifth memory byte since XOR A is a single-byte opcode with no extra parameters. Now all that’s left is to implement XOR A⁵, then run the program again and then move on to LD HL,$9fff which, as we noted, should look extremely similar to LD SP,$fffe, and keep doing that until we hit an instruction that needs to access the RAM.

When we do, it’ll be time for a slightly more complicated memory model that I’ll cover in the next post. In the mean time, I consider the first milestone reached. This isn’t much at all, but the very core of the emulator is already there. We are — slowly, blindly, lovingly and painstakingly — booting up a Game Boy!

References

Game Boy CPU (SM83) instruction set (it’s going to come up a lot, especially when we get to timings)
Pan Docs: CPU Instruction Set
Gameboy Bootstrap ROM (with super cool details and links about how boot ROMs are dumped)
Example program

You can download the example program above and run it anywhere from the command line:

$ go run the-first-steps.go

It expects a dmg-rom.bin file to be present in the same folder. If you don’t feel like tracking down a boot ROM, you can create a usable stub with:

$ echo -n -e '\x31\xfe\xff\xaf\x21\xff\x9f\x32' > dmg-rom.bin

Thank you for reading!

Actually only 200 since 56 bytes of that ROM are just raw tile data. ↩︎
A Sharp LR35902, which is some kind of weird mix between an 8080 and a Z80. ↩︎
In Go it’s technically a slice but I’ll stick to calling it an array for simplicity’s sake. ↩︎
Okay really a four-liner because you should always check for errors. ↩︎
Which, considering it actually just means “set A to zero”, is no big feat. ↩︎