Click here to get a bigger logo!
Wow. Has it been two years already? Oh well… Anyway!
So hey, you remember how the Nintendo logo would scroll down when you turned on a Game Boy — and how writing enough of an emulator to see it was kind of the point of most of this blog?
Last time, I mentioned that the logo was read from the cartridge and copied over to the Game Boy’s video RAM by some specific code inside the boot ROM.
I initially thought tile data was blindly copied over from the cartridge header to the background map, but the boot ROM’s annotated source code contained a somewhat cryptic comment:
; ==== Graphic routine ====
LD C, A ; $0095 "Double up" all the bits of the graphics data
LD B, $04 ; $0096 and store in Video RAM
What do you mean “double up”? How is that logo encoded, exactly?
How big is your logo?
That scrolling logo is the result of the emulator’s display showing the visible tiles in the background map. The logo we see is 96×8 pixels, but in terms of Game Boy graphics, that means 12×2 tiles. Each of these has its own ID, so the contents of the memory where the background map resides goes something like this:
If you don’t count the blank background tile (which is already implicitely defined by all the zeroes the boot ROM writes to the video memory at startup) that’s 25 distinct tiles, each taking up 16 bytes in memory (two bytes per row of 8 pixels).
That’s a whopping 400 bytes! Quite a lot when you consider it’s contained in every ROM, and those could be as small as 32KB. In fact, the graphical data for the Nintendo logo is also stored in the boot ROM, which compares it with the one in the cartridge, allegedly for some obscure copyright protection reason.
400 bytes is way larger than the boot ROM itself1. How does that logo even fit in 256 bytes?
Well. It turns out that it’s not as big as we think.
Is your logo too small?
I’ll cut to the chase and put a link to the Pan Docs right away.
The chapter about the cartridge header tells us that the logo is stored at address 0x0104
and that it’s 48 bytes long.
48 bytes? How is it almost ten times smaller than the picture we see?
I’m glossing over a few things here, but this is how the logo looks like in the header, if you look at it from the right angle and squint a little:
Each of those delimited rectangles represents a byte in the cartridge header, the highest nibble being the top half, and the lowest nibble the bottom half. They are laid in some special order in memory, but basically every nonzero bit in that data corresponds to a black pixel. This is the smallest the logo can be made.
That smaller logo is only 48×8 pixels, though, so it will need to be made bigger indeed.
Do you want to enlarge your logo?
I was curious as to what method was used to scale up what is essentially a bitmap. Scaling images up is something I’ve done a lot in image-editing software but I’ve never really known how it was done internally.
I wondered how it was achieved in assembly, but the first time I looked into it, I only had the old annotated source code for the boot ROM, which didn’t have a lot of informative comments unlike some newer sources do.
I’ll show you the rest of that “graphic routine” I quoted earlier, just for fun:
; ==== Graphic routine ====
LD C, A ; $0095 "Double up" all the bits of the graphics data
LD B, $04 ; $0096 and store in Video RAM
Addr_0098:
PUSH BC ; $0098
RL C ; $0099
RLA ; $009b
POP BC ; $009c
RL C ; $009d
RLA ; $009f
DEC B ; $00a0
JR NZ, Addr_0098 ; $00a1
LD (HL+), A ; $00a3
INC HL ; $00a4
LD (HL+), A ; $00a5
INC HL ; $00a6
RET ; $00a7
Doesn’t look like much, does it? Barely fifteen instructions, most of these just moving data around or running a loop. In fact, the only part of the code that seems to actively modify any data are the calls to RL C
and RLA
2. Operations that shift bits left. Is that it?
The only thing RL
does to a register is to shift all its bits left once, and replace the rightmost bit with whatever the carry flag (bit 4 of F
) was. If you do just that repeatedly, you’ll cycle through all the bits in a register after 9 iterations.
Okay, but in the code we see two consecutive calls to RL
, using C
and A
respectively. This looks more interesting: the first rotation shifts the leftmost bit out of C
to the carry flag, and the second one shifts that carry value into the rightmost bit of A
.
And doing that twice, as the boot ROM code does, would indeed duplicate that leftmost bit. Repeating the whole thing effectively “doubles up” the four leftmost bits of C
. I can see how this could be part of the scaling algorithm, but that’s only part of it.
This really intrigued me, so at that point, I started going through the whole algorithm by hand.
Then I decided it was way too slow and boring, so I tried putting it in code form and turning that into an animation, which in the end only took about two years, six months and most of the last two hours, but wasn’t boring at all!
I’ll let you judge how instructive those two hours of rendering turned out to be — I sped most of it up.
Obviously this happens quite a lot faster in an actual Game Boy.
Was it worth it?
Honestly? I’m really glad with the result, and I’m happy I now understand how fifteen lines of assembly can scale up Game Boy graphics. I might not have bothered if I had read the more recent boot ROM disassembly, which explains it all much more concisely than I just did, though not necessarily always clearly either.
Also, as someone who’s had a lot of side projects across the years, I’m very happy to still be tinkering with this after six years!
And I promise I’ll post something lighter next time, like buggy screenshots because I’m too dumb to understand textures, or something.
Thanks for reading!
-
Even if the boot code only had to write half of these bytes, that would still represent about 78% of the boot ROM! ↩︎
-
This is not a typo, I did not forget a space. The
RLA
instruction (opcode0x17
) is included in the basic instruction set of the Game Boy’s CPU. There is also anRL A
instruction (opcode0xcb 0x17
) which, just likeRL C
(opcode0xcb 0x11
), is only available in the extended instruction set. TheRLA
instruction must have been used here to save space considering there is exactly just enough room in the boot ROM’s 256 bytes to fit all the code. As for why there are two flavors ofRLA
, I guess it’s related to why there are two instruction sets? Looks to me like the extended and non-extended versions overlap but I really have no idea. ↩︎