A photo of a magnifying glass held in front of the scaled up Nintendo logo on a computer screen.

Click here to get a bigger logo!

Wow. Has it been two years already? Oh well… Anyway!

So hey, you remember how the Nintendo logo would scroll down when you turned on a Game Boy — and how writing enough of an emulator to see it was kind of the point of most of this blog?

The Nintendo logo scrolling down.
This is what a semi-working emulator can do!

Last time, I mentioned that the logo was read from the cartridge and copied over to the Game Boy’s video RAM by some specific code inside the boot ROM.

I initially thought tile data was blindly copied over from the cartridge header to the background map, but the boot ROM’s annotated source code contained a somewhat cryptic comment:

; ==== Graphic routine ====

LD C, A     ; $0095  "Double up" all the bits of the graphics data
LD B, $04   ; $0096     and store in Video RAM

What do you mean “double up”? How is that logo encoded, exactly?

How big is your logo?

That scrolling logo is the result of the emulator’s display showing the visible tiles in the background map. The logo we see is 96×8 pixels, but in terms of Game Boy graphics, that means 12×2 tiles. Each of these has its own ID, so the contents of the memory where the background map resides goes something like this:

A grid of numbers showing the byte values from the game boy’s background map from memory address 0x9800 to 0x9a20. Most of them are zero for the background tiles, but 25 bytes have specific values for each tile that makes up the logo to display.
The Nintendo logo’s 24 tiles in the Background Map. There is a 25th ID for the ® tile.

If you don’t count the blank background tile (which is already implicitely defined by all the zeroes the boot ROM writes to the video memory at startup) that’s 25 distinct tiles, each taking up 16 bytes in memory (two bytes per row of 8 pixels).

That’s a whopping 400 bytes! Quite a lot when you consider it’s contained in every ROM, and those could be as small as 32KB. In fact, the graphical data for the Nintendo logo is also stored in the boot ROM, which compares it with the one in the cartridge, allegedly for some obscure copyright protection reason.

400 bytes is way larger than the boot ROM itself1. How does that logo even fit in 256 bytes?

Well. It turns out that it’s not as big as we think.

Is your logo too small?

I’ll cut to the chase and put a link to the Pan Docs right away.

The chapter about the cartridge header tells us that the logo is stored at address 0x0104 and that it’s 48 bytes long.

48 bytes? How is it almost ten times smaller than the picture we see?

I’m glossing over a few things here, but this is how the logo looks like in the header, if you look at it from the right angle and squint a little:

A zoomed-in picture of the Nintendo logo displayed by the Game Boy at startup, but the way it is stored in the cartridge, where each byte represents the top or bottom half of the final tile.

Each of those delimited rectangles represents a byte in the cartridge header, the highest nibble being the top half, and the lowest nibble the bottom half. They are laid in some special order in memory, but basically every nonzero bit in that data corresponds to a black pixel. This is the smallest the logo can be made.

That smaller logo is only 48×8 pixels, though, so it will need to be made bigger indeed.

Do you want to enlarge your logo?

I was curious as to what method was used to scale up what is essentially a bitmap. Scaling images up is something I’ve done a lot in image-editing software but I’ve never really known how it was done internally.

I wondered how it was achieved in assembly, but the first time I looked into it, I only had the old annotated source code for the boot ROM, which didn’t have a lot of informative comments unlike some newer sources do.

I’ll show you the rest of that “graphic routine” I quoted earlier, just for fun:

; ==== Graphic routine ====

  LD C, A           ; $0095  "Double up" all the bits of the graphics data
  LD B, $04         ; $0096     and store in Video RAM
Addr_0098:
  PUSH BC           ; $0098
  RL C              ; $0099
  RLA               ; $009b
  POP BC            ; $009c
  RL C              ; $009d
  RLA               ; $009f
  DEC B             ; $00a0
  JR NZ, Addr_0098  ; $00a1
  LD (HL+), A       ; $00a3
  INC HL            ; $00a4
  LD (HL+), A       ; $00a5
  INC HL            ; $00a6
  RET               ; $00a7

Doesn’t look like much, does it? Barely fifteen instructions, most of these just moving data around or running a loop. In fact, the only part of the code that seems to actively modify any data are the calls to RL C and RLA2. Operations that shift bits left. Is that it?

The only thing RL does to a register is to shift all its bits left once, and replace the rightmost bit with whatever the carry flag (bit 4 of F) was. If you do just that repeatedly, you’ll cycle through all the bits in a register after 9 iterations.

RL C at work in a loop. The register goes back to its initial value every 9 rotations.

Okay, but in the code we see two consecutive calls to RL, using C and A respectively. This looks more interesting: the first rotation shifts the leftmost bit out of C to the carry flag, and the second one shifts that carry value into the rightmost bit of A.

And doing that twice, as the boot ROM code does, would indeed duplicate that leftmost bit. Repeating the whole thing effectively “doubles up” the four leftmost bits of C. I can see how this could be part of the scaling algorithm, but that’s only part of it.

This really intrigued me, so at that point, I started going through the whole algorithm by hand.

Then I decided it was way too slow and boring, so I tried putting it in code form and turning that into an animation, which in the end only took about two years, six months and most of the last two hours, but wasn’t boring at all!

I’ll let you judge how instructive those two hours of rendering turned out to be — I sped most of it up.

The full logo scaling process, only using bit rotations and a stack to save the current value so its leftmost bit can be used twice.

Obviously this happens quite a lot faster in an actual Game Boy.

Was it worth it?

Honestly? I’m really glad with the result, and I’m happy I now understand how fifteen lines of assembly can scale up Game Boy graphics. I might not have bothered if I had read the more recent boot ROM disassembly, which explains it all much more concisely than I just did, though not necessarily always clearly either.

Also, as someone who’s had a lot of side projects across the years, I’m very happy to still be tinkering with this after six years!

And I promise I’ll post something lighter next time, like buggy screenshots because I’m too dumb to understand textures, or something.

Thanks for reading!


  1. Even if the boot code only had to write half of these bytes, that would still represent about 78% of the boot ROM! ↩︎

  2. This is not a typo, I did not forget a space. The RLA instruction (opcode 0x17) is included in the basic instruction set of the Game Boy’s CPU. There is also an RL A instruction (opcode 0xcb 0x17) which, just like RL C (opcode 0xcb 0x11), is only available in the extended instruction set. The RLA instruction must have been used here to save space considering there is exactly just enough room in the boot ROM’s 256 bytes to fit all the code. As for why there are two flavors of RLA, I guess it’s related to why there are two instruction sets? Looks to me like the extended and non-extended versions overlap but I really have no idea. ↩︎