RSS About
A diagram describing a square signal whose amplitude decreases every few steps and a caption indicating how many ticks a step lasts.

Writing an emulator: sound will do

We wrote some ersatz of an APU last time, it kinda works, but how do we fit it inside our existing code now?

We’ll take the example program exactly as we left it back in Scrolling at Last and see what we need:

For the sake of brevity, we’ll consider the first item as done. It mostly consisted in copy-pasting chunks of code from last time somewhere around the end of our example program. It’s also a good thing we’re about done because that one-file source is growing really big, but honestly, we have a (very) rough Game Boy emulator that fits in “only” 2000 lines of code, and that still feels sort of magic to me.

Now the real work is to rewrite our main() function. At the moment it’s doing the following:

It’s a small enough function that I’ll put it below as a reminder:

func main() {
    boot := NewBoot("./dmg-rom.bin")     // Covers 0x0000→0x00ff and 0xff50
    ram := NewRAM(0x8000, 0xffff-0x8000) // Covers 0x8000→0xffff

    // Create window and set is as the PPU's display.
    screen := NewSDL()
    ppu := NewPPU(screen) // Covers 0xff40, 0xff42, 0xff44 and 0xff47

    // MMU looking up addresses in boot ROM or BOOT register first,
    // then in the PPU, then in RAM, then in the cartridge (if any).
    mmu := MMU{[]Addressable{boot, ppu, ram}}

    // If a cartridge file is given as parameter, try to load it.
    if len(os.Args) == 2 {
        if cart := NewCartridge(os.Args[1]); cart != nil {
            mmu.Add(cart)
        }
    }

    ppu.Fetcher.mmu = &mmu
    cpu := CPU{mmu: mmu}

    fmt.Println("Press CTRL+C to quit...")

    for {
        // Now we have each component executing in parallel perform one tick's
        // worth of work in one iteration of our loop.
        cpu.Tick()
        ppu.Tick()
    }
}

That main function is doing both initialization and the main loop. We’re about to split that.

Main loop callback

The main loop, that endless for where we repeatedly call Tick() on all our components, should now be replaced by the audio callback we defined. In essence it only involves moving the cpu.Tick() and ppu.Tick() lines from the main function to the callback. However that also comes with its own problems.

For one, the callback itself has no knowledge of the cpu or ppu variables (it will also need access to apu). It’s only like three pointers, though, so the easiest way is to store them in global variables. To make that slightly cleaner, we’ll group these in a trivial, conveniently named structure:

// Quick and dirty gameboy structure just so we can access CPU, PPU and APU
// from the audio callback.
type GameBoy struct {
    CPU *CPU
    PPU *PPU
    APU *APU
}

// I have given up on using the callback's `data` pointer for now. We'll just
// make sure this is initialized before we enable audio playback.
var gb *GameBoy

There is a lot more we could do with that GameBoy structure but that’s left as an exercise to the reader for now. We only have to make sure it’s initialized in our main function before enabling the audio driver.

Apart from that, the audio callback will look a lot like it did in the last article, we’ve only added two lines of code (the Tick() calls we moved from the main function), moved all needed variables in that global gb structure, and I also renamed the callback function to be more accurate.

func mainLoopCallback(data unsafe.Pointer, buf *C.Uint8, len C.int) {
    // Type-conversion shenanigans.
    n := int(len)
    hdr := reflect.SliceHeader{Data: uintptr(unsafe.Pointer(buf)), Len: n, Cap: n}
    buffer := *(*[]C.Uint8)(unsafe.Pointer(&hdr))

    // Tick everything as many times as needed to fill the audio buffer. See
    // how we now actually tick the whole emulator (CPU, PPU and APU) but only
    // update the audio buffer when the APU itself produces a sample frame.
    for i := 0; i < n; {
        gb.CPU.Tick()
        gb.PPU.Tick()
        left, right, play := gb.APU.Tick()
        if play {
            buffer[i] = C.Uint8(left)
            buffer[i+1] = C.Uint8(right)
            i += 2
        }
    }
}

As the comments say, we now tick all components every time the audio callback is invoked, and only use the APU’s samples every so often.

Let’s add sound initialization to the main function so we can give it a try!

Initialize sound

We’ll just copy and paste all sound initialization code from last time, that’s all those calls to SDL’s API to open the audio device. Then we should just instantiate the APU and add it to the MMU so the boot ROM code, instead of our manual initialization from last time, will put the proper values in the APU’s registers to define sound frequencies and so on. This will also significantly reduce the amount of code to copy over.

func main() {
    boot := NewBoot("./dmg-rom.bin")     // Covers 0x0000→0x00ff and 0xff50
    ram := NewRAM(0x8000, 0xffff-0x8000) // Covers 0x8000→0xffff

    // Create window and set is as the PPU's display.
    screen := NewSDL()
    ppu := NewPPU(screen) // Covers 0xff40, 0xff42, 0xff44 and 0xff47

    // Audio Processing Unit that will generate samples.
    apu := NewAPU()

    // MMU looking up addresses in boot ROM or BOOT register first,
    // then in the PPU, the APU, in RAM, and then in the cartridge (if any).
    mmu := MMU{[]Addressable{boot, ppu, apu, ram}}

    // If a cartridge file is given as parameter, try to load it.
    if len(os.Args) == 2 {
        if cart := NewCartridge(os.Args[1]); cart != nil {
            mmu.Add(cart)
        }
    }

    ppu.Fetcher.mmu = &mmu
    cpu := &CPU{mmu: mmu}

    // Group all these components in a single structure that the audio callback
    // can access directly.
    gb = &GameBoy{cpu, ppu, apu}

    fmt.Println("Press CTRL+C to quit...")

    // AudioSpec structure with our audio parameters.
    spec := sdl.AudioSpec{
        Freq:     SamplingRate,
        Format:   sdl.AUDIO_U8,
        Channels: 2,
        Samples:  FramesPerBuffer,
        Callback: sdl.AudioCallback(C.mainLoopCallback),
    }

    // Initialize the Audio driver.
    if err := sdl.OpenAudio(&spec, nil); err != nil {
        panic(err)
    }

    // Enable audio playback and let the audio callback do all the work.
    sdl.PauseAudio(false)

    // Do nothing here, the callback will take care of everything.
    for {
        sdl.Delay(1000)
    }
}

You may have noticed that even though we removed all calls to Tick() from the endless loop at the end of the function, we kept the loop itself and just do nothing in there. If we didn’t do that, the main function would just exit there and the audio callback, which runs in another thread, would never have time to get called1.

Expose our APU in memory

The only thing missing now is to make the APU structure actually compatible with our Addressable interface, if we are going to let code from the boot ROM access it. Remember how we came up with a convenient way to map hardware register addresses to structure fields a while ago? This is going to come in handy now more than ever.

If you recall, we needed to embed the Registers type in our structure, and then initialize it at instantiation time. We can put that in a constructor function:

type APU struct {
    Registers

    Square1 SquareWave
    Square2 SquareWave

    ticks uint // Clock ticks counter for mixing samples
}

// NewAPU instantiates an APU structure and properly associates register
// addresses with the relevant structure members.
func NewAPU() *APU {
    // Pre-instantiate the APU object so we can refer to its registers.
    apu := APU{}

    // Associate addresses with the corresponding register variables.
    apu.Registers = Registers{
        0xff11: &apu.Square1.NR11,
        0xff12: &apu.Square1.NR12,
        0xff13: &apu.Square1.NR13,
        0xff14: &apu.Square1.NR14,
        0xff16: &apu.Square2.NR11,
        0xff17: &apu.Square2.NR12,
        0xff18: &apu.Square2.NR13,
        0xff19: &apu.Square2.NR14,
    }
    return &apu
}

That’s it! All reads/writes from/to those APU addresses will hit the proper variable now… should we just give it a try, just to see what it looks and sounds like?

(Spoiler: I already lowered the following video’s volume but you still probably want to turn the sound down a bit.)

Ooookay, so that’s definitely sound, we’ve got that at least, but not much else. Most disappointingly, we no longer see our scrolling logo and there’s that low buzz. On the bright side, there’s a definite, albeit endless “beep” at the end!

One of these issues is actually trivial. One stems from the fact we still haven’t implemented that volume envelope I glossed over last time. One is going to come and bite us at the very end of this article. And one is a tedious implementation detail I’ll get out of the way right now because it has frustrated me a lot and it’s still pretty much out of scope.

Threading issues

I originally didn’t plan (or want) to get into threading, but those articles have grown a lot more technical than how they started anyway. It’s still something of an implementation detail, once again due to the fact stuff happens in a callback invoked from a different thread than the main one. I’ll try and make it quick, if you don’t care about threads just skip to the end of this section.

The way I was taught threading (granted, that was a while ago), a process can execute code somewhat simultaneously in any number of sub-processes, or threads. Threads are supposed to share data, so if you change a global variable in one, the other should see the change2. So I sort of assumed that calling SDL code updating our window from the audio callback would just work.

Except it’s more subtle than that. To grossly oversimplify: some resources, like this renderer we had in our PPU, are not shared between threads, but must be used from what I’ve seen various documentation call the main thread — or UI thread or OS thread. Anyway, this all just means that whatever screen-related initialization we perform outside of the audio callback doesn’t carry over when we call SDL functions like Present() from the audio thread to repaint the window’s contents.

The solution is to somehow perform all display-related function calls from that main thread. Fortunately, the SDL bindings in Go come with a specific function for that exact purpose, so I integrated it in the code and I won’t be mentioning it here. I still left a few quick code comments to try and explain how and where we use it.

Let’s see how our program fares now, if we consider the threading issue solved.

This is… marginally better! At the very least it’s showing the combination of all our work so far. That buzz is still nasty though, and it’s the easiest thing to fix.

Have you tried turning it off?

Why is our emulator buzzing? Well, we never actually implemented a mechanism to enable or disable the output from the APU. In our last example, we played sound for the entire duration of the program’s execution, and it’s doing the same here, except we actually don’t want sound if the Game Boy code doesn’t enable it itself.

Right now, it’s just playing whatever is in the APU’s registers, and at the very beginning of our Game Boy startup, those registers are set to zero. If you recall the weird formula used to compute the generated frequency, this zero value translates to a 64Hz tone: that low buzzing.

The good news is that we only need to check a bit to know when sound should be enabled3:

FF14 – NR14 – Channel 1 Frequency hi (R/W)

Bit 7 - Initial (1=Restart Sound) (Write Only)

In reality, you’d want to catch writes to this address and trigger other things in the APU, but right now we can get away with adding this in our signal generator’s Tick() method:

func (s *SquareWave) Tick() (sample uint8) {
    // Only play if enabled.
    if s.NR14&0x80 == 0 {
        return
    }

    // ... rest of the method.

There! How does it sound now?

That’s another step closer! This is still too fast, we can’t really hear the first beep and the second one just never ends, but it’s still looking closer to the result we want.

But now, I guess it’s time… we need that volume envelope.

What’s a volume envelope?

Until now, the volume of the sound we produced was hardcoded into the program. The idea is to make that volume increase or decrease at a programmable rate. It sounded complicated to me at first, but now we have implemented an object that generates a square signal at a programmable frequency, it almost sounds like we already have all we need!

Basically, we need an object that will adjust a value every few ticks. Sounds familiar yet?

Finding details about the volume envelope in the Game Boy specifically is a little trickier. The official documentation only says:

FF12 – NR12 – Channel 1 Volume Envelope (R/W)

Bit 7-4 - Initial Volume of envelope (0-0Fh) (0=No Sound)
Bit 3   - Envelope Direction (0=Decrease, 1=Increase)
Bit 2-0 - Number of envelope sweep (n: 0-7)
          (If zero, stop envelope operation.)

Length of 1 step = n*(1/64) seconds

So we can imagine an object whose Tick() method will either increase or decrease an initial volume every step, and each step will last sweep×(1/64) seconds, that is sweep×(CPUfreq/64) ticks. It will look very similar to how we computed the length of duty steps.

A diagram describing a square signal whose amplitude decreases every few steps and a caption indicating how many ticks a step lasts.
A decreasing volume envelope starting from 15

This volume is represented by a 4-bit value so it ranges from 0 to 15. This is significantly lower than the hardcoded value we worked with so far, but we can simply apply an arbitrary factor to it. I picked 4, which will give us a final volume between 0 and 644.

The structure we’re about to write should feel familiar by now:

// VolumeEnvelope structure that will act as a state machine only managing the
// current volume envelope for a Square or Noise signal generator.
type VolumeEnvelope struct {
    Initial  uint8 // NRx2 bits 7-4
    Decrease bool  // NRx2 bit 3
    Sweep    uint8 // NRx2 bits 2-0

    Volume   uint8 // Current calculated volume.

    enabled bool
    ticks   uint // Clock ticks counter.
}

// Tick advances the volume envelope one step. It will adjust the volume value
// every <sweep>×(1/64) seconds or <sweep>×(<cpuFreq>/64) ticks.
func (v *VolumeEnvelope) Tick() {
    if !v.enabled {
        return
    }

    // Volume must always stay in the 0-15 range.
    if (v.Volume == 0 && v.Decrease) || (v.Volume == 15 && !v.Decrease) {
        v.enabled = false
        return
    }

    // Only update volume every <sweep>×(<cpuFreq>/64) clocks.
    v.ticks++
    if v.ticks < uint(v.Sweep)*(GameBoyRate/64) {
        return
    }
    v.ticks = 0

    if v.Decrease {
        v.Volume -= 1
    } else {
        v.Volume += 1
    }
}

You might recognize the way we count ticks to only do something at a given frequency, we did something similar with square wave generation and duty cycles. We then adjust the current volume based on the envelope’s direction. I also relied on a more detailed article on the Game Boy development wiki, which also describes what happens when the envelope reaches 0 or 15. In that case, the envelope’s volume is no longer updated “until the channel is triggered again.” But what does that even mean?

Audio channel trigger

That “trigger” operation is mentioned in most documentation about Game Boy sound and is what happens when an audio channel is enabled by setting bit 8 to 1 in NR14 or NR24 (or NR34 or NR44 for the other two generators we haven’t implemented). At this moment, the volume envelope’s internal state should be reset. One easy way to emulate that is to detect a write to that memory address and add some logic to it5.

We just need to override our APU’s Write() method where we still call the embedded Registers’ method, and perform a couple additional tasks. Since most sound generators work in a similar way there, we can add a specific method to them:

// SetNRx4 is called whenever the value in NR14 or NR24 was changed, so that we
// can reset the volume envelope and start sound output.
func (s *SquareWave) SetNRx4(value uint8) {
    if value&0x80 != 0 {
        s.envelope.Trigger()
        s.enabled = true
    }
}

While we are at it, I also added an enabled property to the SquareWave type, which we’ll use instead of that quick and dirty NR14 check at the beginning of Tick(). This new method should be called whenever the value in NR14 or NR24 changes:

// Write overrides Registers.Write to handle trigger events.
func (a *APU) Write(addr uint16, value uint8) {
    a.Registers.Write(addr, value)

    // After writing the value, see if we must enable or update something.
    switch addr {
    case 0xff14: // NR14
        a.Square1.SetNRx4(value) // Trigger channel
    case 0xff19: // NR24
        a.Square2.SetNRx4(value) // Trigger channel
    }
}

And that’s it! Our MMU will call the APU’s Write() method whenever a memory write to any of the audio registers occurs, let the Registers type do its thing and, in addition, trigger the relevant sound generator if needed.

However, triggering our volume envelope won’t do us much good right now if all its properties are zero.

Initialize and use the envelope

For that matter, the envelope object’s properties also need to be initialized whenever the NR12 or NR22 register is written to, as we need to parse an 8-bit value into two integers and a boolean. We can use the exact same idea and add another new method to the SquareWave type:

// SetNRx2 is called whenever the value in NR12 or NR22 was changed, so that we
// can update the volume envelope's state machine.
func (s *SquareWave) SetNRx2(value uint8) {
    s.envelope.Initial = value >> 4
    s.envelope.Decrease = (value&0x08 == 0)
    s.envelope.Sweep = value & 7
}

Then add another override for the memory addresses corresponding to NR12 and NR22:

// Write overrides Registers.Write to handle trigger events.
func (a *APU) Write(addr uint16, value uint8) {
    a.Registers.Write(addr, value)

    // After writing the value, see if we must enable or update something.
    switch addr {
    case 0xff12: // NR12
        a.Square1.SetNRx2(value) // Update envelope
    case 0xff14: // NR14
        a.Square1.SetNRx4(value) // Trigger channel
    case 0xff17: // NR22
        a.Square2.SetNRx2(value) // Update envelope
    case 0xff19: // NR24
        a.Square2.SetNRx4(value) // Trigger channel
    }
}

Now, everything is ready to integrate the volume envelope to our square wave generator. There are just a couple additional things we need to do in the generator’s Tick() method:

// Tick produces a sample of the signal to generate based on the current value
// in the signal generator's registers. We use a named return value, which is
// conveniently set to zero (silence) by default.
func (s *SquareWave) Tick() (sample uint8) {
    // Only play if enabled.
    if !s.enabled {
        return
    }

    // Update volume envelope.
    s.envelope.Tick()

    // With `x` the 11-bit value in NR13/NR14, frequency is 131072/(2048-x) Hz.
    rawFreq := ((uint(s.NR14) & 7) << 8) | uint(s.NR13)
    freq := 131072 / (2048 - rawFreq)

    // Advance duty step every 1/(8f) where f is the sound's real frequency.
    if s.ticks++; s.ticks >= GameBoyRate/(freq*8) {
        s.dutyStep = (s.dutyStep + 1) % 8
        s.ticks = 0
    }

    // Use envelope's current volume (and an amplification factor).
    if DutyCycles[s.dutyType][s.dutyStep] {
        sample = s.envelope.Volume * 4
    }

    return
}

I don’t know about you, but at this point I’m really hoping this works and that we can consider we’re done with sound for the time being. It’s been work but we did get sound and pixels at the same time, and we finally wrote that volume envelope I have been putting off for the last couple articles.

Deep breath…

Heeeey, it doesn’t sound all that bad now, does it? We went from a beep to a ding!

Unfortunately, that was only a single ding, because that bar scrolled down too fast for us to hear the first note. Remember the issue I said was going to come bite us at the end of this article?

Back to timings

As far as I can tell, the sound part works. At least well enough, anyway. Sound is complicated.

No, the final issue is that, about five articles ago when we started implementing the PPU, I put the CPU’s whole logic in its own Tick() method, then said we’d implement its specific state machine in a future article. Well, here we are!

What’s happening right now is that our emulated CPU is executing machine instructions way too fast compared to the PPU. I’m still not entirely sure how, but I believe it affects the scrolling code in the boot ROM, which counts frames to wait and execute at the proper speed.

So this is what we’ll have to do in the next article, and then I’ll call it and say we’re done.

And for the time being, sound will do!

Thank you for reading.

References

You can download the example program above and run it anywhere from the command line:

$ go run sound-will-do.go

It expects a dmg-rom.bin file to be present in the same folder. Note that it might take a little while the first time you run the program as it will need to build the SDL libraries.

You can also download the same example ROM from last time and run it from the command line:

$ go run sound-will-do.go cartridge.gb

At last, you can substitute cartridge.gb with the path to any GB ROM you have and see what happens!


  1. There are other ways we could achieve that. Goholint uses a Go channel that effectively acts as a blocking socket waiting for the audio callback to send a notification that the program should stop. We could also still keep the main loop logic in there if we used another method to send samples to the audio card that doesn’t involve a callback from another thread. ↩︎

  2. Which implies a lot of funny issues related to synchronizing concurrent accesses to a same variable, race conditions, thread-safety… no really, I don’t want to get into threading. ↩︎

  3. This approach will prove too simplistic even before the end of this article, but we’ll come back to it later. ↩︎

  4. That’s 25% of the maximal value for a sample, which sounds reasonably loud. You can easily adjust that factor to your liking in the code, and one interesting exercise would be to make it either configurable or, even better, something the user could adjust with a keypress. This is how I plan to do it in Goholint someday. ↩︎

  5. I’ll also slightly reword things to refer to NRx4 since the same method will be used for our two generators and I don’t want to have to do anything special to distinguish NR14 and NR24. ↩︎