APU DMC: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
(bbs ref for counter not changed properly)
m (The CPU vs APU clock rate was reversed.)
(28 intermediate revisions by 8 users not shown)
Line 1: Line 1:
[[Category:APU]]
[[Category:APU]]
The [[APU|NES APU's]] delta modulation channel (DMC) can output 1-bit [[wikipedia:Delta modulation|delta-encoded samples]] or can have its 7-bit counter directly loaded, allowing flexible manual sample playback.
The [[APU|NES APU's]] delta modulation channel (DMC) can output 1-bit [[wikipedia:Delta modulation|delta-encoded samples]] or can have its 7-bit counter directly loaded, allowing flexible manual sample playback.


The DMC channel contains the following: memory reader, interrupt flag, sample buffer, [[APU Misc|timer]], output unit, 7-bit counter.
== Overview ==
 
The DMC channel contains the following: memory reader, interrupt flag, sample buffer, [[APU Misc|timer]], output unit, 7-bit output level with up and down counter.


<pre>
<pre>
                        Timer
                        Timer
                          |
                          |
                          v
                          v
Reader ---> Buffer ---> Output ---> Counter ---> (to the mixer)
Reader ---> Buffer ---> Shifter ---> Output level ---> (to the mixer)
</pre>
</pre>


{| class='tabular'
{| class="wikitable"
| '''$4010''' || <tt>IL--.FFFF</tt> || '''Flags and frequency''' (write)
| '''$4010''' || <tt>IL--.RRRR</tt> || '''Flags and Rate''' (write)
|-
|-
| bit 7 || <tt>I--- ----</tt> || IRQ enabled flag. If clear, the interrupt flag is cleared.
| bit 7 || <tt>I---.----</tt> || IRQ enabled flag. If clear, the interrupt flag is cleared.
|-
|-
| bit 6 || <tt>-L-- ----</tt> || Loop flag
| bit 6 || <tt>-L--.----</tt> || Loop flag
|-
|-
| bits 3-0 || <tt>---- RRRR</tt> || Rate index<br>
| bits 3-0 || <tt>----.RRRR</tt> || Rate index<br>
<!-- If you modify this table, keep the values comma-separated so they can be used without changes in a program -->
<!-- If you modify this table, keep the values comma-separated so they can be used without changes in a program -->
<pre>
<pre>
Line 26: Line 29:
PAL  398, 354, 316, 298, 276, 236, 210, 198, 176, 148, 132, 118,  98,  78,  66,  50
PAL  398, 354, 316, 298, 276, 236, 210, 198, 176, 148, 132, 118,  98,  78,  66,  50
</pre>
</pre>
The rate determines for how many CPU cycles happen between changes in the output level during automatic delta-encoded sample playback. For example, on NTSC (1.789773 MHz), a rate of 428 gives a frequency of 1789773/428 Hz = 4181.71 Hz. These periods are all even numbers because there are 2 CPU cycles in an APU cycle. A rate of 428 means the output level changes every 214 APU cycles.
|-
|-
|colspan=3| &nbsp;
|colspan=3| &nbsp;
Line 31: Line 36:
| '''$4011''' || <tt>-DDD.DDDD</tt> || '''Direct load''' (write)
| '''$4011''' || <tt>-DDD.DDDD</tt> || '''Direct load''' (write)
|-
|-
| bits 6-0 || <tt>-DDD DDDD</tt> || The counter is loaded with D. If the timer is outputting a clock at the same time, the counter is occasionally not changed properly.[http://forums.nesdev.org/viewtopic.php?p=104491#p104491]
| bits 6-0 || <tt>-DDD.DDDD</tt> || The DMC output level is set to D, an unsigned value. If the timer is outputting a clock at the same time, the output level is occasionally not changed properly.[http://forums.nesdev.org/viewtopic.php?p=104491#p104491]
|-
|-
|colspan=3| &nbsp;
|colspan=3| &nbsp;
Line 37: Line 42:
| '''$4012''' || <tt>AAAA.AAAA</tt> || '''Sample address''' (write)
| '''$4012''' || <tt>AAAA.AAAA</tt> || '''Sample address''' (write)
|-
|-
| bits 7-0 || <tt>AAAA AAAA</tt> || Sample address = <tt>%11AAAAAA.AA000000</tt>
| bits 7-0 || <tt>AAAA.AAAA</tt> || Sample address = <tt>%11AAAAAA.AA000000</tt> = <tt>$C000 + (A * 64)</tt>
|-
|-
|colspan=3| &nbsp;
|colspan=3| &nbsp;
Line 43: Line 48:
| '''$4013''' || <tt>LLLL.LLLL</tt> || '''Sample length''' (write)
| '''$4013''' || <tt>LLLL.LLLL</tt> || '''Sample length''' (write)
|-
|-
| bits 7-0 || <tt>LLLL LLLL</tt> || Sample length = <tt>%LLLL.LLLL0001</tt>
| style="white-space: nowrap;" | bits 7-0 || <tt>LLLL.LLLL</tt> || Sample length = <tt>%LLLL.LLLL0001</tt> = <tt>(L * 16) + 1 bytes</tt>
|}
|}


The counter's value is sent to the [[APU Mixer|mixer]]. It is loaded with 0 on power-up.
The output level is sent to the [[APU Mixer|mixer]] whether the channel is enabled or not. It is loaded with 0 on power-up, and can be updated by $4011 writes and delta-encoded sample playback.


Automatic 1-bit [[wikipedia:Delta modulation|delta-encoded sample]] playback is carried out by a combination of three units. The ''memory reader'' fills the 8-bit ''sample buffer'' whenever it is emptied by the sample ''output unit''. The [[APU Status|status register]] is used to start and stop automatic sample playback.
Automatic 1-bit [[wikipedia:Delta modulation|delta-encoded sample]] playback is carried out by a combination of three units. The ''memory reader'' fills the 8-bit ''sample buffer'' whenever it is emptied by the sample ''output unit''. The [[APU Status|status register]] is used to start and stop automatic sample playback.
Line 53: Line 58:


=== Pitch table ===
=== Pitch table ===
{| class="tabular"
 
{| class="wikitable"
|-
|-
|
|
Line 79: Line 85:
|  $4
|  $4
| bgcolor=FFDDDD | $11E  ||  6257.95 Hz  || G-8  -3.86c
| bgcolor=FFDDDD | $11E  ||  6257.95 Hz  || G-8  -3.86c
| bgcolor=DDDDFF | $114  ||  6023.94 Hz  || F#+30.2c
| bgcolor=DDDDFF | $114  ||  6023.94 Hz  || G--69.8c
|-
|-
|  $5
|  $5
Line 126: Line 132:
|}
|}
(Deviation from note is given in cents, which are defined as 1/100 of a semitone.)
(Deviation from note is given in cents, which are defined as 1/100 of a semitone.)
Note that on PAL systems, the pitches at $4 and $C appear to be incorrect with respect to their intended A-440 tuning scheme<ref>[http://forums.nesdev.org/viewtopic.php?p=94079#p94079 Forum post]: PAL DPCM frequency table contains 2 errors.</ref>.


=== Memory reader ===
=== Memory reader ===
Line 133: Line 141:
When a sample is (re)started, the current address is set to the sample address, and bytes remaining is set to the sample length.
When a sample is (re)started, the current address is set to the sample address, and bytes remaining is set to the sample length.


Any time the sample buffer is in an empty state and bytes remaining is not zero, the following occur:
Any time the sample buffer is in an empty state and bytes remaining is not zero (including just after a write to $4015 that enables the channel, regardless of where that write occurs relative to the bit counter mentioned below), the following occur:


* The [[CPU|CPU]] is stalled for [http://forums.nesdev.org/viewtopic.php?p=62690#p62690 up to 4 CPU cycles] to allow the longest possible write (the return address and write after an IRQ) to finish. If [[PPU OAM#DMA|OAM DMA]] is in progress, it is paused for two cycles.[http://forums.nesdev.org/viewtopic.php?p=95703#95703]
* The [[CPU]] is stalled for up to 4 CPU cycles<ref>[//forums.nesdev.org/viewtopic.php?p=62690#p62690 Forum post:] Blargg's DMA tests</ref> to allow the longest possible write (the return address and write after an IRQ) to finish. If [[PPU OAM#DMA|OAM DMA]] is in progress, it is paused for two cycles.<ref>[//forums.nesdev.org/viewtopic.php?p=95703#95703 Forum post:] cpow's Visual 2A03 DMC vs OAM DMA analysis</ref> The sample fetch always occurs on an even CPU cycle due to its alignment with the APU. Specific delay cases:
* The sample buffer is filled with the next sample byte read from the current address, subject to whatever [[MMC|mapping hardware]] present.
** 4 cycles if it falls on a CPU read cycle.
** 3 cycles if it falls on a single CPU write cycle (or the second write of a double CPU write).
** 4 cycles if it falls on the first write of a double CPU write cycle.<ref>[//forums.nesdev.org/viewtopic.php?p=231604#p231604 Forum post:] Fiskbit's aligned controller read test</ref>
** 2 cycles if it occurs during an OAM DMA, or on the $4014 write cycle that triggers the OAM DMA.
** 1 cycle if it occurs on the second-last OAM DMA cycle.
** 3 cycles if it occurs on the last OAM DMA cycle.
* The sample buffer is filled with the next sample byte read from the current address, subject to whatever [[MMC|mapping hardware]] is present.
* The address is incremented; if it exceeds $FFFF, it is wrapped around to $8000.
* The address is incremented; if it exceeds $FFFF, it is wrapped around to $8000.
* The bytes remaining counter is decremented; if it becomes zero and the loop flag is set, the sample is restarted (see above); otherwise, if the bytes remaining counter becomes zero and the IRQ enabled flag is set, the interrupt flag is set.
* The bytes remaining counter is decremented; if it becomes zero and the loop flag is set, the sample is restarted (see above); otherwise, if the bytes remaining counter becomes zero and the IRQ enabled flag is set, the interrupt flag is set.
Line 145: Line 159:
=== Output unit ===
=== Output unit ===


The output unit continuously outputs a 7-bit value to the [[APU Mixer|mixer]]. It contains an 8-bit right shift register, a bits-remaining counter, a 7-bit delta-counter, and a silence flag.
The output unit continuously outputs a 7-bit value to the [[APU Mixer|mixer]]. It contains an 8-bit right shift register, a bits-remaining counter, a 7-bit output level (the same one that can be loaded directly via $4011), and a silence flag.
 
The bits-remaining counter is updated whenever the [[APU Misc|timer]] outputs a clock, regardless of whether a sample is currently playing. When this counter reaches zero, we say that the output cycle ends. The DPCM unit can only transition from silent to playing at the end of an output cycle.


When an output cycle ends, a new cycle is started as follows:
When an output cycle ends, a new cycle is started as follows:
Line 151: Line 167:
* If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.
* If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.


When the [[APU Misc|timer]] outputs a clock, the following actions occur in order:
When the timer outputs a clock, the following actions occur in order:
# If the silence flag is clear, bit 0 of the shift register is applied to the counter as follows: if bit 0 is clear and the delta-counter is greater than 1, the counter is decremented by 2; otherwise, if bit 0 is set and the delta-counter is less than 126, the counter is incremented by 2.
# If the silence flag is clear, the output level changes based on bit 0 of the shift register. If the bit is 1, add 2; otherwise, subtract 2. But if adding or subtracting 2 would cause the output level to leave the 0-127 range, leave the output level unchanged. This means subtract 2 only if the current level is at least 2, or add 2 only if the current level is at most 125.
# The right shift register is clocked.
# The right shift register is clocked.
# The bits-remaining counter is decremented. If it becomes zero, a new cycle is started.
# As stated above, the bits-remaining counter is decremented. If it becomes zero, a new output cycle is started.


''Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.''
''Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.''
== Conflict with controller and PPU read ==
On the NTSC NES and Famicom, if a new sample byte is fetched from memory at the same time the program is reading the [[Standard controller|controller]] through $4016/4017, a conflict occurs corrupting the data read from the controller. Programs which use DPCM sample playback will normally use a redundant [[Controller Reading|controller read]] routine to work around this defect.
A similar problem occurs when reading data from the PPU through $2007, or polling $2002 for vblank.


=== Likely internal implementation of the read ===
=== Likely internal implementation of the read ===
Line 164: Line 186:
The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.
The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.


This matters, because on NTSC NES and Famicom, it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.
This matters because on NTSC NES and Famicom, it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.


For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.
For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.
Line 172: Line 194:
== Usage of DMC for syncing to video ==
== Usage of DMC for syncing to video ==


The following method is discussed [http://forums.nesdev.org/viewtopic.php?t=6521 here]
DMC IRQs can be used for timed video operations. The following method was discussed on the forum in 2010.<ref>[http://forums.nesdev.org/viewtopic.php?t=6521 Forum thread]: DMC IRQ as a video timer.</ref>


=== Concept ===
=== Concept ===
Line 180: Line 202:
However, the DMC channel can hypothetically be used for syncing with video instead of using it for sound. Unfortunately it's a bit complicated, but used correctly, it can function as a crude scanline counter, eliminating the need for an advanced mapper.
However, the DMC channel can hypothetically be used for syncing with video instead of using it for sound. Unfortunately it's a bit complicated, but used correctly, it can function as a crude scanline counter, eliminating the need for an advanced mapper.


The DMC's timing is completely separate from the video (obviously, it being an audio component). The DMC's timer is always running, and samples can only start every 8 clocks. However, because the DMC's timer isn't synchronized to the PPU in any way, these 8-clock boundaries occur on different scanlines each frame.
The DMC's timing is completely separate from the video. The DMC's timer is always running, and samples can only start every 8 clock cycles. However, because the DMC's timer isn't synchronized to the PPU in any way, these 8-clock boundaries occur on different scanlines each frame.


To achieve stable timing, this is what you do:
Here are the steps to achieve stable timing:


* At a fixed time in rendering (we'll use the start of vblank as an example), start a dummy single-byte sample at rate $F. Due to a hardware quirk*, the sample needs to be started three times in a row like this:
* At a fixed point in video rendering (we'll use the start of vblank as an example), a dummy single-byte sample at rate $F is started. Due to a hardware quirk†, the sample needs to be started three times in a row like this:


  sei
<pre>
  lda #$10  
sei
  sta $4015  
lda #$10  
  sta $4015  
sta $4015  
  sta $4015  
sta $4015  
  cli
sta $4015  
cli
</pre>


* Use any method to count how many cycles pass before the DMC IRQ happens (you can use an actual IRQ, or poll $4015).
* The amount of cycles before a DMC IRQ happens is then measured (either using an actual IRQ, or by polling $4015).
** At rate $F, there are 54 cpu cycles between clocks, so there are 432 cpu cycles ((432 * 3) / 341 = about 3.8 scanlines) between boundaries. Your measurement - 432 = the amount of time you need to wait after a DMC IRQ, in order to synchronize with vblank.
** At rate $F, there are 54 CPU cycles between clocks, so there are 432 CPU cycles (432 × 3 ÷ 341 = about 3.8 scanlines) between boundaries.
* Start the actual sample that will be used for the timing.
* The main sample that will be used for the timing is then started (please refer to the table below to have sample lengths for various waiting times)
* When the IRQ happens, use your measurement from before to delay your program, and you'll synchronize the interrupt to vblank.
* When the main IRQ happens, the measurement from before is retrieved, and a timing loop with variable delay is used. In order to synchronize with vblank, after a DMC IRQ we should wait 432 CPU cycles minus the time we measured.
'''Note:''' The hardware quirk mentioned above deals with how DMC IRQs are generated. Basically, the IRQ is generated when the last '''byte''' of the sample is '''read''', '''not''' when the last ''sample'' of the sample ''plays''. The sample buffer sometimes has enough time to empty itself between your writes to $4015, meaning your next write to $4015 will trigger an immediate IRQ. Fortunately, writing to $4015 three times will avoid this issue.
'''Note:''' The hardware quirk mentioned above deals with how DMC IRQs are generated. Basically, the IRQ is generated when the last '''byte''' of the sample is '''read''', '''not''' when the last ''sample'' of the sample ''plays''. The sample buffer sometimes has enough time to empty itself between writes to $4015, meaning your next write to $4015 will trigger an immediate IRQ. Fortunately, writing to $4015 three times will avoid this issue.


Still using vblank as an example, the measurement you take will tell you how far into the 8-clock boundary vblank occured, and by delaying after a DMC IRQ, you can perform a raster effect at the same point within the 8-clock boundary, aligning it with vblank. If you perform this same method each frame, your raster effect will have a reasonably stable timing to it. As a bonus, since you're mostly using IRQs, the CPU is free to do something else, instead of waiting in a timed loop.
Still using vblank as an example, the measurement tells how far into the 8-clock boundary vblank occurred, and by delaying after a DMC IRQ, we perform a raster effect at the same point within the 8-clock boundary, aligning it with vblank. By performing this same method each frame, the raster effect will have a reasonably stable timing to it. As a bonus, since mostly using IRQs are being used, the CPU is free to do something else, instead of waiting in a timed loop.


It's possible to use more than one IRQ per frame - but you need to do the ''measurement'' part at the ''same time'' within each frame, before you use any of the IRQs.
It's possible to use more than one IRQ per frame - but the ''measurement'' part needs to be done at the ''same time'' within each frame, before the usage of any IRQ.


You can only have one split-point per IRQ, with the shortest IRQ being 3.8 scanlines. If you want them closer together, you must use timed code.
Only a single split-point per IRQ is possible, with the shortest IRQ being 3.8 scanlines. For split points closer than this amount, timed code has to be used.


You should also make sure your samples are made up of all $00 bytes, and that you've written $00 to $4011 at some point. Otherwise, you might unintentionally create audio. This ''is'' a sound channel, after all.
In order to remain silent, samples should be made up of all $00 bytes, and $00 should have been previously written to $4011. Otherwise, audio will unintentionally be created. This ''is'' a sound channel, after all.


=== Timing table ===
=== Timing table ===
Line 211: Line 235:
This table converts sample length in scanline length (all values are rounded to the higher integer).
This table converts sample length in scanline length (all values are rounded to the higher integer).


NTSC              Rate  
<pre>
Length              $0    $1  $2  $3  $4  $5  $6  $7  $8  $9  $a  $b  $c  $d  $e  $f  
NTSC              Rate  
----------------------------------------------------------------------------------------------------  
Length              $0    $1  $2  $3  $4  $5  $6  $7  $8  $9  $a  $b  $c  $d  $e  $f  
1-byte (8 bits)    31    27  24  23  21  18  16  16  14  12  10  10  8    6    6    4  
----------------------------------------------------------------------------------------------------  
17-byte (136 bits)  **    **  **  **  **  **  **  **  228  192  170  154  127  101  87  65  
1-byte (8 bits)    31    27  24  23  21  18  16  16  14  12  10  10  8    6    6    4  
33-byte (264 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  196  168  126  
17-byte (136 bits)  **    **  **  **  **  **  **  **  228  192  170  154  127  101  87  65  
49-byte (392 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  **  **  187  
33-byte (264 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  196  168  126  
49-byte (392 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  **  **  187  


PAL                Rate  
PAL                Rate  
Length              $0    $1  $2  $3  $4  $5  $6  $7  $8  $9  $a  $b  $c  $d  $e  $f  
Length              $0    $1  $2  $3  $4  $5  $6  $7  $8  $9  $a  $b  $c  $d  $e  $f  
----------------------------------------------------------------------------------------------------  
----------------------------------------------------------------------------------------------------  
1-byte (8 bits)    30    27  24  23  21  18  16  15  14  12  10  9    8    6    5    4  
1-byte (8 bits)    30    27  24  23  21  18  16  15  14  12  10  9    8    6    5    4  
17-byte (136 bits)  **    **  **  **  **  **  **  **  225  189  169  151  126  100  85  64  
17-byte (136 bits)  **    **  **  **  **  **  **  **  225  189  169  151  126  100  85  64  
33-byte (264 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  194  164  124  
33-byte (264 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  194  164  124  
49-byte (392 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  **  **  184
49-byte (392 bits)  **    **  **  **  **  **  **  **  **  **  **  **  **  **  **  184
</pre>


=== Number of scanlines to wait table ===
=== Number of scanlines to wait table ===
Line 233: Line 259:
Because a PAL interrupt will always happen about the same time or a bit sooner than a NTSC interrupt, the NTSC table will be used to set the "best" setting here :
Because a PAL interrupt will always happen about the same time or a bit sooner than a NTSC interrupt, the NTSC table will be used to set the "best" setting here :


Scanlines  Best opt. for IRQ  
<pre>
Scanlines  Best opt. for IRQ  
   
   
1-3        Timed code  
1-3        Timed code  
4-5        Length $0, rate $f  
4-5        Length $0, rate $f  
6-7        Length $0, rate $d  
6-7        Length $0, rate $d  
8-9        Length $0, rate $c  
8-9        Length $0, rate $c  
10-11      Length $0, rate $a  
10-11      Length $0, rate $a  
12-13      Length $0, rate $9  
12-13      Length $0, rate $9  
14-15      Length $0, rate $8  
14-15      Length $0, rate $8  
16-17      Length $0, rate $6  
16-17      Length $0, rate $6  
18-20      Length $0, rate $5  
18-20      Length $0, rate $5  
21-22      Length $0, rate $4  
21-22      Length $0, rate $4  
23        Length $0, rate $3  
23        Length $0, rate $3  
24-26      Length $0, rate $2  
24-26      Length $0, rate $2  
27-30      Length $0, rate $1  
27-30      Length $0, rate $1  
31-64      Length $0, rate $0  
31-64      Length $0, rate $0  
65-86      Length $1, rate $f  
65-86      Length $1, rate $f  
87-100    Length $1, rate $e  
87-100    Length $1, rate $e  
101-125    Length $1, rate $d  
101-125    Length $1, rate $d  
126        Length $2, rate $f  
126        Length $2, rate $f  
127-153    Length $1, rate $c  
127-153    Length $1, rate $c  
154-167    Length $1, rate $b  
154-167    Length $1, rate $b  
168-169    Length $2, rate $e  
168-169    Length $2, rate $e  
170-186    Length $1, rate $a  
170-186    Length $1, rate $a  
187-191    Length $3, rate $f  
187-191    Length $3, rate $f  
192-195    Length $1, rate $9  
192-195    Length $1, rate $9  
196-227    Length $2, rate $d  
196-227    Length $2, rate $d  
228-239    Length $1, rate $8
228-239    Length $1, rate $8
</pre>
 
== References ==
<references />

Revision as of 22:13, 14 May 2020


The NES APU's delta modulation channel (DMC) can output 1-bit delta-encoded samples or can have its 7-bit counter directly loaded, allowing flexible manual sample playback.

Overview

The DMC channel contains the following: memory reader, interrupt flag, sample buffer, timer, output unit, 7-bit output level with up and down counter.

                         Timer
                           |
                           v
Reader ---> Buffer ---> Shifter ---> Output level ---> (to the mixer)
$4010 IL--.RRRR Flags and Rate (write)
bit 7 I---.---- IRQ enabled flag. If clear, the interrupt flag is cleared.
bit 6 -L--.---- Loop flag
bits 3-0 ----.RRRR Rate index
Rate   $0   $1   $2   $3   $4   $5   $6   $7   $8   $9   $A   $B   $C   $D   $E   $F
      ------------------------------------------------------------------------------
NTSC  428, 380, 340, 320, 286, 254, 226, 214, 190, 160, 142, 128, 106,  84,  72,  54
PAL   398, 354, 316, 298, 276, 236, 210, 198, 176, 148, 132, 118,  98,  78,  66,  50

The rate determines for how many CPU cycles happen between changes in the output level during automatic delta-encoded sample playback. For example, on NTSC (1.789773 MHz), a rate of 428 gives a frequency of 1789773/428 Hz = 4181.71 Hz. These periods are all even numbers because there are 2 CPU cycles in an APU cycle. A rate of 428 means the output level changes every 214 APU cycles.

 
$4011 -DDD.DDDD Direct load (write)
bits 6-0 -DDD.DDDD The DMC output level is set to D, an unsigned value. If the timer is outputting a clock at the same time, the output level is occasionally not changed properly.[1]
 
$4012 AAAA.AAAA Sample address (write)
bits 7-0 AAAA.AAAA Sample address = %11AAAAAA.AA000000 = $C000 + (A * 64)
 
$4013 LLLL.LLLL Sample length (write)
bits 7-0 LLLL.LLLL Sample length = %LLLL.LLLL0001 = (L * 16) + 1 bytes

The output level is sent to the mixer whether the channel is enabled or not. It is loaded with 0 on power-up, and can be updated by $4011 writes and delta-encoded sample playback.

Automatic 1-bit delta-encoded sample playback is carried out by a combination of three units. The memory reader fills the 8-bit sample buffer whenever it is emptied by the sample output unit. The status register is used to start and stop automatic sample playback.

The sample buffer either holds a single 8-bit sample byte or is empty. It is filled by the reader and can only be emptied by the output unit; once loaded with a sample byte it will be played back.

Pitch table

NTSC PAL
$4010 Period Frequency Note Period Frequency Note
$0 $1AC 4181.71 Hz C-8 -1.78c $18E 4177.40 Hz C-8 -3.56c
$1 $17C 4709.93 Hz D-8 +4.16c $162 4696.63 Hz D-8 -.739c
$2 $154 5264.04 Hz E-8 -3.29c $13C 5261.41 Hz E-8 -4.15c
$3 $140 5593.04 Hz F-8 +1.67c $12A 5579.22 Hz F-8 -2.61c
$4 $11E 6257.95 Hz G-8 -3.86c $114 6023.94 Hz G-8 -69.8c
$5 $0FE 7046.35 Hz A-8 +1.56c $0EC 7044.94 Hz A-8 +1.22c
$6 $0E2 7919.35 Hz B-8 +3.77c $0D2 7917.18 Hz B-8 +3.29c
$7 $0D6 8363.42 Hz C-9 -1.78c $0C6 8397.01 Hz C-9 +5.16c
$8 $0BE 9419.86 Hz D-9 +4.16c $0B0 9446.63 Hz D-9 +9.07c
$9 $0A0 11186.1 Hz F-9 +1.67c $094 11233.8 Hz F-9 +9.04c
$A $08E 12604.0 Hz G-9 +8.29c $084 12595.5 Hz G-9 +7.11c
$B $080 13982.6 Hz A-9 -12.0c $076 14089.9 Hz A-9 +1.22c
$C $06A 16884.6 Hz C-10 +14.5c $062 16965.4 Hz C-10 +22.7c
$D $054 21306.8 Hz E-10 +17.2c $04E 21315.5 Hz E-10 +17.9c
$E $048 24858.0 Hz G-10 -15.9c $042 25191.0 Hz G-10 +7.11c
$F $036 33143.9 Hz C-11 -17.9c $032 33252.1 Hz C-11 -12.2c

(Deviation from note is given in cents, which are defined as 1/100 of a semitone.)

Note that on PAL systems, the pitches at $4 and $C appear to be incorrect with respect to their intended A-440 tuning scheme[1].

Memory reader

When the sample buffer is emptied, the memory reader fills the sample buffer with the next byte from the currently playing sample. It has an address counter and a bytes remaining counter.

When a sample is (re)started, the current address is set to the sample address, and bytes remaining is set to the sample length.

Any time the sample buffer is in an empty state and bytes remaining is not zero (including just after a write to $4015 that enables the channel, regardless of where that write occurs relative to the bit counter mentioned below), the following occur:

  • The CPU is stalled for up to 4 CPU cycles[2] to allow the longest possible write (the return address and write after an IRQ) to finish. If OAM DMA is in progress, it is paused for two cycles.[3] The sample fetch always occurs on an even CPU cycle due to its alignment with the APU. Specific delay cases:
    • 4 cycles if it falls on a CPU read cycle.
    • 3 cycles if it falls on a single CPU write cycle (or the second write of a double CPU write).
    • 4 cycles if it falls on the first write of a double CPU write cycle.[4]
    • 2 cycles if it occurs during an OAM DMA, or on the $4014 write cycle that triggers the OAM DMA.
    • 1 cycle if it occurs on the second-last OAM DMA cycle.
    • 3 cycles if it occurs on the last OAM DMA cycle.
  • The sample buffer is filled with the next sample byte read from the current address, subject to whatever mapping hardware is present.
  • The address is incremented; if it exceeds $FFFF, it is wrapped around to $8000.
  • The bytes remaining counter is decremented; if it becomes zero and the loop flag is set, the sample is restarted (see above); otherwise, if the bytes remaining counter becomes zero and the IRQ enabled flag is set, the interrupt flag is set.

At any time, if the interrupt flag is set, the CPU's IRQ line is continuously asserted until the interrupt flag is cleared. The processor will continue on from where it was stalled.

Output unit

The output unit continuously outputs a 7-bit value to the mixer. It contains an 8-bit right shift register, a bits-remaining counter, a 7-bit output level (the same one that can be loaded directly via $4011), and a silence flag.

The bits-remaining counter is updated whenever the timer outputs a clock, regardless of whether a sample is currently playing. When this counter reaches zero, we say that the output cycle ends. The DPCM unit can only transition from silent to playing at the end of an output cycle.

When an output cycle ends, a new cycle is started as follows:

  • The bits-remaining counter is loaded with 8.
  • If the sample buffer is empty, then the silence flag is set; otherwise, the silence flag is cleared and the sample buffer is emptied into the shift register.

When the timer outputs a clock, the following actions occur in order:

  1. If the silence flag is clear, the output level changes based on bit 0 of the shift register. If the bit is 1, add 2; otherwise, subtract 2. But if adding or subtracting 2 would cause the output level to leave the 0-127 range, leave the output level unchanged. This means subtract 2 only if the current level is at least 2, or add 2 only if the current level is at most 125.
  2. The right shift register is clocked.
  3. As stated above, the bits-remaining counter is decremented. If it becomes zero, a new output cycle is started.

Nothing can interrupt a cycle; every cycle runs to completion before a new cycle is started.

Conflict with controller and PPU read

On the NTSC NES and Famicom, if a new sample byte is fetched from memory at the same time the program is reading the controller through $4016/4017, a conflict occurs corrupting the data read from the controller. Programs which use DPCM sample playback will normally use a redundant controller read routine to work around this defect.

A similar problem occurs when reading data from the PPU through $2007, or polling $2002 for vblank.

Likely internal implementation of the read

The following is speculation, and thus not necessarily 100% accurate. It does accurately predict observed behavior.

The 6502 cannot be pulled off of the bus normally. The 2A03 DMC gets around this by pulling RDY low internally. This causes the CPU to pause during the next read cycle, until RDY goes high again. The DMC unit holds RDY low for 4 cycles. The first three cycles it idles, as the CPU could have just started an interrupt cycle, and thus be writing for 3 consecutive cycles (and thus ignoring RDY). On the fourth cycle, the DMC unit drives the next sample address onto the address lines, and reads that byte from memory. It then drives RDY high again, and the CPU picks up where it left off.

This matters because on NTSC NES and Famicom, it can interfere with the expected operation of any register where reads have a side effect: the controller registers ($4016 and $4017), reads of the PPU status register ($2002), and reads of VRAM/VROM data ($2007) if they happen to occur in the same cycle that the DMC unit pulls RDY low.

For the controller registers, this can cause an extra rising clock edge to occur, and thus shift an extra bit out. For the others, the PPU will see multiple reads, which will cause extra increments of the address latches, or clear the vblank flag.

This problem has been fixed on the 2A07 and PAL NES is exempt of this bug.

Usage of DMC for syncing to video

DMC IRQs can be used for timed video operations. The following method was discussed on the forum in 2010.[5]

Concept

The NES hardware only has limited tools for syncing the code with video rendering. The VBlank NMI and sprite zero hit are the only two reasonably reliable flags that can be used, so only 2 synchronizations per frame are doable easily. In addition, only the VBlank NMI can trigger an interrupt, the sprite zero flag has to be polled, potentially wasting a lot of CPU cycles.

However, the DMC channel can hypothetically be used for syncing with video instead of using it for sound. Unfortunately it's a bit complicated, but used correctly, it can function as a crude scanline counter, eliminating the need for an advanced mapper.

The DMC's timing is completely separate from the video. The DMC's timer is always running, and samples can only start every 8 clock cycles. However, because the DMC's timer isn't synchronized to the PPU in any way, these 8-clock boundaries occur on different scanlines each frame.

Here are the steps to achieve stable timing:

  • At a fixed point in video rendering (we'll use the start of vblank as an example), a dummy single-byte sample at rate $F is started. Due to a hardware quirk†, the sample needs to be started three times in a row like this:
sei
lda #$10 
sta $4015 
sta $4015 
sta $4015 
cli
  • The amount of cycles before a DMC IRQ happens is then measured (either using an actual IRQ, or by polling $4015).
    • At rate $F, there are 54 CPU cycles between clocks, so there are 432 CPU cycles (432 × 3 ÷ 341 = about 3.8 scanlines) between boundaries.
  • The main sample that will be used for the timing is then started (please refer to the table below to have sample lengths for various waiting times)
  • When the main IRQ happens, the measurement from before is retrieved, and a timing loop with variable delay is used. In order to synchronize with vblank, after a DMC IRQ we should wait 432 CPU cycles minus the time we measured.

Note: The hardware quirk mentioned above deals with how DMC IRQs are generated. Basically, the IRQ is generated when the last byte of the sample is read, not when the last sample of the sample plays. The sample buffer sometimes has enough time to empty itself between writes to $4015, meaning your next write to $4015 will trigger an immediate IRQ. Fortunately, writing to $4015 three times will avoid this issue.

Still using vblank as an example, the measurement tells how far into the 8-clock boundary vblank occurred, and by delaying after a DMC IRQ, we perform a raster effect at the same point within the 8-clock boundary, aligning it with vblank. By performing this same method each frame, the raster effect will have a reasonably stable timing to it. As a bonus, since mostly using IRQs are being used, the CPU is free to do something else, instead of waiting in a timed loop.

It's possible to use more than one IRQ per frame - but the measurement part needs to be done at the same time within each frame, before the usage of any IRQ.

Only a single split-point per IRQ is possible, with the shortest IRQ being 3.8 scanlines. For split points closer than this amount, timed code has to be used.

In order to remain silent, samples should be made up of all $00 bytes, and $00 should have been previously written to $4011. Otherwise, audio will unintentionally be created. This is a sound channel, after all.

Timing table

This table converts sample length in scanline length (all values are rounded to the higher integer).

NTSC               Rate 
Length              $0    $1   $2   $3   $4   $5   $6   $7   $8   $9   $a   $b   $c   $d   $e   $f 
---------------------------------------------------------------------------------------------------- 
1-byte (8 bits)     31    27   24   23   21   18   16   16   14   12   10   10   8    6    6    4 
17-byte (136 bits)  **    **   **   **   **   **   **   **   228  192  170  154  127  101  87   65 
33-byte (264 bits)  **    **   **   **   **   **   **   **   **   **   **   **   **   196  168  126 
49-byte (392 bits)  **    **   **   **   **   **   **   **   **   **   **   **   **   **   **   187 

PAL                Rate 
Length              $0    $1   $2   $3   $4   $5   $6   $7   $8   $9   $a   $b   $c   $d   $e   $f 
---------------------------------------------------------------------------------------------------- 
1-byte (8 bits)     30    27   24   23   21   18   16   15   14   12   10   9    8    6    5    4 
17-byte (136 bits)  **    **   **   **   **   **   **   **   225  189  169  151  126  100  85   64 
33-byte (264 bits)  **    **   **   **   **   **   **   **   **   **   **   **   **   194  164  124 
49-byte (392 bits)  **    **   **   **   **   **   **   **   **   **   **   **   **   **   **   184

Number of scanlines to wait table

This table gives the best sample length and frequency combinations for all possible scanlines interval to wait. They are best because they are where the CPU will have to kill the less time. However it's still possible to use options to wait for fewer lines and kill more time during the interrupt before the video effect.

Because a PAL interrupt will always happen about the same time or a bit sooner than a NTSC interrupt, the NTSC table will be used to set the "best" setting here :

Scanlines  Best opt. for IRQ 
 
1-3        Timed code 
4-5        Length $0, rate $f 
6-7        Length $0, rate $d 
8-9        Length $0, rate $c 
10-11      Length $0, rate $a 
12-13      Length $0, rate $9 
14-15      Length $0, rate $8 
16-17      Length $0, rate $6 
18-20      Length $0, rate $5 
21-22      Length $0, rate $4 
23         Length $0, rate $3 
24-26      Length $0, rate $2 
27-30      Length $0, rate $1 
31-64      Length $0, rate $0 
65-86      Length $1, rate $f 
87-100     Length $1, rate $e 
101-125    Length $1, rate $d 
126        Length $2, rate $f 
127-153    Length $1, rate $c 
154-167    Length $1, rate $b 
168-169    Length $2, rate $e 
170-186    Length $1, rate $a 
187-191    Length $3, rate $f 
192-195    Length $1, rate $9 
196-227    Length $2, rate $d 
228-239    Length $1, rate $8

References

  1. Forum post: PAL DPCM frequency table contains 2 errors.
  2. Forum post: Blargg's DMA tests
  3. Forum post: cpow's Visual 2A03 DMC vs OAM DMA analysis
  4. Forum post: Fiskbit's aligned controller read test
  5. Forum thread: DMC IRQ as a video timer.