The skinny on NES scrolling
Preface
"The skinny on NES scrolling" was posted by loopy on 1999-04-13 to what eventually became the NESdev Yahoo! Group. It was the first to publicly tell how exactly how the PPU uses addresses written to its ports. After over a decade, it is still believed accurate. Some people get turned off by the fact that it's provided as monospaced text inside a zipfile, that addresses have nothing to distinguish them from years, and that the diagrams of what bits get copied where are allegedly difficult to read.
The document that follows attempts to explain the mechanisms of NES scrolling in detail, and is largely based on loopy's original notes.
PPU registers
If you aren't trying to split the screen, scrolling the background is as easy as writing the X and Y coordinates to $2005 and writing the high bit of both coordinates to $2000. Programming or emulating a game that uses complex raster effects, on the other hand, requires a complete understanding of how the various address registers inside the PPU work. Here are the related registers:
- v
- Current VRAM address (15 bits)
- t
- Temporary VRAM address (15 bits)
- x
- Fine X scroll (3 bits)
- w
- First or second write toggle (1 bit)
Registers v and t are 15 bits, but because emulators commonly store them in 16-bit machine words, they are shown with an extra bit that's never used.
The PPU uses the current VRAM address for both reading and writing PPU memory thru $2007, and for fetching nametable data to draw the background. As it's drawing the background, it updates the address to point to the nametable data currently being drawn. Bits 10-11 hold the base address of the nametable minus $2000. Bits 12-14 are the Y offset of a scanline within a tile.
The 15 bit registers t and v are composed this way during rendering:
yyy NN YYYYY XXXXX ||| || ||||| +++++-- coarse X scroll ||| || +++++-------- coarse Y scroll ||| ++-------------- nametable select +++----------------- fine Y scroll
Stuff that affects register contents
In the following, d refers to the data written to the port, and A through H to individual bits of a value.
$2005 and $2006 share a common write toggle, so that the first write has one behaviour, and the second write has another. After the second write, the toggle is reset to the first write behaviour. This toggle may be manually reset by reading $2002.
- $2000 write:
t: ...BA.. ........ = d: ......BA
- $2002 read:
w: = 0
- $2005 first write (w is 0):
t: ....... ...HGFED = d: HGFED... x: CBA = d: .....CBA w: = 1
- $2005 second write (w is 1):
t: .....HG FED..... = d: HGFED... t: CBA.... ........ = d: .....CBA w: = 0
- $2006 first write (w is 0):
t: .FEDCBA ........ = d: ..FEDCBA t: G...... ........ = 0 w: = 1
- $2006 second write (w is 1):
t: ....... HGFEDCBA = d: HGFEDCBA v = t w: = 0
- At dot 250 of each scanline:
- If rendering is enabled, the PPU increments the vertical position in v. The effective Y scroll coordinate is incremented, which is a complex operation that will correctly skip the attribute table memory regions, and wrap to the next nametable appropriately. See Wrapping around below.
- At dot 256 of each scanline:
- If rendering is enabled, the PPU copies all bits related to horizontal position from t to v:
v: ....H.. ...EDCBA = t: ....H.. ...EDCBA
- At dot 304 of the pre-render scanline (end of vblank):
- If rendering is enabled, the PPU copies all bits from t to v:
v = t
- $2007 writes:
- Outside of rendering, writes to $2007 will add either 1 or 32 to v depending on the VRAM increment bit set via $2000. During rendering, it increments the Y scroll position in v with wrapping, regardless of the increment setting. This is not normally useful for scrolling.
All of this info agrees with the tests Loopy has run on an NES console and Quietust's analysis of a micrograph of the PPU die.
If there's something you don't agree with, please let the BBS know so that a member can verify it.
Explanation
- The implementation of scrolling has two components. There are two fine offsets, specifying what part of an 8x8 tile each pixel falls on, and two coarse offsets, specifying which tile. Because each tile corresponds to a single byte addressable by the PPU, during rendering the coarse offsets reuse the same VRAM address register (v) that is normally used to send and receive data from the PPU. Because of this reuse, the two registers $2005 and $2006 both offer control over v, but $2005 is mapped in a more obscure way, designed specifically to be convenient for scrolling.
- $2006 is simply to set the VRAM address register. This is why the second write will immediately set v; it is expected you will immediately use this address to send data to the PPU via $2007. The PPU memory space is only 14 bits wide, but v has an extra bit that is used for scrolling only. The first write to $2006 will clear this extra bit (uncertain why, would it interfere with the address internally?).
- $2005 is designed to set the scroll position before the start of the frame. This is why it does not immediately set v, so that it can be set at precisely the right time to start rendering the screen.
- The high 5 bits of the X and Y scroll settings sent to $2005, when combined with the 2 nametable select bits sent to $2000, make a 12 bit address for the next tile to be fetched within the nametable address space $2000-2FFF. If set before the end of vblank, this 12 bit address gets loaded directly into v precisely when it is needed to fetch the tile for the top left pixel to render.
- The low 3 bits of X and Y sent to $2005 are the fine pixel offset within the 8x8 tile. The X component goes into the separate x register, which just counts pixels until the next tile should be fetched. The Y component goes into the high 3 bits of the v register, where during rendering they are not used as part of the PPU memory address (which is being overridden to use the nametable space $2000-2FFF). Instead they count the lines until the coarse Y memory address needs to be incremented (and wrapped appropriately when nametable boundaries are crossed).
Wrapping around
The following pseudocode examples explain how wrapping is performed when incrementing components of v. This code is written for clarity, and is not optimized for speed.
Coarse X increment
The coarse X component of v needs to be incremented when the next tile is reached. Bits 0-4 are incremented, with overflow toggling bit 10. This means that bits 0-4 count from 0 to 31 across a single nametable, and bit 10 selects the current nametable horizontally.
if ((v & 0x001F) == 31) // if coarse X == 31 v &= ~0x001F // coarse X = 0 v ^= 0x0400 // switch horizontal nametable else v += 1 // increment coarse X
Y increment
At pixel 250 of each scanline, fine Y is incremented, overflowing to coarse Y, and finally adjusted to wrap among the nametables vertically.
Bits 12-14 are fine Y. Bits 5-9 are coarse Y. Bit 11 selects the vertical nametable.
if ((v & 0x7000) != 0x7000) // if fine Y < 7 v += 0x1000 // increment fine Y else v &= 0x0FFF // fine Y = 0 int y = (v & 0x03E0) >> 5 // let y = coarse Y if y == 29 y = 0 // coarse Y = 0 v ^= 0x0800 // switch vertical nametable else if y' == 31 y = 0 // coarse Y = 0, nametable not switched else y += 1 // increment coarse Y v = (v & ~0x03E0) | (y << 5) // put coarse Y back into v
Row 29 is the last row of tiles in a nametable, so to wrap to the next nametable, when incrementing coarse Y from 29, the vertical nametable is switch (toggle bit 11) and coarse Y wraps to row 0.
Coarse Y can be be set out of bounds (>29), which will cause the PPU to read the attribute data stored there as tile data. If coarse Y is incremented from 31, it will wrap to 0, but the nametable will not switch. For this reason, a write >=240 to $2005 may appear as a "negative" scroll value, where 1 or 2 rows of attribute data will appear before the nametable's tile data is reached.
Tile and attribute fetching
The high bits of v are used for fine Y during rendering, and addressing nametable data only requires 12 bits, with the high 2 CHR addres lines fixed to the 0x2000 region. The address to be fetched during rendering can be deduced from v in the following way:
tile address = 0x2000 | (v & 0x0FFF) attribute address = 0x23C0 | (v & 0x0C00) | ((v >> 4) & 0x38) | ((v >> 2) & 0x07)
The low 12 bits of the attribute address are composed in the following way:
NN 1111 YYY XXX || |||| ||| +++-- high 3 bits of coarse X (x/4) || |||| +++------ high 3 bits of coarse Y (y/4) || ++++---------- attribute offset (960 bytes) ++--------------- nametable select
Examples
Below is an example of 6502 code that completely sets the scroll register before the next scanline, indicating what happens to all relevant variables described above, both before and after the 6502 instructions are executed.
Individual bits written to a PPU register are colour-coded to reflect where they end up in t.
Assume all 6502 code is run sequentially in the order shown, one instruction after the next.
Single scroll
If only one scroll setting is needed for the entire screen, this can be done by writing $2000 once, and $2005 twice before the end of vblank.
- The low two bits of $2000 select which of the four nametables to use.
- The first write to $2005 specifies the X scroll, in pixels.
- The second write to $2005 specifies the Y scroll, in pixels.
After this, do not make any writes to $2006 before the end of vblank, as they will overwrite the t register. The v register will be completely copied from t at the end of vblank, setting the scroll.
Note that the series of two writes to $2005 presumes the toggle that specifies which write is taking place. If the state of the toggle is unknown, reset it by reading from $2002 before the first write to $2005.
Instead of writing $2000, the first write to $2006 can be used to select the nametable, if this happens to be more convenient (usually it is not because it will toggle w).
Split X scroll
The X scroll can be changed at the end of any scanline when the horizontal components of v get reloaded from t: Simply make one write to $2005 before the end of the line.
- The first write to $2005 alters the horizontal scroll position. The fine x register (sub-tile offset) gets updated immediately, but the coarse horizontal component of t (tile offset) does not get updated until the end of the line.
- An optional second write to $2005 is inconsequential; the changes it makes to t will be ignored at the end of the line. However, it will reset the write toggle w for any subsequent splits.
Like the single scroll example, reset the toggle by reading $2002 if it is in an unknown state. Since a write to $2005 and a read from $2002 are equally expensive in both bytes and time, whether you use one or the other to prepare for subsequent screen splits is up to you.
The first write to $2005 should usually be made as close to the end of the line as possible, but before the start of hblank when the coarse x scroll is copied from t to v. Because about 4 pixels of timing jitter are normally unavoidable, $2005 should be written a little bit early (once hblank begins, it is too late). The resulting glitch at the end of the line can be concealed by a line of one colour pixels, or a sprite. To eliminate the glitch altogether, the following more advanced X/Y scroll technique could be used to update v during hblank instead.
Split X/Y scroll
To split both the X and Y scroll on a scanline, we must perform four writes to $2006 and $2005 alternately in order to completely reload v. Without the second write to $2006, only the horizontal portion of v will loaded from t at the end of the scanline. By writing twice to $2006, the second write causes an immediate full reload of v from t, allowing you to update the vertical scroll in the middle of the screen.
This is based on Drag's example on the nesdev forum where writes to PPU registers are done in the order of $2006, $2005, $2005, $2006. This order of writes is important, understanding that the write toggle for $2005 is shared with $2006. As always, if the state of the toggle is unknown before beginning, read $2002 to reset it.
In this example we will perform two writes to each of $2005 and $2006. We will set the X scroll (X), Y scroll (Y), and nametable select (N) by writes to $2005 and $2006. This diagram shows where each value fits into the four register writes.
N: %01 X: %01111101 = $7D Y: %00111110 = $3E
$2005.1 = X = %01111101 = $7D $2005.2 = Y = %00111110 = $3E $2006.1 = ((Y & %11000000) >> 6) | ((Y & %00000011) << 4) | (N << 2) = %00010100 = $14 $2006.2 = ((X & %11111000) >> 3) | ((Y & %00111000) << 2) = %11101111 = $EF
However, since there is a great deal of overlap between the data sent to $2005 and $2006, only the last write to any particular bit of t matters. This makes the first write to $2006 mostly redundant, and we can simplify its setup significantly:
$2006.1 = N << 2 = %00000100 = $04
There are other redundancies in the writes to $2005, but since it is likely the original X and Y values are already available, these can be left as an exercise for the reader.
Before | Instructions | After | Notes | ||||
---|---|---|---|---|---|---|---|
t | v | x | t | v | x | ||
....... ........ | ....... ........ | ... | LDA #$04 (%00000100) STA $2006 |
0000100 ........ | ....... ........ | ... | Bit 14 of t set to zero |
0000100 ........ | ....... ........ | ... | LDA #$3E (%00111110) STA $2005 |
1100100 111..... | ....... ........ | ... | Behaviour of 2nd $2005 write |
1100100 111..... | ....... ........ | ... | LDA #$7D (%01111101) STA $2005 |
1100100 11101111 | ....... ........ | 101 | Behaviour of 1st $2005 write |
1100100 11101111 | ....... ........ | 101 | LDA #$EF (%11101111) STA $2006 |
1100100 11101111 | 1100100 11101111 | 101 | After t is updated, contents of t copied into v |
Timing for this series of writes is important. Because the Y scroll in v will be incremented at pixel 250, you must either set it to the intended Y-1 before pixel 250, or set it to Y after pixel 250. Many games that use split scrolling have a visible glitch at the end of the line by timing it early like this.
Alternatively you can set the intended Y after pixel 250. The last two writes ($2005.1 / $2006.2) can be timed to fall within hblank to avoid any visible glitch. Hblank begins after pixel 256, and ends at pixel 320 when the first tile of the next line is fetched.
Because this method sets v immediately, it can be used to set the scroll in the middle of the line. This is not normally recommended, as the difficulty of exact timing and interaction of tile fetches makes it difficult to do cleanly.