PPU rendering

From NESdev Wiki
Revision as of 22:35, 6 April 2011 by Drag (talk | contribs) (I'm almost done with this, so I'll move it out of my namespace soon.)
Jump to navigationJump to search

Work in progress, do not alter.

TODO: Rename this as PPU Rendering instead of PPU Background Evaluation?

The PPU contains the following:

  • 2 16-bit shift registers - These contain the bitmap data for two tiles. Every 8 cycles, the bitmap data for the next tile is loaded into the upper 8 bits of this shift register. Meanwhile, the pixel to render is fetched from one of the lower 8 bits.
  • 2 8-bit shift registers - These contain the palette attributes for the lower 8 pixels of the 16-bit shift register. These registers are fed by a latch which contains the palette attribute for the next tile. Every 8 cycles, the latch is loaded with the palette attribute for the next tile.
                   [BBBBBBBB] - Bitmap of next tile, 2 bits per pixel
                    ||||||||
                    vvvvvvvv
                   [BBBBBBBBAAAAAAAA] - 16-bit shift registers
                   [BBBBBBBBAAAAAAAA] -
                            vvvvvvvv
                            ||||||||
[Select a bit]------------>[++++++++]--------------------------------> [Pixel]
                            ||||||||
                            ^^^^^^^^
                  [Latch]->[PPPPPPPP] - 8-bit shift registers
                  [Latch]->[PPPPPPPP] -
                     ^
                     |
                  [2-bit Palette Attribute for next tile (from attribute table)]

Every cycle, a bit is fetched from these 4 shift registers in order to create a pixel on screen. Exactly which bit is fetched depends on the fine X scroll, set by $2005 (this is how fine X scrolling is possible). Afterwards, the shift registers are shifted once, to the data for the next pixel.

Every 8 cycles/shifts, new data is loaded into these registers.

NTSC PPU

The PPU renders 262 scanlines per frame. Each scanline lasts for 341 PPU clock cycles (113.667 CPU clock cycles; 1 CPU cycle = 3 PPU cycles), with each clock cycle producing one pixel.

Scanline -1 or 261

This is a dummy scanline, whose sole purpose is to perform the sprite evaluation for the next scanline, and to fill the shift registers with the data for the first two tiles of the next scanline. Although no pixels are rendered for this scanline, the PPU still makes the same memory accesses it would for a regular scanline.

This scanline varies in length, depending on whether an even or an odd frame is being rendered. For a odd frames, the idle cycle at the end of the scanline is skipped. For even frames, the idle cycle occurs normally. This is done to compensate for some shortcomings with the way the PPU physically outputs its video signal, the end result being a crisper image when the screen isn't scrolling. However, this behavior can be bypassed by keeping rendering disabled until after this scanline has passed, which results in an image that looks more like a traditionally interlaced picture.

Scanlines 0-239

These are the visible scanlines, which contain the graphics to be displayed on the screen. This includes the rendering of both the background and the sprites. During these scanlines, the PPU is busy fetching data, so the program should not access PPU memory during this time, unless rendering is turned off.

Cycles 0-255

The data for each tile is fetched during this phase. Each memory access takes 2 PPU cycles to complete, and 4 must be performed per tile:

  1. Nametable byte
  2. Attribute table byte
  3. Tile bitmap A
  4. Tile bitmap B (+8 bytes from tile bitmap A)

The data fetched from these accesses is placed into internal latches, and then fed to the appropriate shift registers when it's time to do so (every 8 cycles). Because the PPU can only fetch an attribute byte every 8 cycles, each sequential string of 8 pixels is forced to have the same palette attribute.

Note: At the beginning of each scanline, the data for the first two tiles is already loaded into the shift registers (and ready to be rendered), so the first tile that gets fetched is Tile 3.

At some point (TODO: is it during cycles 256-319?), the X position and attribute bytes for each sprite on this scanline were copied from the secondary OAM into 8 seperate counters (for x position) and 8 seperate latches (for attribute bytes), which joins the shift registers containing the sprite bitmaps loaded during cycles 256-319.

Every cycle, the 8 counters are decremented by 1. If any of those counters reaches 0, the respective sprites are "activated".

Then, for each "active" sprite, one pixel of bitmap data is shifted out from the respective sprites' pair of shift registers. These pixels, along with the pixel from the background, go into a multiplexer, which selects the first nonzero (nontransparent) sprite pixel (from highest to lowest sprite priority), outputting the background pixel if there is none. If the sprite has foreground priority, then the sprite's pixel is output to the screen. If the sprite has background priority, the background's pixel is output if it's nonzero (the sprite pixel being output otherwise). (Note: Even though the sprite is "behind the background", it was still the the highest priority sprite to have a nonzero pixel, and thus the only sprite to be looked at. This is where the sprite priority quirk comes from.)

While all of this is going on, sprite evaluation for the next scanline is taking place as a seperate process, independent to what's happening here.

Cycles 256-319

The tile data for the sprites on the next scanline are fetched here. Again, each memory access takes 2 PPU cycles to complete, and 4 are performed for each of the 8 sprites:

  1. Garbage nametable byte
  2. Garbage nametable byte
  3. Tile bitmap A
  4. Tile bitmap B (+8 bytes from tile bitmap A)

The garbage fetches occur so that the same circuitry that performs the BG tile fetches could be reused for the sprite tile fetches.

If there are less than 8 sprites on the next scanline, then dummy tile fetches occur for the left-over sprites. This data is then discarded, and the sprites are loaded with a transparent bitmap instead.

Cycles 320-335

This is where the first two tiles for the next scanline are fetched, and loaded into the shift registers. Again, each memory access takes 2 PPU cycles to complete, and 4 are performed for the two tiles:

  1. Nametable byte
  2. Attribute table byte
  3. Tile bitmap A
  4. Tile bitmap B (+8 bytes from tile bitmap A)

Cycles 336-339

Two bytes are fetched, but the purpose for this is unknown. These fetches are 2 PPU cycles each.

  1. Nametable byte
  2. Nametable byte

Cycle 340

The PPU idles for one cycle.

Scanline 240

The PPU just idles during this scanline. Despite this, this scanline still occurs before the VBlank flag is set.

Scanlines 241-260

These occur during VBlank. The VBlank flag of the PPU is pulled low during scanline 241, so the VBlank NMI occurs here. During this time, the PPU makes no memory accesses, so PPU memory can be freely accessed by the program.

See Also

References