PPU OAM
The OAM (Object Attribute Memory) is internal memory inside the PPU that contains a display list of up to 64 sprites, where each sprite's information occupies 4 bytes.
Byte 0
Y position of top of sprite
Sprite data is delayed by one scanline; you must subtract 1 from the sprite's Y coordinate before writing it here. Hide a sprite by writing any values in $EF-$FF here. Sprites are never displayed on the first line of the picture, and it is impossible to place a sprite partially off the top of the screen.
Byte 1
Tile index number
For 8x8 sprites, this is the tile number of this sprite within the pattern table selected in bit 3 of PPUCTRL ($2000).
For 8x16 sprites, the PPU ignores the pattern table selection and selects a pattern table from bit 0 of this number.
76543210 |||||||| |||||||+- Bank ($0000 or $1000) of tiles +++++++-- Tile number of top of sprite (0 to 254; bottom half gets the next tile)
Thus, the pattern table memory map for 8x16 sprites looks like this:
- $00: $0000-$001F
- $01: $1000-$101F
- $02: $0020-$003F
- $03: $1020-$103F
- $04: $0040-$005F
[...] - $FE: $0FE0-$0FFF
- $FF: $1FE0-$1FFF
Byte 2
Attributes
76543210 |||||||| ||||||++- Palette (4 to 7) of sprite |||+++--- Unimplemented ||+------ Priority (0: in front of background; 1: behind background) |+------- Flip sprite horizontally +-------- Flip sprite vertically
Flipping does not change the position of the sprite's bounding box, just the position of pixels within the sprite. If, for example, a sprite covers (120, 130) through (127, 137), it'll still cover the same area when flipped. In 8x16 mode, vertical flip flips each of the subtiles and also exchanges their position; the odd-numbered tile of a vertically flipped sprite is drawn on top. This behavior differs from the behavior of the unofficial 16x32 and 32x64 pixel sprite sizes on the Super NES, which will only vertically flip each square sub-region.
The three unimplemented bits of each sprite's byte 2 do not exist in the PPU and always read back as 0 on PPU revisions that allow reading PPU OAM through OAMDATA ($2004). This can be emulated by ANDing byte 2 with $E3 either when writing to or when reading from OAM. It has not been determined whether the PPU actually drives these bits low or whether this is the effect of data bus capacitance from reading the last byte of the instruction (LDA $2004, which assembles to AD 04 20).
Byte 3
X position of left side of sprite
X-scroll values of F9-FF do NOT result in the sprite wrapping around to the left side of the screen.
DMA
Most programs write to a copy of OAM somewhere in CPU addressable RAM (often $0200-$02FF) and then copy it to OAM each frame using the OAM_DMA ($4014) register. Writing value to this register causes the DMA circuitry inside the 2A03/07 to fully initialize the OAM by writing OAMDATA ($2004) 256 times using successive bytes from memory page value (i.e., starting at address $100×value). The CPU is suspended while the transfer is taking place.
Note that the address range to copy from could lie outside RAM, though that happens very rarely in practice.
Not counting the $4014 write tick, the above procedure takes 513 CPU cycles (+1 on odd CPU cycles): first one (or two) idle cycles, and then 256 pairs of alternating read/write cycles. (For comparison, an unrolled LDA/STA loop would usually take four times as long.)
Sprite zero hits
The sprite zero hit flag ($2002:6) is raised during image drawing at the point (if any) where a non-bg background pixel overlaps a non-bg sprite zero pixel ("non-bg" simply means that the pattern bits for the pixel are not both zero; the value at the corresponding index in the palette RAM does not matter), provided that background and sprite rendering are both enabled (via $2001:4-3). Sprite zero hits do not register at x=255 (independent of sprite zero's x coordinate), but sprites are still drawn there.
The sprite zero flag is automatically set to 0 at dot 1 of the pre-render line. There is no way to manually reset it, which means that you only get one sprite zero hit per frame.
Sprite overlapping
Priority between sprites is determined by their address inside OAM. So to have a sprite displayed in front of another sprite in a scanline, the sprite data that occurs first will overlap any other sprites after it. For example, when sprites at OAM $0C and $28 overlap, the sprite at $0C will appear in front.
Internal operation
In addition to the primary OAM memory, the PPU contains 32 bytes (enough for 8 sprites) of secondary OAM memory that is not directly accessible by the program. During each visible scanline this secondary OAM is first cleared, and then a linear search of the entire primary OAM is carried out to find sprites that are within y range for the next scanline (the sprite evaluation phase). The OAM data for each sprite found to be within range is copied into the secondary OAM, which is then used to initialize eight internal sprite output units.
For the precise timing, see this timing diagram.
The reason sprites at lower addresses in OAM overlap sprites at higher addresses is that sprites at lower addresses also get assigned a lower address in the secondary OAM, and hence get assigned a lower-numbered sprite output unit during the loading phase. Output from lower-numbered sprite output units is wired inside the PPU to take priority over output from higher-numbered sprite output units.
Sprite zero hit detection relies on the fact that sprite zero, when it is within y range for the next scanline, always gets assigned the first sprite output unit. The hit condition is basically sprite zero is in range AND the first sprite output unit is outputting a non-zero pixel AND the background drawing unit is outputting a non-zero pixel. (Internally the PPU actually uses two flags: one to keep track of whether sprite zero occurs on the next scanline, and another one—initialized from the first—to keep track of whether sprite zero occurs on the current scanline. This is to avoid sprite evaluation, which takes place concurrently with potential sprite zero hits, trampling on the second flag.)