Tile compression refers to techniques that allow fitting more graphics data into a smaller space. Programs using CHR ROM cannot use compressed tiles, as their tile data must be stored in the PPU's native format. But programs using CHR RAM can process tile data while copying it from PRG ROM to CHR RAM, and this processing allows storing more tiles in the same space.
Run-length encoding transforms runs of identical bytes into a shorter sequence of bytes that specifies the length of the run.
In NES tile data, byte-level run-length encoding works well when a row of 8 pixels in a tile is identical to the row above it. It also works well for nametable data because a horizontal run of blank tiles becomes a single tile.
Pixel-level run-length encoding is much slower but can achieve impressive results within a tile.
There are several different RLE data formats.
The PCX image format became popular on PC.
|00-BF||Write this byte to the output.|
|C0-FF||Read another byte, and write it to the output n - 192 times.|
|00-7F||Copy n + 1 bytes from input to output.|
|81-FF||Read another byte, and write it to the output 257 - n times.|
This format is used in the U.S. version of Contra, and the Japanese version of Simon's Quest. It can be decoded and encoded with the Python program GraveyardDuck. Compression ratio is more or less identical to PackBits.
|00-80||Read another byte, and write it to the output n times.|
|81-FE||Copy n - 128 bytes from input to output.|
|FF||End of compressed data|
Blades of Steel uses a subset of this format reserving a special value for jumping to a new PPU address. See: Blades of Steel DataCrystal reference
The BIOS of the Game Boy Advance and Nintendo DS contains a decompressor for an RLE format very similar to PackBits and Konami. As described in GBATEK, it has a 4-byte size header followed by this:
|00-7F||Copy n + 1 bytes from input to output.|
|80-FF||Read one byte from input and write it to output n - 125 times.|
This codec was conceived by Damian Yerrick as an alternative to PackBits for the Action 53 multicart, and a Python packer and 6502 unpacker are included in the Action 53 menu distribution. Unlike freeform RLE formats such as Konami and PackBits, PB53 operates on 16-byte units, making it easy to divide the decompressed data into fixed-size packets to be sent to the PPU during vblank while rendering is turned on. Like LZSS, PB53 uses unary coding on the lengths of literal runs to save on overhead from switching between literal and run modes. This means that like LZSS, it has a worst case expansion of 12.5%, but it works fairly well on real tile data and OK on nametable data, which have shorter runs than the high-resolution files for which PackBits was designed. It also has a special mode to accommodate the layout of Shiru's NROM games LAN Master, Lawn Mower, and Chase, which have many identical tiles between the two pattern tables to allow tile animation.
Each tile consists of several 8-byte planes, two planes in the NES implementation. For the first plane in a tile:
|00-7F||PB8: After this control byte, copy the following byte from input to output. Then, for each bit in the control byte from 6 to 0, if the bit is 1, repeat the previous byte; otherwise, copy another byte from the input. This is somewhat similar to how control bytes are formatted in LZSS.|
|80||Write eight $00 bytes.|
|81||Write eight $FF bytes.|
|82||Copy one tile (16 bytes) starting one tile back. (Used for a repeated tile, such as the unused tiles in many games.)|
|83||Copy one tile starting one segment back, usually 4096 bytes. (Used for pattern tables that share tiles, as seen in several Shiru games. The decoder switches between two instances one segment apart, each with its own input stream and output buffer.)|
|84||Write sixteen $00 bytes. (Solid color 0)|
|85||Write eight $FF bytes then eight $00 bytes. (Solid color 1)|
|86||Write eight $00 bytes then eight $FF bytes. (Solid color 2)|
|87||Write sixteen $FF bytes. (Solid color 3)|
For other planes:
|00-81||Same as first plane|
|82||Copy previous 8 bytes. (Used for 1-bit tiles with colors 0 and 3.)|
|83||Copy previous 8 bytes, bit-inverted. (Used for 1-bit tiles with colors 1 and 2.)|
PB8 is the same as PB53 except that bit 7 of the control byte is not special: it still means repeat the previous byte. It is used in Haunted: Halloween '85 and Haunted: Halloween '86. From July 2019 to May 2020, the workalike boot ROM included with the Game Boy Color emulator SameBoy used PB8 for the emulator's logo.
PB16 is similar to PB8 with one change: each 1 bit means a repeat of the value two bytes back. (Bit 7 is not special, unlike in PB53.) This distance of two bytes is better for Game Boy and Super NES tile data and Super NES tile maps. It is used in the Game Boy ports of 240p Test Suite and Magic Floor.
PB8 and PB16 inspired the creation of PB12 for the SameBoy emulator's boot ROM. So-called "PB12" by NieDzejkob (Jakub Kądziołka) is tuned to the statistics of the 3-color antialiased version of the SameBoy logo. Control bytes are interleaved with literal bytes. Control byte 00000001 is a terminator and thus must not occur in the byte stream. Otherwise, each 2 or 4 bits of the control correspond to a 4-bit word.
|00||copy next byte from input|
|0100||Copy 1 byte back, ORed with the same byte shifted left 1|
|0101||Copy 1 byte back, ANDed with the same byte shifted left 1|
|0110||Copy 1 byte back, ORed with the same byte shifted right 1|
|0111||Copy 1 byte back, ANDed with the same byte shifted left 1|
|10||Copy 2 bytes back|
|11||Copy 1 byte back|
This RLE variant was used by Joel Yliluoma in the Simon's Quest retranslation project. It is very efficient when compressing nametables, which often contain redundancy in more complex forms than simple runs of repeating bytes. Examples include brick walls, which repeat two tiles, and complex graphics that is formed from an ascending series of successive tiles. For bitmap compression, it is slightly inferior to simpler RLE methods.
|00–3F||LIT: Copy (n+1) bytes from input to output backwards|
|40||END: End of stream|
|41–7F||SEQ: Read next byte b. Put b, (n−0x3F) times; add 1 to b after each iteration|
|80–9F||DBL: Read next byte b1, and next byte b2. Put b1, (n−0x7D) times; swap b2 and b1 after each iteration|
|A0–FF||RUN: Read byte b. Put b, (0x101−n) times.|
A compressor for this scheme can be found at http://bisqwit.iki.fi/src/nes-ppu_rleinc_compress.php.txt (PHP), and a decompressor at http://bisqwit.iki.fi/src/nes-ppu_rleinc_v2b.inc (CA65).
JRoatch made PBJ, which adds the PB8 mode from PB53 and a common-byte mode to a modified RLEINC.
Most RLE schemes deal with whole bytes. There are also schemes where the compressed data forms a bitstream, that contains integers of different bit widths.
When compressing the combined tile graphics of Super Mario Bros. and Kirby's Adventure, a simple reference RLE algorithm (PackBits) gets a 12% reduction in data size. However, if the algorithm is modified as indicated below, a 21% reduction is achieved. For comparison, the graphics specialized algorithm in PB53 achieves 25% for that data set. Tokumaru compression manages to reduce the data by 41%.
|0000||End of stream.|
|0nnn||Copy the next n×8 bits, i.e. n bytes, to the output.|
|1nnn||Read the next 8 bits, and output this byte n+2 times.|
A possible reason why bit-based compression methods are not popular on the NES is that bit-shifts are cumbersome and slow with the 6502 CPU, as it can only shift one bit at a time. The above algorithm is still relatively simple to implement, as it operates on units of 4 bits for the input and full bytes for the output. Coincidentally, it also produced the best compression out of all bit-based RLE algorithms that were brute-force-tested for that dataset.
NES Stripe Image RLE
A RLE format mostly used by Nintendo for use in their Arcade ports as well as their Mario games, Also used in some homebrew games!
Format: dest, len, data, [dest, len, data, ]* end
dest: PPU destination address (2 bytes, big endian), to be written to $2006
len: Length (Byte) of PPU Buffer Data:
|00-3F||Literal to right: Copy n + 1 bytes to video memory addresses increasing by 1|
|40-7F||Run to right: Copy one byte n - 63 times to video memory addresses increasing by 1|
|80-BF||Literal down: Copy n - 127 bytes to video memory addresses increasing by 32|
|C0-FF||Run down: Copy one byte n - 191 times to video memory addresses increasing by 32|
data: PPU Data to display
end: End of Data marker. Early games use $00; later games (particularly those with CHR RAM) may use any value with bit 7 set ($80-$FF) to allow writing to the first 16 tiles of video memory.
See also Popslide, a video memory update buffer framework using this format
SNES Stripe Image RLE
Same RLE format used by Nintendo as above, but for SNES.
Format: dest, len, data, end
dest: PPU Destination: $2116 and $2117
len: Length (Word) of PPU Buffer Data:
|0000-3FFF||Copy n + 1 bytes to PPU buffer|
|4000-7FFF||Copy n + 1 bytes to PPU buffer, with RLE|
|8000-BFFF||Copy n + 1 bytes to PPU buffer, increment 32 bytes|
|C000-FFFF||Copy n + 1 bytes to PPU buffer, with RLE, increment 32 bytes|
data: PPU Data to display in 2-byte increments (or word increments)
end: Unlike the NES version, the end byte is $FF or $FFFF.
This is a Markov chain (predictive) algorithm that packs predictions in varying number of bits. It works on packets measured in whole tiles, and compresses mostly pixel by pixel, making it slow. Explained on forum.
Tokumaru discovered an improvement to the way the frequency tables are changed in Codemasters algorithm, and released the compressor and decompressor as open source. And open-source rewrite of the encoder and decoder with slightly better performance can be downloaded at: http://bisqwit.iki.fi/source/tokumaru.html
The compressed data begins with a byte that tells how many tiles it encodes. 256 is maximum. Technically this byte can be ignored, if you already know how many tiles you are going to decode.
After the byte, any number of blocks follows. Each block begins with a color description table. This table tells how to encode transitions between colors in tiles belonging to this block.
There are four elements in this table, from 3 to 0, for color n. Each element begins with a two-bit integer ncolors[n], that tells how many different colors that may follow a pixel of this particular color. After the number of colors, comes a pivot color a that is encoded as follows:
|nothing||ncolors[n]=0||No pivot color|
|1 bit: 0||ncolors[n]>0||Pivot color a is 1 if n < 1, 0 otherwise.|
|2 bits: 10||ncolors[n]>0||Pivot color a is 2 if n < 2, 1 otherwise.|
|2 bits: 11||ncolors[n]>0||Pivot color a is 3 if n < 3, 2 otherwise.|
The table of transition colors is then calculated using n, a, and the number of colors ncolors[n]. First, two other colors b and c are calculated. They are the first color indexes that are neither n nor a. E.g. if a=2 and n=1, b and c become 0 and 3 respectively.
|When ncolors[n] is||Table of possible transition colors next[n] is|
For compression purposes, ideally ncolors[n] should be chosen to be the numbers of distinct colors that actually can follow the color n, based on measuring the tile data, and, if ncolors[n]=3, the pivot color a should be the color that is transitioned into most often from color n. This transition in tile data will be encoded in two bits, while the other transitions are encoded in three bits. For other values of ncolors[n], the choice of pivot color is mandated by the actual possible colors.
After the color description table, comes tile data encoding 16 bytes, or 8×8 pixels. Each tile is comprised of eight rows of pixels. Each pixel row begins with 1 bit, that tells whether the row is to be copied. If the bit is set, the previously decoded row is duplicated, and no other data is encoded for this pixel row. At the start of the block, the "previously decoded row" is assumed to contain zero bytes. The previous row is not reset between different tiles, unless a new block begins. If the bit was clear, eight pixels follow for x coordinates 0 to 7. Each pixel is encoded as follows, depending on the color c of the previous pixel:
|2 bits||x = 0||The color of the first pixel c is encoded as a 2-bit integer.|
|nothing||ncolors[c]=0||c does not change, and nothing is encoded.|
|1 bit: 1||ncolors[c]>0||c does not change from previous pixel.|
|1 bit: 0||ncolors[c]=1||c becomes next[c].|
|2 bits: 00||ncolors[c]>1||c becomes next[c].|
|2 bits: 01||ncolors[c]=2||c becomes next[c].|
|3 bits: 010||ncolors[c]=3||c becomes next[c].|
|3 bits: 011||ncolors[c]=3||c becomes next[c].|
After each full tile, if there are still remaining tiles to be decoded, comes one bit that tells what comes next. Value 1 means a new block start, with a new color description table. Value 0 means that more tile data will follow.
Oracle common byte
This codec, used in The Legend of Zelda: Oracle of Seasons and The Legend of Zelda: Oracle of Ages according to Dwedit, is roughly comparable to RLE in complexity. For each 16-byte block, the compressor determines the most common byte in that block. The compressed data for each block starts with a 2-byte mask, then the most common byte, then other bytes in order that aren't the most common.
To decode a block: First read the two-byte mask. If the entire mask is zero, set the bit corresponding to the first byte to true. Then read the most common byte. For each bit in the mask, if the bit is 1, write the most common byte to output; otherwise, copy one byte.
Maximum expansion is 12.5% for any block that has 16 different bytes in it: two bytes of mask and 16 bytes of data.
A lot of games on platforms after the NES use LZ77 family compression methods such as LZSS, which generalizes run-length encoding to allow copying data from several bytes ago, not just the previous byte. Few NES games use LZ77 because the NES's limited work RAM and limited access to video memory make it difficult to resolve back references. (Fewer still use LZW or any other LZ78-based method because of patents that subsisted through the NES, Super NES, and Nintendo 64 eras.)
In LZSS, the mask contains 8 commands, each either to copy a literal byte or to copy a back reference. determines whether the next 8 things are literal bytes or back references. Once all commands have been processed, read another mask.
- Read a mask byte from input.
- For each bit in the mask byte:
- If the bit is 0, this is a literal:
- Copy one byte from input to output.
- Otherwise, this is a back-reference:
- Read and decode a length and distance from input. These will be positive integers.
- Copy length bytes from the previous output, distance bytes before now, to output.
- If the bit is 0, this is a literal:
LZSS flavors vary mainly in how they encode lengths and distances.
The BIOS of the Game Boy Advance and Nintendo DS has a built-in decompressor for a straightforward LZSS flavor with 3- to 18-byte references into the previous 4096 bytes of output. The data format is described in Martin Korth's GBATEK.
Caution: In data intended for decompression directly to the GBA or DS video memory, the second byte of a 16-bit word cannot refer to the first byte of the same word. So the encoder must not write a run with distance 1 that starts at an odd offset.
This flavor of LZSS is used in The Legend of Zelda: Oracle games for Game Boy Color according to Dwedit.
An entire compressed block can be compressed with one of two subtypes of Oracle LZ: short word mode and long word mode. Short words use references of 2 to 8 bytes into the previous 32 bytes of output, and long words use references of 3 to 33 bytes into the previous 2048 bytes. (A length value of 0 means read an additional byte and use that as the length.) Only short words would work very well on NES.
This compression scheme is used in the second-generation Pokémon games on the Game Boy. It is used for bitmap compression.
A byte n is read and split into two parts: code = n >> 5, and c = n & 0x1F. Byte 0xFF marks the end of the stream. Otherwise the code is interpreted as follows:
|0||Copy the next c + 1 bytes to the output.|
|1||Read another byte, and write it to the output c + 1 times.|
|2||Read another byte b1 and byte b2, and write byte b1 to the output c + 1 times, swapping b1 and b2 after each write.|
|3||Write a zero byte (0x00) to the output c + 1 times.|
|4||Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output.|
|5||Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, reversing the bits in each byte.|
|6||Read byte b. If b < 0x80, then read byte b2; offset is b×256+b2 bytes from the output stream beginning. Else offset = b bytes behind from the current output stream end. Copy c + 1 bytes from the output stream at offset to the output, decrementing the read position after each write (i.e. reverse the data).|
|7||Read another byte b. Change code = (c >> 2), and change c = (c & 3) × 256 + b. Re-interpret code according to this table.|
Chrono Trigger LZ
This compression scheme is used in Square‘s Chrono Trigger for the SNES for compression of graphics and various data.
The data consists of packets. Each packet begins with a 16-bit offset, which gives the packet ending offset relative to the beginning of compressed data. At that offset, there is always a control byte. The first control byte sets the size of offsets: If the byte is < 0xC0, then offsetsbits=12. Else offsetbits=11. Interpreting the offsetbits is done only once. The higher-order two bits in the control bytes of all other packets are ignored. The counter is assigned a default value of 8.
The decompression loop goes as follows:
- If the packet end has been reached, a control byte is read, and counter is assigned the low 6 bits of that byte (i.e. counter = byte & 0x3F). If the counter is zero, the decompression is complete and ends there. If the counter was nonzero, a new 16-bit packet end offset is read.
- If the end of the packet has not yet been reached, a mask byte is read.
- If the mask byte is zero, then next eight bytes are copied verbatim to the output. The counter is not affected.
- If the mask byte was nonzero, its each bit is read, beginning from the lowest-order bit. The number of bits to be read is determined by counter (which is in range 1—63, i.e. it can be both smaller and greater than eight).
- If the bit is zero, a single byte is copied verbatim to the output.
- If the bit is nonzero, a 16-bit number is read from the input. offset becomes the lowest-order offsetbits from that number, and length is the rest of the bits plus 3. The decompressor copies length bytes from offset bytes behind to the output.
- After all bits have been read, the counter is reset to the default value of 8, and the decompression loop continues.