Scanning large tables

From NESdev Wiki
Revision as of 21:18, 24 September 2013 by Blargg (talk | contribs) (expand progression to optimal approach, add source lang tags to code)
Jump to navigationJump to search

Scanning tables larger than 256 bytes is more involved than scanning small tables, because there are no 16-bit equivalents to X and Y. Consider clearing a large block of RAM quickly. A straight-forward approach is to keep a 16-bit index in X and Y, and increment it until it reaches the desired index:

<source lang="6502"> begin = ... ; address of first byte to clear count = ... ; number of bytes

.zeropage addr: .res 2

.code

       lda #<begin
       sta addr
       lda #>begin
       sta addr+1
       
       lda #0
       
       ; X and Y form 16-bit index, with Y holding low byte
       ldx #0
       ldy #0
       

loop:  ; Clear one byte

       sta (addr),y
       
       ; Go to next byte
       iny
       bne nohigh
       inx
       inc addr+1

nohigh:

       ; See if low and high bytes of index match count
       cpy #<count
       bne loop
       cpx #>count
       bne loop

</source>

Using the same approach used in Efficient Forwards Scanning, we can make this more efficient.

First, consider the case of clearing all 64K of RAM. X and Y start out at 0, and they increment up to $FF $FF, then the loop ends.

<source lang="6502">

       lda #0
       sta addr
       lda #0
       sta addr+1
       
       lda #0
       
       ldx #0
       ldy #0
       

loop: sta (addr),y

       iny
       bne nohigh
       inx
       inc addr+1

nohigh:

       cpy #0
       bne loop
       cpx #0
       bne loop

</source>

Since we end once X and Y wrap to 0, we can optimize the inner loop to not check X every time:

<source lang="6502">

       lda #0
       sta addr
       lda #0
       sta addr+1
       
       lda #0
       
       ldx #0
       ldy #0
       

loop: sta (addr),y

       iny
       bne loop
       inc addr+1
       inx
       bne loop

</source>

If we wanted to clear just the last $103 bytes of memory, we'd start with the values the above clear loop has when there are only $103 bytes remaining. So we'd start with X=$FE and Y=$FD, and add $FE00 to addr. This would first clear a byte at $FEFD, then $FEFE, $FEFF, and the 256 bytes beginning at $FF00, as desired:

<source lang="6502">

       lda #0
       sta addr
       lda #$FE
       sta addr+1
       
       lda #0
       
       ldx #$FE
       ldy #$FD
       

loop: sta (addr),y

       iny
       bne loop
       inc addr+1
       inx
       bne loop

</source>

Now, if we can have the above loop clear the last $103 bytes of the 64K block "beginning" at address 0, we can have it clear the last $103 bytes of a 64K block that "begins" at any address in memory (it would wrap around to the beginning). For example, a 64K block at $1234 extends through $FFFF, then back to $0000 and through $1233, covering 64K total, and the last $103 bytes of this table would go from $1131-$1233.

Thus, we can use this to clear a block of $103 bytes anywhere in memory; it doesn't matter whether they are the last $103 bytes of a 64K-byte block, since the loop only accesses those $103 bytes. But it does add $FD to the address, since Y starts out with that value, so we must subtract that. The following efficiently clears $103 bytes anywhere in memory:

<source lang="6502">

       lda #<(begin - $FD)
       sta addr
       lda #>(begin - $FD)
       sta addr+1
       
       lda #0
       
       ldx #$FE
       ldy #$FD
       

loop: sta (addr),y

       iny
       bne loop
       
       inc addr+1
       inx
       bne loop

</source>

Addr starts out as begin-$FD. The first time through the loop, Y=$FD, so it accesses a byte at begin-$FD+$FD, that is, at begin, as desired. Then Y=$FE, so it accesses begin+1. Then Y=$FF and it accesses begin+2. Y wraps around to 0 and the inner loop falls through to the outer loop, which increments the high byte of addr and increments X to $FF and loops back. Now, Y=0 and addr=begin-$FD+$100, or more simply, begin+3, which is the desired address to be accessing this time through. Y keeps incrementing to $FF, then the inner and outer loops end, and it's cleared exactly $103 bytes of memory.

The general pattern is to load X with the high byte of $10000-count, Y with the low byte, and also subtract Y's initial value from begin. $10000-count can more simply be calculated as -count, since you get the same result:

<source lang="6502">

       lda #<(begin - <-count)
       sta addr
       lda #>(begin - <-count)
       sta addr+1
       
       lda #0
       
       ldx #>-count
       ldy #<-count
       

loop: sta (addr),y

       iny
       bne loop
       inc addr+1
       inx
       bne loop

</source>

If the address and/or size isn't known until the program is running, the above must be done at run-time with instructions, rather than relying on the assembler:

<source lang="6502"> .zeropage addr: .res 2 size: .res 2

.code

       ; Clears memory from addr through addr+size-1.
       ; Clears all memory if size=0.
       
       ; Adjust addr. Subtracting low byte of negated size is the same
       ; as adding the low byte of size and then a high byte of $FF,
       ; which greatly simplifies this calculation.
       lda addr
       clc
       adc size
       sta addr
       lda addr+1
       adc #$FF
       sta addr+1
       
       ; Subtract size from 0 to negate it, and put that into X and Y
       lda #0
       sec
       sbc size
       tay
       lda #0
       sbc size+1
       tax
       
       lda #0
       

loop: sta (addr),y

       iny
       bne loop
       inc addr+1
       inx
       bne loop

</source>

An alternate approach of similar efficienty is a dual loop, with the first handling full 256-byte pages efficiently and the second handling the final partial page:

<source lang="6502">

       ; Clears memory from addr through addr+size-1.
       ; Clears nothing if size=0.
       
       lda #0
       
       ldy #0
       
       ; Load page count and handle case where it's zero
       ldx size+1
       beq final
       
       ; Do full pages first

pages: sta (addr),y

       iny
       bne pages
       inc addr+1
       dex
       bne pages
       

final: ldx size

       beq done
       
       ldy #0

loop: sta (addr),y

       iny
       dex
       bne loop
       

done: </source>

This dual-loop approach works if the loop operation is short (here just the STA (addr),Y). If lots is done to each table entry, the previous approach is preferable due to reduced code size and not having to duplicate the code that operates on the table.