Delay code: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
(Add code for short parametrized delays)
(Reorganize the file into two sections: Inline code and callable functions)
Line 6: Line 6:
If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at [[Fixed cycle delay]].
If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at [[Fixed cycle delay]].


=== A + 2 cycles of delay (A = 0—1, Z reflects A) ===
=== Inline code ===
 
==== A + 2 cycles of delay (A = 0—1, Z reflects A) ====


<pre>        bne @1
<pre>        bne @1
@1:</pre>
@1:</pre>


=== A + 4 cycles of delay (A = 0&mdash;1, Z doesn&rsquo;t reflect A) ===
==== A + 4 cycles of delay (A = 0&mdash;1, Z doesn&rsquo;t reflect A) ====


<pre>        ora #0
<pre>        ora #0
Line 17: Line 19:
@1:</pre>
@1:</pre>


=== A + 5 cycles of delay (A = 0&mdash;2, Z reflects A) ===
==== A + 5 cycles of delay (A = 0&mdash;2, Z reflects A) ====


<pre>        beq @2
<pre>        beq @2
Line 24: Line 26:
@3:</pre>
@3:</pre>


=== A + 6 cycles of delay (A = 0&mdash;3, Z reflects A) ===
==== A + 6 cycles of delay (A = 0&mdash;3, Z reflects A) ====


<pre>        beq @2
<pre>        beq @2
Line 32: Line 34:
@4:</pre>
@4:</pre>


=== A + 9 cycles of delay (A = 0&mdash;4 <em>minus 5</em> and C = 0) ===
==== A + 9 cycles of delay (A = 0&mdash;4 <em>minus 5</em> and C = 0) ====


<pre>        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
<pre>        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
Line 42: Line 44:
@6:</pre>
@6:</pre>


=== A + 9 cycles of delay (A = 0&mdash;5) ===
==== A + 9 cycles of delay (A = 0&mdash;5) ====


<pre>        lsr a
<pre>        lsr a
Line 52: Line 54:
@6:</pre>
@6:</pre>


=== A + 9 cycles of delay (A = 0&mdash;6) ===
==== A + 9 cycles of delay (A = 0&mdash;6) ====


<pre>        lsr a
<pre>        lsr a
Line 63: Line 65:
@7:</pre>
@7:</pre>


=== A + 25 cycles of delay, clobbers A, Z&amp;N, C, V ===
==== A + 15 cycles of delay ====
 
<pre>        sec   
@L:    sbc #5 
        bcs @L  ;  6 6 6 6 6  FB FC FD FE FF
        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
        bcc @4  ;  3 3 2 2 2  FE FF 00 01 02
        lsr    ;  - - 2 2 2  -- -- 00 00 01
        beq @5  ;  - - 3 3 2  -- -- 00 00 01
@4:    lsr    ;  2 2 - - 2  7F 7F -- -- 00
@5:    bcs @6  ;  2 3 2 3 2  7F 7F 00 00 00
@6:</pre>
 
==== 851968×Y + 3328×A + 13×X + 18 cycles of delay ====
 
<pre>        iny
@l1:    nop
        nop
@l2:    cpx #1
        dex
        sbc #0
        bcs @l1
        dey
        bne @l2
        rts</pre>
 
=== Callable functions ===
 
==== A + 25 cycles of delay, clobbers A, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 88: Line 118:
; Total cycles:    25 26 27 28 29 30 31</pre>
; Total cycles:    25 26 27 28 29 30 31</pre>


=== A + 27 cycles of delay, clobbers A, Z&amp;N, C, V ===
==== A + 27 cycles of delay, clobbers A, Z&amp;N, C, V ====


This code has longer overhead than delay_a_25_clocks,
This code has longer overhead than delay_a_25_clocks,
Line 111: Line 141:
@6:    rts    ;</pre>
@6:    rts    ;</pre>


=== 256×A + X + 33 cycles of delay, clobbers A, Z&amp;N, C, V ===
==== 256×A + X + 33 cycles of delay, clobbers A, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 134: Line 164:
Can be trivially changed to swap X, Y.
Can be trivially changed to swap X, Y.


=== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&amp;N, C, V ===
==== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 157: Line 187:
Can be trivially changed to swap X, Y.
Can be trivially changed to swap X, Y.


=== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&amp;N, C, V ===
==== 256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 183: Line 213:
<<Place the function delay_a_27_clocks immediately following here>></pre>
<<Place the function delay_a_27_clocks immediately following here>></pre>


=== 256×A + 16 cycles of delay, clobbers A, Z&amp;N, C, V ===
==== 256×A + 16 cycles of delay, clobbers A, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 227: Line 257:
rts</pre>
rts</pre>


=== 256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&amp;N ===
==== 256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&amp;N ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 250: Line 280:
Can be trivially changed to swap X, Y.
Can be trivially changed to swap X, Y.


=== 256×X + A + 30 cycles of delay, clobbers A, X, Z&amp;N, C, V ===
==== 256×X + A + 30 cycles of delay, clobbers A, X, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;
Line 271: Line 301:
Can be trivially changed to swap X, Y.
Can be trivially changed to swap X, Y.


=== 851968×Y + 3328×A + 13×X + 30 cycles of delay, clobbers A, X, Y, Z&amp;N, C, V ===
==== 851968×Y + 3328×A + 13×X + 30 cycles of delay, clobbers A, X, Y, Z&amp;N, C, V ====


<pre>;;;;;;;;;;;;;;;;;;;;;;;;
<pre>;;;;;;;;;;;;;;;;;;;;;;;;

Revision as of 07:54, 7 May 2016

Delay code

Code that causes a parametrised number of cycles of delay.

Note that all branch instructions are written assuming that no page wrap occurs. If you want to ensure this condition at compile time, use the bccnw/beqnw/etc. macros that are listed at Fixed cycle delay.

Inline code

A + 2 cycles of delay (A = 0—1, Z reflects A)

        bne @1
@1:

A + 4 cycles of delay (A = 0—1, Z doesn’t reflect A)

        ora #0
        bne @1
@1:

A + 5 cycles of delay (A = 0—2, Z reflects A)

        beq @2
        lsr
@2:     bne @3
@3:

A + 6 cycles of delay (A = 0—3, Z reflects A)

        beq @2
        lsr
@2:     beq @4
        bcs @4
@4:

A + 9 cycles of delay (A = 0—4 minus 5 and C = 0)

        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
        bcc @4  ;  3 3 2 2 2  FE FF 00 01 02
        lsr     ;  - - 2 2 2  -- -- 00 00 01
        beq @5  ;  - - 3 3 2  -- -- 00 00 01
@4:     lsr     ;  2 2 - - 2  7F 7F -- -- 00
@5:     bcs @6  ;  2 3 2 3 2  7F 7F 00 00 00
@6:

A + 9 cycles of delay (A = 0—5)

        lsr a
        bcs @2
@2:     beq @5
        lsr
        bcs @6
@5:     bne @6
@6:

A + 9 cycles of delay (A = 0—6)

        lsr a
        bcs @2
@2:     beq @6
        lsr
        beq @7
        bcc @7
@6:     bne @7
@7:

A + 15 cycles of delay

        sec     
@L:     sbc #5  
        bcs @L  ;  6 6 6 6 6  FB FC FD FE FF
        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
        bcc @4  ;  3 3 2 2 2  FE FF 00 01 02
        lsr     ;  - - 2 2 2  -- -- 00 00 01
        beq @5  ;  - - 3 3 2  -- -- 00 00 01
@4:     lsr     ;  2 2 - - 2  7F 7F -- -- 00
@5:     bcs @6  ;  2 3 2 3 2  7F 7F 00 00 00
@6:

851968×Y + 3328×A + 13×X + 18 cycles of delay

        iny
@l1:    nop
        nop
@l2:    cpx #1
        dex
        sbc #0
        bcs @l1
        dey
        bne @l2
        rts

Callable functions

A + 25 cycles of delay, clobbers A, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Clobbers A. Preserves X,Y.
; Time: A+25 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
                  ;       Cycles              Accumulator         Carry flag
                  ; 0  1  2  3  4  5  6          (hex)           0 1 2 3 4 5 6
                  ;
                  ; 6  6  6  6  6  6  6   00 01 02 03 04 05 06
:      sbc #7     ; carry set by CMP
delay_a_25_clocks:
       cmp #7     ; 2  2  2  2  2  2  2   00 01 02 03 04 05 06   0 0 0 0 0 0 0
       bcs :-     ; 2  2  2  2  2  2  2   00 01 02 03 04 05 06   0 0 0 0 0 0 0
       lsr        ; 2  2  2  2  2  2  2   00 00 01 01 02 02 03   0 1 0 1 0 1 0
       bcs *+2    ; 2  3  2  3  2  3  2   00 00 01 01 02 02 03   0 1 0 1 0 1 0
       beq :+     ; 3  3  2  2  2  2  2   00 00 01 01 02 02 03   0 1 0 1 0 1 0
       lsr        ;       2  2  2  2  2         00 00 01 01 01       1 1 0 0 1
       beq @rts   ;       3  3  2  2  2         00 00 01 01 01       1 1 0 0 1
       bcc @rts   ;             3  3  2               01 01 01           0 0 1
:      bne @rts   ; 2  2              3   00 00             01   0 1         0
@rts:  rts        ; 6  6  6  6  6  6  6   00 00 00 00 01 01 01   0 1 1 1 0 0 1
; Total cycles:    25 26 27 28 29 30 31

A + 27 cycles of delay, clobbers A, Z&N, C, V

This code has longer overhead than delay_a_25_clocks, but it can be appended into other functions, as the execution begins from the first instruction.

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A clocks + overhead
; Clobbers A. Preserves X,Y.
; Time: A+27 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
delay_a_27_clocks:
        sec     
@L:     sbc #5  
        bcs @L  ;  6 6 6 6 6  FB FC FD FE FF
        adc #3  ;  2 2 2 2 2  FE FF 00 01 02
        bcc @4  ;  3 3 2 2 2  FE FF 00 01 02
        lsr     ;  - - 2 2 2  -- -- 00 00 01
        beq @5  ;  - - 3 3 2  -- -- 00 00 01
@4:     lsr     ;  2 2 - - 2  7F 7F -- -- 00
@5:     bcs @6  ;  2 3 2 3 2  7F 7F 00 00 00
@6:     rts     ;

256×A + X + 33 cycles of delay, clobbers A, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A:X clocks+overhead
; Time: 256*A+X+33 clocks (including JSR)
; Clobbers A. Preserves X,Y. Has relocations.
;;;;;;;;;;;;;;;;;;;;;;;;
:	; do 256-5 cycles.
	sbc #1			; 2 cycles - Carry was set from cmp
	pha
	 lda #(256-5 - 27-7-2)
	 jsr delay_a_27_clocks
	pla
delay_256a_x_33_clocks:
	cmp #1			; +2
	bcs :-			; +3 (-1)
	; 0-255 cycles remain, overhead = 4
	txa 			; +2; 6; +27 = 33
	;passthru
<<Place the function delay_a_27_clocks immediately following here>>

Can be trivially changed to swap X, Y.

256×A + X + 33 cycles of delay, relocatable, clobbers A, Y, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A:X clocks+overhead
; Time: 256*A+X+33 clocks (including JSR)
; Clobbers A,Y. Preserves X. Relocatable.
;;;;;;;;;;;;;;;;;;;;;;;;
:	; do 256-5 cycles.
	sbc #1			; 2 cycles - Carry was set from cmp
	ldy #48  ;\
        dey      ; |- Clobbers Y; 246 cycles, 253 total
        bpl *-1  ;/
        ldy $A4  ;              ; 3 cycles, 256 total
delay_256a_x_33_clocks_b:
	cmp #1			; +2
	bcs :-			; +3 (-1)
	; 0-255 cycles remain, overhead = 4
	txa 			; +2; 6; +27 = 33
	;passthru
<<Place the function delay_a_27_clocks immediately following here>>

Can be trivially changed to swap X, Y.

256×A + X + 33 cycles of delay, relocatable, clobbers A, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A:X clocks+overhead
; Time: 256*A+X+33 clocks (including JSR)
; Clobbers A. Preserves X,Y. Relocatable.
; Does not depend on delay_a_25_clocks.
;;;;;;;;;;;;;;;;;;;;;;;;
:	; do 256 cycles.	; 5 cycles done so far. Loop is 2+1+ 1+2+1+2+1 + 1+1 = 12 bytes.
	sbc #1			; 2 cycles - Carry was set from cmp
        pha       ;\
         txa      ; |
         ldx #46  ; |
         dex      ; |-          ; 247 cycles, 254 total
         bpl *-1  ; |
         tax      ; |
        pla       ;/
        nop                     ; 2 cycles; 256 cycles total
delay_256a_x_33_clocks_c:
	cmp #1			; +2; 2 cycles overhead
	bcs :-			; +2; 4 cycles overhead
	; 0-255 cycles remain, overhead = 4
	txa 			; +2; 6; +27 = 33
	;passthru
<<Place the function delay_a_27_clocks immediately following here>>

256×A + 16 cycles of delay, clobbers A, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A*256 clocks + overhead
; Clobbers A. Preserves X,Y.
; Time: A*256+16 clocks (including JSR)
; Depends on delay_a_25_clocks
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256a_16_clocks:
	cmp #0
	bne :+
	rts
delay_256a_11_clocks_:
:       pha
	 lda #(256-25-7-2-2-3)
	 jsr delay_a_25_clocks
	pla
	sec
	sbc #1
	bne :-
	rts

Alternative that depends on different function:

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays A*256 clocks + overhead
; Clobbers A. Preserves X,Y.
; Time: A*256+16 clocks (including JSR)
; Depends on delay_a_27_clocks
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256a_16_clocks_b:
	cmp #0
	bne :+
	rts
delay_256a_11_clocks_b_:
:       pha
	 lda #(256-27-7-2-2-3)
	 jsr delay_a_27_clocks
	pla
	sec
	sbc #1
	bne :-
	rts

256×X + 16 cycles of delay, relocatable, clobbers X, Y, Z&N

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays X*256 clocks + overhead
; Clobbers X,Y. Preserves A. Relocatable.
; Time: X*256+16 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256x_16_clocks:
	cpx #0
	bne :+
	rts
delay_256x_11_clocks_:
	;5 cycles done. Loop is 256 cycles
:       ldy #50
	dey
	bne *-1
	dex
	bne :-
	;Loop end is -1 cycles. Total: 4+JSR+RTS = 16
	rts

Can be trivially changed to swap X, Y.

256×X + A + 30 cycles of delay, clobbers A, X, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays X*256+A clocks + overhead
; Clobbers A,X. Preserves Y.
; Depends on delay_a_25_clocks within short branch distance
; Time: X*256+A+30 clocks (including JSR)
;;;;;;;;;;;;;;;;;;;;;;;;
delay_256x_a_30_clocks:
        cpx #0                  ;2
        beq delay_a_25_clocks   ;3
        ;4 cycles done. Must consume 256 cycles; 252 cycles remain.
        pha                             ;3
         lda #(256-4-(3+2+4+2+3))-25    ;2
         jsr delay_a_25_clocks          ;238
        pla                             ;4
        dex                             ;2
        jmp delay_256x_a_30_clocks      ;3

Can be trivially changed to swap X, Y.

851968×Y + 3328×A + 13×X + 30 cycles of delay, clobbers A, X, Y, Z&N, C, V

;;;;;;;;;;;;;;;;;;;;;;;;
; Delays 30+13*(65536*Y+256*A+X) cycles including JSR.
; Clobbers A,X,Y.
delay_851968y_3328a_13x_30_clocks:
        iny
@l1:    nop
        nop
@l2:    cpx #1
        dex
        sbc #0
        bcs @l1
        dey
        bne @l2
        rts

See also