Cycle counting: Difference between revisions

Revision as of 06:22, 11 October 2020

It is often useful to delay a specific number of CPU cycles. Timing raster effects or generating PCM audio are some examples that might utilize this. This article outlines a few relevant techniques.

Instruction timings

You can use a comprehensive guide^[1]^[2] as reference for instruction timings, but there are some rules-of-thumb^[3] that can help remember most of them:

Each byte of memory read or write adds another cycle to the instruction. This includes fetching the instruction, and each byte of its operand, then any memory it references.
Indexed instructions which cross a page take 1 extra cycle to adjust the high byte of the effective address first.
Read-modify-write instructions take an extra cycle.
Instructions that modify the stack take extra cycles.
"Extra" cycles often include an extra read or write that usually does not affect the outcome.
There is a minimum of 2 cycles per instruction.

Examples:

SEC - 2 cycles: 1 byte opcode, but has to wait for the 2-cycle minimum.
AND #imm - 2 cycles: opcode + operand = 2 bytes. Only affects registers.
LDA zp - 3 cycles: opcode + operand + byte fetched from zp.
STA abs - 4 cycles: opcode + 2 byte operand + byte written to abs.
LDA abs, X - 4 or 5 cycles: opcode + 2 byte operand + read from abs, but if the addition of the X index causes a page crossing it delays 1 extra cycle.
ASL zp - 5 cycles: opcode + operand + read from zp + write to zp, but it takes 1 extra cycle to modify the value.
LDA (indirect), Y - 5 or 6 cycles: opcode + operand + two reads from zp + read from indirect address. 1 extra cycle if a page is crossed.
STA (indirect), Y - 6 cycles: like LDA (indirect) but assumes the worst case of page crossing, so always spends 1 extra cycle reading in case the page correction is being applied.
PHA - 3 cycles: opcode + stack write, but requires 1 extra cycle to perform the stack operation.
RTS - 6 cycles: opcode + two stack reads, but requires 3 extra cycles to perform the stack operations.

Short delays

Here are few ways to create short delays without side effects. As the shortest instruction time is 2 cycles, it is not possible to delay 1 cycle on its own.

NOP - 2 cycles, 1 byte, no side effects
JMP *+3 - 3 cycles, 3 bytes, no side effects
Bxx *+2 - 3 cycles, 2 bytes, no side effects but requires a known flag state (e.g. BCC if carry is known to be clear)
IGN zp - 3 cycles, 2 bytes, only side effect is a read, unofficial instruction

Clockslide

A clockslide^[4] is a sequence of instructions that wastes a small constant amount of cycles plus one cycle per executed byte, no matter whether it's entered on an odd or even address. With official instructions, one can construct a clockslide from CMP instructions: ... C9 C9 C9 C9 C5 EA Disassemble from the start and you get CMP #$C9 CMP #$C9 CMP $EA (6 bytes, 7 cycles). Disassemble one byte in and you get CMP #$C9 CMP #$C5 NOP (5 bytes, 6 cycles). The entry point can be controlled with an indirect jump or the RTS Trick to precisely control raster effect or sample playback timing.

CMP has a side effect of destroying most of the flags, but unofficial instructions that skip one byte can be used to preserve them. For example, replace $C9 (CMP) with $89 or $80, which skips one immediate byte, and replace $C5 with $04, $44, or $64, which reads a byte from zero page and ignores it.

References

[1] Obelisk: 6502 instruction reference

[2] 6502_cpu.txt: cycle-by-cycle instruction behaviour

[3] Forum: Is there a logic to instruction timings?

[4] Clockslide: How to waste an exact number of clock cycles on the 6502

[1]

[2]

[3]

[4]

@@ Line 1: / Line 1: @@
 It is often useful to delay a specific number of CPU cycles. Timing raster effects or generating PCM audio are some examples that might utilize this. This article outlines a few relevant techniques.
+== Instruction timings ==
+You can use a comprehensive
+guide<ref>[http://www.obelisk.me.uk/6502/reference.html Obelisk: 6502 instruction reference]</ref><ref>[http://nesdev.org/6502_cpu.txt 6502_cpu.txt: cycle-by-cycle instruction behaviour]</ref>
+as reference for instruction timings,
+but there are some
+rules-of-thumb<ref>[https://forums.nesdev.org/viewtopic.php?p=256515 Forum: Is there a logic to instruction timings?]</ref>
+that can help remember most of them:
+* Each byte of memory read or write adds another cycle to the instruction. This includes fetching the instruction, and each byte of its operand, then any memory it references.
+* Indexed instructions which cross a page take 1 extra cycle to adjust the high byte of the effective address first.
+* Read-modify-write instructions take an extra cycle.
+* Instructions that modify the stack take extra cycles.
+* "Extra" cycles often include an extra read or write that usually does not affect the outcome.
+* There is a minimum of 2 cycles per instruction.
+Examples:
+* <code>SEC</code> - 2 cycles: 1 byte opcode, but has to wait for the 2-cycle minimum.
+* <code>AND #imm</code> - 2 cycles: opcode + operand = 2 bytes. Only affects registers.
+* <code>LDA zp</code> - 3 cycles: opcode + operand + byte fetched from zp.
+* <code>STA abs</code> - 4 cycles: opcode + 2 byte operand + byte written to abs.
+* <code>LDA abs, X</code> - 4 or 5 cycles: opcode + 2 byte operand + read from abs, but if the addition of the X index causes a page crossing it delays 1 extra cycle.
+* <code>ASL zp</code> - 5 cycles: opcode + operand + read from zp + write to zp, but it takes 1 extra cycle to modify the value.
+* <code>LDA (indirect), Y</code> - 5 or 6 cycles: opcode + operand + two reads from zp + read from indirect address. 1 extra cycle if a page is crossed.
+* <code>STA (indirect), Y</code> - 6 cycles: like LDA (indirect) but assumes the worst case of page crossing, so always spends 1 extra cycle reading in case the page correction is being applied.
+* <code>PHA</code> - 3 cycles: opcode + stack write, but requires 1 extra cycle to perform the stack operation.
+* <code>RTS</code> - 6 cycles: opcode + two stack reads, but requires 3 extra cycles to perform the stack operations.
 == Short delays ==

Cycle counting: Difference between revisions

Revision as of 06:22, 11 October 2020

Contents

Instruction timings

Short delays

Clockslide

References

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools