Cycle counting: Difference between revisions

From NESdev Wiki
Jump to navigationJump to search
(starting page for cycle counting topics and techniques)
 
(Compiling some rule-of-thumb examples and references)
Line 1: Line 1:
It is often useful to delay a specific number of CPU cycles. Timing raster effects or generating PCM audio are some examples that might utilize this. This article outlines a few relevant techniques.
It is often useful to delay a specific number of CPU cycles. Timing raster effects or generating PCM audio are some examples that might utilize this. This article outlines a few relevant techniques.
== Instruction timings ==
You can use a comprehensive
guide<ref>[http://www.obelisk.me.uk/6502/reference.html Obelisk: 6502 instruction reference]</ref><ref>[http://nesdev.org/6502_cpu.txt 6502_cpu.txt: cycle-by-cycle instruction behaviour]</ref>
as reference for instruction timings,
but there are some
rules-of-thumb<ref>[https://forums.nesdev.org/viewtopic.php?p=256515 Forum: Is there a logic to instruction timings?]</ref>
that can help remember most of them:
* Each byte of memory read or write adds another cycle to the instruction. This includes fetching the instruction, and each byte of its operand, then any memory it references.
* Indexed instructions which cross a page take 1 extra cycle to adjust the high byte of the effective address first.
* Read-modify-write instructions take an extra cycle.
* Instructions that modify the stack take extra cycles.
* "Extra" cycles often include an extra read or write that usually does not affect the outcome.
* There is a minimum of 2 cycles per instruction.
Examples:
* <code>SEC</code> - 2 cycles: 1 byte opcode, but has to wait for the 2-cycle minimum.
* <code>AND #imm</code> - 2 cycles: opcode + operand = 2 bytes. Only affects registers.
* <code>LDA zp</code> - 3 cycles: opcode + operand + byte fetched from zp.
* <code>STA abs</code> - 4 cycles: opcode + 2 byte operand + byte written to abs.
* <code>LDA abs, X</code> - 4 or 5 cycles: opcode + 2 byte operand + read from abs, but if the addition of the X index causes a page crossing it delays 1 extra cycle.
* <code>ASL zp</code> - 5 cycles: opcode + operand + read from zp + write to zp, but it takes 1 extra cycle to modify the value.
* <code>LDA (indirect), Y</code> - 5 or 6 cycles: opcode + operand + two reads from zp + read from indirect address. 1 extra cycle if a page is crossed.
* <code>STA (indirect), Y</code> - 6 cycles: like LDA (indirect) but assumes the worst case of page crossing, so always spends 1 extra cycle reading in case the page correction is being applied.
* <code>PHA</code> - 3 cycles: opcode + stack write, but requires 1 extra cycle to perform the stack operation.
* <code>RTS</code> - 6 cycles: opcode + two stack reads, but requires 3 extra cycles to perform the stack operations.


== Short delays ==
== Short delays ==

Revision as of 06:22, 11 October 2020

It is often useful to delay a specific number of CPU cycles. Timing raster effects or generating PCM audio are some examples that might utilize this. This article outlines a few relevant techniques.

Instruction timings

You can use a comprehensive guide[1][2] as reference for instruction timings, but there are some rules-of-thumb[3] that can help remember most of them:

  • Each byte of memory read or write adds another cycle to the instruction. This includes fetching the instruction, and each byte of its operand, then any memory it references.
  • Indexed instructions which cross a page take 1 extra cycle to adjust the high byte of the effective address first.
  • Read-modify-write instructions take an extra cycle.
  • Instructions that modify the stack take extra cycles.
  • "Extra" cycles often include an extra read or write that usually does not affect the outcome.
  • There is a minimum of 2 cycles per instruction.

Examples:

  • SEC - 2 cycles: 1 byte opcode, but has to wait for the 2-cycle minimum.
  • AND #imm - 2 cycles: opcode + operand = 2 bytes. Only affects registers.
  • LDA zp - 3 cycles: opcode + operand + byte fetched from zp.
  • STA abs - 4 cycles: opcode + 2 byte operand + byte written to abs.
  • LDA abs, X - 4 or 5 cycles: opcode + 2 byte operand + read from abs, but if the addition of the X index causes a page crossing it delays 1 extra cycle.
  • ASL zp - 5 cycles: opcode + operand + read from zp + write to zp, but it takes 1 extra cycle to modify the value.
  • LDA (indirect), Y - 5 or 6 cycles: opcode + operand + two reads from zp + read from indirect address. 1 extra cycle if a page is crossed.
  • STA (indirect), Y - 6 cycles: like LDA (indirect) but assumes the worst case of page crossing, so always spends 1 extra cycle reading in case the page correction is being applied.
  • PHA - 3 cycles: opcode + stack write, but requires 1 extra cycle to perform the stack operation.
  • RTS - 6 cycles: opcode + two stack reads, but requires 3 extra cycles to perform the stack operations.

Short delays

Here are few ways to create short delays without side effects. As the shortest instruction time is 2 cycles, it is not possible to delay 1 cycle on its own.

  • NOP - 2 cycles, 1 byte, no side effects
  • JMP *+3 - 3 cycles, 3 bytes, no side effects
  • Bxx *+2 - 3 cycles, 2 bytes, no side effects but requires a known flag state (e.g. BCC if carry is known to be clear)
  • IGN zp - 3 cycles, 2 bytes, only side effect is a read, unofficial instruction

Clockslide

A clockslide[4] is a sequence of instructions that wastes a small constant amount of cycles plus one cycle per executed byte, no matter whether it's entered on an odd or even address. With official instructions, one can construct a clockslide from CMP instructions: ... C9 C9 C9 C9 C5 EA Disassemble from the start and you get CMP #$C9 CMP #$C9 CMP $EA (6 bytes, 7 cycles). Disassemble one byte in and you get CMP #$C9 CMP #$C5 NOP (5 bytes, 6 cycles). The entry point can be controlled with an indirect jump or the RTS Trick to precisely control raster effect or sample playback timing.

CMP has a side effect of destroying most of the flags, but unofficial instructions that skip one byte can be used to preserve them. For example, replace $C9 (CMP) with $89 or $80, which skips one immediate byte, and replace $C5 with $04, $44, or $64, which reads a byte from zero page and ignores it.

References