8-bit Multiply

From NESdev Wiki
Revision as of 10:44, 12 October 2017 by Rainwarrior (talk | contribs) (→‎frantik: it seems more responsible just to fix the infinite loop case than add a warning about it)
Jump to navigationJump to search

Since the 6502 CPU has no multiplication instruction, this feature has to be written in software.

Bregalad

This routine by Bregalad does binary long multiplication (a.k.a. "the Russian Peasant method").

;8-bit multiply
;by Bregalad
;Enter with A,Y, numbers to multiply
;Output with YA = 16-bit result (X is unchanged)
Multiply:
	sty Factor  ;Store input factor
	ldy #$00
	sty Res
	sty Res2    ;Clear result
	ldy #$08    ;Number of shifts needed

-	lsr A       ;Shift right input number
	bcc +       ;Check if bit is set
	pha
	lda Res2
	clc
	adc Factor
	sta Res2    ;If so add number to result
	pla
+	lsr Res2    ;Shift result right
	ror Res
	dey
	bne -
	lda Res
	ldy Res2
	rts

An optimization for efficiency is made here; binary long multiplication requires adding one multiplicand to the result at various bit-shifts (i.e. multiply by each power of 2). The naive approach might maintain the value to add as a 16-bit value, left shifting it once each iteration to reach the next power of 2. This one, however, takes advantage of the input being only 8-bits wide, and instead pre-multiplies the result by 256 (8 bits), and each iteration instead right-shifts the result. After 8 iterations the pre-multiply is undone, and the advantage gained is that only the shift is 16-bit; adding the multiplicand remains an efficient 8-bit add.

Assuming no page crossings and zero page, this routine takes 184-320 cycles, not counting the JSR to call it. (Each non-zero bit in A adds 17 cycles.)

frantik

This routine by frantik is another binary long multiplication.

; Multiply two bytes in memory using Russian peasant algorithm
; by frantik

; Accepts: value1 and value2, labels for bytes in memory
;   value2 should ideally be the lesser of the two input values 
;   for increased efficiency
; Uses: $00, $01, $02 for temporary variables
; Returns: 16 bit value in $00 and $01

.macro multiply value1, value2

ret  = $00              ; return value
temp = $02              ; temp storage

	lda #$00        ; clear temporary variables
	sta ret
	sta ret+1
	sta temp
	lda value2
	bne +end
	jmp +start:

-loop:
	asl value1      ; double first value
	rol temp        ; using 16bit precision
	lsr value2      ; halve second vale
+start:
	lda value2      ;
	and #01         ; is new 2nd value an odd number?
	beq -loop:      ; 
	clc             ; if so, add new 1st value to running total
	lda ret         ;
	adc value1      ;
	sta ret         ;
	lda ret+1       ;
	adc temp        ;
	sta ret+1       ;
	lda value2      ;
	cmp #01         ; is 2nd value 1?  if so, we're done
	bne -loop:      ; otherwise, loop
+end:
.endm

Assuming no page crossings and zero page, this routine takes 17-403 cycles, though it is an in-place macro generating 48 bytes of new code each time it is used.

External Links

  • Forum post: Fast multi, ... - Faster multiplication routine using lookup tables, by keldon.