AMD64 Multiprecision Arithmetic
Eric Bainville - Dec 2006Division by a scalar
This function divides the integer represented by (Z,N) by a single word k. Let's start with a naive implementation, using the 64-bit div instruction:
; Loop from msb to lsb
lea Z, [Z + 8*N]
; rem is the input remainder, in 0..k-1
xor REM, REM
align 16
.a
lea Z, [Z - 8]
mov RDX, REM
mov RAX, [Z]
div K
mov [Z], RAX
mov REM, RDX
dec N
jnz .a
; rem is the output remainder, in 0..k-1
The div instruction is very slow, and we get near 74 cycles/word! Efficient alternatives require using mul instead. The trick to do this is to compute once a pseudo inverse k'=264/k, and multiply by k' instead of divide by k. At each step, we multiply ai+264r (the current word combined with the incoming remainder r in 0..k-1) by k' to get a close (and always lower) guess of the effective quotient. Then we multiply it by k to get the next remainder, and make corrections until the remainder is less than k. Actually, we shift all values to maximize the number of exact bits obtained by a single product by the pseudo inverse. The exact quotient and remainder are then determined by additions iteratively.
; Loop from msb to lsb
lea Z, [Z + 8*N]
; rem is the input remainder, in 0..k-1
xor REM, REM
; Build the pseudo-inverse of k: 2^(127-)/k
; First determine the shift in rcx to get the max number of bits in kinv
xor RCX, RCX
mov RAX, K
mov RDX, 1
.kinv1:
inc RCX
ror RDX, 1
shl RAX, 1
jnc .kinv1
dec RCX
; Here, rcx is a left shift moving the msb of k to bit 63
mov MASK, 1
shl MASK, cl
dec MASK
ror MASK, cl ; rcx bits at msb
; Then divide 2^(64+cx) by k (rdx already ok)
xor RAX, RAX
div K
mov KINV, RAX
; Here kinv has 64 bits
align 16
.a:
; Get 64 bits of quotient approx, multiplying
; most significant word of (rem*2^64+input)
mov RAX, MASK
and RAX, [Z - 8]
or RAX, REM
rol RAX, cl
mul KINV
rcl RAX, 1
rcl RDX, 1
mov QUOT, RDX
; Multiply by k and subtract to get remainder
; Subtraction must be done on two words
mov HREM, REM
mov REM, [Z - 8]
mov RAX, QUOT
mul K
sub REM, RAX
sbb HREM, RDX
jz .b ; high word is 0, goto adjust on single word
; Adjust quotient and remainder on two words
.d: inc QUOT
sub REM, K
sbb HREM, 0
jnz .d
; Adjust quotient and remainder on single word
.b: cmp REM, K
jb .c ; rem in 0..k-1, OK
sub REM, K
inc QUOT
jmp .b
; Store result
.c: mov [Z - 8], QUOT
lea Z, [Z - 8]
dec N
jnz .a
This code runs at 17 cycles/word.
| AMD64 Multiprecision : Left and right shifts | Top of Page | Intel64 Multiprecision |
