Schönhage–Strassen algorithm

From testwiki
Revision as of 22:34, 4 January 2025 by 67.180.202.84 (talk) (Pseudocode: Fixed typo)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Template:Short descriptionTemplate:More citations needed

The Schönhage–Strassen algorithm is based on the fast Fourier transform (FFT) method of integer multiplication. This figure demonstrates multiplying 1234 × 5678 = 7006652 using the simple FFT method. Base 10 is used in place of base 2w for illustrative purposes.
Schönhage (on the right) and Strassen (on the left) playing chess in Oberwolfach, 1979

The Schönhage–Strassen algorithm is an asymptotically fast multiplication algorithm for large integers, published by Arnold Schönhage and Volker Strassen in 1971.[1] It works by recursively applying fast Fourier transform (FFT) over the integers modulo 2n+1. The run-time bit complexity to multiply two Template:Mvar-digit numbers using the algorithm is O(nlognloglogn) in [[big O notation|big Template:Mvar notation]].

The Schönhage–Strassen algorithm was the asymptotically fastest multiplication method known from 1971 until 2007. It is asymptotically faster than older methods such as Karatsuba and Toom–Cook multiplication, and starts to outperform them in practice for numbers beyond about 10,000 to 100,000 decimal digits.[2] In 2007, Martin Fürer published an algorithm with faster asymptotic complexity.[3] In 2019, David Harvey and Joris van der Hoeven demonstrated that multi-digit multiplication has theoretical O(nlogn) complexity; however, their algorithm has constant factors which make it impossibly slow for any conceivable practical problem (see galactic algorithm).[4]

Applications of the Schönhage–Strassen algorithm include large computations done for their own sake such as the Great Internet Mersenne Prime Search and [[Approximations of π|approximations of Template:Mvar]], as well as practical applications such as Lenstra elliptic curve factorization via Kronecker substitution, which reduces polynomial multiplication to integer multiplication.[5][6]

Description

This section has a simplified version of the algorithm, showing how to compute the product ab of two natural numbers a,b, modulo a number of the form 2n+1, where n=2kM is some fixed number. The integers a,b are to be divided into D=2k blocks of M bits, so in practical implementations, it is important to strike the right balance between the parameters M,k. In any case, this algorithm will provide a way to multiply two positive integers, provided n is chosen so that ab<2n+1.

Let n=DM be the number of bits in the signals a and b, where D=2k is a power of two. Divide the signals a and b into D blocks of M bits each, storing the resulting blocks as arrays A,B (whose entries we shall consider for simplicity as arbitrary precision integers).

We now select a modulus for the Fourier transform, as follows. Let M be such that DM2M+k. Also put n=DM, and regard the elements of the arrays A,B as (arbitrary precision) integers modulo 2n+1. Observe that since 2n+122M+k+1=D22M+1, the modulus is large enough to accommodate any carries that can result from multiplying a and b. Thus, the product ab (modulo 2n+1) can be calculated by evaluating the convolution of A,B. Also, with g=22M, we have gD/21(mod2n+1), and so g is a primitive Dth root of unity modulo 2n+1.

We now take the discrete Fourier transform of the arrays A,B in the ring /(2n+1), using the root of unity g for the Fourier basis, giving the transformed arrays A^,B^. Because D=2k is a power of two, this can be achieved in logarithmic time using a fast Fourier transform.

Let C^i=A^iB^i (pointwise product), and compute the inverse transform C of the array C^, again using the root of unity g. The array C is now the convolution of the arrays A,B. Finally, the product ab(mod2n+1) is given by evaluating abjCj2Mjmod2n+1.

This basic algorithm can be improved in several ways. Firstly, it is not necessary to store the digits of a,b to arbitrary precision, but rather only up to n+1 bits, which gives a more efficient machine representation of the arrays A,B. Secondly, it is clear that the multiplications in the forward transforms are simple bit shifts. With some care, it is also possible to compute the inverse transform using only shifts. Taking care, it is thus possible to eliminate any true multiplications from the algorithm except for where the pointwise product C^i=A^iB^i is evaluated. It is therefore advantageous to select the parameters D,M so that this pointwise product can be performed efficiently, either because it is a single machine word or using some optimized algorithm for multiplying integers of a (ideally small) number of words. Selecting the parameters D,M is thus an important area for further optimization of the method.

Details

Every number in base B, can be written as a polynomial:

X=i=0NxiBi

Furthermore, multiplication of two numbers could be thought of as a product of two polynomials:

XY=(i=0NxiBi)(j=0NyiBj)

Because, for Bk: ck=(i,j):i+j=kaibj=i=0kaibki, we have a convolution.

By using FFT (fast Fourier transform), used in the original version rather than NTT (Number-theoretic transform),[7] with convolution rule; we get

f^(a*b)=f^(i=0kaibki)=f^(a)f^(b).

That is; Ck=akbk, where Ck is the corresponding coefficient in Fourier space. This can also be written as: fft(a*b)=fft(a)fft(b).

We have the same coefficients due to linearity under the Fourier transform, and because these polynomials only consist of one unique term per coefficient:

f^(xn)=(i2π)nδ(n) and
f^(aX(ξ)+bY(ξ))=aX^(ξ)+bY^(ξ)

Convolution rule: f^(X*Y)= f^(X)f^(Y)

We have reduced our convolution problem to product problem, through FFT.

By finding the FFT of the polynomial interpolation of each Ck, one can determine the desired coefficients.

This algorithm uses the divide-and-conquer method to divide the problem into subproblems.

Convolution under mod N

ck=(i,j):i+jk(modN(n))aibj, where N(n)=2n+1.

By letting:

ai=θiai and bj=θjbj,

where θN=1 is the nth root, one sees that:[8]

Ck=(i,j):i+j=k(modN(n))aibj=θk(i,j):i+jk(modN(n))aibj=θk((i,j):i+j=kaibj+(i,j):i+j=k+naibj)=θk((i,j):i+j=kaibjθk+(i,j):i+j=k+naibjθn+k)=(i,j):i+j=kaibj+θn(i,j):i+j=k+naibj.

This mean, one can use weight θi, and then multiply with θk after.

Instead of using weight, as θN=1, in first step of recursion (when n=N), one can calculate:

Ck=(i,j):i+jk(modN(N))=(i,j):i+j=kaibj(i,j):i+j=k+naibj

In a normal FFT which operates over complex numbers, one would use:

exp(2kπin)=cos2kπn+isin2kπn,k=0,1,,n1.
Ck=θk((i,j):i+j=kaibjθk+(i,j):i+j=k+naibjθn+k)=ei2πk/n((i,j):i+j=kaibjei2πk/n+(i,j):i+j=k+naibjei2π(n+k)/n)

However, FFT can also be used as a NTT (number theoretic transformation) in Schönhage–Strassen. This means that we have to use Template:Mvar to generate numbers in a finite field (for example GF(2n+1)).

A root of unity under a finite field Template:Math, is an element a such that θr11 or θrθ. For example Template:Math, where Template:Mvar is a prime number, gives {1,2,,p1}.

Notice that 2n1 in GF(2n+1) and 21 in GF(2n+2+1). For these candidates, θN1 under its finite field, and therefore act the way we want .

Same FFT algorithms can still be used, though, as long as Template:Mvar is a root of unity of a finite field.

To find FFT/NTT transform, we do the following:

Ck=f^(k)=f^(θk((i,j):i+j=kaibjθk+(i,j):i+j=k+naibjθn+k))Ck+k=f^(k+k)=f^((i,j):i+j=2kaibjθk+(i,j):i+j=n+2kaibjθn+k)=f^((i,j):i+j=2kaibjθk+(i,j):i+j=2k+naibjθn+k)=f^(Akk)f^(Bkk)+f^(Akk+n)f^(Bkk+n)

First product gives contribution to ck, for each Template:Mvar. Second gives contribution to ck, due to (i+j) mod N(n).

To do the inverse:

Ck=2mf1^(θkCk+k) or Ck=f1^(θkCk+k)

depending whether data needs to be normalized.

One multiplies by 2m to normalize FFT data into a specific range, where 1n2mmodN(n), where Template:Mvar is found using the modular multiplicative inverse.

Implementation details

Why N = 2Template:Sup + 1 mod N

In Schönhage–Strassen algorithm, N=2M+1. This should be thought of as a binary tree, where one have values in 0index2M=2i+j. By letting K[0,M], for each Template:Mvar one can find all i+j=K, and group all (i,j) pairs into M different groups. Using i+j=k to group (i,j) pairs through convolution is a classical problem in algorithms.[9]

Having this in mind, N=2M+1 help us to group (i,j) into M2k groups for each group of subtasks in depth Template:Mvar in a tree with N=2M2k+1

Notice that N=2M+1=22L+1, for some L. This makes N a Fermat number. When doing mod N=2M+1=22L+1, we have a Fermat ring.

Because some Fermat numbers are Fermat primes, one can in some cases avoid calculations.

There are other N that could have been used, of course, with same prime number advantages. By letting N=2k1, one have the maximal number in a binary number with k+1 bits. N=2k1 is a Mersenne number, that in some cases is a Mersenne prime. It is a natural candidate against Fermat number N=22L+1

In search of another N

Doing several mod calculations against different Template:Mvar, can be helpful when it comes to solving integer product. By using the Chinese remainder theorem, after splitting Template:Mvar into smaller different types of Template:Mvar, one can find the answer of multiplication Template:Mvar [10]

Fermat numbers and Mersenne numbers are just two types of numbers, in something called generalized Fermat Mersenne number (GSM); with formula:[11]

Gq,p,n=i=1pq(pi)n=qpn1qn1
Mp,n=G2,p,n

In this formula, M2,2k is a Fermat number, and Mp,1 is a Mersenne number.

This formula can be used to generate sets of equations, that can be used in CRT (Chinese remainder theorem):[12]

g(Mp,n1)21(modMp,n), where Template:Mvar is a number such that there exists an Template:Mvar where x2g(modMp,n), assuming N=2n

Furthermore; g2(p1)n1a2n1(modMp,n), where Template:Mvar is an element that generates elements in {1,2,4,...2n1,2n} in a cyclic manner.

If N=2t, where 1tn, then gt=a(2n1)2nt.

How to choose K for a specific N

The following formula is helpful, finding a proper Template:Mvar (number of groups to divide Template:Mvar bits into) given bit size Template:Mvar by calculating efficiency :[13]

E=2NK+kn Template:Mvar is bit size (the one used in 2N+1) at outermost level. Template:Mvar gives NK groups of bits, where K=2k.

Template:Mvar is found through Template:Mvar and Template:Mvar by finding the smallest Template:Mvar, such that 2N/K+kn=K2x

If one assume efficiency above 50%, n22NK,Kn and Template:Mvar is very small compared to rest of formula; one get

K2N

This means: When something is very effective; Template:Mvar is bound above by 2N or asymptotically bound above by N

Pseudocode

Following algorithm, the standard Modular Schönhage-Strassen Multiplication algorithm (with some optimizations), is found in overview through [14] Template:Olist

  • T3MUL = Toom–Cook multiplication
  • SMUL = Schönhage–Strassen multiplication
  • Evaluate = FFT/IFFT

Further study

For implemantion details, one can read the book Prime Numbers: A Computational Perspective.[15] This variant differs somewhat from Schönhage's original method in that it exploits the discrete weighted transform to perform negacyclic convolutions more efficiently. Another source for detailed information is Knuth's The Art of Computer Programming.[16]

Optimizations

This section explains a number of important practical optimizations, when implementing Schönhage–Strassen.

Use of other multiplications algorithm, inside algorithm

Below a certain cutoff point, it's more efficient to use other multiplication algorithms, such as Toom–Cook multiplication.[17]

Square root of 2 trick

The idea is to use 2 as a root of unity of order 2n+2 in finite field GF(2n+2+1) ( it is a solution to equation θ2n+21(mod2n+2+1)), when weighting values in NTT (number theoretic transformation) approach. It has been shown to save 10% in integer multiplication time.[18]

Granlund's trick

By letting m=N+h, one can compute uvmod2N+1 and (umod2h)(vmod2h). In combination with CRT (Chinese Remainder Theorem) to find exact values of multiplication Template:Mvar[19]

References

Template:Reflist

Template:Number-theoretic algorithms

  1. Template:Cite journal
  2. Karatsuba multiplication has asymptotic complexity of about O(n1.58) and Toom–Cook multiplication has asymptotic complexity of about O(n1.46). Template:Pb Template:Cite journal Template:Pb A discussion of practical crossover points between various algorithms can be found in: Overview of Magma V2.9 Features, arithmetic section Template:Webarchive Template:Pb Luis Carlos Coronado García, "Can Schönhage multiplication speed up the RSA encryption or decryption? Archived", University of Technology, Darmstadt (2005) Template:Pb The GNU Multi-Precision Library uses it for values of at least 1728 to 7808 64-bit words (33,000 to 150,000 decimal digits), depending on architecture. See: Template:Pb Template:Cite web Template:Pb Template:Cite web Template:Pb Template:Cite web
  3. Fürer's algorithm has asymptotic complexity O(nlogn2Θ(log*n)). Template:Pb Template:Cite conferenceTemplate:PbTemplate:Cite journal Template:Pb Fürer's algorithm is used in the Basic Polynomial Algebra Subprograms (BPAS) open source library. See: Template:Cite book
  4. Template:Cite journal
  5. This method is used in INRIA's ECM library.
  6. Template:Cite web
  7. Template:Cite web
  8. Template:Cite web
  9. Template:Cite book
  10. Template:Cite web
  11. Template:Cite web
  12. Template:Cite web
  13. Template:Cite web
  14. Template:Cite web
  15. R. Crandall & C. Pomerance. Prime Numbers – A Computational Perspective. Second Edition, Springer, 2005. Section 9.5.6: Schönhage method, p. 502. Template:ISBN
  16. Template:Cite book
  17. Template:Cite web
  18. Template:Cite web
  19. Template:Cite web