I just want to multiply two big numbers modulo a big number

by math_explorer, Mar 28, 2016, 5:15 AM

without overflowing or writing my own big integer structure or wasting too many processor cycles, because you never know, Time Limit Exceeded penalty is unforgiving. They're not that large. They'll fit into a long long just fine. How hard could it be?

This is simple and works with $m < 2^{62}$, but it's kind of slow. (We are assuming in all these below functions that the caller has already modded $a$ and $b$, so $0 \leq a, b < m$. And that long longs are 64 bits, yes. You might be able to get an extra bit by making everything unsigned and being really careful? But let's not. Also, I am avoiding the remainder operation everywhere, because it's quite slow; since we know tight bounds on each of the arithmetic operations we do, we can get by with subtracting $m$ as we find it necessary. Testing on my machine says this, using subtractions instead of modding, is a 9x speedup, which is a big deal if this is in an inner loop!)
long long mul1(long long a, long long b, long long m) {
    long long res = 0;
    while (b) {
        if (b & 1) {
            res += a;
            if (res >= m) res -= m;
        }
        b >>= 1;
        a <<= 1;
        if (a >= m) a -= m;
    }
    return res;
}

While carelessly Googling, I found some references to an algorithm called Schrage multiplication that seemed to do the same thing much more carefully. But references were scattered and incomplete and paid much less attention to performance than I cared about. Finally I reconstructed roughly what it looks like:
long long schrage(long long a, long long b, long long m) {
    if (a == 0) { return 0; }
    long long quot = m / a, rem = m % a;
    long long result = a * (b % quot) - rem * (b / quot);
    while (result < 0) { result += m; }
    return result;
}

You can work out the modular arithmetic yourself and discover that, somewhat amazingly, \[ \mathtt{result} \equiv ab \bmod{m}. \]But note that this does not work with arbitrary $a, b, m$ in range. As usually written, Schrage's algorithm seems to require that $\mathtt{quot} \geq \mathtt{rem}$. However, we can loosen that a little; the key is really just that the computation $\mathtt{rem} * (b / \mathtt{quot})$ should not overflow, and this is roughly true if $a \approx \sqrt{m}$. This means that to get a function that works for general $a$, we can split $a$ into two and use this as a subroutine.

Also, we're aggressively avoiding the remainder operation again. Under the original Schrage requirements, the while can be replaced with an if, but our loosening will require it, and using the remainder operation loses our speed margin over mul1. Optimization is hard.

It's hard to loosen it in a more precise and useful way, though. I juggled expressions for quite a while but couldn't find a good intermediate point to abstract out, so instead, I'll just proclaim that this works, again as long as $0 \leq a, b < m < 2^{62}$:
long long mul2(long long a, long long b, long long m) {
    int rtsize = (64 - __builtin_clzll(m)) / 2;
    long long rt = 1LL << rtsize;
    return (schrage(rt, schrage(a >> rtsize, b, m), m) + schrage(a & (rt - 1), b, m)) % m;
}

This quickly and crudely computes an approximate square root $\mathtt{rt}$ of $m$. It satisfies $m/2 < \mathtt{rt}^2 \leq 2m$. (Check this. I really hope I got it right.) As a result, in all three calls to schrage(...), $a^2 \leq 2m$.

With this, we can work out that in schrage, $\mathtt{rem} \leq a - 1$ and $\mathtt{quot} = \lfloor m/a \rfloor > m/a - 1$, so \[ \mathtt{rem}\cdot\left\lfloor\frac{b}{\mathtt{quot}}\right\rfloor < (a-1)\cdot\frac{b}{m/a - 1} = (a-1)\cdot\frac{ab}{m - a} = \frac{b(a^2 - a)}{m - a} \leq \frac{b(2m-a)}{m-a} < 2b \]is a safe multiplication that doesn't overflow, and we're done!

So, mul2 is faster. A little. Okay, it's just a 1.5x speedup compared to mul1, and I'm quite a bit more unsure on whether it always works, since proofs have bugs.

But now we have mul3, the enigma bequeathed from code of a different generation that prompted this entire odyssey. It doesn't quite reach the effective range of either of the above ranges — it seems to work up to $2^{58}$ or $2^{60}$ — but that's often enough, and blazing phoenixes, it's fast. It's, like, another 4.5x factor over mul2. How does it work? Why does it work?? I'm pretty sure it relies on undefined behavior???? And yet...
long long mul3(long long a, long long b, long long m) {
    return (a*b - (long long)(a/(long double)m*b + 1e-3)*m + m)%m;
}

Comment

2 Comments

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
hmm can you just use binary and do the same as the system multiply & mod function does
or does that count as making your own big integer structure

by Lord.of.AMC, Mar 28, 2016, 5:36 AM

The post below has been deleted. Click to close.
This post has been deleted. Click here to see post.
OK, "writing my own big integer structure" is fuzzy and not exactly what I'm optimizing for. The thing is that whatever code I prepare is something I have to print onto limited paper (unless I memorize it) and type out under limited time. I'm not sure what you mean by "system multiply & mod function" though. It's my impression that those things are single processor instructions, hardwired into the ALU or something, and not some sort of (relatively) high-level algorithm I can just reimplement equally fast versions of for bigger integer types, only using C or even assembly instructions.

by math_explorer, Mar 28, 2016, 4:28 PM

♪ i just hope you understand / sometimes the clothes do not make the man ♫ // https://beta.vero.site/

avatar

math_explorer
Archives
+ September 2019
+ February 2018
+ December 2017
+ September 2017
+ July 2017
+ March 2017
+ January 2017
+ November 2016
+ October 2016
+ August 2016
+ February 2016
+ January 2016
+ September 2015
+ July 2015
+ June 2015
+ January 2015
+ July 2014
+ June 2014
inv
+ April 2014
+ December 2013
+ November 2013
+ September 2013
+ February 2013
+ April 2012
Shouts
Submit
  • how do you have so many posts

    by krithikrokcs, Jul 14, 2023, 6:20 PM

  • lol⠀⠀⠀⠀⠀

    by math_explorer, Jan 20, 2021, 8:43 AM

  • woah ancient blog

    by suvamkonar, Jan 20, 2021, 4:14 AM

  • https://artofproblemsolving.com/community/c47h361466

    by math_explorer, Jun 10, 2020, 1:20 AM

  • when did the first greed control game start?

    by piphi, May 30, 2020, 1:08 AM

  • ok..........

    by asdf334, Sep 10, 2019, 3:48 PM

  • There is one existing way to obtain contributorship documented on this blog. See if you can find it.

    by math_explorer, Sep 10, 2019, 2:03 PM

  • SO MANY VIEWS!!!
    PLEASE CONTRIB
    :)

    by asdf334, Sep 10, 2019, 1:58 PM

  • Hullo bye

    by AnArtist, Jan 15, 2019, 8:59 AM

  • Hullo bye

    by tastymath75025, Nov 22, 2018, 9:08 PM

  • Hullo bye

    by Kayak, Jul 22, 2018, 1:29 PM

  • It's sad; the blog is still active but not really ;-;

    by GeneralCobra19, Sep 21, 2017, 1:09 AM

  • dope css

    by zxcv1337, Mar 27, 2017, 4:44 AM

  • nice blog ^_^

    by chezbgone, Mar 28, 2016, 5:18 AM

  • shouts make blogs happier

    by briantix, Mar 18, 2016, 9:58 PM

91 shouts
Contributors
Tags
About Owner
  • Posts: 583
  • Joined: Dec 16, 2006
Blog Stats
  • Blog created: May 17, 2010
  • Total entries: 327
  • Total visits: 358716
  • Total comments: 368
Search Blog