I just want to multiply two big numbers modulo a big number

by math_explorer, Mar 28, 2016, 5:15 AM

without overflowing or writing my own big integer structure or wasting too many processor cycles, because you never know, Time Limit Exceeded penalty is unforgiving. They're not that large. They'll fit into a long long just fine. How hard could it be?

This is simple and works with $m < 2^{62}$ , but it's kind of slow. (We are assuming in all these below functions that the caller has already modded $a$ and $b$ , so $0 \leq a, b < m$ . And that long longs are 64 bits, yes. You might be able to get an extra bit by making everything unsigned and being really careful? But let's not. Also, I am avoiding the remainder operation everywhere, because it's quite slow; since we know tight bounds on each of the arithmetic operations we do, we can get by with subtracting $m$ as we find it necessary. Testing on my machine says this, using subtractions instead of modding, is a 9x speedup, which is a big deal if this is in an inner loop!)

long long mul1(long long a, long long b, long long m) {
    long long res = 0;
    while (b) {
        if (b & 1) {
            res += a;
            if (res >= m) res -= m;
        }
        b >>= 1;
        a <<= 1;
        if (a >= m) a -= m;
    }
    return res;
}

While carelessly Googling, I found some references to an algorithm called Schrage multiplication that seemed to do the same thing much more carefully. But references were scattered and incomplete and paid much less attention to performance than I cared about. Finally I reconstructed roughly what it looks like:

long long schrage(long long a, long long b, long long m) {
    if (a == 0) { return 0; }
    long long quot = m / a, rem = m % a;
    long long result = a * (b % quot) - rem * (b / quot);
    while (result < 0) { result += m; }
    return result;
}

You can work out the modular arithmetic yourself and discover that, somewhat amazingly, $\mathtt{result} \equiv ab \bmod{m}.$ But note that this does not work with arbitrary $a, b, m$ in range. As usually written, Schrage's algorithm seems to require that $\mathtt{quot} \geq \mathtt{rem}$ . However, we can loosen that a little; the key is really just that the computation $\mathtt{rem} * (b / \mathtt{quot})$ should not overflow, and this is roughly true if $a \approx \sqrt{m}$ . This means that to get a function that works for general $a$ , we can split $a$ into two and use this as a subroutine.

Also, we're aggressively avoiding the remainder operation again. Under the original Schrage requirements, the while can be replaced with an if, but our loosening will require it, and using the remainder operation loses our speed margin over mul1. Optimization is hard.

It's hard to loosen it in a more precise and useful way, though. I juggled expressions for quite a while but couldn't find a good intermediate point to abstract out, so instead, I'll just proclaim that this works, again as long as $0 \leq a, b < m < 2^{62}$ :

long long mul2(long long a, long long b, long long m) {
    int rtsize = (64 - __builtin_clzll(m)) / 2;
    long long rt = 1LL << rtsize;
    return (schrage(rt, schrage(a >> rtsize, b, m), m) + schrage(a & (rt - 1), b, m)) % m;
}

This quickly and crudely computes an approximate square root $\mathtt{rt}$ of $m$ . It satisfies $m/2 < \mathtt{rt}^2 \leq 2m$ . (Check this. I really hope I got it right.) As a result, in all three calls to schrage(...), $a^2 \leq 2m$ .

With this, we can work out that in schrage, $\mathtt{rem} \leq a - 1$ and $\mathtt{quot} = \lfloor m/a \rfloor > m/a - 1$ , so $\mathtt{rem}\cdot\left\lfloor\frac{b}{\mathtt{quot}}\right\rfloor < (a-1)\cdot\frac{b}{m/a - 1} = (a-1)\cdot\frac{ab}{m - a} = \frac{b(a^2 - a)}{m - a} \leq \frac{b(2m-a)}{m-a} < 2b$ is a safe multiplication that doesn't overflow, and we're done!

So, mul2 is faster. A little. Okay, it's just a 1.5x speedup compared to mul1, and I'm quite a bit more unsure on whether it always works, since proofs have bugs.

But now we have mul3, the enigma bequeathed from code of a different generation that prompted this entire odyssey. It doesn't quite reach the effective range of either of the above ranges — it seems to work up to $2^{58}$ or $2^{60}$ — but that's often enough, and blazing phoenixes, it's fast. It's, like, another 4.5x factor over mul2. How does it work? Why does it work?? I'm pretty sure it relies on undefined behavior???? And yet...

long long mul3(long long a, long long b, long long m) {
    return (a*b - (long long)(a/(long double)m*b + 1e-3)*m + m)%m;
}

computers

number theory

Comment

2 Comments

The post below has been deleted. Click to close.

This post has been deleted. Click here to see post.

hmm can you just use binary and do the same as the system multiply & mod function does
or does that count as making your own big integer structure

by Lord.of.AMC, Mar 28, 2016, 5:36 AM

Report

The post below has been deleted. Click to close.

This post has been deleted. Click here to see post.

OK, "writing my own big integer structure" is fuzzy and not exactly what I'm optimizing for. The thing is that whatever code I prepare is something I have to print onto limited paper (unless I memorize it) and type out under limited time. I'm not sure what you mean by "system multiply & mod function" though. It's my impression that those things are single processor instructions, hardwired into the ALU or something, and not some sort of (relatively) high-level algorithm I can just reimplement equally fast versions of for bigger integer types, only using C or even assembly instructions.

by math_explorer, Mar 28, 2016, 4:28 PM

Report

Submit

how do you have so many posts
by krithikrokcs, Jul 14, 2023, 6:20 PM
lol⠀⠀⠀⠀⠀
by math_explorer, Jan 20, 2021, 8:43 AM
woah ancient blog
by suvamkonar, Jan 20, 2021, 4:14 AM
https://artofproblemsolving.com/community/c47h361466
by math_explorer, Jun 10, 2020, 1:20 AM
when did the first greed control game start?
by piphi, May 30, 2020, 1:08 AM
ok..........
by asdf334, Sep 10, 2019, 3:48 PM
There is one existing way to obtain contributorship documented on this blog. See if you can find it.
by math_explorer, Sep 10, 2019, 2:03 PM
SO MANY VIEWS!!!
PLEASE CONTRIB

by asdf334, Sep 10, 2019, 1:58 PM
Hullo bye
by AnArtist, Jan 15, 2019, 8:59 AM
Hullo bye
by tastymath75025, Nov 22, 2018, 9:08 PM
Hullo bye
by Kayak, Jul 22, 2018, 1:29 PM
It's sad; the blog is still active but not really ;-;
by GeneralCobra19, Sep 21, 2017, 1:09 AM
dope css
by zxcv1337, Mar 27, 2017, 4:44 AM
nice blog ^_^
by chezbgone, Mar 28, 2016, 5:18 AM
shouts make blogs happier
by briantix, Mar 18, 2016, 9:58 PM
I like your CSS. Can you send me your shoutbox block? I'll credit you.
by bluecarneal, Feb 21, 2011, 8:24 PM
Hey contrib?
by Dsquared, Dec 16, 2010, 3:14 PM
$\sum_{n=0}^{\infty}2^n=-1$
by AwesomeToad, Dec 13, 2010, 2:12 PM
Delete the shouts...

If you wish something more mathematical, here's some digits.
1679350278936041982752
Enjoy. (no offense)
by chaotic_iak, Dec 3, 2010, 1:00 PM
Now I feel guilty my shoutbox is so nonmathematical.
by math_explorer, Dec 3, 2010, 12:36 PM
Congratulations on winning Achievement Unlocked: AoPS Style! I'm looking to your participation in Achievement Unlocked 2: Reunlock the Achievements!
by chaotic_iak, Dec 3, 2010, 12:32 PM
El_ectric won game 11
by halowarfare17, Nov 17, 2010, 2:36 PM
Did you win reaper or did electric?
by dragon96, Nov 11, 2010, 2:14 AM
Very nice blog
by Amir Hossein, Oct 28, 2010, 2:58 PM
Changed your avatar?
by asf, Oct 15, 2010, 3:47 AM
Gosh, post count over 9000.
by math_explorer, Sep 20, 2010, 1:44 PM
Nice blog I read pretty much whenever you make a new post, but I don't comment.
by AIME15, Sep 19, 2010, 2:38 PM
I see, that part of the CSS needs a lot of tweaking.
by math_explorer, Sep 12, 2010, 9:20 AM
Hello. Shout?
by asf, Sep 11, 2010, 5:18 PM

91 shouts

math_exploration

I just want to multiply two big numbers modulo a big number

by math_explorer, Mar 28, 2016, 5:15 AM

2 Comments