gh-100687: Reduce frequency of overallocation in long_add by mdickinson · Pull Request #100688 · python/cpython

mdickinson · 2023-01-02T13:31:04Z

Addresses #100687; see the issue for discussion and rationale.

This PR does some extra work to get a tighter upper bound on the number of digits needed of addition of two like-signed multidigit ints (or subtraction of two oppositely-signed multidigit ints).

Here's a benchmark script, tracking both memory allocations and performance in the case that's most significant, where both of the addends have exactly two digits (the case of single digit ints takes a fast path and doesn't use x_add)

import random
import timeit
import tracemalloc

def sum_pairs(pairs):
    for x, y in pairs:
        x + y

int_samples = 10**5
timeit_samples = 1000
seed = 12345

# Create test pairs, keeping track of memory allocations
tracemalloc.start()
snapshot1 = tracemalloc.take_snapshot()
random.seed(seed)
test_pairs = [
    (random.randrange(2**30, 2**60), random.randrange(2**30, 2**60))
    for _ in range(int_samples)
]
snapshot2 = tracemalloc.take_snapshot()
stats = snapshot2.compare_to(snapshot1, "lineno")
tracemalloc.stop()

# Get iteration time
time = timeit.timeit(stmt="sum_pairs(test_pairs)", globals=globals(), number=timeit_samples) / (timeit_samples * int_samples)

# Report
for stat in stats:
    print(stat)
print(f"Time per iteration (ns): {time * 1e9:.1f}")

Some sample results on my macOS / Intel machine. (cpython is main, cpython-modified is this branch; both were built with --enable-optimizations). Surprisingly, I actually get a minor speed increase where I was expecting a decrease.

mdickinson@lovelace python % ./cpython/python.exe time_int_sum.py         
/Users/mdickinson/Repositories/python/cpython/Lib/random.py:314: size=7031 KiB (+7031 KiB), count=200000 (+200000), average=36 B
/Users/mdickinson/Repositories/python/time_int_sum.py:18: size=6250 KiB (+6250 KiB), count=99984 (+99984), average=64 B
/Users/mdickinson/Repositories/python/cpython/Lib/tracemalloc.py:560: size=312 B (+312 B), count=2 (+2), average=156 B
/Users/mdickinson/Repositories/python/cpython/Lib/tracemalloc.py:423: size=312 B (+312 B), count=2 (+2), average=156 B
/Users/mdickinson/Repositories/python/cpython/Lib/tracemalloc.py:558: size=56 B (+56 B), count=1 (+1), average=56 B
Time per iteration (ns): 39.7
mdickinson@lovelace python % ./cpython-modified/python.exe time_int_sum.py
/Users/mdickinson/Repositories/python/cpython-modified/Lib/random.py:314: size=6250 KiB (+6250 KiB), count=200000 (+200000), average=32 B
/Users/mdickinson/Repositories/python/time_int_sum.py:18: size=6250 KiB (+6250 KiB), count=99984 (+99984), average=64 B
/Users/mdickinson/Repositories/python/cpython-modified/Lib/tracemalloc.py:560: size=312 B (+312 B), count=2 (+2), average=156 B
/Users/mdickinson/Repositories/python/cpython-modified/Lib/tracemalloc.py:423: size=312 B (+312 B), count=2 (+2), average=156 B
/Users/mdickinson/Repositories/python/cpython-modified/Lib/tracemalloc.py:558: size=56 B (+56 B), count=1 (+1), average=56 B
Time per iteration (ns): 37.5

Issue: Reduce frequency of memory overallocation in some int additions and subtractions #100687

To do:

More benchmarking, particularly the case of two digits + single digit; we should also check subtractions
Verify that there unit tests that exercise the case where we still overallocate

Objects/longobject.c

eendebakpt · 2023-10-23T20:43:09Z

I tested the PR with the pidigits benchmark (from the pyperformance package). Performance impact is neutral:

Mean +- std dev: [main] 209 ms +- 1 ms -> [pr] 209 ms +- 1 ms: 1.00x slower
Not significant!

(some profiling with valgrind shows that only 11% is spend in the x_add method, and within x_add most time is spend in the for loops performing the actual additions)

In the pidigits calculation there are roughly 156200 calls to x_add for which only 760 times the extra_digit is set (and zero times a false positive).

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>

add test for long addition with over allocation

mdickinson · 2023-10-25T08:00:24Z

This needs a couple more changes thanks to the recent changes in long format. Converting to draft while I investigate.

mdickinson · 2023-10-25T09:06:48Z

Converting to draft while I investigate.

Should be good to go now. I've removed the special-case handling of normalization: it seems safer to call long_normalize just like everything else does, in case normalization rules change in the future.

mdickinson · 2023-10-25T09:08:14Z

@eendebakpt Thank you for looking into this, and for the suggestions and test! I fear that my changes have invalidated your benchmarks, but I suspect the take-away that performance isn't significantly affected either way remains valid.

eendebakpt · 2023-10-30T10:05:23Z

@eendebakpt Thank you for looking into this, and for the suggestions and test! I fear that my changes have invalidated your benchmarks, but I suspect the take-away that performance isn't significantly affected either way remains valid.

The pidigits benchmark results are indeed unaffected. The computation time is dominated by the actual addition (size_a is large for the additions in pidigits). The gain of this PR is in the slightly improved memory allocation for small size_a >= 2.

serhiy-storchaka

LGTM.

mdickinson · 2025-12-18T10:21:06Z

PR is stale; I suggest closing. I'll do that in a week or two if there are no objections.

skirpichev · 2025-12-18T23:57:03Z

Why? This looks good for me too.

PR has conflicts, that seems easy to fix. Probably it worth also to update benchmark results after #100688 (comment) (though, I doubt it will affect something).

Edit:

diff after conflict resolution

diff --git a/Lib/test/test_long.py b/Lib/test/test_long.py
index b48a8812a1a..c23d141ad7d 100644
--- a/Lib/test/test_long.py
+++ b/Lib/test/test_long.py
@@ -1698,6 +1698,13 @@ class MyInt(int):
         # GH-117195 -- This shouldn't crash
         object.__sizeof__(1)
 
+    def test_long_add_overallocate(self):
+        # see gh-100688
+        x = (MASK//2) * (MASK+1)
+        x2 = (MASK//2 + 1) * (MASK+1)
+        z = x + x2
+        self.assertEqual(x + x2, MASK * (MASK + 1))
+
     def test_hash(self):
         # gh-136599
         self.assertEqual(hash(-1), -2)
diff --git a/Misc/NEWS.d/next/Core_and_Builtins/2023-01-02-14-54-12.gh-issue-100688.yenIag.rst b/Misc/NEWS.d/next/Core_and_Builtins/2023-01-02-14-54-12.gh-issue-100688.yenIag.rst
new file mode 100644
index 00000000000..06c11e28c61
--- /dev/null
+++ b/Misc/NEWS.d/next/Core_and_Builtins/2023-01-02-14-54-12.gh-issue-100688.yenIag.rst
@@ -0,0 +1,2 @@
+Reduce frequency of overallocation in some cases of multidigit integer
+additions and subtractions. Thanks Pieter Eendebak for contributions.
diff --git a/Objects/longobject.c b/Objects/longobject.c
index 8ba1fd65078..4b5ba6bc61b 100644
--- a/Objects/longobject.c
+++ b/Objects/longobject.c
@@ -3751,6 +3751,10 @@ x_add(PyLongObject *a, PyLongObject *b)
     Py_ssize_t i;
     digit carry = 0;
 
+    /* There are fast paths for cases where a and b both have at most a single
+       digit, so if we end up here then at least one of them is multi-digit. */
+    assert(size_a >= 2 || size_b >= 2);
+
     /* Ensure a is the larger of the two: */
     if (size_a < size_b) {
         { PyLongObject *temp = a; a = b; b = temp; }
@@ -3758,7 +3762,15 @@ x_add(PyLongObject *a, PyLongObject *b)
             size_a = size_b;
             size_b = size_temp; }
     }
-    z = long_alloc(size_a+1);
+
+    /* Allocate sufficient space for the result. In the majority of cases
+       we allocate exactly the right number of digits, but in the (relatively
+       rare) case where the sum of the topmost digits is exactly PyLong_MASK,
+       we'll sometimes end up overallocating by a single digit. */
+    digit top_sum = a->long_value.ob_digit[size_a - 1]
+        + (size_b == size_a ? b->long_value.ob_digit[size_b - 1] : (digit)0);
+    int extra_digit = top_sum >= PyLong_MASK;
+    z = long_alloc(size_a + extra_digit);
     if (z == NULL)
         return NULL;
     for (i = 0; i < size_b; ++i) {
@@ -3771,7 +3783,10 @@ x_add(PyLongObject *a, PyLongObject *b)
         z->long_value.ob_digit[i] = carry & PyLong_MASK;
         carry >>= PyLong_SHIFT;
     }
-    z->long_value.ob_digit[i] = carry;
+    assert(carry == 0 || extra_digit);
+    if (extra_digit) {
+        z->long_value.ob_digit[i] = carry;
+    }
     return long_normalize(z);
 }

Adjust x_add so that it overallocates much less often

06c0bb6

bedevere-bot added the awaiting core review label Jan 2, 2023

bedevere-bot mentioned this pull request Jan 2, 2023

Reduce frequency of memory overallocation in some int additions and subtractions #100687

Open

mdickinson commented Jan 2, 2023

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

mdickinson added 2 commits January 2, 2023 14:52

Revert unrelated change

9782b45

News entry

893f9d6

eendebakpt approved these changes Oct 22, 2023

View reviewed changes

Objects/longobject.c Show resolved Hide resolved

add test for long addition with over allocation

247df54

eendebakpt mentioned this pull request Oct 22, 2023

add test for long addition with over allocation mdickinson/cpython#3

Merged

mdickinson and others added 5 commits October 25, 2023 08:31

Double check that extra_digit is always true when we we've got a carry

8d6bd2a

Co-authored-by: Pieter Eendebak <pieter.eendebak@gmail.com>

Merge pull request #3 from eendebakpt/leaner-long-add-test

deb64d9

add test for long addition with over allocation

Merge remote-tracking branch 'upstream/main' into leaner-long-add

237afe8

Update Misc/NEWS entry to acknowledge contributions

e159e2e

Always set a value for the extra digit

5d97741

mdickinson marked this pull request as draft October 25, 2023 07:59

bedevere-app bot removed the awaiting core review label Oct 25, 2023

mdickinson added 2 commits October 25, 2023 09:22

Set digit count correctly

c3002e2

Style tweaks

db81b71

mdickinson marked this pull request as ready for review October 25, 2023 09:06

bedevere-app bot added the awaiting core review label Oct 25, 2023

eendebakpt approved these changes Dec 23, 2023

View reviewed changes

serhiy-storchaka approved these changes Dec 23, 2023

View reviewed changes

bedevere-app bot added awaiting merge and removed awaiting core review labels Dec 23, 2023

Merge in the main branch

9955db2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gh-100687: Reduce frequency of overallocation in long_add#100688

gh-100687: Reduce frequency of overallocation in long_add#100688
mdickinson wants to merge 12 commits intopython:mainfrom
mdickinson:leaner-long-add

mdickinson commented Jan 2, 2023 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Oct 23, 2023

Uh oh!

mdickinson commented Oct 25, 2023

Uh oh!

mdickinson commented Oct 25, 2023

Uh oh!

mdickinson commented Oct 25, 2023 •

edited

Loading

Uh oh!

eendebakpt commented Oct 30, 2023

Uh oh!

serhiy-storchaka left a comment

Uh oh!

mdickinson commented Dec 18, 2025

Uh oh!

skirpichev commented Dec 18, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

mdickinson commented Jan 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Oct 23, 2023

Uh oh!

mdickinson commented Oct 25, 2023

Uh oh!

mdickinson commented Oct 25, 2023

Uh oh!

mdickinson commented Oct 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eendebakpt commented Oct 30, 2023

Uh oh!

serhiy-storchaka left a comment

Choose a reason for hiding this comment

Uh oh!

mdickinson commented Dec 18, 2025

Uh oh!

skirpichev commented Dec 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mdickinson commented Jan 2, 2023 •

edited

Loading

mdickinson commented Oct 25, 2023 •

edited

Loading

skirpichev commented Dec 18, 2025 •

edited

Loading