Skip to content

gh-150942: Speed up json.loads array and object decoding#150945

Open
eendebakpt wants to merge 1 commit into
python:mainfrom
eendebakpt:json-takeref-opt
Open

gh-150942: Speed up json.loads array and object decoding#150945
eendebakpt wants to merge 1 commit into
python:mainfrom
eendebakpt:json-takeref-opt

Conversation

@eendebakpt
Copy link
Copy Markdown
Contributor

@eendebakpt eendebakpt commented Jun 5, 2026

Benchmarks on FT build:

Benchmark speedup vs main
array_strings 1.16×
array_ints 1.08×
config 1.07×
bm_json_loads 1.03×
geomean 1.08×

The relative performance gain is larger with #150639 included (geomean 1.12x)

Benchmark script
"""Compare json.loads across four shapes: bm_json_loads, config, int array, string array."""
import json
import random
import sys
import pyperf

# --- bm_json_loads (pyperformance) documents ---
DICT = {'ads_flags': 0, 'age': 18, 'bulletin_count': 0, 'comment_count': 0,
        'country': 'BR', 'encrypted_id': 'G9urXXAJwjE', 'favorite_count': 9,
        'first_name': '', 'flags': 412317970704, 'friend_count': 0, 'gender': 'm',
        'gender_for_display': 'Male', 'id': 302935349, 'is_custom_profile_icon': 0,
        'last_name': '', 'locale_preference': 'pt_BR', 'member': 0,
        'tags': ['a', 'b', 'c', 'd', 'e', 'f', 'g'], 'profile_foo_id': 827119638,
        'secure_encrypted_id': 'Z_xxx2dYx3t4YAdnmfgyKw', 'session_number': 2,
        'signup_id': '201-19225-223', 'status': 'A', 'theme': 1,
        'time_created': 1225237014, 'time_updated': 1233134493,
        'unread_message_count': 0, 'user_group': '0', 'username': 'collinwinter',
        'play_count': 9, 'view_count': 7, 'zip': ''}
TUPLE = ([265867233, 265868503, 265252341, 265243910, 265879514, 266219766,
          266021701, 265843726, 265592821, 265246784, 265853180, 45526486,
          265463699, 265848143, 265863062, 265392591, 265877490, 265823665,
          265828884, 265753032], 60)


def _mutate(o, r):
    d = dict(o)
    for k, v in d.items():
        rv = r.random() * sys.maxsize
        if isinstance(k, (int, bytes, str)):
            d[k] = type(k)(rv)
    return d


_r = random.Random(5)
DICT_GROUP = [_mutate(DICT, _r) for _ in range(3)]
BM_OBJS = (json.dumps(DICT), json.dumps(TUPLE), json.dumps(DICT_GROUP))


def bm_json_loads(objs):
    for obj in objs:
        for _ in range(20):
            json.loads(obj)


# --- config.json shape (objects + small int arrays) ---
CONFIG = json.dumps({f"section_{i}": {"enabled": True, "values": list(range(50)),
                     "meta": {"name": f"s{i}", "desc": "x" * 40}} for i in range(3000)})
# --- array of ints ---
INTS = json.dumps([i for i in range(100000)])
# --- array of small strings ---
STRS = json.dumps([f"item_{i}" for i in range(100000)])

if __name__ == "__main__":
    runner = pyperf.Runner()
    runner.bench_func("bm_json_loads", bm_json_loads, BM_OBJS, inner_loops=20)
    runner.bench_func("config", json.loads, CONFIG)
    runner.bench_func("array_ints", json.loads, INTS)
    runner.bench_func("array_strings", json.loads, STRS)

Append parsed values to the result list with _PyList_AppendTakeRef and
insert key/value pairs with _PyDict_SetItem_Take2, which take ownership of
the references instead of incref-ing on insert and then decref-ing the
local.  This removes a reference-count round-trip per element (and, on the
free-threaded build, a per-append lock).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Contributor

@sergey-miryanov sergey-miryanov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants