Skip to content

Lazy-load chat histories, fast list summaries, and mtime disk cache (closes #84)#88

Open
bradjin8 wants to merge 5 commits into
masterfrom
feat/lazy-load-history
Open

Lazy-load chat histories, fast list summaries, and mtime disk cache (closes #84)#88
bradjin8 wants to merge 5 commits into
masterfrom
feat/lazy-load-history

Conversation

@bradjin8
Copy link
Copy Markdown
Collaborator

@bradjin8 bradjin8 commented Jun 2, 2026

Summary

Implements issue #84: split summary and full-assembly code paths so list views avoid scanning global bubbleId:% rows, the workspace UI lazy-loads conversation bodies on tab selection, and repeat page loads are served from an mtime-keyed disk cache.

  • Home (GET /api/workspaces) — no global bubble scan; SQL-filtered composerData; per-composer MRC only when needed; cached composer_id_to_ws and project list
  • Workspace sidebarGET /api/workspaces/<id>/tabs?summary=1 returns titles/metadata only; full payloads via GET /api/workspaces/<id>/tabs/<composer_id>
  • Phase 3 cache — project list, composer map, and tab summaries under ~/.cache/cursor-chat-browser/; invalidate on storage mtimes; bypass with ?nocache=1 or CURSOR_CHAT_BROWSER_NOCACHE=1
  • Full /tabs, export, and search — unchanged; monolithic /tabs still available for backward compatibility

Problem

On large local Cursor datasets, the home page and workspace page each rescanned global KV storage on every load (often 1–2+ minutes). The UI blocked on full tab assembly before the sidebar could render.

Solution

Backend

Area Change
services/workspace_listing.py Summary list path: no load_bubble_map; SQL filter for non-empty headers; lazy load_project_layouts_for_composer; disk cache wrapper
services/workspace_tabs.py list_workspace_tab_summaries, assemble_single_tab, shared _assemble_tab_from_composer_data; scoped bubble/MRC/diff loaders
services/workspace_db.py Scoped loaders + build_composer_id_to_workspace_id_cached
services/summary_cache.py New mtime-keyed cache layer
api/workspaces.py ?summary=1, /tabs/<composer_id>, ?nocache=1

Frontend

  • templates/workspace.html — fetch summary tabs on load; lazy-fetch full tab on selection; tabCache for loaded conversations
  • static/css/style.css — stable grid width during load; .main-content.is-loading for centered spinner

Tests (356 pass)

  • tests/test_workspace_listing_performance.py — spy: no global bubbleId:% on list path
  • tests/test_workspace_tabs_summary.py — summary/single-tab scoped queries and payload shape
  • tests/test_summary_cache.py — cache hit/miss and fingerprint invalidation

Performance notes

Document representative row counts and timings from your local fixture in review comments if helpful.

Test plan

  • pytest — all tests pass locally
  • Home page: project cards appear; second refresh noticeably faster than first
  • Workspace page: sidebar titles render quickly; selecting a conversation loads bubbles with per-tab spinner
  • Copy All / Download work on a loaded conversation
  • Export script and full GET /api/workspaces/<id>/tabs (no summary=1) still work
  • ?nocache=1 bypasses cache; cache files appear under ~/.cache/cursor-chat-browser/

Closes #84

Summary by CodeRabbit

  • New Features

    • Lazy-loaded workspace UI with per-conversation summary endpoint and on-demand full conversation loading; sidebar shows per-conversation model badges.
  • Performance

    • Disk-backed summary cache for workspace/tab summaries (nocache option available).
    • Scoped data loading to fetch only targeted conversation rows, reducing expensive global scans.
  • Style

    • Updated layout and loading skeleton styles for consistent full-width loading states.
  • Documentation

    • Changelog updated with Phase 3 summary, caching, and API notes.
  • Tests

    • New/regression tests covering summary cache, tab summaries, scoped loading, and listing performance.
  • Bug Fixes

    • Improved inline error handling for workspace and conversation load failures.

@bradjin8 bradjin8 self-assigned this Jun 2, 2026
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jun 2, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3f1b551a-973c-4fd3-998c-f2b33da775a7

📥 Commits

Reviewing files that changed from the base of the PR and between 14e9f54 and f6d3ff9.

📒 Files selected for processing (6)
  • services/summary_cache.py
  • services/workspace_db.py
  • templates/workspace.html
  • tests/test_summary_cache.py
  • tests/test_workspace_listing_performance.py
  • tests/test_workspace_tabs_summary.py
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/test_workspace_tabs_summary.py
  • tests/test_summary_cache.py
  • templates/workspace.html
  • tests/test_workspace_listing_performance.py
  • services/summary_cache.py
  • services/workspace_db.py

📝 Walkthrough

Walkthrough

Adds a fingerprinted disk-backed summary cache, composer-scoped KV loaders to avoid global bubble scans, refactors workspace listing and tab assembly to use cached composer↔workspace maps, exposes summary-only and per-composer tab endpoints, and updates the frontend to lazy-load full tabs on demand with tests preventing unscoped scans.

Changes

Lazy-load workspace tabs with summary cache and scoped loaders

Layer / File(s) Summary
Summary cache infrastructure
services/summary_cache.py, CHANGELOG.md, tests/test_summary_cache.py
Disk-backed cache with fingerprinting, nocache bypass, atomic JSON I/O, and accessors for projects, composer→workspace mapping, and per-workspace tab summaries; unit tests validate hits/misses and nocache behavior.
Composer-scoped KV loaders
services/workspace_db.py
New helpers to extract root paths from messageRequestContext, load project layouts per composer, and scoped loaders for bubbles, message contexts, and code-block diffs; centralizes global DB path resolution and adds cached composer→workspace mapping.
Workspace listing optimization
services/workspace_listing.py, tests/test_workspace_listing_performance.py, tests/test_project_layouts_dict_shape.py
list_workspace_projects gains nocache flag and fingerprint-based caching, uses lightweight composer validation and constant composer-row SQL, and skips unscoped bubble scans; performance tests assert no bubbleId:% global LIKE queries and preserve project-card shape.
Tab assembly modularization
services/workspace_tabs.py
Extracts _assemble_tab_from_composer_data and _build_matching_ws_ids, centralizes bubble/context injection and metadata aggregation, and updates assembly loop to delegate per-composer tab construction to the helper.
Summary and single-tab endpoints
services/workspace_tabs.py, tests/test_workspace_tabs_summary.py
Adds list_workspace_tab_summaries (cached summary-only payload) and assemble_single_tab (scoped per-composer loads returning full tab); uncached builders and tests ensure scoped SQL and expected payload shapes, returning 404 for missing/unauthorized composer IDs.
API endpoint routing
api/workspaces.py
Parses nocache on workspace listing requests, branches GET /api/workspaces/<id>/tabs on summary=true to return summaries or full assembly, and adds GET /api/workspaces/<workspace_id>/tabs/<composer_id> for lazy per-composer fetches (CLI workspaces rejected).
Frontend lazy-loading and UI
templates/workspace.html, static/css/style.css
Workspace page renders skeleton immediately and requests summary tabs (?summary=1). selectTab becomes async, uses tabCache and lazy-fetch of full tabs with inline loading/error UIs, and sidebar shows a per-tab model badge. CSS adjusted for loading layout and consistent heights.
Test updates and fixtures
tests/*
Adds cache tests, regression/performance fixtures asserting no unscoped global scans, tab-summary and assemble-single-tab tests verifying scoped SQL and payloads, and updates parse-warning fixtures to new malformed JSON cases.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • timon0305
  • clean6378-max-it
  • wpak-ai

Poem

🐇 A rabbit nibbles at the cache,
Fingerprints snug in a hidden stash,
Summaries first, then bubbles on call,
No more global scans that stall—
Hop, fetch, render, dash.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 43.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the three main changes: lazy-loading chat histories, fast list summaries, and mtime disk cache. It directly maps to the primary objectives in the PR description.
Linked Issues check ✅ Passed The PR implements all core requirements from issue #84: summary endpoints without global bubbleId scans, lazy-load full tabs on selection, and mtime-keyed disk cache. Tests verify no global scans and cache behavior.
Out of Scope Changes check ✅ Passed All changes are directly tied to implementing lazy-loading and caching: new cache service, scoped loaders, summary endpoints, frontend lazy-loading, CSS for loading states, and corresponding tests. No unrelated refactoring detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/lazy-load-history

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
templates/workspace.html (1)

124-175: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Possible stale-render race when switching tabs quickly.

selectTab is now async and awaits a network fetch before calling renderChat. If a user clicks conversation A (slow fetch) then conversation B, B may render first and A's later-resolving fetch will overwrite main-content with A's content while the sidebar shows B as active. Consider tracking the latest requested id and bailing out of stale resolutions.

🛠️ Suggested guard
 async function selectTab(id) {
   const summary = allTabs.find(t => t.id === id);
   if (!summary) return;
+  selectTab._latest = id;

Then after each await, before mutating selectedTab/main-content:

if (selectTab._latest !== id) return; // a newer selection superseded this one
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@templates/workspace.html` around lines 124 - 175, selectTab can race when
multiple async fetches complete out-of-order; set a per-call marker (e.g. assign
selectTab._latest = id at the start of selectTab) and after any await or before
mutating shared state (tabCache, selectedTab, main-content, calling renderChat)
check if selectTab._latest !== id and if so return early to avoid overwriting a
newer tab; apply these guards around the fetch/try-catch resolution and any
other async points so only the most-recent selection updates the UI.
🧹 Nitpick comments (1)
tests/test_summary_cache.py (1)

37-66: ⚡ Quick win

Add a round-trip test with a realistic fingerprint.

Current tests only exercise scalar-only fingerprints, so they don't catch the tuple→list serialization mismatch in fingerprint_workspace_storage (see the services/summary_cache.py finding). A test that calls fingerprint_workspace_storage (so workspace_files is populated), then set_cached_projects/get_cached_projects with that fingerprint, would guard against the cache silently missing on every repeat load.

💚 Suggested regression test
def test_cache_hit_with_workspace_files_fingerprint(self):
    with tempfile.TemporaryDirectory() as ws:
        entry_dir = os.path.join(ws, "entry1")
        os.makedirs(entry_dir)
        with open(os.path.join(entry_dir, "state.vscdb"), "wb") as f:
            f.write(b"x")
        entries = [{"name": "entry1"}]
        fp = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[])
        set_cached_projects(fp, [{"id": "a"}], [])
        # Recompute the fingerprint to mimic a fresh process/page load.
        fp2 = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[])
        self.assertIsNotNone(get_cached_projects(fp2))
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_summary_cache.py` around lines 37 - 66, Add a regression test that
round-trips a realistic fingerprint containing workspace_files to catch the
tuple→list serialization mismatch: use fingerprint_workspace_storage to create
fp (with a temp workspace and an entry that creates a state.vscdb file), call
set_cached_projects(fp, projects, warnings), then recompute fp2 =
fingerprint_workspace_storage(...) to mimic a fresh process and assert
get_cached_projects(fp2) is not None; this ensures
set_cached_projects/get_cached_projects handle the workspace_files shape
consistently (see fingerprint_workspace_storage, set_cached_projects,
get_cached_projects, and services/summary_cache.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@services/summary_cache.py`:
- Around line 64-88: The fingerprint uses Python tuples in workspace_files but
JSON serialization converts tuples to lists, so cached fingerprints (lists)
never equal freshly computed fingerprints (tuples); change the construction of
workspace_files in the fingerprint generator to emit JSON-stable lists instead
of tuples (i.e., append [f"{name}/{rel}", mtime] rather than (f"{name}/{rel}",
mtime)) so set_cached_* and get_cached_* compare identical types, and optionally
make _fingerprint_equal normalize sequence types (e.g., convert tuples to lists
or use deep structural equality) to be robust to future tuple/list differences.

In `@services/workspace_db.py`:
- Around line 294-299: COMPOSER_ROWS_WITH_HEADERS_SQL's NOT LIKE predicate only
excludes the compact empty-array form and should also exclude the spaced JSON
form; update the SQL string constant COMPOSER_ROWS_WITH_HEADERS_SQL so the
predicate excludes both "\"fullConversationHeadersOnly\":[]" and
"\"fullConversationHeadersOnly\": []" (e.g. add an additional AND value NOT LIKE
'%fullConversationHeadersOnly\": []%' or normalize/remove whitespace in the
filter) to ensure rows with empty headers are filtered out.

In `@tests/test_workspace_listing_performance.py`:
- Around line 88-109: The test's spy patch is targeting
services.workspace_db.open_global_db but list_workspace_projects calls the
module-local open_global_db imported into services.workspace_listing, so update
the patch to target services.workspace_listing.open_global_db (e.g. import
services.workspace_listing as _ws_listing_mod and use
patch.object(_ws_listing_mod, "open_global_db", _spying_open_global_db)) so the
spy (_spying_open_global_db) actually intercepts calls used by
list_workspace_projects; also ensure executed_queries is asserted non-empty
before filtering for "bubbleId:%" to avoid vacuous success (refer to symbols
_spying_open_global_db, executed_queries, list_workspace_projects,
open_global_db, and patch.object).

In `@tests/test_workspace_tabs_summary.py`:
- Around line 111-133: The SQL spy wrapper in _collect_queries is patching
services.workspace_db.open_global_db but workspace_tabs imports open_global_db
directly, so change the patch target to services.workspace_tabs.open_global_db
in _collect_queries to ensure the wrapper runs; additionally harden the tests to
assert that the captured executed list is non-empty (fail the test if no SQL was
recorded) and update test_no_global_bubble_scan to call the exercised function
with nocache=True to force DB access; use the unique names _collect_queries,
executed, and test_no_global_bubble_scan/test_scoped_bubble_query_only to locate
the spots to modify.

---

Outside diff comments:
In `@templates/workspace.html`:
- Around line 124-175: selectTab can race when multiple async fetches complete
out-of-order; set a per-call marker (e.g. assign selectTab._latest = id at the
start of selectTab) and after any await or before mutating shared state
(tabCache, selectedTab, main-content, calling renderChat) check if
selectTab._latest !== id and if so return early to avoid overwriting a newer
tab; apply these guards around the fetch/try-catch resolution and any other
async points so only the most-recent selection updates the UI.

---

Nitpick comments:
In `@tests/test_summary_cache.py`:
- Around line 37-66: Add a regression test that round-trips a realistic
fingerprint containing workspace_files to catch the tuple→list serialization
mismatch: use fingerprint_workspace_storage to create fp (with a temp workspace
and an entry that creates a state.vscdb file), call set_cached_projects(fp,
projects, warnings), then recompute fp2 = fingerprint_workspace_storage(...) to
mimic a fresh process and assert get_cached_projects(fp2) is not None; this
ensures set_cached_projects/get_cached_projects handle the workspace_files shape
consistently (see fingerprint_workspace_storage, set_cached_projects,
get_cached_projects, and services/summary_cache.py).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: c8f448e2-81de-4319-86b7-18abdc0bb2c8

📥 Commits

Reviewing files that changed from the base of the PR and between 3bd7ce0 and 02fcb65.

📒 Files selected for processing (13)
  • CHANGELOG.md
  • api/workspaces.py
  • services/summary_cache.py
  • services/workspace_db.py
  • services/workspace_listing.py
  • services/workspace_tabs.py
  • static/css/style.css
  • templates/workspace.html
  • tests/test_parse_warnings.py
  • tests/test_project_layouts_dict_shape.py
  • tests/test_summary_cache.py
  • tests/test_workspace_listing_performance.py
  • tests/test_workspace_tabs_summary.py

Comment thread services/summary_cache.py Outdated
Comment thread services/workspace_db.py
Comment thread tests/test_workspace_listing_performance.py
Comment thread tests/test_workspace_tabs_summary.py
@bradjin8 bradjin8 requested a review from clean6378-max-it June 2, 2026 21:00
@clean6378-max-it
Copy link
Copy Markdown
Collaborator

Should fix

services/workspace_tabs.py:693 — In assemble_single_tab, replace load_project_layouts_map(global_db) with load_project_layouts_for_composer(global_db, composer_id) (and a minimal map {composer_id: …} for determine_project_for_conversation). (Per-tab load still full-scans all messageRequestContext:% rows, contradicting the scoped-load design and CHANGELOG wording for the single-tab endpoint.)

services/workspace_tabs.py:696-701 — Gate the broad composerData:% alias query behind invalid_workspace_ids (same pattern as list_workspace_tab_summaries / listing), or scope it to the one composer. (Every tab open re-scans all composers for alias resolution even when no invalid workspaces exist.)

api/workspaces.py:167-192 — Add Flask test-client coverage for GET /api/workspaces//tabs?summary=1 and GET /api/workspaces//tabs/<composer_id> (status, shape, no bubbles on summary). (New user-facing routes are only exercised via direct service calls; aligns with eval verification void.)

Nice to have

services/workspace_tabs.py:594 — messageCount uses len(headers) while _assemble_tab_from_composer_data may omit empty bubbles; consider counting renderable messages or documenting the difference. (Sidebar count can disagree with opened conversation.)

templates/workspace.html:113 — Prefer data-id + addEventListener over onclick="selectTab('${tab.id}')". (Avoids breakage if IDs ever contain quotes; minor hardening.)

services/workspace_listing.py:167-224, services/workspace_tabs.py:531-606 — Shared helper for composer-row → project assignment loop. (Reduces drift between list, summary, and full assembly under monolith-duplication.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Lazy-load chat histories and serve lightweight list summaries (avoid full global index scan on every page load)

2 participants