Lazy-load chat histories, fast list summaries, and mtime disk cache (closes #84)#88
Lazy-load chat histories, fast list summaries, and mtime disk cache (closes #84)#88bradjin8 wants to merge 5 commits into
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (6)
🚧 Files skipped from review as they are similar to previous changes (6)
📝 WalkthroughWalkthroughAdds a fingerprinted disk-backed summary cache, composer-scoped KV loaders to avoid global bubble scans, refactors workspace listing and tab assembly to use cached composer↔workspace maps, exposes summary-only and per-composer tab endpoints, and updates the frontend to lazy-load full tabs on demand with tests preventing unscoped scans. ChangesLazy-load workspace tabs with summary cache and scoped loaders
Estimated code review effort 🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
templates/workspace.html (1)
124-175:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winPossible stale-render race when switching tabs quickly.
selectTabis now async and awaits a network fetch before callingrenderChat. If a user clicks conversation A (slow fetch) then conversation B, B may render first and A's later-resolving fetch will overwritemain-contentwith A's content while the sidebar shows B as active. Consider tracking the latest requested id and bailing out of stale resolutions.🛠️ Suggested guard
async function selectTab(id) { const summary = allTabs.find(t => t.id === id); if (!summary) return; + selectTab._latest = id;Then after each
await, before mutatingselectedTab/main-content:if (selectTab._latest !== id) return; // a newer selection superseded this one🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@templates/workspace.html` around lines 124 - 175, selectTab can race when multiple async fetches complete out-of-order; set a per-call marker (e.g. assign selectTab._latest = id at the start of selectTab) and after any await or before mutating shared state (tabCache, selectedTab, main-content, calling renderChat) check if selectTab._latest !== id and if so return early to avoid overwriting a newer tab; apply these guards around the fetch/try-catch resolution and any other async points so only the most-recent selection updates the UI.
🧹 Nitpick comments (1)
tests/test_summary_cache.py (1)
37-66: ⚡ Quick winAdd a round-trip test with a realistic fingerprint.
Current tests only exercise scalar-only fingerprints, so they don't catch the tuple→list serialization mismatch in
fingerprint_workspace_storage(see theservices/summary_cache.pyfinding). A test that callsfingerprint_workspace_storage(soworkspace_filesis populated), thenset_cached_projects/get_cached_projectswith that fingerprint, would guard against the cache silently missing on every repeat load.💚 Suggested regression test
def test_cache_hit_with_workspace_files_fingerprint(self): with tempfile.TemporaryDirectory() as ws: entry_dir = os.path.join(ws, "entry1") os.makedirs(entry_dir) with open(os.path.join(entry_dir, "state.vscdb"), "wb") as f: f.write(b"x") entries = [{"name": "entry1"}] fp = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[]) set_cached_projects(fp, [{"id": "a"}], []) # Recompute the fingerprint to mimic a fresh process/page load. fp2 = fingerprint_workspace_storage(ws, entries, global_db_path=None, rules=[]) self.assertIsNotNone(get_cached_projects(fp2))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/test_summary_cache.py` around lines 37 - 66, Add a regression test that round-trips a realistic fingerprint containing workspace_files to catch the tuple→list serialization mismatch: use fingerprint_workspace_storage to create fp (with a temp workspace and an entry that creates a state.vscdb file), call set_cached_projects(fp, projects, warnings), then recompute fp2 = fingerprint_workspace_storage(...) to mimic a fresh process and assert get_cached_projects(fp2) is not None; this ensures set_cached_projects/get_cached_projects handle the workspace_files shape consistently (see fingerprint_workspace_storage, set_cached_projects, get_cached_projects, and services/summary_cache.py).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@services/summary_cache.py`:
- Around line 64-88: The fingerprint uses Python tuples in workspace_files but
JSON serialization converts tuples to lists, so cached fingerprints (lists)
never equal freshly computed fingerprints (tuples); change the construction of
workspace_files in the fingerprint generator to emit JSON-stable lists instead
of tuples (i.e., append [f"{name}/{rel}", mtime] rather than (f"{name}/{rel}",
mtime)) so set_cached_* and get_cached_* compare identical types, and optionally
make _fingerprint_equal normalize sequence types (e.g., convert tuples to lists
or use deep structural equality) to be robust to future tuple/list differences.
In `@services/workspace_db.py`:
- Around line 294-299: COMPOSER_ROWS_WITH_HEADERS_SQL's NOT LIKE predicate only
excludes the compact empty-array form and should also exclude the spaced JSON
form; update the SQL string constant COMPOSER_ROWS_WITH_HEADERS_SQL so the
predicate excludes both "\"fullConversationHeadersOnly\":[]" and
"\"fullConversationHeadersOnly\": []" (e.g. add an additional AND value NOT LIKE
'%fullConversationHeadersOnly\": []%' or normalize/remove whitespace in the
filter) to ensure rows with empty headers are filtered out.
In `@tests/test_workspace_listing_performance.py`:
- Around line 88-109: The test's spy patch is targeting
services.workspace_db.open_global_db but list_workspace_projects calls the
module-local open_global_db imported into services.workspace_listing, so update
the patch to target services.workspace_listing.open_global_db (e.g. import
services.workspace_listing as _ws_listing_mod and use
patch.object(_ws_listing_mod, "open_global_db", _spying_open_global_db)) so the
spy (_spying_open_global_db) actually intercepts calls used by
list_workspace_projects; also ensure executed_queries is asserted non-empty
before filtering for "bubbleId:%" to avoid vacuous success (refer to symbols
_spying_open_global_db, executed_queries, list_workspace_projects,
open_global_db, and patch.object).
In `@tests/test_workspace_tabs_summary.py`:
- Around line 111-133: The SQL spy wrapper in _collect_queries is patching
services.workspace_db.open_global_db but workspace_tabs imports open_global_db
directly, so change the patch target to services.workspace_tabs.open_global_db
in _collect_queries to ensure the wrapper runs; additionally harden the tests to
assert that the captured executed list is non-empty (fail the test if no SQL was
recorded) and update test_no_global_bubble_scan to call the exercised function
with nocache=True to force DB access; use the unique names _collect_queries,
executed, and test_no_global_bubble_scan/test_scoped_bubble_query_only to locate
the spots to modify.
---
Outside diff comments:
In `@templates/workspace.html`:
- Around line 124-175: selectTab can race when multiple async fetches complete
out-of-order; set a per-call marker (e.g. assign selectTab._latest = id at the
start of selectTab) and after any await or before mutating shared state
(tabCache, selectedTab, main-content, calling renderChat) check if
selectTab._latest !== id and if so return early to avoid overwriting a newer
tab; apply these guards around the fetch/try-catch resolution and any other
async points so only the most-recent selection updates the UI.
---
Nitpick comments:
In `@tests/test_summary_cache.py`:
- Around line 37-66: Add a regression test that round-trips a realistic
fingerprint containing workspace_files to catch the tuple→list serialization
mismatch: use fingerprint_workspace_storage to create fp (with a temp workspace
and an entry that creates a state.vscdb file), call set_cached_projects(fp,
projects, warnings), then recompute fp2 = fingerprint_workspace_storage(...) to
mimic a fresh process and assert get_cached_projects(fp2) is not None; this
ensures set_cached_projects/get_cached_projects handle the workspace_files shape
consistently (see fingerprint_workspace_storage, set_cached_projects,
get_cached_projects, and services/summary_cache.py).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: c8f448e2-81de-4319-86b7-18abdc0bb2c8
📒 Files selected for processing (13)
CHANGELOG.mdapi/workspaces.pyservices/summary_cache.pyservices/workspace_db.pyservices/workspace_listing.pyservices/workspace_tabs.pystatic/css/style.csstemplates/workspace.htmltests/test_parse_warnings.pytests/test_project_layouts_dict_shape.pytests/test_summary_cache.pytests/test_workspace_listing_performance.pytests/test_workspace_tabs_summary.py
Should fixservices/workspace_tabs.py:693 — In assemble_single_tab, replace load_project_layouts_map(global_db) with load_project_layouts_for_composer(global_db, composer_id) (and a minimal map {composer_id: …} for determine_project_for_conversation). (Per-tab load still full-scans all messageRequestContext:% rows, contradicting the scoped-load design and CHANGELOG wording for the single-tab endpoint.) services/workspace_tabs.py:696-701 — Gate the broad composerData:% alias query behind invalid_workspace_ids (same pattern as list_workspace_tab_summaries / listing), or scope it to the one composer. (Every tab open re-scans all composers for alias resolution even when no invalid workspaces exist.) api/workspaces.py:167-192 — Add Flask test-client coverage for GET /api/workspaces//tabs?summary=1 and GET /api/workspaces//tabs/<composer_id> (status, shape, no bubbles on summary). (New user-facing routes are only exercised via direct service calls; aligns with eval verification void.) Nice to haveservices/workspace_tabs.py:594 — messageCount uses len(headers) while _assemble_tab_from_composer_data may omit empty bubbles; consider counting renderable messages or documenting the difference. (Sidebar count can disagree with opened conversation.) templates/workspace.html:113 — Prefer data-id + addEventListener over onclick="selectTab('${tab.id}')". (Avoids breakage if IDs ever contain quotes; minor hardening.) services/workspace_listing.py:167-224, services/workspace_tabs.py:531-606 — Shared helper for composer-row → project assignment loop. (Reduces drift between list, summary, and full assembly under monolith-duplication.) |
Summary
Implements issue #84: split summary and full-assembly code paths so list views avoid scanning global
bubbleId:%rows, the workspace UI lazy-loads conversation bodies on tab selection, and repeat page loads are served from an mtime-keyed disk cache.GET /api/workspaces) — no global bubble scan; SQL-filteredcomposerData; per-composer MRC only when needed; cachedcomposer_id_to_wsand project listGET /api/workspaces/<id>/tabs?summary=1returns titles/metadata only; full payloads viaGET /api/workspaces/<id>/tabs/<composer_id>~/.cache/cursor-chat-browser/; invalidate on storage mtimes; bypass with?nocache=1orCURSOR_CHAT_BROWSER_NOCACHE=1/tabs, export, and search — unchanged; monolithic/tabsstill available for backward compatibilityProblem
On large local Cursor datasets, the home page and workspace page each rescanned global KV storage on every load (often 1–2+ minutes). The UI blocked on full tab assembly before the sidebar could render.
Solution
Backend
services/workspace_listing.pyload_bubble_map; SQL filter for non-empty headers; lazyload_project_layouts_for_composer; disk cache wrapperservices/workspace_tabs.pylist_workspace_tab_summaries,assemble_single_tab, shared_assemble_tab_from_composer_data; scoped bubble/MRC/diff loadersservices/workspace_db.pybuild_composer_id_to_workspace_id_cachedservices/summary_cache.pyapi/workspaces.py?summary=1,/tabs/<composer_id>,?nocache=1Frontend
templates/workspace.html— fetch summary tabs on load; lazy-fetch full tab on selection;tabCachefor loaded conversationsstatic/css/style.css— stable grid width during load;.main-content.is-loadingfor centered spinnerTests (356 pass)
tests/test_workspace_listing_performance.py— spy: no globalbubbleId:%on list pathtests/test_workspace_tabs_summary.py— summary/single-tab scoped queries and payload shapetests/test_summary_cache.py— cache hit/miss and fingerprint invalidationPerformance notes
GET /api/workspaces?nocache=1Document representative row counts and timings from your local fixture in review comments if helpful.
Test plan
pytest— all tests pass locallyGET /api/workspaces/<id>/tabs(nosummary=1) still work?nocache=1bypasses cache; cache files appear under~/.cache/cursor-chat-browser/Closes #84
Summary by CodeRabbit
New Features
Performance
Style
Documentation
Tests
Bug Fixes