Add typed raw accessors with drift logging; migrate workspace tabs and resolver by bradjin8 · Pull Request #94 · cppalliance/cppa-cursor-browser

bradjin8 · 2026-06-03T22:48:01Z

Summary

Introduce models/raw_access.py with shared optional-key readers and schema-drift WARNING logging when structurally important keys are missing or mistyped (projectLayouts, bubbleId in headers, etc.).
Add typed accessors on Composer and Bubble (newly_created_files, relevant_files, usage_data, bubble_timestamp_ms(), and related fields) so services can use model fields instead of silent .get() chains after from_dict validation.
Migrate services/workspace_tabs.py and services/workspace_resolver.py to pass Composer / Bubble through the hot path (bubble_map stores Bubble instances; determine_project_for_conversation accepts Composer | dict).
Add tests/test_raw_accessors.py for drift logging and bridge helpers.

Closes #90

Test plan

python -m pytest tests/test_raw_accessors.py -v
python -m pytest tests/test_workspace_assignment_fallback.py tests/test_models.py -v
python -m pytest tests/ -q (371 passed)
python -m mypy models services/workspace_tabs.py services/workspace_resolver.py
Manual: open workspace tabs in the UI for a workspace with real Cursor data; confirm tabs load and logs do not spam warnings for normal composers missing optional keys (newlyCreatedFiles, relevantFiles)

Notes

Commonly absent optional fields return empty defaults without a missing-key warning; wrong types still log drift.
Follow-up (out of scope here): migrate remaining .get() usage in workspace_listing.py, api/search.py, and scripts/export.py.

Summary by CodeRabbit

New Features
- Conversations and bubbles now expose richer usage/metrics, file-change rollups, model info, and more reliable timestamps; workspace tabs and project inference assemble from typed conversation/bubble objects for clearer displays.
- Text extraction accepts broader bubble inputs for more reliable content display.
Bug Fixes
- Fewer crashes and incorrect displays when conversation JSON is malformed; safer defaults and schema-drift warnings replace errors.
Tests
- Added tests covering typed raw accessors and schema-drift logging.

coderabbitai · 2026-06-03T22:48:12Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: f8e10e61-e26b-4bbb-81c3-d3eb936af80c

📥 Commits

Reviewing files that changed from the base of the PR and between 768c31d and 838ff0f.

📒 Files selected for processing (1)

services/workspace_tabs.py

📝 Walkthrough

Walkthrough

Adds defensive, raw-backed properties to Composer and Bubble, a new models/raw_access module with optional-field readers and schema-drift warnings, refactors workspace resolver/tabs to use these typed accessors and model objects, updates text extraction to accept model objects, and adds tests for drift-logging behavior.

Changes

Typed Raw Accessor Layer

Layer / File(s)	Summary
Composer and Bubble typed properties `models/conversation.py`	`Composer` and `Bubble` dataclasses gain typed, raw-backed properties and helpers that validate shapes, log schema-drift warnings on mismatches, and return safe defaults.
Raw access helper module with generic and domain-specific extractors `models/raw_access.py`	New module with generic optional readers (`optional_raw_value`, `optional_raw_list`, `optional_raw_dict`, `optional_raw_str`, `optional_raw_number`), missing-key warning helper, domain extractors (`conversation_header_bubble_id`, `message_request_context_project_layouts`), and polymorphic composer/bubble accessors.
Workspace resolver refactor to use typed accessors `services/workspace_resolver.py`	Replaces direct nested JSON reads with raw_access helpers (project layouts, newlyCreatedFiles, codeBlockData, headers, bubble-derived file/URI candidates); `determine_project_for_conversation` accepts `Composer
Workspace tabs refactor to use typed Bubble and Composer `services/workspace_tabs.py`	Stores `Bubble` instances in `bubble_map`, reads bubble/composer fields via model properties/methods (`bubble_timestamp_ms()`, `model_info`, `usage_data`, `total_*`, etc.), and extracts messageRequestContext layouts via `message_request_context_project_layouts`.
Schema-drift test coverage and text extraction update `tests/test_raw_accessors.py`, `utils/text_extract.py`, `tests/test_blob_parsing_fuzz.py`	New tests exercise drift-warning logging and safe defaults; `extract_text_from_bubble` now accepts dicts or objects with a `raw` attribute and unwraps accordingly; small Hypothesis adjustment.

Sequence Diagram(s)

sequenceDiagram
  participant Client as service caller
  participant Tabs as services.workspace_tabs
  participant Resolver as services.workspace_resolver
  participant Models as models.raw_access / models.conversation

  Client->>Tabs: request tabs assembly (composer raw rows)
  Tabs->>Models: Composer.from_dict / Composer properties (usage_data, model_name_from_config)
  Tabs->>Models: conversation_header_bubble_id(header)
  Tabs->>Resolver: determine_project_for_conversation(composer)
  Resolver->>Models: composer_newly_created_files / composer_code_block_data / bubble helpers
  Models-->>Resolver: validated lists/dicts or safe defaults
  Resolver-->>Tabs: resolved project/workspace
  Tabs-->>Client: assembled tab payloads

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cppalliance/cppa-cursor-browser#83: Touches the workspace composition pipeline and determine_project_for_conversation.
cppalliance/cppa-cursor-browser#30: Related upstream typed Composer/Bubble parsing that these accessors build upon.
cppalliance/cppa-cursor-browser#31: Refactors workspace inference/tab assembly logic overlapping this area.

Suggested reviewers

clean6378-max-it
timon0305
wpak-ai

Poem

🐰 I dug through raw dict fields at dawn,
Found keys that slipped and types withdrawn.
I logged a whisper when the schema swayed,
Returned safe defaults where values frayed.
Hops of joy — the rabbit’s patch is calm.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and concisely summarizes the main changes: adding typed raw accessors with drift logging and migrating workspace tabs/resolver to use them.
Linked Issues check	✅ Passed	The PR successfully addresses all acceptance criteria from issue `#90`: implements typed accessors for top raw dict keys, adds schema-drift WARNING logging, includes type annotations for mypy verification, migrates workspace_tabs.py and workspace_resolver.py, and provides comprehensive test coverage.
Out of Scope Changes check	✅ Passed	All changes are directly aligned with issue `#90` objectives: models/raw_access.py provides the accessor infrastructure, models/conversation.py adds typed properties, service migrations use these accessors, tests validate drift-logging, and minor updates to text_extract.py and test_blob_parsing_fuzz.py support the core functionality.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/typed-accessor-methods

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (3)

tests/test_raw_accessors.py (1)

26-30: ⚡ Quick win

Make the “missing key” setup explicit.

This test currently relies on GOOD_COMPOSER_RAW not containing newlyCreatedFiles. To keep the test stable and self-descriptive, remove the key explicitly in-test before calling Composer.from_dict(...).

Suggested patch

-        bare = Composer.from_dict(GOOD_COMPOSER_RAW, composer_id="cid-2")
+        raw = dict(GOOD_COMPOSER_RAW)
+        raw.pop("newlyCreatedFiles", None)
+        bare = Composer.from_dict(raw, composer_id="cid-2")

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_raw_accessors.py` around lines 26 - 30, The test
test_composer_newly_created_files_empty_when_key_missing should explicitly
remove the 'newlyCreatedFiles' key from the input before calling
Composer.from_dict to avoid relying on the global GOOD_COMPOSER_RAW shape;
create a shallow or deep copy of GOOD_COMPOSER_RAW, pop the 'newlyCreatedFiles'
key if present on that copy, then pass the modified dict into
Composer.from_dict(..., composer_id="cid-2") and keep the existing with
self.assertNoLogs and self.assertEqual(bare.newly_created_files, []). Ensure you
do not mutate the global GOOD_COMPOSER_RAW variable.

utils/text_extract.py (1)

24-27: ⚡ Quick win

Typing precision: bubble: dict | object is effectively just object for static type checking (since dict is a subtype of object), so it loses most type value; narrow it to the specific “has raw” shape (e.g., a Protocol) to keep mypy/pyright helpful.

Suggested patch

+from typing import Any, Protocol
+
+class HasRaw(Protocol):
+    raw: dict[str, Any]
+
-def extract_text_from_bubble(bubble: dict | object) -> str:
+def extract_text_from_bubble(bubble: dict[str, Any] | HasRaw) -> str:
     """Extract displayable text from a bubble object (text, richText, codeBlocks)."""
-    if hasattr(bubble, "raw"):
-        bubble = bubble.raw  # type: ignore[union-attr]
+    if hasattr(bubble, "raw"):
+        bubble = bubble.raw

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@utils/text_extract.py` around lines 24 - 27, The parameter typing for
extract_text_from_bubble is too broad (dict | object) so static checkers lose
precision; define a small Protocol (e.g., HasRaw with attribute raw: dict) or a
TypedDict for the expected bubble shape and change the function signature to use
that Protocol/TypedDict (e.g., bubble: HasRaw | dict) so mypy/pyright can infer
attributes like raw and the contained fields used later in
extract_text_from_bubble; update any imports and type comments to keep
compatibility with Python versions in the project.

models/conversation.py (1)

3-9: 💤 Low value

Move _logger below the imports to avoid E402.

The _logger = logging.getLogger(__name__) assignment at Line 7 sits between from typing import Any and the from models.errors import SchemaError block, so the latter is a module-level import that no longer appears at the top of the file. Most linters (flake8/ruff E402) will flag this.

♻️ Proposed reordering

 import logging
 from dataclasses import dataclass, field
 from typing import Any
 
-_logger = logging.getLogger(__name__)
-
 from models.errors import SchemaError
 from models.from_dict_validation import (
     require_dict,
     require_key,
     require_non_empty_str,
     require_non_empty_str_field,
     require_type,
 )
+
+_logger = logging.getLogger(__name__)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@models/conversation.py` around lines 3 - 9, _the module-level logger
`_logger` is defined before the final import `from models.errors import
SchemaError`, causing an import-after-code lint error (E402); move the `_logger
= logging.getLogger(__name__)` line so it appears after all import statements
(including `from models.errors import SchemaError`) — locate `_logger` and the
import `SchemaError` in this file and reorder so all imports come first, then
initialize `_logger`.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@models/raw_access.py`:
- Around line 158-190: composer_headers currently calls optional_raw_list which
emits a missing-key WARNING, while composer_newly_created_files silently returns
[] when the key is absent; align them to the silent-on-missing contract by
changing composer_headers to mirror composer_newly_created_files: if data is a
Composer return data.full_conversation_headers_only, otherwise read value =
data.get("fullConversationHeadersOnly") when data is a dict, return [] if value
is None, warn only if value exists but is not a list (similar warning message
pattern used in composer_newly_created_files), and avoid relying on
optional_raw_list to emit missing-key warnings.
- Around line 203-242: The three accessors bubble_relevant_files,
bubble_attached_file_uris, and bubble_context silently drop values of the wrong
type instead of logging drift; change their dict-path handling to call the
existing helpers optional_raw_list (for keys "relevantFiles" and
"attachedFileCodeChunksUris") and optional_raw_dict (for key "context") rather
than manually checking types, so type-mismatches are logged consistently (remove
the duplicated extraction logic and return the helper results when present).

---

Nitpick comments:
In `@models/conversation.py`:
- Around line 3-9: _the module-level logger `_logger` is defined before the
final import `from models.errors import SchemaError`, causing an
import-after-code lint error (E402); move the `_logger =
logging.getLogger(__name__)` line so it appears after all import statements
(including `from models.errors import SchemaError`) — locate `_logger` and the
import `SchemaError` in this file and reorder so all imports come first, then
initialize `_logger`.

In `@tests/test_raw_accessors.py`:
- Around line 26-30: The test
test_composer_newly_created_files_empty_when_key_missing should explicitly
remove the 'newlyCreatedFiles' key from the input before calling
Composer.from_dict to avoid relying on the global GOOD_COMPOSER_RAW shape;
create a shallow or deep copy of GOOD_COMPOSER_RAW, pop the 'newlyCreatedFiles'
key if present on that copy, then pass the modified dict into
Composer.from_dict(..., composer_id="cid-2") and keep the existing with
self.assertNoLogs and self.assertEqual(bare.newly_created_files, []). Ensure you
do not mutate the global GOOD_COMPOSER_RAW variable.

In `@utils/text_extract.py`:
- Around line 24-27: The parameter typing for extract_text_from_bubble is too
broad (dict | object) so static checkers lose precision; define a small Protocol
(e.g., HasRaw with attribute raw: dict) or a TypedDict for the expected bubble
shape and change the function signature to use that Protocol/TypedDict (e.g.,
bubble: HasRaw | dict) so mypy/pyright can infer attributes like raw and the
contained fields used later in extract_text_from_bubble; update any imports and
type comments to keep compatibility with Python versions in the project.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 563cb83f-22bd-4ffb-a6d0-7d58c209538a

📥 Commits

Reviewing files that changed from the base of the PR and between 2839571 and 1e98ae3.

📒 Files selected for processing (6)

models/conversation.py
models/raw_access.py
services/workspace_resolver.py
services/workspace_tabs.py
tests/test_raw_accessors.py
utils/text_extract.py

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

services/workspace_tabs.py (2)

136-143: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Don't treat an empty raw bubble payload as "missing".

bubble_map.get() returns None only when the id is absent. With the new Bubble | dict bridge, if not bubble_entry also drops {} payloads before Bubble.from_dict(...) runs, so context-only bubbles can disappear silently in the single-tab path.

Proposed fix

         bubble_id = conversation_header_bubble_id(header, composer_id=composer_id)
         if not bubble_id:
             continue
         bubble_entry = bubble_map.get(bubble_id)
-        if not bubble_entry:
+        if bubble_entry is None:
             continue
         if isinstance(bubble_entry, Bubble):
             bubble = bubble_entry
         else:

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_tabs.py` around lines 136 - 143, The code is incorrectly
treating empty dict payloads as missing because it uses a falsy check; change
the presence check to a strict None test so empty mappings are preserved —
replace "if not bubble_entry" with "if bubble_entry is None" (or alternately
check "if bubble_id not in bubble_map") before the isinstance/Bubble.from_dict
branch to ensure {} payloads are passed into Bubble.from_dict rather than
skipped.

573-617: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Validate summary rows with Composer.from_dict() too.

This path still bypasses the typed composer contract and reads cd.get(...) directly. A drifted row can therefore appear in list_workspace_tab_summaries() even though assemble_single_tab() and assemble_workspace_tabs() reject the same composer via Composer.from_dict(), which leaves the sidebar and detail endpoints disagreeing on what a valid conversation is.

Proposed fix

             if not isinstance(cd, dict):
                 parse_warnings.record_composer_skipped()
                 continue
             try:
+                composer = Composer.from_dict(cd, composer_id=composer_id)
                 if (
                     composer_id not in composer_id_to_ws
                     and composer_id not in project_layouts_map
                 ):
                     project_layouts_map[composer_id] = load_project_layouts_for_composer(
                         global_db, composer_id,
                     )
                 pid = determine_project_for_conversation(
-                    cd, composer_id, project_layouts_map,
+                    composer, composer_id, project_layouts_map,
                     project_name_map, workspace_path_map,
                     workspace_entries, {}, composer_id_to_ws, invalid_workspace_ids,
                 )
@@
-                headers = cd.get("fullConversationHeadersOnly") or []
+                headers = composer.full_conversation_headers_only
                 if not headers:
                     continue

-                title = cd.get("name") or f"Conversation {composer_id[:8]}"
+                title = composer.name or f"Conversation {composer_id[:8]}"

-                _early_model_config = cd.get("modelConfig") or {}
-                _early_model_name = _early_model_config.get("modelName")
+                _early_model_name = composer.model_name_from_config()
                 _early_model_names = [_early_model_name] if _early_model_name and _early_model_name != "default" else None
@@
                 tab_entry: dict = {
                     "id": composer_id,
                     "title": title,
-                    "timestamp": to_epoch_ms(cd.get("lastUpdatedAt")) or to_epoch_ms(cd.get("createdAt")) or int(datetime.now().timestamp() * 1000),
+                    "timestamp": to_epoch_ms(composer.last_updated_at) or to_epoch_ms(composer.created_at) or int(datetime.now().timestamp() * 1000),
                     "messageCount": len(headers),
                 }
+            except SchemaError as e:
+                _logger.warning(
+                    "Failed to parse Composer from composerData:%s: %s",
+                    composer_id,
+                    e,
+                )
+                parse_warnings.record_composer_skipped()
+                continue

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@services/workspace_tabs.py` around lines 573 - 617, The summary-row path in
list_workspace_tab_summaries currently reads raw composer dicts (cd) directly
and can include drifted rows that Composer.from_dict() would reject; update the
logic in list_workspace_tab_summaries to attempt constructing a typed Composer
via Composer.from_dict(cd) (same validation used by
assemble_single_tab()/assemble_workspace_tabs()), and if Composer.from_dict
raises/returns invalid, skip that summary row (continue) so the sidebar and
detail endpoints remain consistent; reference the composer dict variable cd and
functions Composer.from_dict, assemble_single_tab, assemble_workspace_tabs when
locating where to insert this validation.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@services/workspace_tabs.py`:
- Around line 136-143: The code is incorrectly treating empty dict payloads as
missing because it uses a falsy check; change the presence check to a strict
None test so empty mappings are preserved — replace "if not bubble_entry" with
"if bubble_entry is None" (or alternately check "if bubble_id not in
bubble_map") before the isinstance/Bubble.from_dict branch to ensure {} payloads
are passed into Bubble.from_dict rather than skipped.
- Around line 573-617: The summary-row path in list_workspace_tab_summaries
currently reads raw composer dicts (cd) directly and can include drifted rows
that Composer.from_dict() would reject; update the logic in
list_workspace_tab_summaries to attempt constructing a typed Composer via
Composer.from_dict(cd) (same validation used by
assemble_single_tab()/assemble_workspace_tabs()), and if Composer.from_dict
raises/returns invalid, skip that summary row (continue) so the sidebar and
detail endpoints remain consistent; reference the composer dict variable cd and
functions Composer.from_dict, assemble_single_tab, assemble_workspace_tabs when
locating where to insert this validation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d9b0fc91-65e8-46d2-955e-ee7f60388a4f

📥 Commits

Reviewing files that changed from the base of the PR and between 19eb9da and 8d45e95.

📒 Files selected for processing (2)

services/workspace_tabs.py
tests/test_blob_parsing_fuzz.py

bradjin8 self-assigned this Jun 3, 2026

coderabbitai Bot reviewed Jun 3, 2026

View reviewed changes

Comment thread models/raw_access.py

Comment thread models/raw_access.py

feat: initial implementation for typed accessor methods

19eb9da

bradjin8 force-pushed the feat/typed-accessor-methods branch from 1e98ae3 to 19eb9da Compare June 4, 2026 00:00

fix: typecheck fail error

8d45e95

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

bradjin8 added 2 commits June 3, 2026 21:44

fix: review comments by ai

768c31d

fix: outside diff range comments

838ff0f

bradjin8 requested a review from clean6378-max-it June 4, 2026 13:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add typed raw accessors with drift logging; migrate workspace tabs and resolver#94

Add typed raw accessors with drift logging; migrate workspace tabs and resolver#94
bradjin8 wants to merge 4 commits into
masterfrom
feat/typed-accessor-methods

bradjin8 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bradjin8 commented Jun 3, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Notes

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

bradjin8 commented Jun 3, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 3, 2026 •

edited

Loading