Skip to content

feat(agent): grep-based completeness sweep for the query agent#84

Open
KylinMountain wants to merge 8 commits into
mainfrom
feat/grep-wiki-search
Open

feat(agent): grep-based completeness sweep for the query agent#84
KylinMountain wants to merge 8 commits into
mainfrom
feat/grep-wiki-search

Conversation

@KylinMountain
Copy link
Copy Markdown
Collaborator

Summary

Adds a lexical grep capability to the query/chat agent as a final completeness check before it finalizes an answer — backstopping the lossy LLM-generated summary layer. Inspired by "Is Grep All You Need? How Agent Harnesses Reshape Agentic Search" (arXiv:2605.15184): at OpenKB's scale a capable agent + grep matches embedding/vector retrieval, with zero index infrastructure.

Division of labour stays clean:

  • index.md / summaries — primary breadth (unchanged, still first).
  • PageIndex — depth within long documents (unchanged).
  • grep (new) — completeness sweep over the raw *.md layer to catch details summaries dropped, pages never read, and contradicting mentions.

What changed

  1. grep_wiki_files() helper (openkb/agent/tools.py) — shells out to ripgrep (preferred) / grep (fallback) over every wiki *.md (summaries, concepts, entities, explorations, index.md, short-doc sources/*.md); excludes long-doc *.json (PageIndex's domain) and log.md. Injection-safe (shell=False, pattern via -e), result-capped at 50, --no-ignore so gitignored wiki dirs still match, never raises.
  2. grep_wiki agent tool (openkb/agent/query.py) — wraps the helper on the query agent; chat agent inherits it via clone.
  3. "COMPLETENESS SWEEP" instruction — tells the agent to grep the question's salient terms + unsourced draft claims before finalizing, try lexical variants, fold in unread matches, note contradictions with citations, bounded to ≤3 rounds; explicitly a check, not the primary search.

Test Plan

  • pytest tests/test_grep.py tests/test_query.py → 26 passed
  • Helper unit tests: scope inclusion/exclusion (json + log.md excluded), case sensitivity, fixed-string vs regex, result cap + truncation notice, relative-path output, shell-injection safety, and the ripgrep branch (mock-verified command construction incl. --no-ignore)
  • Agent wiring: grep_wiki registered on query agent; chat inherits via clone
  • Manual: run openkb chat against a KB and confirm the agent greps before answering

Note: 4 unrelated tests/test_url_ingest.py failures are pre-existing (missing optional trafilatura dep), not introduced here.

Comment thread tests/test_grep.py Fixed
Comment thread tests/test_grep.py Fixed
Comment thread tests/test_grep.py

import pytest

import openkb.agent.tools as tools_mod
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant