Skip to content

REFACTOR: consolidate content-filter markers into shared constants#1900

Merged
romanlutz merged 5 commits into
microsoft:mainfrom
romanlutz:romanlutz/standardize-catchable-errors
Jun 5, 2026
Merged

REFACTOR: consolidate content-filter markers into shared constants#1900
romanlutz merged 5 commits into
microsoft:mainfrom
romanlutz:romanlutz/standardize-catchable-errors

Conversation

@romanlutz
Copy link
Copy Markdown
Contributor

Description

Noticed while reviewing #1890 that _is_content_filter_error (and friends) have accumulated a sprawl of content-filter strings spread across multiple lists and branches: exact error.code matches in one list, substring fallbacks in another, a special-case invalid_prompt check with its own marker list, plus a hardcoded literal "Invalid prompt: your prompt was flagged as potentially violating our usage policy." in handle_bad_request_exception. Each new provider variant (most recently MAI's content_safety_violation) has required touching 2-3 of those spots, and it's easy to miss one.

These strings have no canonical spec; they're empirically observed from various OpenAI / Azure / MAI responses, where different providers expose the signal through different fields (error.code, finish_reason, incomplete_details.reason, free-form error.message) and the wording evolves. Given that, a single substring scan over the payload is both more honest about what we know and more resilient than pretending we have an enum.

This PR extracts two well-documented frozenset constants in pyrit/prompt_target/openai/openai_error_handling.py:

  • CONTENT_FILTER_MARKERS - substring markers scanned over the full payload. Covers content_filter, content_safety_violation, policy_violation (substring of Azure's content_policy_violation and OpenAI moderation's usage_policy_violation), and moderation_blocked.
  • SAFETY_MESSAGE_MARKERS - message-text markers used only when error.code == "invalid_prompt" (which is too generic on its own; it's also raised for schema validation errors). Now includes "usage policy" so OpenAI's "flagged as potentially violating our usage policy" prose is caught structurally.

_is_content_filter_error collapses to a single substring scan plus the invalid_prompt special case. The hardcoded literal in handle_bad_request_exception is dropped because the OpenAI flow now sets is_content_filter=True upstream via _extract_error_payload -> _is_content_filter_error -> the new "usage policy" marker; non-OpenAI callers (e.g. azure_ml_chat_target.py) still get the generic "content_filter" in response_text fallback.

Net effect: adding a new provider's content-filter string is now a one-line addition to CONTENT_FILTER_MARKERS, not a multi-file change like #1890 was.

Tests and Documentation

  • Expanded tests/unit/prompt_target/target/test_openai_error_handling.py:
    • Parametrized test asserting each exact error.code marker is detected.
    • Regression test for Azure's content_policy_violation (via the policy_violation substring).
    • Regression test for the OpenAI "usage policy" prose (the literal that used to live in handle_bad_request_exception).
    • Sanity-check tests pinning the contents of both frozensets so accidental removals are caught.
    • Existing invalid_prompt safety / non-safety tests preserved.
  • Verified: full tests/unit/prompt_target/ + tests/unit/exceptions/ suite (976 passed, 29 skipped) and pre-commit (ruff, ty type check) clean.
  • No documentation changes needed; behavior is preserved and the rationale lives in the module-level comments above the constants.

Replace the ad-hoc lists of content-filter strings scattered across _is_content_filter_error and handle_bad_request_exception with two module-level frozenset constants in openai_error_handling.py:

- CONTENT_FILTER_MARKERS: substring markers scanned over the full payload (covers content_filter, content_safety_violation, policy_violation, moderation_blocked, and via substring match Azure's content_policy_violation and OpenAI's usage_policy_violation).

- SAFETY_MESSAGE_MARKERS: message-text markers for the invalid_prompt error code (limited access, safety, usage policy).

Documents the empirical-not-specified nature of these strings and the rationale for each. Adding a new provider variant in the future is now a single-line change in one place rather than touching multiple lists and branches.

Also drops the hardcoded 'Invalid prompt: your prompt was flagged as potentially violating our usage policy.' literal from handle_bad_request_exception. That OpenAI message is now caught upstream via the invalid_prompt + 'usage policy' marker path in _is_content_filter_error (which sets is_content_filter=True before handle_bad_request_exception runs).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@jsong468 jsong468 self-assigned this Jun 3, 2026
Comment thread pyrit/prompt_target/openai/openai_error_handling.py Outdated
@hannahwestra25 hannahwestra25 self-assigned this Jun 3, 2026
Comment thread pyrit/exceptions/exception_classes.py Outdated
Copilot AI and others added 3 commits June 4, 2026 11:05
…s safety markers

The previous logic short-circuited to False for invalid_prompt payloads without a safety message marker, but the pre-refactor code fell through to the full-payload substring scan. This restores that behavior and prevents a regression for payloads where the marker lives outside error.message (e.g., in a nested inner_error.code or content_filter_results field).

Added a regression test covering an invalid_prompt payload whose message lacks a safety marker but whose nested inner_error.code contains 'content_filter'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… set in handle_bad_request_exception

Addresses jsong468's design question on PR microsoft#1900: the substring fallback in
``handle_bad_request_exception`` only matched ``"content_filter"`` while
``_is_content_filter_error`` already scanned the full marker set, so callers
that don't pre-compute ``is_content_filter`` (e.g. ``azure_ml_chat_target``)
silently missed ``moderation_blocked``, ``content_safety_violation``, and
``policy_violation`` payloads.

Fix:
- Move ``CONTENT_FILTER_MARKERS`` to ``pyrit.exceptions.exception_classes``
  as the single source of truth and export it from ``pyrit.exceptions``.
- Re-import (and re-export) the constant from
  ``pyrit.prompt_target.openai.openai_error_handling`` so existing imports
  keep working and the two consumers can't drift.
- Change the fallback condition in ``handle_bad_request_exception`` to scan
  ``response_text`` against the full ``CONTENT_FILTER_MARKERS`` set instead
  of only ``"content_filter"``. The ``is_content_filter`` parameter stays as
  the precise signal from OpenAI flows; the fallback now matches its coverage
  for non-OpenAI callers.

Adds regression tests in ``tests/unit/exceptions/test_exceptions.py`` that
verify (a) the marker set is the same object on both import paths, (b)
every marker triggers the blocked path through ``handle_bad_request_exception``,
(c) ``is_content_filter=True`` still triggers the blocked path, and
(d) unrelated errors are re-raised unchanged.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment thread pyrit/prompt_target/openai/openai_error_handling.py
Comment thread pyrit/prompt_target/openai/openai_error_handling.py Outdated
…error_handling

The re-export and its `historically imported'' justification were both wrong --
CONTENT_FILTER_MARKERS is brand new in this PR (introduced earlier in the same
series of commits), so there are no historical callers to preserve.

Drop the re-export, the related comment block, and the cross-module identity test
that only existed to validate the re-export. Update the openai_error_handling
test imports to pull CONTENT_FILTER_MARKERS from its canonical pyrit.exceptions
location. The import in openai_error_handling.py stays because
_is_content_filter_error still uses it internally.

Per hannah's deprecation question: nothing to deprecate, since the constant was
never the public location.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@romanlutz romanlutz added this pull request to the merge queue Jun 5, 2026
Merged via the queue into microsoft:main with commit a9da0a1 Jun 5, 2026
52 checks passed
@romanlutz romanlutz deleted the romanlutz/standardize-catchable-errors branch June 5, 2026 20:48
romanlutz pushed a commit to romanlutz/PyRIT that referenced this pull request Jun 5, 2026
Picks up 3 commits from main: PR microsoft#1883 (keyword-only init enforcement), microsoft#1934 (dataset metadata), microsoft#1939 (noqa audit), plus microsoft#1900 (content-filter constants) and microsoft#1947 (notebook fixes).

Resolved 6 conflicts in lazy-import noqa/type:ignore annotations by taking main's cleaner version (the # type: ignore[ty:unresolved-import] entries are unused warnings once --extra all is installed).

Re-doing a previous merge attempt that aborted without setting MERGE_HEAD, so the prior merge commit was a single-parent commit that silently dropped origin/main's recent changes.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants