REFACTOR: consolidate content-filter markers into shared constants by romanlutz · Pull Request #1900 · microsoft/PyRIT

romanlutz · 2026-06-03T02:05:54Z

Description

Noticed while reviewing #1890 that _is_content_filter_error (and friends) have accumulated a sprawl of content-filter strings spread across multiple lists and branches: exact error.code matches in one list, substring fallbacks in another, a special-case invalid_prompt check with its own marker list, plus a hardcoded literal "Invalid prompt: your prompt was flagged as potentially violating our usage policy." in handle_bad_request_exception. Each new provider variant (most recently MAI's content_safety_violation) has required touching 2-3 of those spots, and it's easy to miss one.

These strings have no canonical spec; they're empirically observed from various OpenAI / Azure / MAI responses, where different providers expose the signal through different fields (error.code, finish_reason, incomplete_details.reason, free-form error.message) and the wording evolves. Given that, a single substring scan over the payload is both more honest about what we know and more resilient than pretending we have an enum.

This PR extracts two well-documented frozenset constants in pyrit/prompt_target/openai/openai_error_handling.py:

CONTENT_FILTER_MARKERS - substring markers scanned over the full payload. Covers content_filter, content_safety_violation, policy_violation (substring of Azure's content_policy_violation and OpenAI moderation's usage_policy_violation), and moderation_blocked.
SAFETY_MESSAGE_MARKERS - message-text markers used only when error.code == "invalid_prompt" (which is too generic on its own; it's also raised for schema validation errors). Now includes "usage policy" so OpenAI's "flagged as potentially violating our usage policy" prose is caught structurally.

_is_content_filter_error collapses to a single substring scan plus the invalid_prompt special case. The hardcoded literal in handle_bad_request_exception is dropped because the OpenAI flow now sets is_content_filter=True upstream via _extract_error_payload -> _is_content_filter_error -> the new "usage policy" marker; non-OpenAI callers (e.g. azure_ml_chat_target.py) still get the generic "content_filter" in response_text fallback.

Net effect: adding a new provider's content-filter string is now a one-line addition to CONTENT_FILTER_MARKERS, not a multi-file change like #1890 was.

Tests and Documentation

Expanded tests/unit/prompt_target/target/test_openai_error_handling.py:
- Parametrized test asserting each exact error.code marker is detected.
- Regression test for Azure's content_policy_violation (via the policy_violation substring).
- Regression test for the OpenAI "usage policy" prose (the literal that used to live in handle_bad_request_exception).
- Sanity-check tests pinning the contents of both frozensets so accidental removals are caught.
- Existing invalid_prompt safety / non-safety tests preserved.
Verified: full tests/unit/prompt_target/ + tests/unit/exceptions/ suite (976 passed, 29 skipped) and pre-commit (ruff, ty type check) clean.
No documentation changes needed; behavior is preserved and the rationale lives in the module-level comments above the constants.

Replace the ad-hoc lists of content-filter strings scattered across _is_content_filter_error and handle_bad_request_exception with two module-level frozenset constants in openai_error_handling.py: - CONTENT_FILTER_MARKERS: substring markers scanned over the full payload (covers content_filter, content_safety_violation, policy_violation, moderation_blocked, and via substring match Azure's content_policy_violation and OpenAI's usage_policy_violation). - SAFETY_MESSAGE_MARKERS: message-text markers for the invalid_prompt error code (limited access, safety, usage policy). Documents the empirical-not-specified nature of these strings and the rationale for each. Adding a new provider variant in the future is now a single-line change in one place rather than touching multiple lists and branches. Also drops the hardcoded 'Invalid prompt: your prompt was flagged as potentially violating our usage policy.' literal from handle_bad_request_exception. That OpenAI message is now caught upstream via the invalid_prompt + 'usage policy' marker path in _is_content_filter_error (which sets is_content_filter=True before handle_bad_request_exception runs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…s safety markers The previous logic short-circuited to False for invalid_prompt payloads without a safety message marker, but the pre-refactor code fell through to the full-payload substring scan. This restores that behavior and prevents a regression for payloads where the marker lives outside error.message (e.g., in a nested inner_error.code or content_filter_results field). Added a regression test covering an invalid_prompt payload whose message lacks a safety marker but whose nested inner_error.code contains 'content_filter'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

… set in handle_bad_request_exception Addresses jsong468's design question on PR microsoft#1900: the substring fallback in ``handle_bad_request_exception`` only matched ``"content_filter"`` while ``_is_content_filter_error`` already scanned the full marker set, so callers that don't pre-compute ``is_content_filter`` (e.g. ``azure_ml_chat_target``) silently missed ``moderation_blocked``, ``content_safety_violation``, and ``policy_violation`` payloads. Fix: - Move ``CONTENT_FILTER_MARKERS`` to ``pyrit.exceptions.exception_classes`` as the single source of truth and export it from ``pyrit.exceptions``. - Re-import (and re-export) the constant from ``pyrit.prompt_target.openai.openai_error_handling`` so existing imports keep working and the two consumers can't drift. - Change the fallback condition in ``handle_bad_request_exception`` to scan ``response_text`` against the full ``CONTENT_FILTER_MARKERS`` set instead of only ``"content_filter"``. The ``is_content_filter`` parameter stays as the precise signal from OpenAI flows; the fallback now matches its coverage for non-OpenAI callers. Adds regression tests in ``tests/unit/exceptions/test_exceptions.py`` that verify (a) the marker set is the same object on both import paths, (b) every marker triggers the blocked path through ``handle_bad_request_exception``, (c) ``is_content_filter=True`` still triggers the blocked path, and (d) unrelated errors are re-raised unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

…error_handling The re-export and its `historically imported'' justification were both wrong -- CONTENT_FILTER_MARKERS is brand new in this PR (introduced earlier in the same series of commits), so there are no historical callers to preserve. Drop the re-export, the related comment block, and the cross-module identity test that only existed to validate the re-export. Update the openai_error_handling test imports to pull CONTENT_FILTER_MARKERS from its canonical pyrit.exceptions location. The import in openai_error_handling.py stays because _is_content_filter_error still uses it internally. Per hannah's deprecation question: nothing to deprecate, since the constant was never the public location. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Picks up 3 commits from main: PR microsoft#1883 (keyword-only init enforcement), microsoft#1934 (dataset metadata), microsoft#1939 (noqa audit), plus microsoft#1900 (content-filter constants) and microsoft#1947 (notebook fixes). Resolved 6 conflicts in lazy-import noqa/type:ignore annotations by taking main's cleaner version (the # type: ignore[ty:unresolved-import] entries are unused warnings once --extra all is installed). Re-doing a previous merge attempt that aborted without setting MERGE_HEAD, so the prior merge commit was a single-parent commit that silently dropped origin/main's recent changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

jsong468 self-assigned this Jun 3, 2026

hannahwestra25 reviewed Jun 3, 2026

View reviewed changes

Comment thread pyrit/prompt_target/openai/openai_error_handling.py Outdated

hannahwestra25 self-assigned this Jun 3, 2026

jsong468 reviewed Jun 3, 2026

View reviewed changes

Comment thread pyrit/exceptions/exception_classes.py Outdated

Copilot AI and others added 3 commits June 4, 2026 11:05

Merge branch 'main' into romanlutz/standardize-catchable-errors

3975729

hannahwestra25 reviewed Jun 5, 2026

View reviewed changes

Comment thread pyrit/prompt_target/openai/openai_error_handling.py

hannahwestra25 reviewed Jun 5, 2026

View reviewed changes

Comment thread pyrit/prompt_target/openai/openai_error_handling.py Outdated

jsong468 approved these changes Jun 5, 2026

View reviewed changes

hannahwestra25 approved these changes Jun 5, 2026

View reviewed changes

romanlutz added this pull request to the merge queue Jun 5, 2026

Merged via the queue into microsoft:main with commit a9da0a1 Jun 5, 2026
52 checks passed

romanlutz deleted the romanlutz/standardize-catchable-errors branch June 5, 2026 20:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

REFACTOR: consolidate content-filter markers into shared constants#1900

REFACTOR: consolidate content-filter markers into shared constants#1900
romanlutz merged 5 commits into
microsoft:mainfrom
romanlutz:romanlutz/standardize-catchable-errors

romanlutz commented Jun 3, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

romanlutz commented Jun 3, 2026

Description

Tests and Documentation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants