REFACTOR: consolidate content-filter markers into shared constants#1900
Merged
romanlutz merged 5 commits intoJun 5, 2026
Merged
Conversation
Replace the ad-hoc lists of content-filter strings scattered across _is_content_filter_error and handle_bad_request_exception with two module-level frozenset constants in openai_error_handling.py: - CONTENT_FILTER_MARKERS: substring markers scanned over the full payload (covers content_filter, content_safety_violation, policy_violation, moderation_blocked, and via substring match Azure's content_policy_violation and OpenAI's usage_policy_violation). - SAFETY_MESSAGE_MARKERS: message-text markers for the invalid_prompt error code (limited access, safety, usage policy). Documents the empirical-not-specified nature of these strings and the rationale for each. Adding a new provider variant in the future is now a single-line change in one place rather than touching multiple lists and branches. Also drops the hardcoded 'Invalid prompt: your prompt was flagged as potentially violating our usage policy.' literal from handle_bad_request_exception. That OpenAI message is now caught upstream via the invalid_prompt + 'usage policy' marker path in _is_content_filter_error (which sets is_content_filter=True before handle_bad_request_exception runs). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jsong468
reviewed
Jun 3, 2026
…s safety markers The previous logic short-circuited to False for invalid_prompt payloads without a safety message marker, but the pre-refactor code fell through to the full-payload substring scan. This restores that behavior and prevents a regression for payloads where the marker lives outside error.message (e.g., in a nested inner_error.code or content_filter_results field). Added a regression test covering an invalid_prompt payload whose message lacks a safety marker but whose nested inner_error.code contains 'content_filter'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… set in handle_bad_request_exception Addresses jsong468's design question on PR microsoft#1900: the substring fallback in ``handle_bad_request_exception`` only matched ``"content_filter"`` while ``_is_content_filter_error`` already scanned the full marker set, so callers that don't pre-compute ``is_content_filter`` (e.g. ``azure_ml_chat_target``) silently missed ``moderation_blocked``, ``content_safety_violation``, and ``policy_violation`` payloads. Fix: - Move ``CONTENT_FILTER_MARKERS`` to ``pyrit.exceptions.exception_classes`` as the single source of truth and export it from ``pyrit.exceptions``. - Re-import (and re-export) the constant from ``pyrit.prompt_target.openai.openai_error_handling`` so existing imports keep working and the two consumers can't drift. - Change the fallback condition in ``handle_bad_request_exception`` to scan ``response_text`` against the full ``CONTENT_FILTER_MARKERS`` set instead of only ``"content_filter"``. The ``is_content_filter`` parameter stays as the precise signal from OpenAI flows; the fallback now matches its coverage for non-OpenAI callers. Adds regression tests in ``tests/unit/exceptions/test_exceptions.py`` that verify (a) the marker set is the same object on both import paths, (b) every marker triggers the blocked path through ``handle_bad_request_exception``, (c) ``is_content_filter=True`` still triggers the blocked path, and (d) unrelated errors are re-raised unchanged. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
jsong468
approved these changes
Jun 5, 2026
…error_handling The re-export and its `historically imported'' justification were both wrong -- CONTENT_FILTER_MARKERS is brand new in this PR (introduced earlier in the same series of commits), so there are no historical callers to preserve. Drop the re-export, the related comment block, and the cross-module identity test that only existed to validate the re-export. Update the openai_error_handling test imports to pull CONTENT_FILTER_MARKERS from its canonical pyrit.exceptions location. The import in openai_error_handling.py stays because _is_content_filter_error still uses it internally. Per hannah's deprecation question: nothing to deprecate, since the constant was never the public location. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
hannahwestra25
approved these changes
Jun 5, 2026
romanlutz
pushed a commit
to romanlutz/PyRIT
that referenced
this pull request
Jun 5, 2026
Picks up 3 commits from main: PR microsoft#1883 (keyword-only init enforcement), microsoft#1934 (dataset metadata), microsoft#1939 (noqa audit), plus microsoft#1900 (content-filter constants) and microsoft#1947 (notebook fixes). Resolved 6 conflicts in lazy-import noqa/type:ignore annotations by taking main's cleaner version (the # type: ignore[ty:unresolved-import] entries are unused warnings once --extra all is installed). Re-doing a previous merge attempt that aborted without setting MERGE_HEAD, so the prior merge commit was a single-parent commit that silently dropped origin/main's recent changes. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Noticed while reviewing #1890 that
_is_content_filter_error(and friends) have accumulated a sprawl of content-filter strings spread across multiple lists and branches: exacterror.codematches in one list, substring fallbacks in another, a special-caseinvalid_promptcheck with its own marker list, plus a hardcoded literal"Invalid prompt: your prompt was flagged as potentially violating our usage policy."inhandle_bad_request_exception. Each new provider variant (most recently MAI'scontent_safety_violation) has required touching 2-3 of those spots, and it's easy to miss one.These strings have no canonical spec; they're empirically observed from various OpenAI / Azure / MAI responses, where different providers expose the signal through different fields (
error.code,finish_reason,incomplete_details.reason, free-formerror.message) and the wording evolves. Given that, a single substring scan over the payload is both more honest about what we know and more resilient than pretending we have an enum.This PR extracts two well-documented
frozensetconstants inpyrit/prompt_target/openai/openai_error_handling.py:CONTENT_FILTER_MARKERS- substring markers scanned over the full payload. Coverscontent_filter,content_safety_violation,policy_violation(substring of Azure'scontent_policy_violationand OpenAI moderation'susage_policy_violation), andmoderation_blocked.SAFETY_MESSAGE_MARKERS- message-text markers used only whenerror.code == "invalid_prompt"(which is too generic on its own; it's also raised for schema validation errors). Now includes"usage policy"so OpenAI's "flagged as potentially violating our usage policy" prose is caught structurally._is_content_filter_errorcollapses to a single substring scan plus theinvalid_promptspecial case. The hardcoded literal inhandle_bad_request_exceptionis dropped because the OpenAI flow now setsis_content_filter=Trueupstream via_extract_error_payload->_is_content_filter_error-> the new"usage policy"marker; non-OpenAI callers (e.g.azure_ml_chat_target.py) still get the generic"content_filter" in response_textfallback.Net effect: adding a new provider's content-filter string is now a one-line addition to
CONTENT_FILTER_MARKERS, not a multi-file change like #1890 was.Tests and Documentation
tests/unit/prompt_target/target/test_openai_error_handling.py:error.codemarker is detected.content_policy_violation(via thepolicy_violationsubstring).handle_bad_request_exception).invalid_promptsafety / non-safety tests preserved.tests/unit/prompt_target/+tests/unit/exceptions/suite (976 passed, 29 skipped) and pre-commit (ruff, ty type check) clean.