Skip to content

[Regression] FSMN-VAD outputs single huge segment instead of separated speech segments in 1.3.9 (works in 1.3.1) #2970

@gpww

Description

@gpww

Bug Description

When using FSMN-VAD (iic/speech_fsmn_vad_zh-cn-16k-common-pytorch) with the same audio file and same model checkpoint (model.pt + config.yaml), funasr 1.3.9 produces a single ~52s segment, while funasr 1.3.1 correctly produces 19 separated speech segments.

max_end_silence_time (used to control segmentation granularity) appears to be ignored or behave differently in 1.3.9.

Environment

  • Python: 3.10
  • Platform: Ubuntu 22.04 (Docker)
  • PyTorch: latest

Steps to Reproduce

from funasr import AutoModel

model = AutoModel(
    model="path/to/fsmn-vad",   # local directory containing model.pt + config.yaml + am.mvn
    device="cpu",
    disable_update=True,
    disable_pbar=True,
    speech_noise_thres=0.9,
)
result = model.generate(input="test.mp3", max_end_silence_time=300)
print(len(result[0]['value']), "segments")
for s, e in result[0]['value']:
    print(f"  {s/1000:.2f}s ~ {e/1000:.2f}s")

Run with the same 52s audio under both versions:

funasr 1.3.1 (correct)

19 segments
  3.15s ~  4.35s
  4.69s ~  5.93s
  ...
  45.05s ~ 45.97s

funasr 1.3.9 (regression)

1 segments
  0.07s ~ 52.57s

Expected Behavior

Same model + same audio + same parameters should produce the same segmentation result.

Additional Notes

  • The model.pt and config.yaml files used are byte-identical between the two runs.
  • Tried passing max_end_silence_time both at AutoModel(...) init time and at model.generate(...) time — neither works in 1.3.9.
  • Tried speech_noise_thres ranging 0.3 ~ 0.95 — no segmentation in 1.3.9 regardless.
  • Likely a regression in the VAD post-processing pipeline or in how model_conf overrides are propagated.

Suggested Fix

Please verify whether max_end_silence_time and related streaming-VAD parameters are still being applied to FsmnVADStreaming in 1.3.9. If not, restore the parameter routing from 1.3.x.

test_vad.py

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions