[Regression] FSMN-VAD outputs single huge segment instead of separated speech segments in 1.3.9 (works in 1.3.1)

## Bug Description

When using FSMN-VAD (`iic/speech_fsmn_vad_zh-cn-16k-common-pytorch`) with the **same audio file** and **same model checkpoint** (`model.pt` + `config.yaml`), funasr **1.3.9** produces a single ~52s segment, while funasr **1.3.1** correctly produces 19 separated speech segments.

`max_end_silence_time` (used to control segmentation granularity) appears to be ignored or behave differently in 1.3.9.

## Environment

- Python: 3.10
- Platform: Ubuntu 22.04 (Docker)
- PyTorch: latest

## Steps to Reproduce

```python
from funasr import AutoModel

model = AutoModel(
    model="path/to/fsmn-vad",   # local directory containing model.pt + config.yaml + am.mvn
    device="cpu",
    disable_update=True,
    disable_pbar=True,
    speech_noise_thres=0.9,
)
result = model.generate(input="test.mp3", max_end_silence_time=300)
print(len(result[0]['value']), "segments")
for s, e in result[0]['value']:
    print(f"  {s/1000:.2f}s ~ {e/1000:.2f}s")
```

Run with the **same 52s audio** under both versions:

### funasr 1.3.1 (correct)

```
19 segments
  3.15s ~  4.35s
  4.69s ~  5.93s
  ...
  45.05s ~ 45.97s
```

### funasr 1.3.9 (regression)

```
1 segments
  0.07s ~ 52.57s
```

## Expected Behavior

Same model + same audio + same parameters should produce the same segmentation result.

## Additional Notes

- The `model.pt` and `config.yaml` files used are byte-identical between the two runs.
- Tried passing `max_end_silence_time` both at `AutoModel(...)` init time and at `model.generate(...)` time — neither works in 1.3.9.
- Tried `speech_noise_thres` ranging 0.3 ~ 0.95 — no segmentation in 1.3.9 regardless.
- Likely a regression in the VAD post-processing pipeline or in how `model_conf` overrides are propagated.

## Suggested Fix

Please verify whether `max_end_silence_time` and related streaming-VAD parameters are still being applied to `FsmnVADStreaming` in 1.3.9. If not, restore the parameter routing from 1.3.x.

[test_vad.py](https://github.com/user-attachments/files/28789806/test_vad.py)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Regression] FSMN-VAD outputs single huge segment instead of separated speech segments in 1.3.9 (works in 1.3.1) #2970

Bug Description

Environment

Steps to Reproduce

funasr 1.3.1 (correct)

funasr 1.3.9 (regression)

Expected Behavior

Additional Notes

Suggested Fix

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Regression] FSMN-VAD outputs single huge segment instead of separated speech segments in 1.3.9 (works in 1.3.1) #2970

Description

Bug Description

Environment

Steps to Reproduce

funasr 1.3.1 (correct)

funasr 1.3.9 (regression)

Expected Behavior

Additional Notes

Suggested Fix

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions