Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ONNX Runtime AudioDecoder Error on Olive with Whisper Model #1362

Open
mridulrao opened this issue Sep 18, 2024 · 4 comments
Open

ONNX Runtime AudioDecoder Error on Olive with Whisper Model #1362

mridulrao opened this issue Sep 18, 2024 · 4 comments

Comments

@mridulrao
Copy link

mridulrao commented Sep 18, 2024

Describe the bug
I encountered an error while using Olive with the Whisper ONNX model for transcription. The error occurs during the AudioDecoder step in the ONNX Runtime.

To Reproduce
Set up an environment with the Whisper ONNX model using Olive(exactly same given in README.md)

python test_transcription.py --config whisper_cpu_int8.json --audio_path yt_audio.mp3

Expected behavior
Transcriptions

2024-09-18 08:49:27.862823112 [E:onnxruntime:, sequential_executor.cc:516 ExecuteKernel] Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format
Traceback (most recent call last):
File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 129, in
output_text = main()
File "/teamspace/studios/this_studio/Olive/examples/whisper/test_transcription.py", line 124, in main
output = olive_model.run_session(session, input_data)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/olive/model/handler/onnx.py", line 146, in run_session
return session.run(output_names, inputs, **kwargs)
File "/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running AudioDecoder node. Name:'AudioDecoder_1' Status Message: [AudioDecoder]: Cannot detect audio stream format

Other information

  • OS: Linux
  • Olive version: main
  • Optimization Pipeline :CPU, INT8
@jambayk
Copy link
Contributor

jambayk commented Sep 20, 2024

The audio input for whisper has some restrictions such as the sample rate being 16khz https://github.com/openai/whisper/blob/279133e3107392276dc509148da1f41bfb532c7e/whisper/audio.py#L13
It also cannot be longer than 30s.

Can you confirm your audio meets these requirements?

@jambayk
Copy link
Contributor

jambayk commented Sep 20, 2024

Can you also share the version of onnxruntime and onnxruntime-extensions you are using?

@mridulrao
Copy link
Author

Oh, I didnt see the limit on audio length. The audio lengths I am trying to process varies between 7-12 mins. The sample rate is 16khz.

Versions -
onnxruntime==1.19.2
onnxruntime_extensions==0.12.0

Is it recommended to change the hard coded lengths? Or should I clip the audio lengths in multiple batch of 30 secs?

@jambayk
Copy link
Contributor

jambayk commented Nov 1, 2024

hi, sorry for the delayed response. The 30s limit cannot be changed so you would need to clip the audio and run them individually.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants