v2.22.0
LocalAI v2.22.0 is out 🥳
💡 Highlights
- Image-to-Text and Video-to-Text Support: The VLLM backend now supports both image-to-text and video-to-text processing.
- Enhanced Multimodal Support: Template placeholders are now available, offering more flexibility in multimodal applications
- Model Management Made Easy: List all your loaded models directly via the /system endpoint for seamless management.
- Various bugfixes and improvements: Fixed issues with dangling processes to ensure proper resource management and resolved channel closure issues in the base GRPC server.
🖼️ Multimodal vLLM
To use multimodal models with vLLM simply specify the model in the YAML file. Models however can differ if support multiple images or single images, along how they process internally placeholders for images.
Some models/libraries have different way to express images, videos or audio placeholders. For example, llama.cpp backend expects images within an [img-ID]
tag, but other backends/models (e.g. vLLM) use a different notation ( <|image_|>
).
For example, to override defaults, now it is possible to set in the model configuration the following:
template:
video: "<|video_{{.ID}}|> {{.Text}}"
image: "<|image_{{.ID}}|> {{.Text}}"
audio: "<|audio_{{.ID}}|> {{.Text}}"
📹 Video and Audio understanding
Some libraries might support both Video and Audio. Currently only vLLM supports Video understanding, and can be used in the API by "extending" the OpenAI API with audio
and video
type along images:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": [
{
"type": "text",
"text": "What'\''s in this video?"
},
{
"type": "video_url",
"video_url": {
"url": "https://video-image-url"
}
}
]
}
],
"max_tokens": 300
}'
🧑🏭 Work in progress
- Realtime API is work in progress , tracked in #3714. thumbs up if you want to see it supported in LocalAI!
What's Changed
Bug fixes 🐛
- chore: simplify model loading by @mudler in #3715
- fix(initializer): correctly reap dangling processes by @mudler in #3717
- fix(base-grpc): close channel in base grpc server by @mudler in #3734
- fix(vllm): bump cmake - vllm requires it by @mudler in #3744
- fix(llama-cpp): consistently select fallback by @mudler in #3789
- fix(welcome): do not list model twice if we have a config by @mudler in #3790
- fix: listmodelservice / welcome endpoint use LOOSE_ONLY by @dave-gray101 in #3791
Exciting New Features 🎉
- feat(api): list loaded models in
/system
by @mudler in #3661 - feat: Add Get Token Metrics to GRPC server by @siddimore in #3687
- refactor: ListModels Filtering Upgrade by @dave-gray101 in #2773
- feat: track internally started models by ID by @mudler in #3693
- feat: tokenization endpoint by @shraddhazpy in #3710
- feat(multimodal): allow to template placeholders by @mudler in #3728
- feat(vllm): add support for image-to-text and video-to-text by @mudler in #3729
- feat(shutdown): allow force shutdown of backends by @mudler in #3733
- feat(transformers): Use downloaded model for Transformers backend if it already exists. by @joshbtn in #3777
- fix: roll out bluemonday Sanitize more widely by @dave-gray101 in #3794
🧠 Models
- models(gallery): add llama-3.2 3B and 1B by @mudler in #3671
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #3675
- models(gallery): add magnusintellectus-12b-v1-i1 by @mudler in #3678
- models(gallery): add bigqwen2.5-52b-instruct by @mudler in #3679
- feat(api): add correlationID to Track Chat requests by @siddimore in #3668
- models(gallery): add replete-llm-v2.5-qwen-14b by @mudler in #3688
- models(gallery): add replete-llm-v2.5-qwen-7b by @mudler in #3689
- models(gallery): add calme-2.2-qwen2.5-72b-i1 by @mudler in #3691
- models(gallery): add salamandra-7b-instruct by @mudler in #3726
- models(gallery): add mn-backyardai-party-12b-v1-iq-arm-imatrix by @mudler in #3740
- models(gallery): add t.e-8.1-iq-imatrix-request by @mudler in #3741
- models(gallery): add violet_twilight-v0.2-iq-imatrix by @mudler in #3742
- models(gallery): add gemma-2-9b-it-abliterated by @mudler in #3743
- models(gallery): add moe-girl-1ba-7bt-i1 by @mudler in #3766
- models(gallery): add archfunctions models by @mudler in #3767
- models(gallery): add versatillama-llama-3.2-3b-instruct-abliterated by @mudler in #3771
- models(gallery): add llama3.2-3b-enigma by @mudler in #3772
- models(gallery): add llama3.2-3b-esper2 by @mudler in #3773
- models(gallery): add llama-3.1-swallow-70b-v0.1-i1 by @mudler in #3774
- models(gallery): add rombos-llm-v2.5.1-qwen-3b by @mudler in #3778
- models(gallery): add qwen2.5-7b-ins-v3 by @mudler in #3779
- models(gallery): add dans-personalityengine-v1.0.0-8b by @mudler in #3780
- models(gallery): add llama-3.2-3b-agent007 by @mudler in #3781
- models(gallery): add nihappy-l3.1-8b-v0.09 by @mudler in #3782
- models(gallery): add llama-3.2-3b-agent007-coder by @mudler in #3783
- models(gallery): add fireball-meta-llama-3.2-8b-instruct-agent-003-128k-code-dpo by @mudler in #3784
- models(gallery): add gemma-2-ataraxy-v3i-9b by @mudler in #3785
📖 Documentation and examples
👒 Dependencies
- chore: ⬆️ Update ggerganov/llama.cpp to
ea9c32be71b91b42ecc538bd902e93cbb5fb36cb
by @localai-bot in #3667 - chore: ⬆️ Update ggerganov/whisper.cpp to
69339af2d104802f3f201fd419163defba52890e
by @localai-bot in #3666 - chore: ⬆️ Update ggerganov/llama.cpp to
95bc82fbc0df6d48cf66c857a4dda3d044f45ca2
by @localai-bot in #3674 - chore: ⬆️ Update ggerganov/llama.cpp to
b5de3b74a595cbfefab7eeb5a567425c6a9690cf
by @localai-bot in #3681 - chore: ⬆️ Update ggerganov/whisper.cpp to
8feb375fbdf0277ad36958c218c6bf48fa0ba75a
by @localai-bot in #3680 - chore: ⬆️ Update ggerganov/llama.cpp to
c919d5db39c8a7fcb64737f008e4b105ee0acd20
by @localai-bot in #3686 - chore(deps): bump grpcio to 1.66.2 by @mudler in #3690
- chore(deps): Bump openai from 1.47.1 to 1.50.2 in /examples/langchain-chroma by @dependabot in #3697
- chore(deps): Bump chromadb from 0.5.7 to 0.5.11 in /examples/langchain-chroma by @dependabot in #3696
- chore(deps): Bump langchain from 0.3.0 to 0.3.1 in /examples/langchain-chroma by @dependabot in #3694
- chore: ⬆️ Update ggerganov/llama.cpp to
6f1d9d71f4c568778a7637ff6582e6f6ba5fb9d3
by @localai-bot in #3708 - chore(deps): Bump securego/gosec from 2.21.0 to 2.21.4 by @dependabot in #3698
- chore(deps): Bump openai from 1.47.1 to 1.50.2 in /examples/functions by @dependabot in #3699
- chore(deps): Bump langchain from 0.3.0 to 0.3.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3704
- chore(deps): Bump greenlet from 3.1.0 to 3.1.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3703
- chore(deps): Bump langchain from 0.3.0 to 0.3.1 in /examples/functions by @dependabot in #3700
- chore(deps): Bump langchain-community from 0.2.16 to 0.3.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3702
- chore(deps): Bump gradio from 4.38.1 to 4.44.1 in /backend/python/openvoice by @dependabot in #3701
- chore(deps): Bump llama-index from 0.11.12 to 0.11.14 in /examples/langchain-chroma by @dependabot in #3695
- chore(deps): Bump aiohttp from 3.10.3 to 3.10.8 in /examples/langchain/langchainpy-localai-example by @dependabot in #3705
- chore(deps): Bump yarl from 1.11.1 to 1.13.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3706
- chore(deps): Bump llama-index from 0.11.12 to 0.11.14 in /examples/chainlit by @dependabot in #3707
- chore: ⬆️ Update ggerganov/whisper.cpp to
2ef717b293fe93872cc3a03ca77942936a281959
by @localai-bot in #3712 - chore: ⬆️ Update ggerganov/llama.cpp to
3f1ae2e32cde00c39b96be6d01c2997c29bae555
by @localai-bot in #3713 - chore: ⬆️ Update ggerganov/llama.cpp to
a39ab216aa624308fda7fa84439c6b61dc98b87a
by @localai-bot in #3718 - chore: ⬆️ Update ggerganov/whisper.cpp to
ede1718f6d45aa3f7ad4a1e169dfbc9d51570c4e
by @localai-bot in #3719 - chore: ⬆️ Update ggerganov/llama.cpp to
d5ed2b929d85bbd7dbeecb690880f07d9d7a6077
by @localai-bot in #3725 - chore: ⬆️ Update ggerganov/whisper.cpp to
ccc2547210e09e3a1785817383ab770389bb442b
by @localai-bot in #3724 - chore: ⬆️ Update ggerganov/llama.cpp to
71967c2a6d30da9f61580d3e2d4cb00e0223b6fa
by @localai-bot in #3731 - chore: ⬆️ Update ggerganov/whisper.cpp to
2944cb72d95282378037cb0eb45c9e2b2529ff2c
by @localai-bot in #3730 - chore: ⬆️ Update ggerganov/whisper.cpp to
6a94163b913d8e974e60d9ac56c8930d19f45773
by @localai-bot in #3735 - chore: ⬆️ Update ggerganov/llama.cpp to
8c475b97b8ba7d678d4c9904b1161bd8811a9b44
by @localai-bot in #3736 - chore: ⬆️ Update ggerganov/llama.cpp to
d5cb86844f26f600c48bf3643738ea68138f961d
by @localai-bot in #3738 - chore: ⬆️ Update ggerganov/whisper.cpp to
9f346d00840bcd7af62794871109841af40cecfb
by @localai-bot in #3739 - chore(deps): Bump langchain from 0.3.1 to 0.3.2 in /examples/functions by @dependabot in #3755
- chore(deps): Bump openai from 1.50.2 to 1.51.1 in /examples/functions by @dependabot in #3754
- chore(deps): Bump openai from 1.45.1 to 1.51.1 in /examples/langchain/langchainpy-localai-example by @dependabot in #3748
- chore(deps): Bump multidict from 6.0.5 to 6.1.0 in /examples/langchain/langchainpy-localai-example by @dependabot in #3749
- chore(deps): Bump aiohttp from 3.10.8 to 3.10.9 in /examples/langchain/langchainpy-localai-example by @dependabot in #3750
- chore(deps): Bump llama-index from 0.11.14 to 0.11.16 in /examples/chainlit by @dependabot in #3753
- chore(deps): Bump streamlit from 1.38.0 to 1.39.0 in /examples/streamlit-bot by @dependabot in #3757
- chore(deps): Bump debugpy from 1.8.2 to 1.8.6 in /examples/langchain/langchainpy-localai-example by @dependabot in #3751
- chore(deps): Bump langchain from 0.3.1 to 0.3.2 in /examples/langchain/langchainpy-localai-example by @dependabot in #3752
- chore(deps): Bump openai from 1.50.2 to 1.51.1 in /examples/langchain-chroma by @dependabot in #3758
- chore(deps): Bump llama-index from 0.11.14 to 0.11.16 in /examples/langchain-chroma by @dependabot in #3760
- chore(deps): Bump nginx from 1.27.0 to 1.27.2 in /examples/k8sgpt by @dependabot in #3761
- chore(deps): Bump appleboy/ssh-action from 1.0.3 to 1.1.0 by @dependabot in #3762
- chore: ⬆️ Update ggerganov/llama.cpp to
6374743747b14db4eb73ce82ae449a2978bc3b47
by @localai-bot in #3763 - chore: ⬆️ Update ggerganov/whisper.cpp to
ebca09a3d1033417b0c630bbbe607b0f185b1488
by @localai-bot in #3764 - chore: ⬆️ Update ggerganov/llama.cpp to
dca1d4b58a7f1acf1bd253be84e50d6367f492fd
by @localai-bot in #3769 - chore: ⬆️ Update ggerganov/whisper.cpp to
fdbfb460ed546452a5d53611bba66d10d842e719
by @localai-bot in #3768 - chore: ⬆️ Update ggerganov/llama.cpp to
c81f3bbb051f8b736e117dfc78c99d7c4e0450f6
by @localai-bot in #3775 - chore: ⬆️ Update ggerganov/llama.cpp to
0e9f760eb12546704ef8fa72577bc1a3ffe1bc04
by @localai-bot in #3786 - chore(deps): bump llama-cpp to 96776405a17034dcfd53d3ddf5d142d34bdbb657 by @mudler in #3793
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #3665
- feat(swagger): update swagger by @localai-bot in #3664
- chore(refactor): track grpcProcess in the model structure by @mudler in #3663
- chore: get model also from query by @mudler in #3716
- chore(federated): display a message when nodes are not available by @mudler in #3721
- chore(vllm): do not install from source by @mudler in #3745
- chore(Dockerfile): default to cmake from package manager by @mudler in #3746
- chore(tests): improve rwkv tests and consume TEST_FLAKES by @mudler in #3765
New Contributors
- @siddimore made their first contribution in #3668
- @shraddhazpy made their first contribution in #3710
- @jjasghar made their first contribution in #3723
- @joshbtn made their first contribution in #3777
Full Changelog: v2.21.1...v2.22.0