API Handles

Gemma-4-26B-A4B-it

This is primarily used for VIP (for both RP and non-RP calls). Model Specs

Docker image's Current Sglang Config python3 -m sglang.launch_server --model-path google/gemma-4-26b-a4b-it --tp 2 --port 3000 --host 0.0.0.0 --attention-backend triton --mem-fraction-static 0.8 --max-running-requests 128 --chunked-prefill-size 4096 --context-length 32768 --trust-remote-code --enable-piecewise-cuda-graph --schedule-policy lpm Test Curl

curl -X \
 POST http://192.168.40.40:8002/v1/audio/speech   -H \
 "Content-Type: application/json"   -d '{
    "text": "This is a text-to-speech system check. Audio synthesis is functional on Indra.",
    "voice_ref": "nona.wav",
    "seed": 42
  }'   --output \
 tts_test.wav
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9