API Handles
Gemma-4-26B-A4B-it
Docker image's Current Sglang Config
python3 -m sglang.launch_server
--model-path google/gemma-4-26b-a4b-it
--tp 2
--port 3000
--host 0.0.0.0
--attention-backend triton
--mem-fraction-static 0.8
--max-running-requests 128
--chunked-prefill-size 4096
--context-length 32768
--trust-remote-code
--enable-piecewise-cuda-graph
--schedule-policy lpmTest Curl
curl http://localhost:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemma-4-26b-a4b-it",
"messages": [{"role": "user", "content": "System check. Are you online?"}]
}'faster-Qwen3-tts
Test Curl
curl -X \
POST http://192.168.40.40:8002/v1/audio/speech -H \
"Content-Type: application/json" -d '{
"text": "This is a text-to-speech system check. Audio synthesis is functional on Indra.",
"voice_ref": "nona.wav",
"seed": 42
}' --output \
tts_test.wav