Indra-Swarm API
Gemma-4-26B-A4B-it
Docker image's Current Sglang Config
python3 -m sglang.launch_server
--model-path google/gemma-4-26b-a4b-it
--tp 2
--port 3000
--host 0.0.0.0
--attention-backend triton
--mem-fraction-static 0.8
--max-running-requests 128
--chunked-prefill-size 4096
--context-length 32768
--trust-remote-code
--enable-piecewise-cuda-graph
--schedule-policy lpmTest Curl
curl http://192.168.40.40:3000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "google/gemma-4-26b-a4b-it",
"messages": [{"role": "user", "content": "System check. Are you online?"}]
}'faster-Qwen3-tts
Test Curl
curl -X POST http://192.168.40.40:8002/v1/audio/speech \
-H "Content-Type: application/json" \
-d '{
"model": "tts-1",
"input": "This is a text-to-speech system check. Audio synthesis is functional on Indra.",
"voice": "nona",
"response_format": "wav",
"seed": 42
}' \
--output tts_test.wavto change voices, set "voice ref" to any of the following:
- aus-female-1.wav
- aus-female-2.wav
- aus-female-3.wav
- aus-female-4.wav
- aus-female-5.wav
- aus-female-6.wav
- aus-male-1.wav
- aus-male-2.wav
- aus-male-3.wav
- aus-male-4.wav
- aus-male-5.wav
- aus-male-6.wav
- aus-male-7.wav
- charter.wav
- gaius.wav
- _gantry.wav
- nona.wav
- oni.wav
- vulcan.wav
faster-whisper-large-v3-turbo-ct2
Test Curl Uses a known locally saved audio file on the indra machine for testing
curl http://192.168.40.40:8005/v1/audio/transcriptions \ -H "Content-Type: multipart/form-data" \ -F "file=@/mnt/nvme3n1/swarm/voice-samples/aus-male-1.wav" \ -F "model=deepdml/faster-whisper-large-v3-turbo-ct2" \ -F "response_format=json"
