Indra-Swarm API

Gemma-4-26B-A4B-it

Docker image's Current Sglang Config

python3 -m sglang.launch_server
      --model-path google/gemma-4-26b-a4b-it
      --tp 2
      --port 3000
      --host 0.0.0.0
      --attention-backend triton
      --mem-fraction-static 0.8
      --max-running-requests 128
      --chunked-prefill-size 4096
      --context-length 32768
      --trust-remote-code
      --enable-piecewise-cuda-graph
      --schedule-policy lpm

Test Curl

curl http://192.168.40.40:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-26b-a4b-it",
    "messages": [{"role": "user", "content": "System check. Are you online?"}]
  }'

faster-Qwen3-tts

Model info

Test Curl

curl -X POST http://192.168.40.40:8002/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "This is a text-to-speech system check. Audio synthesis is functional on Indra.",
    "voice": "nona",
    "response_format": "wav",
    "seed": 42
  }' \
  --output tts_test.wav

to change voices, set "voice ref" to any of the following:

aus-female-1.wav
aus-female-2.wav
aus-female-3.wav
aus-female-4.wav
aus-female-5.wav
aus-female-6.wav
aus-male-1.wav
aus-male-2.wav
aus-male-3.wav
aus-male-4.wav
aus-male-5.wav
aus-male-6.wav
aus-male-7.wav
charter.wav
gaius.wav
_gantry.wav
nona.wav
oni.wav
vulcan.wav

faster-whisper-large-v3-turbo-ct2

Model info

Test Curl Uses a known locally saved audio file on the indra machine for testing

curl http://192.168.40.40:8005/v1/audio/transcriptions \
  -H "Content-Type: multipart/form-data" \
  -F "file=@/mnt/nvme3n1/swarm/voice-samples/aus-male-1.wav" \
  -F "model=deepdml/faster-whisper-large-v3-turbo-ct2" \
  -F "response_format=json"