Indra-Swarm API

Gemma-4-26B-A4B-it

Model info

Docker image's Current Sglang Config

python3 -m sglang.launch_server
      --model-path google/gemma-4-26b-a4b-it
      --tp 2
      --port 3000
      --host 0.0.0.0
      --attention-backend triton
      --mem-fraction-static 0.8
      --max-running-requests 128
      --chunked-prefill-size 4096
      --context-length 32768
      --trust-remote-code
      --enable-piecewise-cuda-graph
      --schedule-policy lpm

Test Curl

curl http://localhost:3000/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "google/gemma-4-26b-a4b-it",
    "messages": [{"role": "user", "content": "System check. Are you online?"}]
  }'

faster-Qwen3-tts

Model info

Test Curl

curl -X \
 POST http://192.168.40.40:8002/v1/audio/speech   -H \
 "Content-Type: application/json"   -d '{
    "text": "This is a text-to-speech system check. Audio synthesis is functional on Indra.",
    "voice_ref": "nona.wav",
    "seed": 42
  }'   --output \
 tts_test.wav
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9