Practical tools for running language models on commodity hardware. No hype, no SaaS.
Distributed Java inference engine for GGUF models. Runs LLMs across a cluster of commodity GPUs using pipeline or tensor parallelism over gRPC. No Python. No NCCL. No InfiniBand required.
--pTypejuno-deploy.sh — GPU and CPU clustersNo Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.
Commodity hardware over premium gear. 16 × 4 GB GPUs beats one 64 GB card and costs a fraction.
Tests before features. A module without a test suite is a module that can't be trusted.
Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.
Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno
GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.
Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.