Practical tools for running language models on commodity hardware. No hype, no SaaS.
Distributed Java inference engine for GGUF models. Runs LLMs across multiple JVMs using pipeline parallelism over gRPC. No Python. No NCCL. No InfiniBand required.
No Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.
Commodity hardware over premium gear. 16 × 4 GB GPUs beats one 64 GB card and costs a fraction.
Tests before features. A module without a test suite is a module that can't be trusted.
Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.
Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno
GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.
Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.