Projects Principles Contribute GitHub →
Open source · Java ecosystem

Machine Learning Cabinet

Practical tools for running language models on commodity hardware. No hype, no SaaS.

See projects GitHub →
Projects
What we're building
Active · Session 21

Juno

Distributed Java inference engine for GGUF models. Runs LLMs across a cluster of commodity GPUs using pipeline or tensor parallelism over gRPC. No Python. No NCCL. No InfiniBand required.

More info → GitHub

At a glance

  • LLaMA · Phi-3 · Mistral · TinyLlama · Meta-Llama 3 · Gemma
  • CPU (parallel matVec) and CUDA (cublasSgemv, weights resident on GPU) backends
  • Pipeline and tensor parallelism — select with --pType
  • FLOAT32 / FLOAT16 / INT8 activation wire formats
  • LoRA fine-tuning · Session KV cache · JFR profiling
  • AWS cluster deployment via juno-deploy.sh — GPU and CPU clusters
  • 475+ tests · 0 failures · JDK 25 · Maven 3.9
Principles
How we work

No Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.

Commodity hardware over premium gear. 16 × 4 GB GPUs beats one 64 GB card and costs a fraction.

Tests before features. A module without a test suite is a module that can't be trusted.

Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.

Contribute
How to get involved

Code

Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno

Benchmarks

GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.

Bug reports

Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.