Projects Principles Contribute GitHub →
Open source · Java ecosystem

Machine Learning Cabinet

Practical tools for running language models on commodity hardware. No hype, no SaaS.

See projects GitHub →
Projects
What we're building
Active · Session 12

Juno

Distributed Java inference engine for GGUF models. Runs LLMs across multiple JVMs using pipeline parallelism over gRPC. No Python. No NCCL. No InfiniBand required.

More info → GitHub

At a glance

  • LLaMA · Phi-3 · Mistral · TinyLlama · Gemma
  • CPU (parallel matVec) and CUDA (cublasSgemv) backends
  • FLOAT32 / FLOAT16 / INT8 activation wire formats
  • Session KV cache — turn latency stays flat
  • 375+ tests · 0 failures · JDK 25 · Maven 3.9
Principles
How we work

No Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.

Commodity hardware over premium gear. 16 × 4 GB GPUs beats one 64 GB card and costs a fraction.

Tests before features. A module without a test suite is a module that can't be trusted.

Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.

Contribute
How to get involved

Code

Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno

Benchmarks

GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.

Bug reports

Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.