Projects Principles Contribute GitHub →
Open source · Java ecosystem

Machine Learning Cabinet

Java-native tools for running and fine-tuning language models — on-prem, air-gapped, or in the cloud. No hype, no SaaS, no Python.

See projects GitHub →
Projects
What we're building
Active · Session 29

Juno

Java-native distributed LLM inference and fine-tuning. Runs open-source GGUF models locally, in a cluster, or embedded as a JVM library. OpenAI-compatible REST API included. No Python. No NCCL. No InfiniBand required.

More info → GitHub

At a glance

  • LLaMA · Phi-3 · Mistral · TinyLlama · Meta-Llama 3 · Gemma architectures
  • CPU and CUDA backends — FP16 resident weights, OOM fallback to CPU
  • Pipeline and tensor parallelism — select with --pType
  • FLOAT32 / FLOAT16 / INT8 activation wire formats
  • LoRA fine-tuning, inference overlay, and merge to standalone GGUF
  • OpenAI-compatible REST API — swap base URL, no glue code needed
  • Maven BOM on Central — cab.ml:juno-bom:0.1.0
  • AWS cluster automation via juno-deploy.sh
Principles
How we work

No Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.

On-prem and air-gapped first. No mandatory cloud dependency, no telemetry, no SaaS lock-in.

Tests before features. A module without a test suite is a module that cannot be trusted.

Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.

Contribute
How to get involved

Code

Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno

Benchmarks

GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.

Bug reports

Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.