Open source · Java ecosystem

Machine Learning Cabinet

Java-native tools for running and fine-tuning language models — on-prem, air-gapped, or in the cloud. No hype, no SaaS, no Python.

See projects GitHub →

Projects

What we're building

Active · Session 29

Juno

Java-native distributed LLM inference and fine-tuning. Runs open-source GGUF models locally, in a cluster, or embedded as a JVM library. OpenAI-compatible REST API included. No Python. No NCCL. No InfiniBand required.

More info → GitHub

At a glance

LLaMA · Phi-3 · Mistral · TinyLlama · Meta-Llama 3 · Gemma architectures
CPU and CUDA backends — FP16 resident weights, OOM fallback to CPU
Pipeline and tensor parallelism — select with --pType
FLOAT32 / FLOAT16 / INT8 activation wire formats
LoRA fine-tuning, inference overlay, and merge to standalone GGUF
OpenAI-compatible REST API — swap base URL, no glue code needed
Maven BOM on Central — cab.ml:juno-bom:0.1.0
AWS cluster automation via juno-deploy.sh

Principles

How we work

No Python. No Spring Boot. No framework bloat. JVM reads GGUF directly and runs inference end to end.

On-prem and air-gapped first. No mandatory cloud dependency, no telemetry, no SaaS lock-in.

Tests before features. A module without a test suite is a module that cannot be trusted.

Honest documentation. Known gaps, open issues, real benchmarks — not marketing copy.

Contribute

How to get involved

Code

Pick an open issue, send a PR. All modules have their own test suite. github.com/ml-cab/juno

Benchmarks

GPU numbers on real hardware are the most useful contribution right now. CUDA 12.x access and the integration suite is all you need.

Bug reports

Tried Juno on your setup? Found a rough edge? Open an issue. Specific, reproducible reports move things forward fastest.