Hugind

Run your own AI assistant.

Hugind runs open-weight LLMs on the hardware you already own. Chat with documents and images, automate work with agents, process anything an OpenAI-compatible tool can handle. No API keys. No per-token bills. Nothing leaves your office.

What it does

Two jobs. One binary.

Chat you can use like ChatGPT. Agents you can trust with real work. Both running on your own hardware, both free.

01

Talk to it.

Pull any open-weight model from Hugging Face and chat with it. Llama, Gemma, Qwen, Mistral, DeepSeek, vision-language models. Drop in an image and ask about it. See the model think through a problem before it answers. Plug it into anything that speaks OpenAI.

  • Multimodal: images plus text in the same conversation
  • Thinking mode: visible reasoning before the answer
  • OpenAI-compatible API: drop-in replacement for existing code
  • Streaming, embeddings, auth, long-context caching
02

Put it to work.

An agent reads your email, opens a browser, fills forms, runs a shell command, or writes a file. But only the things you explicitly grant. Safe automation of the boring parts of work, with a manifest your security team can actually review.

  • Network, filesystem, and shell permissions declared per agent
  • JavaScript (QuickJS) or WASM runtimes, each in its own OS process
  • MCP support, tool-use loops, and multi-agent teams for larger goals
  • Every tool call logged and reviewable
What people do with it

Three shapes of Hugind project.

Private chat

A ChatGPT you actually own.

Ship a private assistant to every desk in the office without a monthly per-seat bill. Your team chats, summarizes, translates, codes, drafts, and works with images, all on one GPU server your IT team already runs. Nothing ever leaves the building.

  • Multimodal chat (text plus images)
  • Thinking mode for tricky questions
  • Plug into any OpenAI-compatible client your team already uses
Back-office automation

The boring half of your queue, automated.

A customer emails asking for a password reset. An agent watches the support inbox, reads the request, opens a browser, signs into the admin panel, resets the password, and replies with the new credentials. If anything looks unusual, it stops and tags a human. Every action is logged.

  • Email, browser, shell, and file tools, each scoped by a manifest
  • Dashboard logins and form fills, on the same machine as the agent
  • Full audit trail of every tool call and response
Sovereign document work

Documents that cannot leave the country.

A government office processes hundreds of passports, court filings, or contracts a day. The documents legally cannot leave the jurisdiction. Hugind runs on a local GPU server: page-by-page OCR, structured extraction, redaction, and indexing. Same workflow as a cloud service, without the cross-border problem.

  • Multimodal models read scanned pages directly
  • Agent pipelines ingest, extract, redact, and index
  • Runs inside data-residency boundaries, air-gapped if needed
Install

From zero to a running model. Four commands.

macOS, Linux, Windows. Models download from Hugging Face straight into your home directory. The server speaks OpenAI-compatible HTTP on port 8080.

Quickstart/terminal
bash
# 1. Install the runtime.
$ brew install hugind

# 2. Pull a model from Hugging Face.
$ hugind model add google/gemma-3-4b-it-qat-q4_0-gguf

# 3. Create a hardware-aware config (auto-detects your GPU).
$ hugind config init gemma-4b

# 4. Start the OpenAI-compatible server.
$ hugind server start gemma-4b
  ready on http://localhost:8080

# Talk to it like any OpenAI client.
$ curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{"model":"gemma-4b",
        "messages":[{"role":"user","content":"who are you?"}]}'
Hardware

Runs on what you already own.

You do not need a data center. Hugind auto-detects your hardware, picks the best backend, and tells you which models will fit before you download them. If the RAM is not there, it says so. If a smaller quant is a better fit, it suggests it.

Apple Silicon
M1 through M4

7B-14B models on a MacBook. 70B on a Mac Studio with unified memory.

Consumer GPUs
RTX 3060 and up

The gaming rig under your desk runs a useful model tonight.

Workstation GPUs
A / L / H series

Bigger models, more concurrent users, same binary, same config.

Jetson and CPU
Edge and fallback

Industrial edge devices. A slow but working CPU path on everything else.

Honest comparison

How it stacks up.

Ollama is easier for first contact. LM Studio has a nicer GUI. Raw llama.cpp gives you every low-level knob. Hugind is for teams that also want agents on top of inference.

HugindOllamaLM Studiollama.cpp
OpenAI-compatible serverYesYesYesYes
Runs offline, no account, freeYesYesYesYes
One-command installYesYesYes (GUI installer)Build from source
Sandboxed agent runtimeYes (JS + WASM)NoNoNo
Per-agent permission manifestYesNoNoNo
Multi-agent orchestrationYesNoNoNo
MCP supportYesExperimentalNoNo
Easiest first-time setupGoodBestBest (GUI)Hardest
Low-level tuning knobsSomeSomeSomeBest

"Yes" means the feature exists in the default install and is documented. We update this table as competitors ship. Corrections welcome on GitHub.

Price

Free. MIT. Permanent.

Hugind is free to install, free to use, and free to redistribute. There is no paid tier, no trial, no upsell, no telemetry reporting home. The MIT license means you can also fork it, embed it, or ship it inside a product.

If you run Hugind in production and need help past the docs, imaged Lab does paid integration engineering and custom agents. That is the only commercial path, and it is kept entirely separate from the project.

Get started

Pull a model. Start a server. Talk to it in five minutes.