screentinker/docs/local-ai-setup.md
ScreenTinker 1a4397ad24 docs: local AI setup guide for the Content Designer (#41)
How to run the AI design feature fully local + free: Ollama (OpenAI-compatible
LLM) for text/layout and stable-diffusion.cpp (Vulkan) for images, plus the
SELF_HOSTED requirement for localhost endpoints, an OpenAI fallback, and GPU
troubleshooting (incl. the Blackwell CUDA-fails/Vulkan-works note). Linked from
the README integrations section.
2026-06-09 13:57:02 -05:00

6.5 KiB
Raw Blame History

Local AI for the Content Designer

The Content Designer → AI generate feature turns a text prompt into a finished sign: the layout and copy come from an LLM, and (optionally) the background / foreground imagery comes from an image model. ScreenTinker is bring-your-own: you point each workspace at an OpenAI-compatible text endpoint and an image endpoint of your choice. Nothing is sent to us, and the operator pays no AI costs.

This guide sets up a fully local, free stack:

Prefer the cloud? Skip to Using OpenAI instead.

Important

To use localhost / LAN AI endpoints, your instance must run with SELF_HOSTED=true. ScreenTinker blocks private/internal addresses for the AI endpoints (SSRF protection) unless it is in self-hosted mode. See Enable self-hosted mode.


1. Enable self-hosted mode

The AI endpoint config is gated by an SSRF guard. On a self-hosted box this guard is relaxed so you can point at localhost. Set the env var:

# systemd: drop-in (recommended)
sudo mkdir -p /etc/systemd/system/screentinker.service.d
printf '[Service]\nEnvironment=SELF_HOSTED=true\n' | sudo tee /etc/systemd/system/screentinker.service.d/selfhosted.conf
sudo systemctl daemon-reload && sudo systemctl restart screentinker

(Or SELF_HOSTED=true npm start for a manual run.)


2. Text / layout model — Ollama

# Install (use a recent build — 0.30+ is required for NVIDIA 50-series / Blackwell)
curl -fsSL https://ollama.com/install.sh | sh

# Pull a model. 8B is a good size/quality balance for signage copy.
ollama pull llama3.1:8b

# Confirm it's loaded on the GPU
ollama ps

Ollama exposes an OpenAI-compatible API at http://localhost:11434/v1. No API key is required (any value works).

In Designer → ⚙ AI settings:

Field Value
Endpoint base URL http://localhost:11434/v1
Model llama3.1:8b (or click Load models)
API key (leave blank)

That alone enables AI generation (text + shapes). Add images below.


3. Image model — stable-diffusion.cpp (Vulkan)

We use the prebuilt stable-diffusion.cpp server. Its --backend runs on Vulkan, which works on modern NVIDIA GPUs even where CUDA/PyTorch (ComfyUI) fails to initialize — see GPU notes.

# 1. Grab the prebuilt server from the releases page and pick the variant for
#    your GPU (…-vulkan.zip works broadly; cuda / rocm builds also exist):
#    https://github.com/leejet/stable-diffusion.cpp/releases
mkdir -p ~/sd-server && cd ~/sd-server
unzip ~/Downloads/sd-*-vulkan.zip          # -> sd-server, sd-cli, libstable-diffusion.so

# 2. A checkpoint. SDXL base is a solid default (~6.5 GB):
mkdir -p models
curl -L -o models/sd_xl_base_1.0.safetensors \
  https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

# 3. Find your GPU's Vulkan device index, then run the server.
#    The startup log prints "Found N Vulkan devices" — note the index of your
#    discrete GPU (an Intel/AMD iGPU is often device 0, the dGPU device 1).
LD_LIBRARY_PATH=~/sd-server ~/sd-server/sd-server \
  -m ~/sd-server/models/sd_xl_base_1.0.safetensors \
  --backend vulkan1 --listen-port 7860

The server is OpenAI-compatible at http://localhost:7860/v1 (POST /v1/images/generations). Smoke test:

curl -s http://localhost:7860/v1/images/generations \
  -H 'Content-Type: application/json' \
  -d '{"prompt":"a cozy cafe interior, no text","size":"1024x576","response_format":"b64_json"}' \
  | head -c 80

In Designer → ⚙ AI settings → AI images:

Field Value
Image provider Stable Diffusion — local (sd.cpp)
Image endpoint URL http://localhost:7860/v1
Image model (leave blank — uses the loaded checkpoint)
Image API key (leave blank)

Now a prompt produces a full sign: an atmospheric background, crisp text on top, and an optional foreground graphic.

# /etc/systemd/system/sd-server.service
[Unit]
Description=stable-diffusion.cpp image server
After=network.target

[Service]
User=youruser
Environment=LD_LIBRARY_PATH=/home/youruser/sd-server
ExecStart=/home/youruser/sd-server/sd-server -m /home/youruser/sd-server/models/sd_xl_base_1.0.safetensors --backend vulkan1 --listen-port 7860
Restart=on-failure
RestartSec=5

[Install]
WantedBy=multi-user.target
sudo systemctl daemon-reload && sudo systemctl enable --now sd-server

VRAM: the server keeps the checkpoint resident (~6.5 GB for SDXL). The app requests modest sizes (1024×576 background, 768×768 foreground) so it fits alongside the LLM on a single ~16 GB+ GPU. Larger sizes need a tiled VAE (--vae-tiling) or more VRAM. ComfyUI works too — set the provider to ComfyUI and point at http://localhost:8188.


Using OpenAI instead

No local hardware? Use the cloud (you pay OpenAI directly):

  • Text: endpoint https://api.openai.com/v1, model e.g. gpt-4o-mini, paste your key.
  • Images: provider OpenAI / OpenAI-compatible, endpoint https://api.openai.com/v1, model e.g. gpt-image-1.

If your text endpoint is local (no key) but images are OpenAI, put the OpenAI key in the separate Image API key field. When that field is blank, the image endpoint reuses the main API key.


GPU notes / troubleshooting

  • NVIDIA 50-series (Blackwell): CUDA compute can fail to initialize for PyTorch-based tools (ComfyUI) with CUDA unknown error, even though nvidia-smi works. Vulkan does work — which is why this guide uses Ollama (Vulkan) and stable-diffusion.cpp (Vulkan). Use a recent Ollama (0.30+).
  • Wrong/slow device: if generation is CPU-slow, the tool picked the wrong Vulkan device. Check the startup log's device list and set --backend vulkanN (sd.cpp) accordingly; Ollama honours GGML_VK_VISIBLE_DEVICES.
  • Endpoint URL not allowed when saving AI settings → the instance is not in self-hosted mode. See step 1.
  • Images time out → a cold or under-powered model. Try a smaller checkpoint (e.g. SD 1.5) or fewer steps; first request also pays the model-load cost.
  • Publishing a sign with images embeds the generated images in the widget, so configs can be a few MB each. That's expected today.