Image generation reused the single (text-endpoint) API key, which breaks the
common 'local LLM with no key + OpenAI for images' setup. Add an optional
image_api_key (encrypted, write-only, never returned); generate-design uses it
for image calls and falls back to the main key when blank (all-OpenAI setups).
Local sd.cpp / ComfyUI still need no key. Schema column + migration.
A prompt now produces a full sign: the LLM writes the design AND image prompts,
the server generates the images and composites them with the crisp text layer.
- lib/image-gen.js: text-to-image with 3 BYO/self-hostable backends, all behind
the SSRF guard: 'sdcpp' (local stable-diffusion.cpp OpenAI-compatible server,
exact small sizes that fit VRAM), 'openai' (cloud / OpenAI-compatible, snapped
sizes), 'comfyui' (prompt/history/view API).
- ai.js: prompt asks for a background_prompt (preferred — full-bleed atmosphere)
and an optional foreground image element; after the design is normalized, the
bg + fg images are generated best-effort (a failed image never fails the sign)
and returned as data URLs. New image_* settings (provider/base_url/model),
image_provider whitelist, schema column + migration.
- designer.js: AI-images section in settings; generate applies the background
image; publish bakes the background image into the HTML so it survives.
- server.js: raise JSON body limit to 12mb for embedded image data URLs.
Verified end-to-end on local Vulkan SDXL (RTX 5090): prompt -> bg+fg images on
the canvas -> publish creates a widget with the images embedded. 63/63.
Note: prod (not self-hosted) requires a PUBLIC image endpoint (e.g. OpenAI); the
SSRF guard blocks localhost there. Follow-up: upload generated images to the
content store and reference by URL to avoid multi-MB widget configs.
Models sometimes stacked text lines at the same y (unreadable) and emitted accent
shapes after text, so a band could hide the words.
- deoverlapTexts: push a line down only when it also overlaps horizontally
(leaves side-by-side text alone), with conservative line-height clearance so
real rendering doesn't re-overlap; shift the stack up if it ran past the bottom.
- Order shapes before text in the output so accent bands always render behind the
words.
Verified: 0 text-on-text overlaps across multiple prompts (Playwright DOM check);
unit test asserts overlapping lines get separated + shapes precede text. 63/63.
Text could run off the edge (long/large headlines, nowrap) and shapes placed at
the far edge (e.g. a bottom band at y=100) spilled over.
- Server-side fit pass on every generated element: shrink text fontSize so it
fits the canvas width (chars*fontSize*0.075, tuned for bold/uppercase
headlines) and height (incl. line-height), then nudge x/y within 4% margins;
clamp shapes so x+width<=100 and y+height<=100. Deterministic - doesn't rely on
the model getting layout right.
- Designer preview: vw -> cqw (+ container-type on the canvas) so text scales to
the canvas, not the browser window. The preview was overstating size vs what
actually publishes; now it matches. Published widget keeps vw (scales on the
player).
Verified: Playwright DOM check shows zero elements overflowing the canvas after
generation; unit test asserts long text is shrunk + repositioned in-bounds. 62/62.
- POST /api/ai/models lists the configured endpoint's models (OpenAI-compatible
/models) so the settings modal can populate a 'Load models' dropdown instead of
requiring users to type the model name. Combobox (datalist) so they can still
type a custom one. Admin only; same SSRF guard; uses the posted or saved key.
- Bump generate-design timeout 120s -> 180s for slow local endpoints.
Competitor pressure (Mandoe 'AI Magic Create'): prompt -> signage. We answer it
in a way that's actually BETTER for signage and costs the operator nothing.
Key idea: don't generate raw images (AI garbles text - fatal for menus/promos).
The LLM returns a STRUCTURED design spec (headline, supporting text, accent
shapes, palette) that the existing Designer renders with real fonts - crisp and
fully editable. Reuses the whole Designer.
BYOK, fully under the customer's control: each workspace configures its own
OpenAI-COMPATIBLE endpoint + key - OpenAI cloud OR self-hosted (Ollama / LM Studio
/ llama.cpp). Operator bears zero AI cost/liability.
- server/lib/secretbox.js: AES-256-GCM for the key at rest (never returned).
- routes/ai.js: GET/PUT /api/ai/settings (admin; key write-only) + POST
/generate-design (editor+). Output is strictly validated/normalized (cap count,
clamp ranges, px->%, strip HTML, validate colors) - never trust the model.
SSRF guard: hosted instances block private/internal targets; self-hosted (the
whole point of local AI) may point at localhost/LAN.
- Designer: an 'AI generate' panel (prompt + Generate) + a settings modal.
Verified end-to-end against local Ollama (llama3.1:8b): prompt -> editable design
on the canvas. Unit tests cover normalization + the SSRF guard. Suite 61/61.
Phase 2 (next): AI background images (OpenAI images / AUTOMATIC1111).