AI Space

Theme

Choose a color palette for your AI Space.

AI Avatar

Choose your AI assistant's personality and appearance.

Avatar preset

Avatar name Face shape Voice style

Mode

Local ✓

Everything runs on your device. Maximum privacy.

Hybrid ✓

Local first, cloud assist for complex tasks. You approve each call.

Cloud ✓

Use cloud AI. Faster, but data leaves your device.

Relay Hub Beta

What is a Relay?
A Relay is a prompt-based instruction artifact you send to your AI. It tells the AI how to format a command for a specific channel — iOS Shortcuts, your browser, or your device — so you can automate tasks across your apps without any servers.

How to use (3 steps) ▸

Choose a Relay (iOS Shortcuts, Browser, or Device).
Choose an Action (e.g. Summarize, Draft Reply, Morning Briefing).
Paste your content, tap Build Relay Artifact. The prompt is loaded in chat — review it, then Send.

Tip: You can also type in chat, e.g. "build a shortcuts relay to summarize this text: ..."

1. Relay channel 2. Action 3. AI output style Content / context (optional)

Build Artifact → loads the prompt in chat for review. Build + Send → builds and sends immediately.

Workflow Studio Imported Skill

What is Workflow Studio?
Inspired by the bundled-skill architecture from the imported source, it turns a complex routine into a reusable local skill with steps, approval checkpoints, and a relay-safe manifest.

Skill name (optional) What should this skill do? Best relay

Draft Skill → saves an encrypted local manifest and loads the full Workflow Studio prompt into chat.

Background Runtime Beta

What is the Runtime?
A sandboxed mini-terminal that runs scripts inside a Web Worker — off the UI thread, locally on your device. Use it to ping URLs, fetch data, navigate tabs, or chain commands without cloud.

DSL command reference ▸

LOG <text>	Print a message to output
WAIT <ms>	Pause execution (max 60 000 ms)
RUN fetch <url> -> v	HTTP GET, store result in v
RUN json <url> -> v	Fetch & parse JSON into v
RUN echo <text>	Echo text (useful for testing)
RUN now	Print current ISO timestamp
NAVIGATE <url>	Request browser navigation
RETURN <text>	Return a string result & exit
RETURNJSON <json>	Return a JSON object & exit

Use {{var.path}} to interpolate stored values.
Example: LOG Status: {{api.status}}

Power mode Preset script Script editor

Local Internet Assist

Let local model consult internet context when online

Adds lightweight live context (Wikipedia search) to local responses. Stays optional and local-first.

Cloud API

Provider API Key

⚡ TurboKV — Context Optimizer

Optimization Strategy

◈

Standard

Direct token trimming. No transformation.

⟨⟩

Sliding Window

Attention-sink + recency window.

◉

Semantic

Importance-scored context selection.

⚡

Turbo

Max efficiency — bullet synopsis.

Direct token trimming. Best for short conversations. Zero overhead.

Context Window (WebGPU KV Cache)

Takes effect on next model load · larger = more GPU memory

Live Metrics

—

Tokens In

—

Tokens Out

0

Compressions

—

Tok/s

0%

Context Fill

Custom KV Script

Write a JS function body. Receives (messages, budget), must return an array.

Compression Log (0 events)

No compressions yet. Start a long conversation.

Chat History

No conversations yet

Local Model

⚡ Tiny — Ultra Fast

SmolLM2 360M200 MB

Fastest option for lightweight tasks.

fastlightweight

TinyLlama 1.1B640 MB

Ultra-fast chat at 1.1B parameters.

fastchat

◈ Small — Balanced

Qwen 2.5 0.5B350 MB

Ultra-fast balance for everyday use.

fastmultilingual

Llama 3.2 1B700 MBRecommended

Best quality local reasoning. Recommended default.

reasoning8K ctx

Qwen 2.5 1.5B900 MB

Better quality than 0.5B, lighter than Llama 1B.

multilingualbalanced

Gemma 2 2B1.4 GB

Google Gemma 2 at 2B. Strong instruction following.

googleinstruction

◉ Medium — High Quality

Llama 3.2 3B2.0 GB

Significantly smarter than 1B. Best quality/size tradeoff.

reasoningquality

Phi 3.5 Mini 3.8B2.2 GB

Microsoft Phi-3.5. Excellent reasoning. 16K context.

microsoft16K ctx

🔥 Large — Needs 6 GB+ GPU RAM

Mistral 7B v0.34.1 GB

Classic Mistral 7B. Excellent code & instruction following.

code32K ctx

Llama 3.1 8B5.0 GB

Meta Llama 3.1 8B. Top-tier local quality.

qualityreasoning

DeepSeek-R1 7B4.4 GB

R1 reasoning distillation. Chain-of-thought, math, logic.

reasoningmathchain-of-thought

Runs entirely on your device via WebGPU

Trust Dashboard

Data Location

On device only

Active Model

Loading...

Cloud API Calls

0

Conversations Stored

0

Voice

Read responses aloud

Voice

PersonaPlex voice (local server / VPS)

Browser Voice AI (offline, no server)

Data

Export All Data ›

Clear All Data ›

AI Space

Built different.

Private by default

Runs locally

Device connected

Three steps. That's it.

Open this link

Choose your model

It's yours

Your rules.

Local

Hybrid

Cloud

See everything. Control everything.

How do you prefer?

What should I call you?

Just checking...

How should I talk to you?

Pick a voice

Meet your AI

Choose your AI model

Conversations