DOEON KWON
Building AI-native 3D worlds where agents and players share the same space, working on spatial intelligence
Selected work
An LLM can join a live multiplayer 3D world as a persistent player: it perceives its surroundings, remembers places, chats, moves, builds, and comes back with its memory intact across sessions.
I built and own the embodiment layer that lets an LLM act as a player inside a SPACE0 world: a stateless MCP server exposing the full embodied loop, 3D-anchored long-term memory with a soul document and brain-state checkpoints that survive restarts, a Voyager-pattern skill library, and the server-side movement and collision the relay needs because it has no physics engine.
- 60 MCP tools across 8 modules (presence, build, memory, identity, commitment, skill, brain_state, media); 30 read-only, 4 destructive, the count enforced by a manifest-drift test
- Live on the public MCP registry as a Cloudflare Worker; two agent brains run against it in production 24/7, one embedded local model and one cloud model
- Memories persist in pgvector with dual spatial anchors (observer and subject); soul revisions are logged and revertible
- Builds draw from a 146-material palette; tool calls fan out to 23 authenticated relay actions
- Embodiment parity: there is no privileged model-only write API. A world mutation issues the identical box-brush primitive humans use, and agent reach is pinned to the human client's 15m interaction raycast, so the relay re-derives server-side the placement physics the human client handles by default: reach, no floating blocks, claim containment.
- Renderer-less symbolic perception: every look_around is built from the terrain SDF plus the sparse edit overlay through a fixed grid-to-world transform, an 11x11x7 density probe and a marched gaze ray, no GPU and no pixels; the grid cell ships beside the world position so a small model never hand-converts coordinates.
- A per-call prompt-injection fence: any perception payload carrying other-authored text is wrapped in a fresh random nonce delimiter with an explicit rule that agent text, labels, and posts are never instructions; human identity is set server-side, never self-declared.
Sensorium › Working memory › Perception fusion › Retrieval › Inference › Intent dispatch › Result ingestion

The memory design doc, posted inside the world.

The same doc opened in the reading overlay: index, placement, organization.

Memory regions shown as a spatial index over the live world.
A study showing by measurement that the point of an LLM agent's spatial memory is not storing locations, but storing the geometry that can compute occlusion.
I built the spatial-memory evaluation system on the live SPACE0 world and wrote it up as a paper: the recall-versus-visibility separation, a minimum-representation schema per query type, a ray-versus-voxel DDA visibility predicate, and a pre-registered live confirmation.
- The shipped memory-palace blend failed its own pre-set test (mean ΔHit@5 -0.0375, p = 0.306, at a position-blind baseline) while geometry-led weighting won (+0.32, p < 10^-15): geometry must lead recall when the query regime is spatial
- On 849 behind-wall targets, text and a live FoV cone both score 0.000 at telling visible from occluded; adding the ray-versus-voxel DDA raycast games commonly use reaches 0.982 (exact McNemar p < 10^-6)
- Live confirmation with the criteria locked before the run: 8 scripted worlds, 96 behind-wall targets, false-visible 1.000 to 0.000 (pooled exact McNemar p = 2.5x10^-29), surfacing and fixing a relay anchor defect
- Coordinate recall resolves near-duplicate locations a cosine null cannot (1.000 vs 0.533, n = 150); an object-binding ablation lifts situated-action accuracy 0.625 to 1.000 (p = 0.0039)
- The test is definitional: if a non-spatial text or vector index could answer the query, it is not a spatial-memory test. BM25, RAG, GraphRAG, and HippoRAG miss geometry questions not because the algorithms are weak, but because nothing spatial is stored; a minimum-representation schema pairs each query type with what to store and what to compute.
- Recall and visibility are different problems: the fridge behind the wall should stay remembered (recall is occlusion-blind by design), while 'is it visible from here' is computed separately over stored geometry; one line of the DDA raycast games already use recovers occlusion.
- The surprising result: storage, not a renderer, is the irreducible piece. Handed coordinates and occluders as text, an LLM computes occlusion at parity with ray-marching (0.99 vs 0.985), so the contribution is measurement and isolation, and the robustness checks narrow the claim rather than inflate it.
Embodied capture › Pre-register › Falsify shipped blend › Retune geometry-led › Fix relay bug › Live-confirm
One Manifold Dual Contouring voxel engine in Rust, extracted behind a typed boundary and shipped across four runtimes: a web app, an App Store iOS client, native macOS/Windows desktop, and an agent relay, kept in sync by an FFI, TOML codegen, and a parity gate.
I extracted an 18.9K-LOC engine kernel out of the Next.js monolith into versioned packages, then carried it to native through a ~5,100-LOC extern-C FFI xcframework (the iOS C++ to Rust swap) and a wgpu/Slint desktop client, and built the TOML codegen that single-sources physics constants, copy, design tokens, and analytics events into TypeScript, Swift, Rust, and Slint.
- An 18.9K-LOC engine kernel extracted from the Next.js monolith: 1,189+ import sites codemodded in one compile-green commit
- An atomic C++ to Rust engine swap behind a 3-slice C-FFI xcframework: 120/120 XCTests, blake3 byte-parity, legacy engine deleted the same day
- iOS: a Swift 6 / TCA / Metal App Store client; desktop: a native Rust client (wgpu, Slint) with PKCE auth and signed Sparkle/WinSparkle auto-updates
- TOML codegen single-sources physics constants, copy, design tokens, and analytics events into web, Swift, Rust, and Slint
- QA across two languages: Rust proptest, Kani formal-verification, cargo-mutants, TypeScript fast-check, and a record/replay divergence harness
- The engine kernel was lifted out of the app monolith into versioned workspace packages, proven app-independent by a leak audit, then a custom ESLint rule machine-enforces the dependency graph.
- iOS runs the engine on-device through a hand-written C FFI packaged as a three-slice xcframework (Swift 6 strict concurrency, TCA, Metal); desktop is a fully native Rust client (wgpu, Slint), no embedded browser, with PKCE OAuth and signed auto-updates.
- One typed TOML source fans physics constants, copy, design tokens, and analytics events into four platforms or the build fails; QA spans Rust proptest/Kani/cargo-mutants and TypeScript fast-check, with a record/replay divergence harness on the push gate.
Engine kernel (extracted) › C-FFI / xcframework › web / iOS / desktop / relay › TOML codegen › pre-push parity gate
A browser editor that carves 3D Gaussian splat scenes at voxel resolution: per-fragment masking cuts crisp cube-shaped holes in a live splat render, without forking the renderer.
I built splatcarve, an open-source WebGL editor on Spark and Three.js: a voxel occupancy mask evaluated per fragment in the splat shader, voxel-level picking and snapping, undo/redo, and experimental block-stack and first-person modes on top of the carved scene.
- Per-fragment carving holds p95 9.6ms at 256 carves on a 177K-splat scene; voxel picking answers in p95 5.3ms
- The shader hook injects through Spark's public onBeforeCompile API: no fork, no custom rasterizer, one O(1) sampler3D lookup per fragment
- 15.7K LOC of TypeScript with 197 passing tests and reproducible latency benchmarks
- Most 3DGS editors delete whole splats, which leaves fuzzy halos because neighboring Gaussians still cover the hole; splatcarve masks per fragment against a 3D occupancy texture, so carved holes get clean cube boundaries.
- A clip-to-local matrix computed once per frame moves the voxel test into the fragment shader cheaply: a bounds check, one texture sample, discard.
- Carves are visual-only masks over immutable splat data: an edit history gives full undo/redo, and the same voxel grid drives the experimental stack and first-person collision modes.
Splat scene › Voxel occupancy mask › Per-fragment discard › Carved live render

Snap a photo, and the pipeline job kicks off.

The finished model, ready to drop into any world.

A museum table, captured into a placeable asset.
Text or a single photo becomes a rigged, animated, compressed, game-ready 3D asset, with GPU workers, validation, rig merge, material maps, sound, and web/iOS/desktop export handled by the server-side pipeline.
I built the pipeline that turns a single phone photo into a rigged, compressed, game-ready 3D asset: self-hosted Hunyuan3D 2.1 on GPU workers, UniRig auto-rigging, generated material maps, and meshopt/KTX2/USDZ output behind a Supabase RPC job queue with atomic claims.
- Self-hosted Hunyuan3D 2.1 on SaladCloud RTX 4090 workers, with weights baked into the image so cold workers start without downloading checkpoints
- Material maps (normal, height, roughness, AO) generated from a single albedo; output as meshopt/KTX2/USDZ with skeletal animation and sound
- A Redis-free GPU job queue on Supabase, claimed atomically by RPC, scaling independently of the app
- Self-hosted Hunyuan3D 2.1 as Python GPU workers on SaladCloud RTX 4090s, with model weights baked into the container image for faster cold starts.
- Auto-rigging with UniRig: predict the skeleton and skin on a simplified proxy, score multiple seeds, then merge the rig back onto the full-resolution mesh so topology, UVs, and textures all survive.
- A production asset path with validation and failure handling around mesh generation, rig merge, material-map generation, compression, and export, so bad outputs fail as jobs rather than leaking into the world.
One Cloudflare Durable Object per world carries presence, edits, chat, and agent actions, so humans and agents share the same live space.
I built the shared live layer: one Cloudflare Durable Object per SPACE0 world carries presence, edits, chat, and agent actions. Short-lived signed tokens authorize every connection.
- One Cloudflare Durable Object per world, holding presence and session state in memory
- Short-lived signed tokens minted by the web app authorize every connection
- Web, iOS, macOS, and Windows clients all connect to the same multiplayer server: true cross-platform play
- Every read is local to the world's Durable Object, and writes propagate instantly to all connected clients.
- A misbehaving agent is one that violates rate limits, sends malformed action payloads, or triggers server-side claim rejections repeatedly. The denylist blocks it at the next join, not mid-session.
- Short-lived signed tokens minted by the web app gate every WebSocket connection, so the relay never trusts claimed identity.
Web / iOS / desktop › signed token › Durable Object per world › backend (voxel store)

Framed posts sitting on a real surface, on web.

A media card with real depth and clipping, on web.
Posts sit inside the world rather than in a panel, turning the world itself into the UGC surface, with text, image, and video cards rendering across web, desktop, and iOS from one backend.
I shipped in-world media posts on three platforms: a card of text, image, or video placed on a surface renders as a depth-correct decal on web (Three.js), desktop (a GPU-accelerated Slint canvas over wgpu), and iOS (SwiftUI).
- Decals projected with correct depth, occlusion, and surface alignment on every client (web/desktop/iOS)
- Posts are placed in world-space, not a sidebar panel: they sit on surfaces as depth-correct decals with occlusion, so the world itself becomes the UGC surface.
- One backend serves three distinct renderers (Three.js web, wgpu desktop, SwiftUI iOS) without platform-specific divergence in the post schema.

Every component is browseable and live right in the page.

Each page: a live preview plus copy-paste cargo and npm install.
slintcn is an open-source design system for Slint native apps, shipping 56 components with npm and crates.io installers, an MCP server, and live Slint-WASM docs.
I built slintcn: a shadcn-style component registry for Slint native apps with 56 components plus 8 blocks, npm and crates.io installers sharing one registry, an MCP server for AI agents, and live Slint-WASM docs.
- 56 components and 8 blocks, installable via npm or crates.io from the same registry
- A 62K-view r/rust launch post
- MCP server lets AI agents browse, install, and compose components
- Web tooling and native Rust clients share the same component source, without duplication.
- The MCP server exposes the registry to AI agents, letting them browse available components, read docs, and install by name into a project.
- Built from real production need: the component system grew out of the SPACE0 desktop client and was dogfooded back into it, so every component shipped in a live product before it reached the registry.

The whole planet from space, volumetric clouds and a day-night terminator.

Ground level: forested ridges, water, the sky.

The terrain from above: voxel land and water.
Climate to ecosystem, simulated live in a browser tab: real atmospheric circulation, an ocean that obeys real oceanography, 36 climate-derived biomes, volumetric weather, and a metabolism engine, on a kilometer-scale voxel planet.
I built the systems that run the planet: a Rust-WASM climate model (three-cell global circulation, Coriolis, an ITCZ, and a -6.5C/1000m lapse rate), an oceanography-driven Gerstner ocean in TSL (M2 lunar tide, Ekman and thermohaline currents, polar ice), a 36-biome layer that turns climate into terrain and color, an 8,418-line volumetric cloud and weather system, and a five-phase metabolism engine.
- Rust-WASM climate: three-cell global circulation (Hadley/Ferrel/Polar), Coriolis, ITCZ, and lapse-rate temperature, compiled to WASM and run in the browser
- An ocean that simulates real oceanography: an 8-wave Gerstner cascade with analytical normals on the spherical planet, a 12.42-hour M2 lunar tide, Ekman and thermohaline currents, Beer-Lambert depth color, and polar ice past 70 degrees latitude
- 36 Whittaker-classified biomes derived from the simulated climate (temperature, precipitation, altitude), each driving its own terrain erosion, surface palette, and cloud physics
- Volumetric cloud and weather: SDF raymarching, an offline imposter baker, and chunked streaming
- The climate is real physics, not a texture: a wind module implements three-cell global circulation with Coriolis deflection, and a spherical-climate module models the ITCZ, storm tracks, and a -6.5C/1000m lapse rate, all running in WASM.
- Every term in the ocean shader comes from the simulation: waves driven by the global wind field, tides on the true M2 period, with currents, depth color, and polar ice all derived from the climate model.
- A 36-biome classification layer turns the climate field into the visible world: Whittaker temperature-precipitation classes with altitude bands, so terrain, color, and clouds all follow from the climate.
Climate physics (Rust-WASM) › Volumetric weather › Gerstner ocean › Metabolism engine › LLM NPC brain