feat: new version more robust network handle

fix: more robust network handling
feat: added provision error propagation
2026-07-01 12:15:30 +02:00 · 2026-02-09 01:52:33 -05:00 · 2026-02-09 01:52:06 -05:00 · 2026-02-09 01:48:20 -05:00 · 2026-02-09 01:16:56 -05:00 · 2026-02-09 01:16:20 -05:00
10 changed files with 390 additions and 82 deletions
@@ -1303,7 +1303,7 @@ checksum = "ba73ea9cf16a25df0c8caa16c51acb937d5712a8429db78a3ee29d5dcacd3a65"

 [[package]]
 name = "vibebox"
-version = "0.2.3"
+version = "0.2.5"
 dependencies = [
 "assert_cmd",
 "block2",
@@ -1,6 +1,6 @@
 [package]
 name = "vibebox"
-version = "0.2.3"
+version = "0.2.5"
 edition = "2024"
 authors = ["Finn Sheng"]
 description = "Ultrafast CLI on Apple Silicon macOS for fast, sandboxed development and LLM agents."
@@ -5,7 +5,7 @@
    </picture>
  </a>
 </p>
-<p align="center">Your ultrafast open source AI sandbox.</p>
+<p align="center">an ultrafast, open-source sandbox for running coding agents safely.</p>

 <p align="center">
  <a href="https://crates.io/crates/vibebox">
@@ -24,8 +24,36 @@
  <a href="README.zh.md">简体中文</a>
 </p>

-VibeBox is a lightweight, ultra-fast sandbox for AI agents to run commands, edit files, and execute code inside an
-isolated Apple Virtualization Framework micro-VM, no repeated permission prompts, minimal memory/disk overhead.
+**VibeBox is a per-project micro-VM sandbox for running coding agents on macOS (Apple Virtualization Framework).**
+It’s optimized for a *daily-driver* workflow: fast warm re-entry, explicit mounts, and reusable sessions.
+
+**Who it’s for:** macOS users running coding agents who want real isolation without giving up a fast daily workflow.
+
+**Quick facts:** warm re-entry is typically **<5s** on my M3 (varies by machine/cache); first run downloads and
+provisions a Debian base image (network dependent).
+
+**Security model:** Linux guest VM with explicit mount allowlists from `vibebox.toml` (repo-first, everything else
+opt-in).
+
+- **enter/attach in seconds:** `vibebox` drops you into a reusable sandbox for the current repo
+- **project-scoped by default:** explicit mounts + repo-contained changes (repo-first, everything else is allowlisted)
+- **sessioned:** multi-instance + session management (reuse, multiple terminals, cleanup)
+
+### Quick Demo
+
+```bash
+# from any repo
+cd my-project
+vibebox
+```
+
+What you should see (roughly):
+
+```text
+vibebox: starting (session: my-project)
+vibebox: attaching...
+vibecoder@vibebox:~/my-project$
+```

 [![VibeBox Terminal UI](docs/screenshot.png)](https://vibebox.robcholz.com)

@@ -33,25 +61,40 @@ isolated Apple Virtualization Framework micro-VM, no repeated permission prompts

 ### Why I built VibeBox

-I use agents like Codex and CC a lot, but I always felt uneasy running them directly on my host machine. If I lock
-things
-down, I get interrupted by constant “are you sure?” prompts. If I loosen it up, I worry the agent might touch the
-wrong files or run something I didn’t intend.
+I use coding agents daily, and I wanted to give them a real shell without handing them my host machine.
+Lock things down and you get nonstop confirmations; loosen it up and you worry about deleting files, touching secrets,
+or wandering outside the repo.

-I wanted something that feels as frictionless as giving an agent a real shell, but with a hard isolation boundary. So I
-built VibeBox: a per-project micro-VM sandbox that starts fast, keeps changes contained to the repo, and lets me iterate
-without babysitting permissions.
+VibeBox is the middle ground: a per-repo sandbox with a hard VM boundary, fast re-entry, and explicit mounts. It’s built
+to be “always on” for agent work without turning safety into a chore.
+
+### Why a micro-VM (vs containers)?
+
+Containers are great. VibeBox isn’t trying to replace Docker/devcontainers for building services.
+
+I specifically wanted a VM-shaped default for agent workflows on macOS:
+
+- **guest-kernel isolation boundary by default:** when I’m letting an agent run arbitrary commands, I want “safe mode”
+  to
+  be a Linux guest, not my host.
+- **sessions as a first-class workflow:** attach/reuse per repo, multiple terminals into the same sandbox, reliable
+  cleanup to avoid orphan environments.
+- **explicit mount allowlists as the primary UX:** repo-scoped by default; anything else is an explicit decision.
+- **minimal per-repo setup:** you *can* reproduce parts of this with compose/devcontainers, but I wanted a single
+  command
+  that works repo-to-repo without maintaining container configs for the basic “safe shell” workflow.

 ### Comparison

 Here’s why I didn’t just use existing options:

- **vibe**: super convenient, but it’s too minimal for what I need. It lacks basic configuration, and it doesn’t give me
-  the multi-instance + session management my workflow wants.
+- **vibe**: super convenient and nails “zero-config, just go”. VibeBox is intentionally on a different axis: per-repo
+  config + sessions + multi-instance lifecycle.
 - **QEMU**: powerful, but the configuration surface area is huge. For day-to-day sandboxing it’s not “open a repo and
  go” — it’s a project on its own.
- **Docker / devcontainers**: great ecosystem, but for daily use it feels heavy. Cold starts can be slow, and it’s not
-  something I can jump into instantly, repeatedly, all day.
+- **Docker / devcontainers / devpods**: great ecosystem. My friction wasn’t raw startup time, it was the day-to-day
+  overhead of keeping per-repo agent sandboxes *safe-by-default* (mount allowlists, secrets exposure, attach/reuse,
+  cleanup) without maintaining container configs per repo for the basic workflow.

 That’s what pushed me to build **VibeBox**: I wanted a per-project sandbox that’s fast to enter (just `vibebox`),
 supports real configuration + sessions, and keeps a hard isolation boundary.
@@ -59,26 +102,23 @@ supports real configuration + sessions, and keeps a hard isolation boundary.
 ### Installation

 ```bash
-# YOLO
+# install script
 curl -fsSL https://raw.githubusercontent.com/robcholz/vibebox/main/install | bash

-# Package managers
+# package managers
 cargo install vibebox

-# Or manually (bad)
+# manual install
 curl -LO https://github.com/robcholz/vibebox/releases/download/latest/vibebox-macos-arm64.zip
 unzip vibebox-macos-arm64.zip
 mkdir -p ~/.local/bin
-mv vibe ~/.local/bin
+mv vibebox ~/.local/bin
 export PATH="$HOME/.local/bin:$PATH"
 ```

-> [!TIP]
-> We truly recommend you to use `YOLO` to install.
-
 **Requirements**

- macOS on Apple Silicon (Vibebox uses Apple's virtualization APIs).
+- macOS on Apple Silicon (VibeBox uses Apple's virtualization APIs).

 **First Run**

@@ -94,7 +134,7 @@ cd /path/to/your/project
 vibebox
 ```

-On first run, Vibebox creates `vibebox.toml` in your project (if missing) and a `.vibebox/` directory for instance data.
+On first run, VibeBox creates `vibebox.toml` in your project (if missing) and a `.vibebox/` directory for instance data.

 **Configuration (`vibebox.toml`)**

@@ -145,7 +185,7 @@ vibebox explain     # show mounts and network info
 - Default SSH user: `vibecoder`
 - Hostname: `vibebox`
 - Base image provisioning installs: build tools, `git`, `curl`, `ripgrep`, `openssh-server`, and `sudo`.
- On first login, Vibebox installs `mise` and configures tools like `uv`, `node`, `@openai/codex`, and
+- On first login, VibeBox installs `mise` and configures tools like `uv`, `node`, `@openai/codex`, and
  `@anthropic-ai/claude-code` (best-effort).
 - Shell aliases: `:help` and `:exit`.

@@ -160,27 +200,24 @@ vibebox explain     # show mounts and network info
 If you're interested in contributing to VibeBox, please read our [contributing docs](CONTRIBUTING.md) before
 submitting a pull request.

-### Using VibeBox
-
-Feel free to use, but remember to promote VibeBox as well!
-
 ### FAQ

-#### How is this different from other Sandboxes?
+#### How is this different from other sandboxes?

-Vibebox is built for fast, repeatable local sandboxes with minimal ceremony. What’s different here:
+VibeBox is built for fast, repeatable local sandboxes with minimal ceremony. What’s different here:

- Warm startup is typically under **6 seconds** on my M3, so you can jump back in quickly.
+- Warm re-entry is typically **<5s** on my M3 (varies by machine/cache), so you can jump back in quickly.
 - One simple command — `vibebox` — drops you into the sandbox from your project.
 - Configuration lives in `vibebox.toml`, where you can set CPU, RAM, disk size, and mounts.
+- Sessions are first-class: reuse, multiple terminals, cleanup.

-### Special Thank
+### Special thanks

 [vibe](https://github.com/lynaghk/vibe) by lynaghk.

-And amazing Rust community, without your rich crates and fantastic toolchain like [crates.io](https://crates.io), this
-wouldn't be possible!
+And the amazing Rust community — without the ecosystem and toolchain like [crates.io](https://crates.io), this wouldn't
+be possible!

 ---

-**Follow me on X** [X.com](https://x.com/robcholz)
+**Follow me on X** [x.com/robcholz](https://x.com/robcholz)
@@ -5,7 +5,7 @@
    </picture>
  </a>
 </p>
-<p align="center">超高速、开源的 AI 沙盒。</p>
+<p align="center">用于安全运行 coding agents 的超高速开源沙盒。</p>

 <p align="center">
  <a href="https://crates.io/crates/vibebox">
@@ -24,8 +24,34 @@
  <a href="README.md">English</a>
 </p>

-VibeBox 是一个轻量、启动极快的沙盒环境，让 AI Agent 可以安全地直接跑命令、改文件、执行代码，不会不停弹“要不要允许”的提示。它基于
-Apple 的 Virtualization Framework 做到彻底隔离，所以无论 Agent 在里面怎么折腾，你的真实系统都不会被影响；同时做到了低内存和磁盘占用。
+**VibeBox 是一个按项目划分的 micro-VM 沙盒，用于在 macOS 上运行 coding agents（基于 Apple Virtualization Framework）。**
+它面向 *日常使用* 工作流优化：快速热启动、显式挂载、可复用会话。
+
+**适合谁：** 在 macOS 上使用 coding agents，并且既想要真实隔离又不想牺牲日常效率的人。
+
+**快速事实：** 在我的 M3 上，热启动通常 **<5s**（因机器/缓存而异）；首次运行会下载并初始化 Debian 基础镜像（受网络影响）。
+
+**安全模型：** Linux 来宾 VM + `vibebox.toml` 显式挂载白名单（默认仅项目目录，其它均需显式允许）。
+
+- **几秒进入/附加：** `vibebox` 直接进入当前仓库的可复用沙盒
+- **默认按项目范围：** 显式挂载 + 改动限制在仓库内（repo 优先，其他需 allowlist）
+- **会话化：** 多实例 + 会话管理（复用、多终端、清理）
+
+### 快速演示
+
+```bash
+# 在任意仓库内
+cd my-project
+vibebox
+```
+
+你大致会看到：
+
+```text
+vibebox: starting (session: my-project)
+vibebox: attaching...
+vibecoder@vibebox:~/my-project$
+```

 [![VibeBox Terminal UI](docs/screenshot.png)](https://vibebox.robcholz.com)

@@ -33,43 +59,51 @@ Apple 的 Virtualization Framework 做到彻底隔离，所以无论 Agent 在

 ### 我为什么做 VibeBox

-我平时经常用像 Codex 和 CC 这样的
-agent，但一直不太敢让它们直接在我的宿主机上跑。把权限收紧一点，就会被各种“你确定吗？”的确认弹窗不停打断；放松一点，又担心它哪天误操作，碰到不该碰的文件，或者跑了我没打算执行的命令。
+我每天都在用 coding agents，也希望它们有一个真实的 shell，但不想把宿主机直接交出去。
+权限收紧会被不停的确认打断；权限放开又担心误删文件、触及密钥，或者跑出仓库边界。

-我想要的是一种体验：像给 agent 一个真实 shell 一样顺滑，但同时又有一道硬隔离的安全边界。于是我做了 VibeBox：一个按项目划分的
-micro-VM 沙箱，启动很快，改动被限制在仓库范围内，让我可以更高频地迭代，而不用一直盯着权限确认。
+VibeBox 是中间方案：按项目隔离、硬 VM 边界、快速回到工作状态、显式挂载。它适合把 agent 当成日常工具，而不是把安全变成负担。
+
+### 为什么是 micro-VM（而不是容器）？
+
+容器很好用。VibeBox 并不是用来替代 Docker/devcontainers 去构建服务。
+
+我更想要的是在 macOS 上适合 agent 的 VM 默认形态：
+
+- **默认是 guest-kernel 隔离边界：** 让 agent 跑任意命令时，“安全模式”是 Linux 来宾而不是宿主机。
+- **会话是第一等公民：** 按项目附加/复用，多终端进入同一沙盒，可靠清理，避免孤儿环境。
+- **显式挂载白名单作为主要 UX：** 默认仅项目目录，其它都需显式允许。
+- **最少的每项目配置：** 你可以用 compose/devcontainers 实现一部分，但我想要的是一个命令，在不同仓库间直接工作，不必维护容器配置来获得基础“安全
+  shell”体验。

 ### 对比

 下面是我为什么没有直接用现成方案的原因：

- **vibe**：非常方便，但对我来说太“极简”了。它缺少一些基础配置能力，也没法提供我工作流里需要的多开和 session 管理。
+- **vibe**：非常方便，“零配置、直接用”做得很好。但 VibeBox 走的是另一条路：按项目配置 + 会话 + 多实例生命周期。
 - **QEMU**：很强大，但配置面太大了。日常当沙箱用，它不像是“进到 repo 就能用”，更像是你得先把它当成一个项目来折腾。
- **Docker / devcontainers**：生态很成熟，但日常使用对我来说偏重。冷启动有时会慢，而且它不是那种我能一天反复、随时秒进秒出的工具。
+- **Docker / devcontainers / devpods**：生态很成熟。我的痛点不在启动时间，而是日常保持
+  safe-by-default（挂载白名单、密钥暴露、附加/复用、清理）的开销，不想为基础 workflow 在每个项目维护容器配置。

-这些就促使我做了 **VibeBox**：我想要一个按项目隔离的沙箱，进入速度快（直接用 `vibebox`），支持真正可用的配置和
-sessions，同时还保留明确的硬隔离边界。
+这就是我做 **VibeBox** 的原因：我想要一个按项目隔离的沙箱，进入快（直接 `vibebox`），支持真实配置 + 会话，同时保持硬隔离边界。

 ### 安装

 ```bash
-# YOLO：一键安装（推荐）
+# YOLO：一键安装
 curl -fsSL https://raw.githubusercontent.com/robcholz/vibebox/main/install | bash

 # Cargo
 cargo install vibebox

-# 或者手动安装（不推荐）
+# 或者手动安装
 curl -LO https://github.com/robcholz/vibebox/releases/download/latest/vibebox-macos-arm64.zip
 unzip vibebox-macos-arm64.zip
 mkdir -p ~/.local/bin
-mv vibe ~/.local/bin
+mv vibebox ~/.local/bin
 export PATH="$HOME/.local/bin:$PATH"
 ```

-> [!TIP]
-> 强烈建议直接用 `YOLO` 方式安装，省事且更不容易踩坑。
-
 **系统要求**

 - Apple Silicon 的 macOS（VibeBox 使用了 Apple 的虚拟化 API）。
@@ -152,19 +186,16 @@ vibebox explain     # 显示挂载与网络信息

 如果你想参与贡献 VibeBox，请先阅读 [贡献指南](CONTRIBUTING.md)，再提交 Pull Request。

-### 使用 VibeBox
-
-欢迎使用，也别忘了顺手帮 VibeBox 做点宣传！
-
 ### FAQ

 #### 它和其它 Sandboxes 有什么不同？

-Vibebox 追求的是：本地、可复现、启动快、流程简单。主要差异点：
+VibeBox 追求的是：本地、可复现、启动快、流程简单。主要差异点：

- 在我的 M3 上，热启动通常 **6 秒以内**，可以非常快地回到工作状态。
+- 在我的 M3 上，热启动通常 **<5s**，可以非常快地回到工作状态。
 - 一个命令——`vibebox`——直接把你带进沙盒（从你的项目目录启动）。
 - 配置集中在 `vibebox.toml`，CPU / 内存 / 磁盘大小 / 挂载都能一眼看懂、随手改。
+- 会话是第一等公民：复用、多终端、清理。

 ### 特别鸣谢

@@ -64,3 +64,4 @@
 3. [ ] Redirect vm output to log.
 4. [ ] Redirect vm output to vibebox starting it.
 5. [ ] use anyhow to sync api.
+6. [ ] add support for ipv6.
@@ -120,9 +120,15 @@ fn main() -> Result<()> {
    tracing::debug!(auto_shutdown_ms, "auto shutdown config");
    let manager_conn =
        vm_manager::ensure_manager(&raw_args, auto_shutdown_ms, config_override.as_deref())
-            .map_err(|err| color_eyre::eyre::eyre!(err.to_string()))?;
+            .map_err(|err| {
+                tracing::error!(error = %err, "failed to ensure vm manager");
+                color_eyre::eyre::eyre!(err.to_string())
+            })?;

-    instance::run_with_ssh(manager_conn).map_err(|err| color_eyre::eyre::eyre!(err.to_string()))?;
+    instance::run_with_ssh(manager_conn).map_err(|err| {
+        tracing::error!(error = %err, "failed to ensure vm manager");
+        color_eyre::eyre::eyre!(err.to_string())
+    })?;

    tracing::info!("See you again — keep vibecoding (no SEVs, only vibes) 😈");

@@ -266,6 +266,11 @@ fn wait_for_vm_ipv4(
                Ok(status) => {
                    status_missing = false;
                    let status = status.trim().to_string();
+                    if status.starts_with("error:") {
+                        let _ = fs::remove_file(&status_path);
+                        let message = status.trim_start_matches("error:").trim().to_string();
+                        return Err(message.into());
+                    }
                    if !status.is_empty() && last_status.as_deref() != Some(status.as_str()) {
                        tracing::info!("[background]: {}", status);
                        last_status = Some(status);
@@ -1,12 +1,55 @@
 #!/bin/bash
-set -eux
+set -eEux
+
+trap 'rc=$?; echo "[vibebox][error] provisioning failed at: ${BASH_COMMAND} (exit ${rc})"; echo "VIBEBOX_PROVISION_FAILED"; systemctl poweroff || true; exit 1' ERR
+
+# Wait for network + DNS before apt-get to avoid early boot flakiness.
+wait_for_network() {
+  echo "[vibebox] waiting for network/DNS readiness"
+  local deadline=$((SECONDS + 180))
+  while [ "$SECONDS" -lt "$deadline" ]; do
+    local has_route=0
+    if ip -4 route show default >/dev/null 2>&1; then
+      has_route=1
+    elif ip -6 route show default >/dev/null 2>&1; then
+      has_route=1
+    fi
+    if [ "$has_route" -eq 1 ]; then
+      if getent hosts deb.debian.org >/dev/null 2>&1; then
+        return 0
+      fi
+    fi
+    sleep 1
+  done
+  echo "[vibebox][warn] network/DNS still not ready after 180s; continuing" >&2
+  echo "[vibebox][warn] /etc/resolv.conf:" >&2
+  cat /etc/resolv.conf >&2 || true
+  ip -br addr >&2 || true
+  ip route >&2 || true
+  ip -6 route >&2 || true
+  return 0
+}
+
+apt_update_with_retries() {
+  local attempt=1
+  while [ "$attempt" -le 5 ]; do
+    if apt-get update; then
+      return 0
+    fi
+    echo "[vibebox][warn] apt-get update failed (attempt ${attempt}/5); retrying..." >&2
+    attempt=$((attempt + 1))
+    sleep 2
+  done
+  return 1
+}

 # Don't wait too long for slow mirrors.
-echo 'Acquire::http::Timeout "2";' | tee /etc/apt/apt.conf.d/99timeout
-echo 'Acquire::https::Timeout "2";' | tee -a /etc/apt/apt.conf.d/99timeout
-echo 'Acquire::Retries "2";' | tee -a /etc/apt/apt.conf.d/99timeout
+echo 'Acquire::http::Timeout "10";' | tee /etc/apt/apt.conf.d/99timeout
+echo 'Acquire::https::Timeout "10";' | tee -a /etc/apt/apt.conf.d/99timeout
+echo 'Acquire::Retries "5";' | tee -a /etc/apt/apt.conf.d/99timeout

-apt-get update
+wait_for_network
+apt_update_with_retries
 apt-get install -y --no-install-recommends      \
        build-essential                         \
        pkg-config                              \
@@ -55,4 +98,5 @@ sleep 100 # sleep here so that we don't see the login screen flash up before the
 EOF

 # Done provisioning, power off the VM
+echo "VIBEBOX_PROVISION_OK"
 systemctl poweroff
@@ -40,6 +40,7 @@ const DEFAULT_RAM_MB: u64 = 2048;
 const DEFAULT_RAM_BYTES: u64 = DEFAULT_RAM_MB * BYTES_PER_MB;
 const START_TIMEOUT: Duration = Duration::from_secs(60);
 const LOGIN_EXPECT_TIMEOUT: Duration = Duration::from_secs(120);
+const PROVISION_EXPECT_TIMEOUT: Duration = Duration::from_secs(900);

 struct StatusFile {
    path: PathBuf,
@@ -76,7 +77,15 @@ const BASE_DISK_RAW_NAME: &str = "disk.raw";

 #[derive(Clone)]
 pub(crate) enum LoginAction {
-    Expect { text: String, timeout: Duration },
+    Expect {
+        text: String,
+        timeout: Duration,
+    },
+    ExpectEither {
+        success: String,
+        failure: String,
+        timeout: Duration,
+    },
    Send(String),
 }
 use LoginAction::*;
@@ -359,6 +368,12 @@ enum WaitResult {
    Found,
 }

+#[derive(PartialEq, Eq)]
+enum WaitAnyResult {
+    Timeout,
+    Found(usize),
+}
+
 pub enum VmInput {
    Bytes(Vec<u8>),
    Shutdown,
@@ -366,6 +381,7 @@ pub enum VmInput {

 enum VmOutput {
    LoginActionTimeout { action: String, timeout: Duration },
+    LoginActionFailed { action: String, reason: String },
 }

 #[derive(Default)]
@@ -402,6 +418,41 @@ impl OutputMonitor {
            WaitResult::Found
        }
    }
+
+    fn wait_for_any(&self, needles: &[&str], timeout: Duration) -> WaitAnyResult {
+        let mut found: Option<usize> = None;
+        let (_unused, timeout_result) = self
+            .condvar
+            .wait_timeout_while(self.buffer.lock().unwrap(), timeout, |buf| {
+                if let Some((pos, idx, len)) = find_any(buf, needles) {
+                    *buf = buf[(pos + len)..].to_string();
+                    found = Some(idx);
+                    false
+                } else {
+                    true
+                }
+            })
+            .unwrap();
+
+        if timeout_result.timed_out() {
+            WaitAnyResult::Timeout
+        } else {
+            WaitAnyResult::Found(found.unwrap_or(0))
+        }
+    }
+}
+
+fn find_any(buf: &str, needles: &[&str]) -> Option<(usize, usize, usize)> {
+    let mut best: Option<(usize, usize, usize)> = None; // (pos, idx, len)
+    for (idx, needle) in needles.iter().enumerate() {
+        if let Some(pos) = buf.find(needle) {
+            let candidate = (pos, idx, needle.len());
+            if best.is_none_or(|b| candidate.0 < b.0) {
+                best = Some(candidate);
+            }
+        }
+    }
+    best
 }

 #[derive(Debug)]
@@ -546,14 +597,25 @@ fn ensure_default_image(
    fs::copy(base_raw, default_raw)?;

    let provision_command = script_command_from_content(PROVISION_SCRIPT_NAME, PROVISION_SCRIPT)?;
-    run_vm(
+    let provision_actions = [
+        Send(provision_command),
+        ExpectEither {
+            success: "VIBEBOX_PROVISION_OK".to_string(),
+            failure: "VIBEBOX_PROVISION_FAILED".to_string(),
+            timeout: PROVISION_EXPECT_TIMEOUT,
+        },
+    ];
+    if let Err(err) = run_vm(
        default_raw,
-        &[Send(provision_command)],
+        &provision_actions,
        directory_shares,
        DEFAULT_CPU_COUNT,
        DEFAULT_RAM_BYTES,
        None,
-    )?;
+    ) {
+        let _ = fs::remove_file(default_raw);
+        return Err(err);
+    }

    Ok(())
 }
@@ -1033,6 +1095,27 @@ fn spawn_login_actions_thread(
                        return;
                    }
                }
+                ExpectEither {
+                    success,
+                    failure,
+                    timeout,
+                } => match output_monitor.wait_for_any(&[&success, &failure], timeout) {
+                    WaitAnyResult::Found(0) => {}
+                    WaitAnyResult::Found(_) => {
+                        let _ = vm_output_tx.send(VmOutput::LoginActionFailed {
+                            action: format!("expect '{}'", success),
+                            reason: format!("saw failure marker '{}'", failure),
+                        });
+                        return;
+                    }
+                    WaitAnyResult::Timeout => {
+                        let _ = vm_output_tx.send(VmOutput::LoginActionTimeout {
+                            action: format!("expect '{}'", success),
+                            timeout,
+                        });
+                        return;
+                    }
+                },
                Send(mut text) => {
                    text.push('\n'); // Type the newline so the command is actually submitted.
                    input_tx.send(VmInput::Bytes(text.into_bytes())).unwrap();
@@ -1192,6 +1275,24 @@ where
                }
                break;
            }
+            Ok(VmOutput::LoginActionFailed { action, reason }) => {
+                exit_result = Err(format!(
+                    "Login action ({}) failed: {}; shutting down.",
+                    action, reason
+                )
+                .into());
+                unsafe {
+                    if vm.canRequestStop() {
+                        if let Err(err) = vm.requestStopWithError() {
+                            tracing::error!(error = ?err, "failed to request VM stop");
+                        }
+                    } else if vm.canStop() {
+                        let handler = RcBlock::new(|_error: *mut NSError| {});
+                        vm.stopWithCompletionHandler(&handler);
+                    }
+                }
+                break;
+            }
            Err(mpsc::TryRecvError::Empty) => {}
            Err(mpsc::TryRecvError::Disconnected) => {}
        }
@@ -17,6 +17,7 @@ use std::{

 use crate::{
    config::CONFIG_PATH_ENV,
+    instance::STATUS_FILE_NAME,
    instance::VM_ROOT_LOG_NAME,
    instance::{
        DEFAULT_SSH_USER, InstanceConfig, build_ssh_login_actions, ensure_instance_dir,
@@ -30,8 +31,22 @@ use crate::{

 const VM_MANAGER_LOCK_NAME: &str = "vm.lock";
 const VM_MANAGER_LOG_NAME: &str = "vm_manager.log";
+const SHUTDOWN_RETRY_MS: u64 = 500;
+#[cfg(test)]
+const HARD_SHUTDOWN_TIMEOUT_MS: u64 = 1_000;
+#[cfg(not(test))]
 const HARD_SHUTDOWN_TIMEOUT_MS: u64 = 12_000;

+#[cfg(not(test))]
+fn force_exit(_reason: &str) -> ! {
+    std::process::exit(1);
+}
+
+#[cfg(test)]
+fn force_exit(reason: &str) -> ! {
+    panic!("{reason}");
+}
+
 pub fn ensure_manager(
    raw_args: &[std::ffi::OsString],
    auto_shutdown_ms: u64,
@@ -751,6 +766,10 @@ fn run_manager_with(
    );
    tracing::info!("vm manager vm run completed");
    let vm_err = vm_result.err().map(|e| e.to_string());
+    if let Some(err) = &vm_err {
+        let status_path = instance_dir.join(STATUS_FILE_NAME);
+        let _ = fs::write(&status_path, format!("error: {err}"));
+    }
    let _ = event_tx.send(ManagerEvent::VmExited(vm_err.clone()));
    let event_loop_result: Result<(), String> = event_loop_handle
        .join()
@@ -782,9 +801,14 @@ fn manager_event_loop(
    let hard_timeout = Duration::from_millis(HARD_SHUTDOWN_TIMEOUT_MS);

    loop {
-        let timeout = match shutdown_deadline {
-            Some(deadline) => deadline.saturating_duration_since(Instant::now()),
-            None => Duration::from_secs(1),
+        let timeout = match (shutdown_deadline, hard_deadline) {
+            (Some(shutdown), Some(hard)) => {
+                let next = if shutdown <= hard { shutdown } else { hard };
+                next.saturating_duration_since(Instant::now())
+            }
+            (Some(shutdown), None) => shutdown.saturating_duration_since(Instant::now()),
+            (None, Some(hard)) => hard.saturating_duration_since(Instant::now()),
+            (None, None) => Duration::from_secs(1),
        };

        match event_rx.recv_timeout(timeout) {
@@ -824,6 +848,9 @@ fn manager_event_loop(
                    && Instant::now() >= deadline
                    && !shutdown_sent
                {
+                    if hard_deadline.is_none() {
+                        hard_deadline = Some(Instant::now() + hard_timeout);
+                    }
                    let mut sent = false;
                    if let Some(tx) = vm_input_tx.lock().unwrap().clone() {
                        if tx
@@ -843,19 +870,26 @@ fn manager_event_loop(
                        shutdown_deadline = None;
                        hard_deadline = Some(Instant::now() + hard_timeout);
                    } else {
-                        shutdown_deadline = Some(Instant::now() + Duration::from_millis(500));
+                        shutdown_deadline =
+                            Some(Instant::now() + Duration::from_millis(SHUTDOWN_RETRY_MS));
                    }
                }
                if ref_count == 0
-                    && shutdown_sent
                    && let Some(deadline) = hard_deadline
                    && Instant::now() >= deadline
                {
-                    tracing::warn!(
-                        timeout_ms = HARD_SHUTDOWN_TIMEOUT_MS,
-                        "force exiting: VM did not stop after shutdown timeout"
-                    );
-                    std::process::exit(1);
+                    if shutdown_sent {
+                        tracing::warn!(
+                            timeout_ms = HARD_SHUTDOWN_TIMEOUT_MS,
+                            "force exiting: VM did not stop after shutdown timeout"
+                        );
+                    } else {
+                        tracing::warn!(
+                            timeout_ms = HARD_SHUTDOWN_TIMEOUT_MS,
+                            "force exiting: VM input not ready after shutdown timeout"
+                        );
+                    }
+                    force_exit("vm manager forced exit");
                }
            }
            Err(mpsc::RecvTimeoutError::Disconnected) => break,
@@ -868,7 +902,7 @@ fn manager_event_loop(
 #[cfg(test)]
 mod tests {
    use super::*;
-    use std::{sync::mpsc, time::Duration};
+    use std::{sync::mpsc, thread, time::Duration};

    #[test]
    fn manager_powers_off_after_grace_when_no_refs() {
@@ -901,4 +935,53 @@ mod tests {
        let _ = event_tx.send(ManagerEvent::VmExited(None));
        let _ = manager_thread.join();
    }
+
+    #[test]
+    fn manager_force_exits_when_vm_input_never_ready() {
+        let (event_tx, event_rx) = mpsc::channel::<ManagerEvent>();
+        let vm_input_tx = Arc::new(Mutex::new(None));
+
+        let manager_thread = thread::spawn(move || {
+            let _ = manager_event_loop(event_rx, vm_input_tx, 10);
+        });
+
+        event_tx.send(ManagerEvent::Inc(None)).unwrap();
+        event_tx.send(ManagerEvent::Dec(None)).unwrap();
+
+        let join_result = manager_thread.join();
+        assert!(
+            join_result.is_err(),
+            "expected manager to force-exit when vm input never becomes ready"
+        );
+    }
+
+    #[test]
+    fn manager_sends_shutdown_after_vm_input_becomes_ready() {
+        let (event_tx, event_rx) = mpsc::channel::<ManagerEvent>();
+        let (vm_tx, vm_rx) = mpsc::channel::<VmInput>();
+        let vm_input_tx = Arc::new(Mutex::new(None));
+        let vm_input_for_thread = vm_input_tx.clone();
+
+        let manager_thread = thread::spawn(move || {
+            manager_event_loop(event_rx, vm_input_for_thread, 10).expect("event loop");
+        });
+
+        event_tx.send(ManagerEvent::Inc(None)).unwrap();
+        event_tx.send(ManagerEvent::Dec(None)).unwrap();
+
+        thread::sleep(Duration::from_millis(100));
+        *vm_input_tx.lock().unwrap() = Some(vm_tx);
+
+        let msg = vm_rx
+            .recv_timeout(Duration::from_secs(2))
+            .expect("poweroff");
+        match msg {
+            VmInput::Bytes(data) => {
+                assert_eq!(data, b"systemctl poweroff\n");
+            }
+            _ => panic!("unexpected vm input"),
+        }
+        let _ = event_tx.send(ManagerEvent::VmExited(None));
+        let _ = manager_thread.join();
+    }
 }
Author	SHA1	Message	Date
robcholz	4d1529905e	feat: new version more robust network handle	2026-02-09 01:52:33 -05:00
robcholz	a568295bd3	fix: more robust network handling	2026-02-09 01:52:06 -05:00
robcholz	b5cd1f2064	feat: added provision error propagation	2026-02-09 01:48:20 -05:00
robcholz	b425ae4b77	feat: version 0.2.4	2026-02-09 01:16:56 -05:00
robcholz	b433d3ef93	fix: handled edge network issue when doing provisioning	2026-02-09 01:16:20 -05:00
robcholz	ecfce7acf7	fix: enforce shutdown timeout when vm input never becomes ready	2026-02-09 00:59:16 -05:00
robcholz	7065144e6f	doc: update	2026-02-09 00:28:20 -05:00
robcholz	5e95c09c75	doc: changes	2026-02-08 20:40:30 -05:00
robcholz	b1680e54fb	doc: changes	2026-02-08 20:09:43 -05:00
robcholz	0e4c4c7f53	doc: changes	2026-02-08 20:00:17 -05:00