I run a tailnet for my own infrastructure. Dev servers, CI runners, a couple of databases that should never touch the public internet. When I need to answer a question about it — which of my devices haven't checked in recently?, who changed DNS last night?, is this OAuth client still scoped the way I wrote it? — the answer is almost always two or three Tailscale v2 API calls glued together. List devices, filter by a field. Pull the audit log, grep for a substring. Fetch the OAuth client, eyeball the claim pattern.

For months my options were: click through the admin console tab by tab, write a one-off shell script I'd throw away after, or learn to live with not knowing. None of those scale. The admin console is paginated and slow to compose views. The throwaway scripts accumulate and rot. And "learn to live with not knowing" is how a dusty device with keyExpiryDisabled: true sits on your tailnet for six months.

So I built @yawlabs/tailscale-mcp — an MCP server that exposes all 99 endpoints of the Tailscale v2 API as tools the agent can compose. It's been in production on my own tailnet for weeks. Five of the audits it does for me in one prompt each, below. Then I'll walk through the design choices that make handing this to an agent less terrifying than it sounds.

Audit 1: Stale devices with key expiry disabled

The question: "Which devices haven't checked in for 30 days and have key expiry disabled?"

This is the audit I wrote a shell script for four separate times before getting tired of rewriting it. It's one API call (GET /api/v2/tailnet/-/devices) and then in-memory filtering — but the filtering needs two fields (lastSeen and keyExpiryDisabled) that nobody remembers the exact names of, a date comparison against "30 days ago," and output formatting.

What the agent does:

  1. Calls tailscale_list_devices with a fields filter set to name,lastSeen,keyExpiryDisabled,tags. The fields filter is the Tailscale API's native projection — the server never sees irrelevant data.
  2. Filters in memory: lastSeen < now - 30d AND keyExpiryDisabled === true.
  3. Formats as a table, sorted by how long since last seen.

The result:

Device Last seen Tags ------------------ --------------- ------------------ ancient-vpn-box 112 days ago tag:legacy ci-runner-04 87 days ago tag:ci jeff-mini-old 41 days ago tag:personal 3 devices match. The VPN box stands out — do you want me to expire its key, or just report it?

That last line is the part a shell script doesn't write for you. The agent noticed the 112-day outlier and suggested a next step without being asked. I said "just report it, I'll poke the owner first" and moved on.

Audit 2: Who broke DNS at 2am

The question: "DNS broke overnight — who changed what in the last 24 hours?"

This one is the "someone paged me and I need to know what changed" audit. The raw material is the configuration audit log. It's not hard to fetch (GET /api/v2/tailnet/-/logging/configuration) — but the log is firehose-shaped. Every tag change, every ACL edit, every device authorization, interleaved. Finding the DNS-related entries means scrolling.

What the agent does:

  1. Calls tailscale_get_audit_log scoped to the last 24 hours.
  2. Filters entries where the affected endpoint includes /dns/, the action mentions nameservers, search paths, split DNS, or MagicDNS, or the before/after diff touches a DNS field.
  3. For each matching entry, pulls the structured before/after from the log payload and summarizes: "At 01:47 UTC, jane@example.com removed the split-DNS mapping for internal.example.com pointing at 10.0.0.53. No other DNS changes in the window."

One endpoint, one agent pass, an English answer that would have taken me five minutes of scrolling. The step that matters isn't the API call — it's the filter, which is too specific to be worth memorizing and too variable to hard-code in a script.

Audit 3: Draft a minimal ACL diff

The question: "Let tag:mobile reach tag:dashboard on port 443, but not tag:db. Draft the diff but don't apply it."

ACL edits are where "have the agent do it" gets risky, which is why this walkthrough matters most. What the agent does:

  1. Calls tailscale_get_acl. This returns the current policy as HuJSON (JSON with comments), along with an ETag. Comments and trailing commas are preserved verbatim — the API speaks HuJSON, not minified JSON.
  2. Parses the HuJSON mentally, finds the existing ACLs section, and proposes a minimal addition. Critically, it doesn't rewrite my comments or reformat my whitespace — the instruction embedded in the tailscale_update_acl tool description says "Only modify the specific parts that need to change."
  3. Runs tailscale_validate_acl against the proposed policy. Validates without applying.
  4. Runs tailscale_preview_acl for type=user, previewFor=alice@example.com (a user with tag:mobile). Shows the exact rules that would apply after the change.
  5. Returns the diff to me.

The agent didn't apply the change. It couldn't, because I said "don't apply." But even if I had said "apply it," the server would have required me to pass back the ETag from step 1 — which means if someone else edited the policy in the meantime, the write fails cleanly instead of silently stomping their work. More on that under the design-notes section.

Audit 4: GitHub Actions OIDC scope check

The question: "Show me the OIDC workload identity for our GitHub Actions and confirm its allowed subjects still match repo:Acme/*."

Workload identity is the feature that lets CI jobs authenticate as Tailscale principals without shipping long-lived credentials. The risk mode is drift: you set up the OIDC federation once, rename your GitHub org, and the sub claim pattern no longer matches your repo paths. Silent failure on the CI side, you chase it for an hour.

The audit is tailscale_list_workload_identities to find the GitHub Actions provider, then tailscale_get_workload_identity for its details. The agent parses the claim patterns, compares them to the pattern I mentioned in the question, and tells me whether they still match. Two API calls, pattern-match, done.

Audit 5: Rotate auth keys older than 90 days

The question: "Rotate every auth key older than 90 days. Keep the same tags. Print the new keys."

This one is four composed tools:

  1. tailscale_list_keys to enumerate.
  2. For each key older than 90 days: tailscale_get_key to read its tags, expiry, reusable flag, and ephemeral flag.
  3. tailscale_create_key with the same settings.
  4. tailscale_delete_key on the old one.
  5. Print the new secrets to copy into CI.

This one I do roll through with confirmation on each pair, not as a single "go do it" — because tailscale_delete_key carries destructiveHint: true, and my MCP client prompts on every destructive call. Which brings us to the part of this post that earns the rest.

Each step is a curl. The agent does the composition. That's the lift.

Everything above is a shell script you could write yourself. Most of them I did write myself, more than once. The value the MCP adds isn't the API calls — it's the composition, the filtering, the formatting, the memory of what field names exist, and above all the not doing it again next time I need a slightly different version of the same question.

But composition is also where it goes wrong. An agent that can list devices can also delete them. An agent that can read an ACL can also rewrite it. An agent that can list auth keys can also leak them to a log you didn't mean to write. Four design choices make the composition safer than it sounds.

Design choice 1: Every tool declares what kind of action it is

MCP has three annotations per tool: readOnlyHint, destructiveHint, and idempotentHint. Every one of the 99 tools in this server declares all three. A few examples:

ToolreadOnlydestructiveidempotent
tailscale_list_devicestruefalsetrue
tailscale_authorize_devicefalsefalsetrue
tailscale_deauthorize_devicefalsetruetrue
tailscale_delete_devicefalsetruefalse
tailscale_update_aclfalsefalsetrue

Note that tailscale_update_acl is not destructive. Why? Because it's ETag-guarded and idempotent — calling it twice with the same policy yields the same state. "Destructive" in MCP annotation terms means "performs updates that may have a significant negative impact if not reviewed", which is a stricter bar than "writes something." Delete-device is destructive because you can't get it back. Update-ACL is a safe write because the failure mode is bounded (you can always read the ACL again and re-update).

The annotations are metadata, not enforcement. What they enable is client-side gating. Claude Code and most other MCP clients let you configure: "skip confirmation on read-only tools, prompt on destructive ones, never auto-run writes." A skill that shells out to tailscale CLI can't express those distinctions — the client sees one opaque bash call. Here the client sees 99 clearly-typed actions and can apply policy per action.

Design choice 2: The 12-vs-99 problem, solved three ways

99 tools is a lot. If you install this alongside a dozen other MCP servers, your client's context window starts groaning under the tool list before you've asked it anything. The server provides three orthogonal ways to shrink the surface:

That last combination — TAILSCALE_PROFILE=core + TAILSCALE_READONLY=1 — is the one I'd start with if you're nervous. You get the five audits in this post; you don't get any tool that can change your tailnet. When you want mutations, you drop the readonly flag for that session.

The filtering happens server-side before tools/list is sent, which means the disabled tools don't consume context tokens. Run the server and it prints what loaded:

@yawlabs/tailscale-mcp v0.8.3 ready (19 tools, profile=core, readonly)

Design choice 3: ETag concurrency on the ACL

ACLs are the one piece of tailnet state where "last write wins" is genuinely dangerous. Two people editing the policy at the same time, second write lands, first write is gone silently, nobody notices until the wrong traffic reaches the wrong host.

The Tailscale API handles this with optional If-Match headers on ACL updates. This server makes that path non-optional: tailscale_get_acl returns the ETag alongside the policy text, and tailscale_update_acl requires the ETag as a parameter. The Zod schema on the tool is literally:

z.object({ policy: z.string().describe("The full ACL policy text..."), etag: z.string().describe("The ETag from tailscale_get_acl. Required to prevent concurrent edit conflicts."), })

If the ETag doesn't match what the API has, the write returns 412 Precondition Failed and nothing changes. The agent gets a clean error and re-fetches. This is the single difference between "reasonable tool" and "footgun" on ACL writes, and it's the one most CLI wrappers skip because it's a couple of extra lines of plumbing per call. It's worth the couple of extra lines.

Design choice 4: HuJSON comments survive round-trips

The Tailscale ACL format is HuJSON — JSON with comments and trailing commas. Most tooling I've seen parses it, loses the comments, and writes back minified JSON. Your // Allow prod SRE to reach DB on emergency-only ports comment vanishes the first time a script rewrites the policy.

This server speaks HuJSON natively on both directions — the Accept and Content-Type are set to application/hujson, the policy body is passed through verbatim, and the tool description instructs the agent to "Only modify the specific parts that need to change." Your comments, whitespace, and structure round-trip. This isn't a fancy feature — it's just not being destructive about text you didn't ask to change.

If "no AI on infra changes" is your line, this isn't for you

There's a real argument against any of this, and I've heard it from Tailscale users: handing an AI agent the ability to modify your tailnet — even with all the hint annotations and ETag guards in the world — is at odds with the zero-trust posture that made Tailscale attractive in the first place. The whole point of the system is that access is minimal, scoped, and deliberate. An LLM deliberating on your behalf is a category of actor your ACL doesn't know how to reason about.

I take that seriously, and I'm not going to tell you it's wrong. I've landed in a different place myself: the audits in this post — the stale-device sweep, the DNS forensic, the OIDC scope check — are reads, and reads are where most of the value sits. I run this server with mutations enabled because I trust my client's confirmation policy on destructive tools. If you don't want to make that bet, the TAILSCALE_READONLY=1 flag gives you the full audit surface with zero mutation paths. The agent can answer questions; it cannot touch anything.

There's no version of this post that makes that concern go away. What I can say is: the design choices above are there precisely because I think the concern is legitimate, and I'd rather ship something that takes the concern seriously than pretend it doesn't exist.

How to try it

Add it to your MCP client:

{ "mcpServers": { "tailscale": { "command": "npx", "args": ["-y", "@yawlabs/tailscale-mcp"], "env": { "TAILSCALE_API_KEY": "tskey-api-...", "TAILSCALE_PROFILE": "core", "TAILSCALE_READONLY": "1" } } } }

Start with TAILSCALE_PROFILE=core + TAILSCALE_READONLY=1. Drop the readonly flag once you're comfortable with the tools your client has approved. Drop the profile too if you need the long-tail tools (workload identity, posture integrations, log streaming) that the core preset doesn't ship.

Then ask it the first audit from this post:

Which devices haven't checked in for 30 days and have key expiry disabled?

If it comes back with zero, your tailnet is cleaner than mine. If it comes back with a list, you just did an audit you've been meaning to do for months.

Source, tests, and full tool reference: github.com/YawLabs/tailscale-mcp. MIT, PRs welcome, 735 unit tests and a nightly integration run against a real tailnet so regressions surface before a tag goes out.

GitHub · npm

Published by Yaw Labs.

Related Articles