Every tailnet audit question is two or three API calls glued together. I got tired of writing one-off scripts, so I built an MCP server to let the agent do the gluing. Here's what that looks like in practice — and the design choices that keep it from turning into a footgun.
I run a tailnet for my own infrastructure. Dev servers, CI runners, a couple of databases that should never touch the public internet. When I need to answer a question about it — which of my devices haven't checked in recently?, who changed DNS last night?, is this OAuth client still scoped the way I wrote it? — the answer is almost always two or three Tailscale v2 API calls glued together. List devices, filter by a field. Pull the audit log, grep for a substring. Fetch the OAuth client, eyeball the claim pattern.
For months my options were: click through the admin console tab by tab, write a one-off shell script I'd throw away after, or learn to live with not knowing. None of those scale. The admin console is paginated and slow to compose views. The throwaway scripts accumulate and rot. And "learn to live with not knowing" is how a dusty device with keyExpiryDisabled: true sits on your tailnet for six months.
So I built @yawlabs/tailscale-mcp — an MCP server that exposes all 99 endpoints of the Tailscale v2 API as tools the agent can compose. It's been in production on my own tailnet for weeks. Five of the audits it does for me in one prompt each, below. Then I'll walk through the design choices that make handing this to an agent less terrifying than it sounds.
The question: "Which devices haven't checked in for 30 days and have key expiry disabled?"
This is the audit I wrote a shell script for four separate times before getting tired of rewriting it. It's one API call (GET /api/v2/tailnet/-/devices) and then in-memory filtering — but the filtering needs two fields (lastSeen and keyExpiryDisabled) that nobody remembers the exact names of, a date comparison against "30 days ago," and output formatting.
What the agent does:
tailscale_list_devices with a fields filter set to name,lastSeen,keyExpiryDisabled,tags. The fields filter is the Tailscale API's native projection — the server never sees irrelevant data.lastSeen < now - 30d AND keyExpiryDisabled === true.The result:
Device Last seen Tags
------------------ --------------- ------------------
ancient-vpn-box 112 days ago tag:legacy
ci-runner-04 87 days ago tag:ci
jeff-mini-old 41 days ago tag:personal
3 devices match. The VPN box stands out — do you want me to
expire its key, or just report it?
That last line is the part a shell script doesn't write for you. The agent noticed the 112-day outlier and suggested a next step without being asked. I said "just report it, I'll poke the owner first" and moved on.
The question: "DNS broke overnight — who changed what in the last 24 hours?"
This one is the "someone paged me and I need to know what changed" audit. The raw material is the configuration audit log. It's not hard to fetch (GET /api/v2/tailnet/-/logging/configuration) — but the log is firehose-shaped. Every tag change, every ACL edit, every device authorization, interleaved. Finding the DNS-related entries means scrolling.
What the agent does:
tailscale_get_audit_log scoped to the last 24 hours./dns/, the action mentions nameservers, search paths, split DNS, or MagicDNS, or the before/after diff touches a DNS field.internal.example.com pointing at 10.0.0.53. No other DNS changes in the window."One endpoint, one agent pass, an English answer that would have taken me five minutes of scrolling. The step that matters isn't the API call — it's the filter, which is too specific to be worth memorizing and too variable to hard-code in a script.
The question: "Let tag:mobile reach tag:dashboard on port 443, but not tag:db. Draft the diff but don't apply it."
ACL edits are where "have the agent do it" gets risky, which is why this walkthrough matters most. What the agent does:
tailscale_get_acl. This returns the current policy as HuJSON (JSON with comments), along with an ETag. Comments and trailing commas are preserved verbatim — the API speaks HuJSON, not minified JSON.tailscale_update_acl tool description says "Only modify the specific parts that need to change."tailscale_validate_acl against the proposed policy. Validates without applying.tailscale_preview_acl for type=user, previewFor=alice@example.com (a user with tag:mobile). Shows the exact rules that would apply after the change.The agent didn't apply the change. It couldn't, because I said "don't apply." But even if I had said "apply it," the server would have required me to pass back the ETag from step 1 — which means if someone else edited the policy in the meantime, the write fails cleanly instead of silently stomping their work. More on that under the design-notes section.
The question: "Show me the OIDC workload identity for our GitHub Actions and confirm its allowed subjects still match repo:Acme/*."
Workload identity is the feature that lets CI jobs authenticate as Tailscale principals without shipping long-lived credentials. The risk mode is drift: you set up the OIDC federation once, rename your GitHub org, and the sub claim pattern no longer matches your repo paths. Silent failure on the CI side, you chase it for an hour.
The audit is tailscale_list_workload_identities to find the GitHub Actions provider, then tailscale_get_workload_identity for its details. The agent parses the claim patterns, compares them to the pattern I mentioned in the question, and tells me whether they still match. Two API calls, pattern-match, done.
The question: "Rotate every auth key older than 90 days. Keep the same tags. Print the new keys."
This one is four composed tools:
tailscale_list_keys to enumerate.tailscale_get_key to read its tags, expiry, reusable flag, and ephemeral flag.tailscale_create_key with the same settings.tailscale_delete_key on the old one.This one I do roll through with confirmation on each pair, not as a single "go do it" — because tailscale_delete_key carries destructiveHint: true, and my MCP client prompts on every destructive call. Which brings us to the part of this post that earns the rest.
Everything above is a shell script you could write yourself. Most of them I did write myself, more than once. The value the MCP adds isn't the API calls — it's the composition, the filtering, the formatting, the memory of what field names exist, and above all the not doing it again next time I need a slightly different version of the same question.
But composition is also where it goes wrong. An agent that can list devices can also delete them. An agent that can read an ACL can also rewrite it. An agent that can list auth keys can also leak them to a log you didn't mean to write. Four design choices make the composition safer than it sounds.
MCP has three annotations per tool: readOnlyHint, destructiveHint, and idempotentHint. Every one of the 99 tools in this server declares all three. A few examples:
| Tool | readOnly | destructive | idempotent |
|---|---|---|---|
tailscale_list_devices | true | false | true |
tailscale_authorize_device | false | false | true |
tailscale_deauthorize_device | false | true | true |
tailscale_delete_device | false | true | false |
tailscale_update_acl | false | false | true |
Note that tailscale_update_acl is not destructive. Why? Because it's ETag-guarded and idempotent — calling it twice with the same policy yields the same state. "Destructive" in MCP annotation terms means "performs updates that may have a significant negative impact if not reviewed", which is a stricter bar than "writes something." Delete-device is destructive because you can't get it back. Update-ACL is a safe write because the failure mode is bounded (you can always read the ACL again and re-update).
The annotations are metadata, not enforcement. What they enable is client-side gating. Claude Code and most other MCP clients let you configure: "skip confirmation on read-only tools, prompt on destructive ones, never auto-run writes." A skill that shells out to tailscale CLI can't express those distinctions — the client sees one opaque bash call. Here the client sees 99 clearly-typed actions and can apply policy per action.
99 tools is a lot. If you install this alongside a dozen other MCP servers, your client's context window starts groaning under the tool list before you've asked it anything. The server provides three orthogonal ways to shrink the surface:
TAILSCALE_PROFILE — a preset. minimal (19 tools: status, devices, audit log), core (46 tools: adds ACL, DNS, keys, users), full (99 tools, default). For most people, core is the right answer.TAILSCALE_TOOLS — an explicit comma-separated group list. TAILSCALE_TOOLS=devices,acl,audit gives you exactly those three groups, nothing else. Overrides the profile.TAILSCALE_READONLY=1 — drops every tool that doesn't have readOnlyHint: true. Intersects with the other two. Combined: "give me only read-only tools from the core profile" → you get 19 observation tools, zero mutations. The agent can audit but cannot act.That last combination — TAILSCALE_PROFILE=core + TAILSCALE_READONLY=1 — is the one I'd start with if you're nervous. You get the five audits in this post; you don't get any tool that can change your tailnet. When you want mutations, you drop the readonly flag for that session.
The filtering happens server-side before tools/list is sent, which means the disabled tools don't consume context tokens. Run the server and it prints what loaded:
@yawlabs/tailscale-mcp v0.8.3 ready (19 tools, profile=core, readonly)
ACLs are the one piece of tailnet state where "last write wins" is genuinely dangerous. Two people editing the policy at the same time, second write lands, first write is gone silently, nobody notices until the wrong traffic reaches the wrong host.
The Tailscale API handles this with optional If-Match headers on ACL updates. This server makes that path non-optional: tailscale_get_acl returns the ETag alongside the policy text, and tailscale_update_acl requires the ETag as a parameter. The Zod schema on the tool is literally:
z.object({
policy: z.string().describe("The full ACL policy text..."),
etag: z.string().describe("The ETag from tailscale_get_acl.
Required to prevent concurrent edit conflicts."),
})
If the ETag doesn't match what the API has, the write returns 412 Precondition Failed and nothing changes. The agent gets a clean error and re-fetches. This is the single difference between "reasonable tool" and "footgun" on ACL writes, and it's the one most CLI wrappers skip because it's a couple of extra lines of plumbing per call. It's worth the couple of extra lines.
The Tailscale ACL format is HuJSON — JSON with comments and trailing commas. Most tooling I've seen parses it, loses the comments, and writes back minified JSON. Your // Allow prod SRE to reach DB on emergency-only ports comment vanishes the first time a script rewrites the policy.
This server speaks HuJSON natively on both directions — the Accept and Content-Type are set to application/hujson, the policy body is passed through verbatim, and the tool description instructs the agent to "Only modify the specific parts that need to change." Your comments, whitespace, and structure round-trip. This isn't a fancy feature — it's just not being destructive about text you didn't ask to change.
There's a real argument against any of this, and I've heard it from Tailscale users: handing an AI agent the ability to modify your tailnet — even with all the hint annotations and ETag guards in the world — is at odds with the zero-trust posture that made Tailscale attractive in the first place. The whole point of the system is that access is minimal, scoped, and deliberate. An LLM deliberating on your behalf is a category of actor your ACL doesn't know how to reason about.
I take that seriously, and I'm not going to tell you it's wrong. I've landed in a different place myself: the audits in this post — the stale-device sweep, the DNS forensic, the OIDC scope check — are reads, and reads are where most of the value sits. I run this server with mutations enabled because I trust my client's confirmation policy on destructive tools. If you don't want to make that bet, the TAILSCALE_READONLY=1 flag gives you the full audit surface with zero mutation paths. The agent can answer questions; it cannot touch anything.
There's no version of this post that makes that concern go away. What I can say is: the design choices above are there precisely because I think the concern is legitimate, and I'd rather ship something that takes the concern seriously than pretend it doesn't exist.
Add it to your MCP client:
{
"mcpServers": {
"tailscale": {
"command": "npx",
"args": ["-y", "@yawlabs/tailscale-mcp"],
"env": {
"TAILSCALE_API_KEY": "tskey-api-...",
"TAILSCALE_PROFILE": "core",
"TAILSCALE_READONLY": "1"
}
}
}
}
Start with TAILSCALE_PROFILE=core + TAILSCALE_READONLY=1. Drop the readonly flag once you're comfortable with the tools your client has approved. Drop the profile too if you need the long-tail tools (workload identity, posture integrations, log streaming) that the core preset doesn't ship.
Then ask it the first audit from this post:
Which devices haven't checked in for 30 days and have key expiry disabled?
If it comes back with zero, your tailnet is cleaner than mine. If it comes back with a list, you just did an audit you've been meaning to do for months.
Source, tests, and full tool reference: github.com/YawLabs/tailscale-mcp. MIT, PRs welcome, 735 unit tests and a nightly integration run against a real tailnet so regressions surface before a tag goes out.
Published by Yaw Labs.