One-line hook
Our support team’s most-used tool was a senior engineer with database access and a willingness to run UPDATE statements during a customer call. We replaced that workflow with a Model Context Protocol server, Claude Desktop, and a permissions model strict enough that we’d let an intern use it.
Why this article exists
MCP is new. Most of the public discussion is either (a) “here’s a toy filesystem server” or (b) marketing material. Almost nobody writes about what it’s like to operate an MCP server that touches a real production database in front of a real human support team. This is that article.
The setup (vague-client framing)
- Client: B2B SaaS platform with rich, nested operational data. Every customer record has a long graph of related entities (contracts, line items, attached media, audit history, signatories, etc.).
- Support team: ~20 humans, mix of L1/L2/L3. L1 handles 80% of tickets but escalates anything that requires editing a record, because their tooling was read-only.
- The escalation tax: every L2/L3 edit took an engineer ~5–15 minutes of context-switching. Aggregate cost: many engineer-days per week.
- The pre-existing “tool” was a Rails console session shared via Zoom screenshare. (Yes, really. We’re not naming names.)
The problem with the obvious solutions
- “Just build an admin UI for it.” We had one. It handled the 60% of cases that were predictable. The other 40% — the messy ones, the “this contract was signed by the wrong entity and we need to re-parent three downstream records” cases — needed ad-hoc reasoning that no form-based UI captures.
- “Let support run SQL.” Auditability nightmare. Also: support reps shouldn’t have to know your foreign key topology.
- “Build a chatbot against your API.” We tried. The LLM-as-API-caller pattern breaks the moment the LLM needs to reason about what to do next based on what it just saw. Tool-calling helps but isn’t enough for multi-step transactional edits.
- “Give the LLM database access directly.” Catastrophic failure mode. One hallucinated WHERE clause and you’ve nuked a tenant.
Why MCP was the right shape for this problem
- MCP separates what tools exist from what model is calling them. We could iterate on tools without retraining or reprompting.
- The server-side enforces the contract. The model can only do what the server permits. This is the same security posture as “your API decides what the client can do” — except the client is now Claude Desktop.
- Claude Desktop’s UX (conversation, persistent context, tool-call confirmation prompts) was already in the support team’s muscle memory from internal use.
What we built (the architecture)
- One MCP server, written in Python, exposed over stdio (initial dev) and then over HTTP+SSE (production, behind auth proxy).
- A tool catalog organized by intent, not by table. Not
update_contract_row— insteadcorrect_signatory_on_contract,void_and_reissue_line_item,re_parent_attached_document. Each tool has narrow, validated inputs and explicit pre/post-conditions. - A “dry-run by default” semantic. Every mutating tool has a
dry_run: bool = Trueparameter. The model gets back a diff preview of what would change, must explicitly confirm with the human, and only then re-issues the call withdry_run=False. - A separate read-only “context” tool surface. Fetching a customer record, walking the related-entity graph, pulling audit history. The model uses these freely; the support rep doesn’t have to dictate the lookup.
- Per-tool authorization. Tools declare a required permission grant. The MCP server checks the support rep’s role on every call. L1 reps see a strict subset of tools; L3 reps see the full set.
The Python snippets we’ll show
- The MCP server skeleton:
mcpPython SDK, registering tools with typed input schemas - A representative mutating tool with: dry-run, pydantic input validation, audit-log emission, structured diff output
- The permissions decorator that wraps every tool call against the current support rep’s role
- The audit log emitter — every call (read or write, dry-run or live, success or error) lands in an append-only store with the rep’s identity, the prompt context, and the diff
# Sketch — full version in the article
@tool(
name="correct_signatory_on_contract",
requires_permission="contracts.edit_signatory",
description="Reassign the signatory on a contract. Always dry-run first."
)
def correct_signatory(
contract_id: str,
new_signatory_id: str,
reason: str,
dry_run: bool = True,
) -> SignatoryChangeResult:
...
The hard parts nobody warns you about
- Tool design is the entire game. A badly-shaped tool (“update_record”) is worse than no tool at all because the model will use it confidently and incorrectly. Good tools are narrow, intent-named, and refuse to do anything outside their declared purpose.
- The model will try to compose your tools in unexpected ways. That’s mostly fine, but you need each tool to fail loudly when its preconditions aren’t met, rather than silently doing something partial.
- Dry-run is non-negotiable for mutations. We initially shipped without it. Within two days, a rep approved a confidently-wrong correction. Postmortem outcome: dry-run + diff preview became a server-level invariant, not a per-tool option.
- Context windows fill up faster than you think. Walking a customer’s related-entity graph naively can blow the context. We added a “summarize_customer_state” tool that produces a compressed, deterministic snapshot.
- Audit log first, feature second. If the audit log isn’t bulletproof, you can’t ship to production. Build it on day one, not day thirty.
- The model’s “I’m confident” tone is dangerous in support. Tools should return structured results that the model summarizes, but the rep should always see the structured diff, not just the model’s prose. We render diffs in Claude Desktop using markdown tables generated server-side.
A war story (anonymized)
A rep asked the assistant to “fix the line items on contract X — the totals look wrong.” The assistant reasoned its way to a sequence of corrections, dry-ran each, produced a diff that summed to a delta of -$48,000 on the customer’s account. The rep paused, escalated to an engineer, and the engineer caught that the source data was right and the customer’s complaint was based on a misread invoice. The model wasn’t wrong to propose the change — the rep wasn’t wrong to approve dry-running it — and the dry-run + escalation path caught it before any data moved. That whole loop took 90 seconds. In the old workflow, it would have been a 30-minute engineer interrupt.
What we’d do differently
- We’d write fewer tools, more carefully. Our first cut had 40+ tools. The trimmed catalog has 22 and is much more usable.
- We’d build the diff renderer first. It’s the part the rep actually sees and trusts.
- We’d version the tool catalog. Schema changes mid-session are confusing for the model and worse for the audit log.
- We’d put rate limits on the destructive tools per-rep-per-day from day one. Not because anyone abused it — because the option of accidental abuse should be off the table.
Results
- ~70% reduction in L1→engineering escalations for edit-required tickets
- Median time-to-resolution on the affected ticket types dropped from ~25 min to ~6 min
- Zero data-integrity incidents traced to the assistant in the first six months
- Engineering team got their afternoons back
The takeaway
MCP isn’t magic. It’s a sensible protocol for letting a model use tools that you control, on your terms. The hard work isn’t the protocol; it’s the tool design, the dry-run discipline, and the audit trail. Do those three things well and you can give a language model production database access without losing sleep. Do them badly and you’ve built a very expensive way to corrupt your customer data.