STIG Compliance at Federal Scale: Building VMware Compliance Tooling for vSphere 8.0
A federal agency customer’s STIG compliance program needed to scan 31 ESXi hosts, a vCenter Server, and the underlying VCSA appliance against the DISA vSphere 8.0 v2r3 baseline — and produce per-host CKL accreditation artifacts that auditors would actually accept. The VMware STIG Tools Appliance ships ready-to-use, but real federal environments expose gaps the documentation doesn’t cover. This is the story of the four production tools that emerged.
The Challenge
The customer’s compliance team had a working STIG Tools Appliance deployed and a clear mandate: scan the entire vSphere environment regularly and produce accreditation-grade reports. The first scan attempt revealed a stack of friction points that, individually, looked like minor edge cases — but cumulatively, they blocked the team from running their first full-cluster scan.
The password authentication wall. The customer’s vCenter administrator account, as with any properly governed federal environment, used a complex password containing special characters: dollar signs, percent signs, ampersands, the works. The InSpec train-vmware plugin’s authentication kept failing with credential errors despite the password being correct. Browser logins worked fine. Direct PowerCLI logins worked fine. Only the InSpec-mediated path failed.
The single-host script. The PowerShell runner script shipped with the STIG Tools Appliance was designed for scanning one ESXi host at a time. Running it manually against 31 hosts was tedious and error-prone — and worse, the team’s first attempt at adapting it failed with A parameter cannot be found that matches parameter name 'allhosts'. The sample script simply didn’t have a way to iterate over all hosts in a cluster.
The CKL-vs-CKLB format mismatch. STIG Viewer 3.x — the auditor-preferred tool for reviewing accreditation evidence — uses a native JSON format called CKLB. The InSpec → SAF CLI pipeline produces legacy CKL (XML). Importing CKL into STIG Viewer 3.x technically works, but the round-trip strips fidelity: GUIDs change, target metadata gets lost, mode flags are dropped, and findings sometimes reorder. Auditors noticed.
The static-IP appliance bootstrap. Federal environments don’t allow DHCP for management appliances. The STIG Tools Appliance defaults to DHCP and the documentation assumes it. Configuring static IPs on Photon OS 5.0 means working with systemd-networkd — a different configuration model than the network-scripts familiar from RHEL or older Photon builds.
Documentation gaps for non-default workflows. Scanning the full vSphere stack involves three different InSpec baselines (vSphere product controls, ESXi controls, VCSA appliance controls) with different working directories, different transport layers, and different inputs. The official documentation covers each in isolation but doesn’t walk through running all three back-to-back to produce a complete accreditation package.
The team needed working solutions for each of these — built quickly, documented thoroughly, and reusable across the broader federal customer base.
The Journey: Five Tools, One Workshop Series
The engagement ran as a multi-day workshop series. Each day surfaced new gaps, and each gap turned into a production tool. By the end, the customer had a complete scanning pipeline they could run unattended.
The train-vmware Password Fix
The discovery. Reading the train-vmware plugin’s connection.rb source revealed the issue immediately. The plugin constructs PowerShell commands by string interpolation:
login_command = "Connect-VIServer #{options[:viserver]} -User #{options[:username]} -Password '#{options[:password]}' | Out-Null"
When that string reaches PowerShell with a password like MyP@ss$Word123, PowerShell sees $Word123 as a variable reference. The variable doesn’t exist (PowerShell returns empty). The auth call goes out with a different password than the user thinks they’re providing. Authentication fails with no clear indication that interpolation is the culprit.
The fix that scaled. The naive fix — escape every special character in the password before interpolation — gets messy. PowerShell has its own escaping rules, the password might contain literal backticks (PowerShell’s escape character), and the list of characters that need handling grows over time. Maintaining an escape-list patch was a rabbit hole.
The cleaner approach: Base64-encode the password in Ruby, pass the encoded string through PowerShell as an opaque blob, decode it back to the original on the PowerShell side just before the auth call. PowerShell never sees the special characters. The authentication payload arrives byte-for-byte identical to what the user provided.
The patched connection.rb does exactly that. The install.sh script handles installation: backs up the original, drops the patched file in, and verifies the change took effect. Idempotent — safe to re-run if needed.
Tested character coverage: dollar signs, percent signs, ampersands, pipes, backticks, single quotes, double quotes, backslashes. All work. The fix is small (under 50 lines of changed Ruby) and surgical (only touches the password handling path).
The Enhanced InSpec Runner
The bug that revealed the gap. When the customer first tried to scan all 31 hosts at once with the sample script, PowerShell threw a parameter error. The shipped runner only supported -vmhost (single host), with no way to iterate. Modifying it for production use meant rewriting a substantial chunk anyway.
The enhanced runner — version 2.0 — adds three capabilities the original lacked:
-
-allhostsparameter that connects to vCenter once, enumerates all reachable ESXi hosts, and iterates through them in a single invocation. No bash loops, no copy-paste-modify-rerun, no manual host lists. -
Automatic environment variable setup for InSpec’s
vmware://transport. The runner setsVISERVER,VISERVER_USERNAME, andVISERVER_PASSWORDprogrammatically before each scan. Operators don’t have to remember to set them manually, and the variables don’t persist after the scan completes. -
Per-host JSON + CKL output with predictable naming. Each host produces
<hostname>_<timestamp>.jsonand<hostname>_<timestamp>.cklbfiles. Auditors get one artifact per host. Compliance teams get a clear timestamp trail.
A single command produces 62 files (31 JSON + 31 CKL) ready for STIG Viewer import. What used to be a half-day’s work became a 30-minute unattended scan.
The CKL → CKLB Converter
Why the SAF CLI path wasn’t enough. SAF CLI converts InSpec JSON output to CKL format. CKL is the legacy XML format that STIG Viewer has supported for years. STIG Viewer 3.x will open CKL files, but it converts them to its native CKLB JSON format on import — and that conversion is lossy.
What gets lost in the round-trip:
– Mode flags identifying whether a check passed automated scanning or required manual review
– Target metadata like host fully-qualified names, IP addresses, vCenter associations
– Per-finding GUIDs that auditors use to track evidence across scan runs
– Comment fields preserving notes about why findings were marked NotApplicable or NotReviewed
For one-off scans this doesn’t matter. For ongoing accreditation programs that compare scan-to-scan deltas across months of evidence, it matters a lot.
The Python converter reads CKL XML directly and writes the expanded native CKLB JSON format that STIG Viewer 3.x produces internally. Mode flags preserved. Targets preserved. GUIDs preserved. Comments preserved. The resulting CKLB file is what STIG Viewer 3.x would have produced if it could speak CKLB natively from the start.
The converter handles the v2r3-stig content baseline and uses standard library Python only — no dependency installation on the appliance, no virtual environments to manage. Drop in, run, get a CKLB.
The Photon OS Network Configuration Guide
The gap. The STIG Tools Appliance documentation assumes DHCP. Federal environments don’t permit DHCP for management appliances. The default gap is small but absolute: deploy the appliance, attempt the first reboot after IP configuration, and the appliance comes up with an unpredictable address (or no address at all) because the operator’s static-IP attempt didn’t take effect.
The right Photon OS 5.0 pattern. Photon OS 5.0 uses systemd-networkd rather than the network-scripts model familiar from older Photon builds or RHEL. The configuration files live in /etc/systemd/network/, named with priority prefixes like 99-static.network. The format is .ini-style sections ([Match], [Network]) rather than the shell-variable style of ifcfg-eth0. Reload via systemctl restart systemd-networkd. Verify via networkctl status.
The configuration guide documents the procedure end-to-end: how to identify the right network interface, how to write the configuration file, how to apply it without rebooting, how to verify the change took effect, and how to roll back cleanly if something goes wrong. Tested against the latest STIG Tools Appliance build (5.2.1.2-24725623, released 2025-05-01).
The Complete Scanning Guide
Why a comprehensive guide was needed. The customer’s team had three baseline directories to navigate, three different transport mechanisms to use, three different inputs files to maintain, and the final output had to be aggregated into a single accreditation package. Each component is documented in isolation by VMware. None of the official documentation walks through the full pipeline as one workflow.
The Scanning Guide covers:
– Setting up environment variables (with the ''''-quote-escape pattern for the special character that the Base64 fix doesn’t address: literal single quotes in passwords)
– Scanning vCenter product controls (~67 STIG checks via the vmware:// transport)
– Scanning ESXi hosts (~240 checks per host, multiplied across the cluster)
– Scanning VCSA appliance components (the OS-level controls, scanned via SSH transport because they live on the underlying Photon OS, not surfaced through vCenter APIs)
– Generating the per-host CKL artifacts auditors need
– Converting to CKLB for STIG Viewer 3.x ingestion
– Common error patterns and their resolutions
The guide is written as a sequential workflow. Read top to bottom, follow each command, and you produce a complete accreditation package.
The Quick-Reference Cheat Sheet
For day-to-day operations, the team didn’t need a comprehensive guide. They needed a one-page command reference. The cheat sheet is exactly that: copy-paste commands for the most common scan patterns, organized by what the operator is trying to accomplish (full audit, vCenter only, single host, all ESXi, VCSA only).
The Results
Five focused tools transformed an “almost working” appliance into a production-ready scanning pipeline.
31 hosts, one command. What used to require manually invoking the runner script 31 times — with operator attention required between each — became a single -allhosts invocation that runs unattended in roughly 30 minutes and produces 62 ready-to-import artifacts.
Special-character passwords just work. The Base64 fix eliminated an entire class of authentication failures. The customer’s password rotation policy continued working without exception handling for “characters InSpec can’t handle.”
Audit-grade reports without manual rework. The CKLB converter eliminated the round-trip data loss. STIG Viewer 3.x ingests the converted files with full fidelity. Auditors stopped flagging missing metadata.
Reproducible appliance deployment. New STIG Tools Appliance instances come up with the right static IP on the first boot. The systemd-networkd pattern is captured in the documentation and applied identically across appliance rebuilds.
Documentation that survives turnover. The scanning guide and cheat sheet are written for operators who weren’t part of the original engagement. New team members can come up to speed by reading the guide front-to-back, then keep the cheat sheet open for daily operations.
Lessons Learned
Read the source when the docs disappoint. The train-vmware password issue was solved within an hour of opening connection.rb. The fix took longer to test than to write. When a tool’s documentation says something should work and it doesn’t, the source code is usually clearer than another round of doc-spelunking.
Base64-encode opaque data through shell pipelines. Any time data has to traverse multiple language runtimes — Ruby to PowerShell, Python to Bash, anywhere a string gets re-parsed by a different parser — Base64 encoding is the cleanest way to preserve byte-for-byte fidelity. Escaping special characters per-language is fragile; encoding once and decoding once is robust.
Build the -allhosts mode the way operators want to use it. The original sample runner was designed as a “here’s how it works” reference, not as a production tool. The enhanced runner is designed for the use case operators actually have: scan everything, produce per-host artifacts, exit cleanly. Prefer the production-shaped tool over the educational reference.
Preserve fidelity at format boundaries. The CKL → CKLB lossiness wasn’t visible until auditors started asking why scan-to-scan comparisons were broken. Format conversions silently drop data more often than expected. When building a converter, match the destination format’s native production faithfully — don’t approximate.
Workshop-driven engagements surface real bugs. Each of the five tools came from a problem that surfaced during a hands-on scanning session, not from an abstract architecture review. The -allhosts parameter error appeared when the customer ran the script for the first time. The CKL fidelity issue appeared when the auditor reviewed the first batch. The static-IP problem appeared on the first reboot. Workshop-driven engagements catch real bugs that proof-of-concept reviews miss.
What We’d Do Differently
Bundle the train-vmware fix earlier in the workshop sequence. The password authentication issue blocked the customer’s first scan attempt. Diagnosing it consumed most of Day 1. If the password fix had been delivered as a baseline workshop prerequisite — alongside the appliance deployment guide — Day 1 could have started with successful scans rather than authentication troubleshooting.
Add per-host CKLB output to the runner directly. The current pipeline runs the InSpec scan, runs SAF CLI to produce CKL, then runs the CKL → CKLB converter. Three steps, three intermediate artifacts. A future runner version could call SAF CLI’s library directly and produce CKLB output in one pass — fewer files to manage, faster execution, less to go wrong.
Capture STIG-Viewer-comparison evidence formally. The CKL fidelity issue was discovered when an auditor mentioned that scan-to-scan deltas looked off. A more rigorous validation harness — load both formats into a test STIG Viewer instance, diff the resulting in-memory representations programmatically — would have caught the issue earlier and given us harder evidence about exactly what was being lost.
Document the workshop progression, not just the artifacts. The five tools are well-documented individually. The narrative of why each tool exists — what gap surfaced, what was tried first, what the discovery was — would help other consultants running similar engagements skip the same dead ends.
Getting Started
The toolset requires the VMware STIG Tools Appliance (latest build 5.2.1.2-24725623, released 2025-05-01), vCenter Server 8.0 U2 or U3 with administrator credentials, network connectivity from the appliance to vCenter and ESXi hosts, and a target STIG content baseline (this engagement targeted v2r3-stig).
Clone the repository, apply the train-vmware-fix first (so all subsequent scanning works regardless of password complexity), drop the enhanced InSpec runner into /usr/share/stigs/vsphere/8.0/v2r3-stig/vsphere/powercli/, and follow the scanning guide for your first end-to-end audit. The cheat sheet becomes the day-to-day reference once the workflow is familiar.
Conclusion
VMware STIG compliance scanning is, at the appliance level, a solved problem. At the customer-environment level, the gaps are real but small, and they cluster around the same friction points across federal deployments: special characters in service-account passwords, full-cluster scanning ergonomics, format fidelity for STIG Viewer 3.x, static-IP bootstrap on Photon OS 5.0.
Five focused tools — a Ruby patch, a PowerShell rewrite, a Python converter, a network-config guide, and a comprehensive scanning workflow — closed all five gaps for one federal customer’s STIG compliance program. The same toolset applies, mostly unchanged, to any organization running vSphere STIG scanning at scale.
The complete repository is open-source on GitHub under GPL-3.0.
Repository: github.com/noahfarshad/vmware-stig-tools
Related Stories:
- From Legacy vRO to Modern Aria Automation — The other major engagement on essential.coach: a six-month modernization with a different Fortune 500 customer
- Production-Ready BlueCat IPAM Integration — Another “VMware ships the framework, the customer needs the production integration” pattern
- Idempotent Windows Post-Deploy — Different problem domain, same lesson about preferring the production-shaped tool over the reference
