From Legacy vRO to Modern Aria Automation: A Six-Month Modernization Journey
A large enterprise customer — a Fortune 500 federal services integrator’s CIO and Hosting Services team — partnered with Essential Coach to modernize their VM provisioning pipeline. Six months. Six interlocking workstreams. Six open-source repositories that now serve the broader VMware community. This is the story.
Where We Started
The customer ran a sophisticated but heavily legacy environment:
- Platform: VMware Aria Automation 8.18.1 on VCF 8.18.1 (with VCF 9.x on the horizon)
- Datacenters: Two production sites connected via NSX Federation with Global Manager and Local Managers
- IPAM: BlueCat Address Manager, deeply integrated with their network team’s processes
- Identity: Workspace ONE Access / vIDM with AD-managed service accounts
- Source Control: Bitbucket
- Existing Provisioning: A 50+ step vRO workflow chain triggered from ServiceNow, with branches for every OS family and datacenter combination
The vRO workflows had been built up over years. They worked — mostly — but they carried the scars of every workaround that had ever been needed. Domain joins were happening twice in some paths because nobody had ever fixed the original failure. There were five- to eight-minute sleep timers padding around guest operations calls. Credentials were sprinkled throughout JavaScript actions. The “network catalog” the user picked from in ServiceNow was a static list maintained by hand.
The customer’s ask was simple to state and hard to deliver: “Give us a self-service Service Broker form that does what our ServiceNow portal does today, but better — and that our team can actually maintain.”
The Numbers
By engagement close, the transformation showed up clearly in the numbers:
- 6 separate vRO workflow branches (TX/VA × Windows / Linux / Oracle) collapsed into 1 unified Cloud Assembly blueprint
- 2,500+ stale network references in Aria reduced to 155 clean, curated workload networks per datacenter
- 834 fabric networks across two datacenters mapped, tagged, and IPAM-linked (551 + 283)
- 727 BlueCat IP ranges discovered and associated (up from 489 baseline)
- 8+ minutes of blind sleep timers in the legacy domain-join workflow eliminated
- 10 idempotent Ansible roles replacing decades of bolted-on PowerShell
- 6 production-grade open-source repositories published as the engagement legacy
The customer left the engagement self-sufficient — able to maintain, extend, and troubleshoot every component without a single line of opaque legacy code.
The Six Workstreams
Each workstream is now an open-source repository and a detailed customer success story.
1. Production-Ready BlueCat IPAM Integration
The hardest single component. VMware ships a sample IPAM provider framework but no production BlueCat integration. We built one — fourteen versions deep, from v2.0.0 to the production v2.3.20 — handling NSX Federation segments, undocumented packaging quirks, and DNS lifecycle automation.
Key innovations: Strategy-4 prefix-stripping for Global Manager segments where CIDR comes back as None. Format-preserving package builds matching the reference Infoblox implementation byte-for-byte. Recursive IP search with 50,000-entry traversal limits. Disabling the internal IPAM subscription to prevent double-allocations.
Repository: github.com/noahfarshad/bluecat-ipam-provider
2. Infrastructure-as-Code for Aria Automation
When we started, the customer’s TX network profile had 2,500+ stale network references — a mix of real workload segments, ghost entries from old discoveries, infrastructure backbone segments that had no business being there, and duplicates from NSX Federation Global Manager picking up the same network from multiple Local Managers.
We wrote a multi-tool Python toolkit to clean it up and keep it clean: aria_mapping.py for end-to-end IaC management of flavors, images, storage, capability tags, segment tags, and DNS; mapper.py for multi-profile NSX management with CIDR-based BlueCat IPAM matching; cleanup_profiles.py for Federation-aware ghost and duplicate removal. The result: 155 clean workload networks per datacenter, version-controlled in Bitbucket.
Repository: github.com/noahfarshad/aria-network-automation
3. One Blueprint, Three OS Families
One blueprint. Three OS families. Two datacenters. Twenty-six inputs. Zero compromises. The customer’s legacy environment had six separate vRO workflows. We replaced all of them with a single Cloud Assembly blueprint that evolved through eight major versions (v5.5.1 → v8.8.3) as we ironed out the edge cases.
Key patterns: hybrid network resource declaration (Cloud.vSphere.Network for VLAN-backed segments, Cloud.NSX.Network for overlay) discovered after exhaustive testing; the count: 0 pattern for conditional second NICs; the empty-string-default that prevents accidental misdeployments; the $dynamicEnum integration for live-updating Service Broker dropdowns.
Repository: github.com/noahfarshad/aria-vm-blueprint
4. vRO Actions for Dynamic Catalog Behavior
Not everything belongs in a blueprint. Three JavaScript actions sitting in a com.essential.aria vRO module solve the things Aria can’t do natively: getNetworkSegmentsAll powers the dynamic NIC dropdown; getNetworkProfileTag routes deployment requests to the correct profile based on segment and datacenter; addDataDisksOnDeploy is the post-provision disk attach subscription that bypasses Aria’s storage validator.
Repository: github.com/noahfarshad/aria-vro-actions
5. Idempotent Windows Post-Deploy
The legacy vRO workflow for Windows post-deploy was a tangle of inline PowerShell, blind sleeps, and “if this thing fails, paper over it with another step” patterns. We replaced it with a clean Ansible role library following a desired-state dispatcher pattern: ten roles covering timezone, KMS, hardening, RDP, fleet agents, build info, AD groups, DNS validation, WinRM HTTPS, and domain join. Bootstrap WinRM via an ABX bridge that pushes ConfigureRemotingForAnsible.ps1 through VMware Tools Guest Operations before Ansible ever connects.
Repository: github.com/noahfarshad/ansible-windows-postdeploy
6. Dynamic Inventory Generators
Aria provisions the VM. Ansible needs to know about it. Two Python inventory generators bridge the gap, both triggered via vRO workflows so the operator never leaves the Aria UI. The vSphere generator pulls VMs from vCenter with intelligent IP resolution and OS/folder/environment grouping. The HPE OneView generator supports OneView 6.6, 8.9, and 10.0 with auth variants for local and directory accounts — multi-version support that was hard-won across the customer’s heterogeneous OneView estate.
Repository: github.com/noahfarshad/ansible-inventory-generators
The Side Quests
A six-month engagement is never just the headline workstreams. The customer leaned on us for guidance well beyond the original scope:
NSX bare-metal Edge hardware selection. When the customer’s network team began planning bare-metal NSX Edges, we caught a critical NIC compatibility blocker before any hardware was ordered — the NVIDIA ConnectX-6 Lx (the natural 25G option for the Dell R6615) is explicitly unsupported for NSX bare-metal Edge datapath per VMware KB 407663. We guided them to the ConnectX-6 Dx (Dell SKU 540-BCXN, dual-port 100GbE QSFP56) running at 25G on day one via QSA28 adapters, with a clean upgrade path to 100G later by swapping optics — no NIC replacement required. Comprehensive bare-metal Edge documentation for both VCF 5.2.2 (NSX 4.1.x) and VCF 9.x (NSX 4.2.x), with rack elevation diagrams and full BOMs for both sites.
VCF 5.2.2 → VCF 9.1 upgrade assessment. A thorough readiness review covering vSAN OSA-to-ESA migration, the Aria Suite Lifecycle and vIDM deprecation in favor of VCF MS Fleet Management and Identity Broker, the Aria Operations for Logs fresh-install requirement and 90-day data migration window, the SDDC Manager 5.2.1 prerequisite, and — critically — the FIPS 140-2/140-3 default-on behavior in VCF 9.0 and its potential impact on custom Python ABX integrations.
vSAN performance optimization. When an Oracle workload was consuming an outsized share of cluster resources, we helped the customer avoid a premature ~$300K hardware refresh by demonstrating that workload separation and disk group expansion would solve the problem operationally for a fraction of the cost — with a planned 2-3 year refresh cycle on R7615 / R7625 platforms when the time was actually right.
The Lessons That Apply Everywhere
Some of these were learned the hard way. They’re worth writing down.
On building integrations against undocumented or under-documented platforms. The single highest-leverage thing we did on the BlueCat IPAM build was reverse-engineering the Infoblox reference plugin before writing our own. The package format, the registration.yaml shape, the ABX action conventions, the way logos and schemas are bundled — all of it came from studying a working example. Reading the docs alone would have cost us weeks of trial and error.
On NSX Federation and IPAM. In a federated NSX environment, Global Manager segments come back from the API with cidr: None because the CIDR lives on the Local Manager. Any IPAM provider that assumes a populated CIDR will fail silently on the Federation segments. The fix — strategy-4 prefix stripping — is conceptually simple but had to be discovered by actually deploying.
On network profile hygiene. Aria Automation will happily accumulate thousands of stale network references over time. Once the count gets high enough, profile validation slows to a crawl, “ghost networks” get selected during deployment, and operators stop trusting the catalog. The cleanup pays for itself within a month. Build the cleanup tooling early. Run it on a schedule.
On legacy migration. When you encounter a 50-step vRO workflow with 8-minute sleep timers, the answer is almost never “translate it faithfully.” It’s “figure out what it was trying to accomplish and rebuild the desired-state version.” Half the legacy steps in this engagement existed only to paper over earlier broken steps. We threw them away.
On idempotency. Every single Ansible role we wrote checks current state before changing it. Domain join reads Win32_ComputerSystem.PartOfDomain first. Build info registry writes preserve CreatedDate on re-runs. Hardening checks effective policy before reapplying. The result is a playbook the customer can run against a production VM at 2 a.m. without fear.
On knowledge transfer. We organized the final week around documentation, not heroics. Every artifact — blueprint, role, script, action, package — has its own README with API surface, usage examples, troubleshooting tables, and credential references. Standard Operating Procedures cover the common day-to-day tasks the customer’s team will perform. A consultant who leaves a customer dependent on the consultant has not finished the job.
Outcomes
- Production self-service VM provisioning through Service Broker, replacing six separate vRO branches with a single blueprint
- Automated IP allocation and DNS registration through BlueCat at provision time, with proper cleanup on deallocation
- Federation-aware network management that handles Global Stretched segments correctly across both datacenters
- Idempotent Windows post-deploy via Ansible — readable, version-controlled, and re-runnable
- A complete SOP set so the customer’s team can add new network segments, rotate service-account credentials, update KMS keys, modify Ansible roles, and troubleshoot deployment failures without phoning a friend
- Six open-source repositories published under GPL-3.0 at github.com/noahfarshad — sanitized of all customer-identifying detail, ready for the broader VMware community
- Zero lost work — every artifact preserved in the customer’s Bitbucket, organized by workstream, fully documented
The customer ended the engagement self-sufficient. The Broadcom Professional Services dependency is gone. The platform is theirs.
What’s Next
The pillar story for the next chapter is already written: expand Aria’s value from infrastructure delivery into full-stack provisioning. The Linux post-deploy pipeline is the natural next workstream, mirroring the Windows role library. The VCF 9.1 upgrade plan is queued. And the bare-metal Edge cluster is ready to roll into procurement.
For Essential Coach, the engagement leaves a public reference architecture that anyone running VMware Aria Automation with BlueCat IPAM, NSX Federation, and Ansible post-deploy can clone, study, and adapt. That’s the part that lasts.
Six workstreams. Six repositories. Six customer success stories. One customer team that left the engagement stronger than they found it.
Open-source artifacts: github.com/noahfarshad
Want to talk about a similar modernization at your organization? noah@essential.coach
