Infrastructure-as-Code for Aria Automation: Consistency Across Environments

Infrastructure-as-Code for Aria Automation: Consistency Across Environments

A large enterprise customer transformed their Aria Automation infrastructure from accumulated drift into a clean, version-controlled, Federation-aware platform. This is the story of building the network automation toolkit that made it possible — aria_mapping.py, mapper.py, and cleanup_profiles.py — and the Infrastructure-as-Code patterns that turn manual UI configuration into reproducible deployments.

The Challenge

A Fortune 500 federal services integrator’s CIO and Hosting Services team operated VMware Aria Automation across multiple environments — two production datacenters connected via NSX Federation, plus development and test instances. Managing this through the Aria Automation UI had become unsustainable.

When the engagement started, the customer’s TX network profile contained over 2,500 stale network references. The accumulation came from years of normal operation: real workload segments mixed with ghost entries from old discoveries, infrastructure backbone segments that had no business being there, and duplicates from NSX Federation Global Manager picking up the same network from multiple Local Managers.

Profile validation slowed to a crawl. Operators couldn’t trust the catalog because “ghost networks” appeared in dropdowns. Deployments occasionally selected stale references that pointed at networks that had been decommissioned months earlier. Every cleanup attempt was manual, error-prone, and never quite finished before new drift accumulated.

The deeper problem was that the customer’s infrastructure was managed entirely through the Aria Automation UI. Image mappings, flavor mappings, storage profiles, network profiles, capability tags, segment tags — all of it created, edited, and versioned by hand. There was no version control, no peer review, no audit trail of changes, and no reliable way to ensure that two Aria instances were configured identically.

The customer needed two things: a way to clean up the accumulated drift, and a way to manage Aria Automation infrastructure as code so the drift wouldn’t accumulate again.

The Core Principle

Whether you have three Aria Automation instances or thirty, manual UI configuration leads to inconsistency. Infrastructure-as-Code isn’t just about scale — it’s about ensuring identical deployments regardless of environment. A blueprint that works in dev should work identically in test and production. Image mappings, flavor definitions, storage profiles, and network profiles must be consistent across all instances for reliable automation.

The customer’s environment exemplified this principle perfectly. The two production datacenters were supposed to be identical. They weren’t. Image names had drifted. Flavor sizing differed subtly between sites. Network profiles had accumulated different sets of capability tags. The same blueprint deployed to TX behaved slightly differently than the same blueprint deployed to VA — not because the blueprint was wrong, but because the underlying infrastructure had drifted.

The Solution: Three Tools, One Toolkit

The toolkit grew through several major versions to address each layer of the problem.

Tool 1: aria_mapping.py — The Swiss Army Knife (v1.6.0)

The foundation tool that turned Aria Automation infrastructure from a UI artifact into version-controlled configuration.

What it manages as code:
Flavor mappings — t-shirt sizes (small, medium, large, xlarge) with consistent CPU and memory across all instances
Image mappings — vSphere template references mapped to friendly names (Windows Server 2022, RHEL 9, Oracle Linux 8)
Storage profiles — placement policies and storage tags
Capability tags — the metadata that drives blueprint constraint matching
Segment tags — used by the Service Broker dynamic catalog dropdown
DNS configuration — domain associations for network profiles
--servicenow-tags mode — drives the dynamic catalog dropdown by tagging segments with the labels users see in Service Broker

The IaC workflow it enables:

Define infrastructure declaratively in YAML files committed to a repository. Run aria_mapping.py to push the desired state to Aria Automation. Detect drift by running with the --detect-drift flag and comparing actual to desired state. Remediate drift by re-running the import. Provision a new Aria Automation instance by pointing the tool at it and importing the standard configuration. What used to take weeks of UI clicking takes minutes.

Tool 2: mapper.py — Multi-Profile NSX Manager (v5.0)

The tool that handles network profile management across NSX Federation environments.

Core capabilities:

  • CIDR-based BlueCat IPAM matching — for each fabric network, look up the matching BlueCat IP range by CIDR rather than by name (names drift, CIDRs don’t)
  • Dynamic VCF sub-account discovery — VCF Cloud Accounts bundle vCenter and NSX Local Manager and create hidden sub-accounts at runtime; mapper.py resolves them automatically rather than requiring manual configuration
  • The --all flag — runs the entire pipeline (discovery, IPAM matching, profile assignment, tagging) in one command
  • Federation-aware behavior — handles Global Manager segments where Local Manager state must be queried for complete information
  • Prefix-based segment filtering — categorizes thousands of fabric networks across multiple logical profiles using prefix patterns

Profile organization:

The tool organizes networks into logical profiles based on their location and purpose. The customer’s environment was organized into five profiles: ESXP TX W01 (Texas workload), ESXP VA W01 (Virginia workload), NSX Overlay TX (Texas overlay segments), NSX Overlay VA (Virginia overlay segments), and NSX Global Stretched (Federation-stretched segments). The prefix patterns (NZ-*, TX-*, VA-*, US-CI-*, G-*) automatically routed each network to its correct profile.

Tool 3: cleanup_profiles.py — Federation-Aware Cleanup

The tool that strips ghost and duplicate network entries from NSX profiles.

What it removes:

  • Ghost entries — references to networks that no longer exist in NSX
  • Duplicate entries — the same network appearing multiple times due to Federation discovery
  • Backbone segments — infrastructure networks that should never appear in workload profiles
  • Stale references — networks that were renamed or restructured

Safety features:

The tool checks for active deployments using a profile entry before removing it. It scans all Blueprints for hardcoded network profile references. It validates Cloud Zone constraints to ensure cleanup doesn’t break zone configurations. It runs in dry-run mode by default, requiring explicit --execute to make changes. It backs up profile state before modification to enable rollback.

The Journey: From 2,500 to 155

Phase 1: Discovery and Mapping

The first task was understanding what was actually in the environment. aria_mapping.py was extended to enumerate every fabric network across every NSX Manager and produce a topology map. The discovery revealed 4,188 fabric networks across the federation — a number that surprised even the customer’s network team.

The map showed the scope of the cleanup challenge: 2,500+ entries in the TX profile alone, mixing real workload segments with ghost entries, backbone segments, and Federation duplicates.

Phase 2: BlueCat IPAM Correlation

With the topology mapped, the next phase correlated networks to BlueCat IP ranges. The customer’s BlueCat Address Manager had 489 ranges associated with networks at the start. Many networks had no BlueCat range at all — meaning IPAM allocation would fail when those networks were used.

mapper.py matched networks to BlueCat ranges by CIDR rather than by name (names had drifted over time, but the CIDR was authoritative). Where matches were found, the tool tagged the network profile entry with the BlueCat range identifier. Where matches were missing, the tool flagged the gap for manual investigation.

By v2.3.19 of the IPAM provider working in conjunction with the mapper, 60 of 65 missing BlueCat ranges were resolved. The total associated ranges grew from 489 to 727. Only 5 segments remained without BlueCat ranges across both datacenters.

Phase 3: Profile Cleanup

With networks correlated to IPAM data and categorized by prefix patterns, cleanup_profiles.py could safely identify what to remove. Each candidate for removal was checked against active deployments, Blueprint references, and Cloud Zone constraints. Dry-run mode showed what would be removed before any changes were made.

The cleanup ran in batches with manual review between phases. Each batch reduced the entry count without breaking active deployments. Within several iterations, the TX profile shrank from 2,500+ entries to 155 clean, curated workload networks.

Phase 4: Drift Prevention

Cleanup is meaningless if drift starts accumulating again the moment the cleanup finishes. The toolkit was extended with drift detection: a scheduled run of aria_mapping.py --detect-drift compared current state to the version-controlled desired state. New networks added through NSX automatically appeared in the mapper output. Profile entries that drifted from desired state generated alerts.

The customer adopted a pattern of running drift detection weekly. Real drift (new networks needing classification) triggered IaC updates. Unintended drift (manual UI changes that violated the IaC source of truth) triggered remediation runs.

The Results

The toolkit transformed network profile management from constant manual cleanup to automated, version-controlled infrastructure.

Quantifiable Outcomes

TX profile cleanup: Reduced from 2,500+ entries to 155 clean, curated workload networks.

Federation-wide mapping: 834 fabric networks across two datacenters mapped, tagged, and IPAM-linked (551 TX, 283 VA).

BlueCat coverage: 727 IP ranges associated (up from 489 baseline), with 93 explicitly linked to network profile entries.

Profile organization: 4,188 fabric networks categorized into 5 logical profiles using prefix-based filtering.

Drift control: Automated weekly drift detection replaced manual quarterly cleanup cycles.

Operational Improvements

Version-controlled infrastructure: All Aria Automation infrastructure (flavors, images, storage, network profiles, tags) defined in YAML committed to Bitbucket. Every change peer-reviewed before merge. Every deployment traceable to a commit.

Reproducible environment provisioning: New Aria Automation instances configured in minutes instead of weeks. The standard configuration repository contains everything needed to bring up a new instance to feature parity with production.

Federation-aware operation: Global Stretched segments handled correctly across both datacenters. The tool understands Federation topology and routes correctly without manual intervention.

Dynamic catalog freshness: The Service Broker network dropdown populates from live segment tags managed by aria_mapping.py --servicenow-tags. New networks appear in the catalog as soon as they’re tagged — no blueprint redeploy required.

Lessons Learned

What Worked Well

CIDR-based IPAM matching. Names drift over time. Networks get renamed. Tags shift. CIDRs are authoritative — the same CIDR identifies the same network regardless of what anyone has called it. Matching networks to BlueCat ranges by CIDR eliminated an entire class of correlation failures.

Dynamic sub-account discovery. VCF Cloud Accounts create hidden sub-accounts that have to be resolved at runtime. Building the resolver into mapper.py rather than requiring manual configuration meant the tool worked correctly the first time on every new Aria instance.

Prefix-based filtering. Using prefix patterns (NZ-*, TX-*, VA-*, US-CI-*, G-*) to route networks to logical profiles scaled cleanly. Adding a new profile required defining its prefix pattern and re-running the tool. The customer could extend the categorization without touching the tool’s code.

The --all flag. Tools that require running multiple commands in the right order get used wrong. Combining the entire pipeline behind one flag meant the customer’s team could run it without consulting documentation.

Federation-aware cleanup. Detecting Global Stretched segments and handling them differently from local segments prevented the cleanup tool from breaking Federation-aware deployments.

What We’d Do Differently

Build the cleanup tooling earlier. Aria Automation will happily accumulate thousands of stale network references over time. Once the count gets high enough, profile validation slows to a crawl, ghost networks get selected during deployment, and operators stop trusting the catalog. The cleanup pays for itself within a month. Build the cleanup tooling early. Run it on a schedule.

Document Federation API quirks earlier. The Global Manager cidr: None behavior was discovered through debugging. Documenting it in the API contract from version one would have saved future developers (and future versions of ourselves) from rediscovering it.

Build IaC patterns from day one. The customer’s environment had drifted before the engagement started. If IaC had been the operating model from initial deployment, the cleanup work would have been unnecessary. New Aria deployments should start with IaC, not retrofit it later.

The Community Impact

The complete network automation toolkit has been published as open-source software for the broader VMware community.

Repository: github.com/noahfarshad/aria-network-automation

The repository includes the complete Python source for aria_mapping.py, mapper.py, and cleanup_profiles.py, NSX Federation handling utilities, BlueCat IPAM integration patterns, dry-run and backup capabilities, deployment guides for new and existing Aria Automation environments, configuration templates for the IaC workflow, and troubleshooting runbooks.

The toolkit is broadly applicable for any organization running NSX Federation with Aria Automation. The Federation-aware patterns are unique in the publicly available ecosystem — most off-the-shelf tools don’t handle Global Manager cidr: None correctly, don’t resolve VCF sub-accounts dynamically, and don’t categorize networks by prefix patterns.

Getting Started

The toolkit requires VMware Aria Automation 8.x or later, NSX Federation (any version supporting REST API), BlueCat Address Manager (recommended for IPAM correlation), Python 3.7+ runtime, and API credentials for NSX, Aria Automation, and BlueCat.

Clone the repository, install Python dependencies, configure connection details for NSX and Aria Automation in config.json, run aria_mapping.py --detect-drift to compare current state to desired state, run mapper.py --all to discover networks and correlate to IPAM data, and run cleanup_profiles.py --dry-run to see what cleanup would remove before executing it.

Conclusion

Managing VMware Aria Automation infrastructure through manual UI configuration doesn’t scale and doesn’t survive contact with reality. Networks accumulate, drift creeps in, and operator trust in the catalog erodes. Infrastructure-as-Code is the answer, but it requires tooling that understands the platform’s quirks — Federation behavior, sub-account resolution, IPAM correlation, prefix-based categorization.

For organizations running VMware Aria Automation, the patterns demonstrated here provide a production-proven approach to managing infrastructure as code. Whether you’re cleaning up years of drift or starting fresh with IaC from day one, the toolkit handles the heavy lifting that off-the-shelf tools don’t.

The complete toolkit, documentation, and configuration templates are available on GitHub for organizations facing similar challenges.


Repository: github.com/noahfarshad/aria-network-automation

Related Stories:

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top