Using LLMs for Cloud Infrastructure Recommendations
After building our multi-cloud discovery platform, we had rich structured data about customer infrastructure — resource configurations, utilization metrics, cost breakdowns, dependency maps. The next step was turning that data into actionable recommendations. Traditionally this required senior cloud architects spending days analyzing the data. I wanted to see if LLMs could accelerate (not replace) that process.
Why LLMs for Infrastructure Recommendations?
Cloud infrastructure decisions are nuanced. Recommending whether to migrate a VM to a container, right-size a database, or consolidate storage depends on context that goes beyond simple threshold rules. An experienced architect considers workload patterns, compliance requirements, team capabilities, cost constraints, and interdependencies.
Rule-based engines can handle the obvious cases — "this VM has 5% average CPU utilization, consider downsizing" — but they fall apart on complex, multi-factor decisions. LLMs excel at synthesizing multiple data points into coherent, context-aware recommendations that read like what an experienced architect would write.
The Implementation
I built the recommendation engine as a Python service using FastAPI, with Pydantic models for strict input/output validation. The pipeline works in three stages: 1. Data Preparation — Pull discovered resource data, utilization metrics, and cost data from MongoDB/Elasticsearch. Structure it into concise summaries per resource or resource group. 2. Prompt Construction — Build detailed prompts that include the resource configuration, utilization data, cost information, and relevant context (cloud provider best practices, pricing tiers, compliance requirements). Each recommendation type (right-sizing, migration, modernization) has its own prompt template. 3. LLM Inference + Validation — Send prompts to the LLM API, parse the structured response (JSON output format), and validate recommendations against Pydantic schemas before storing them.
Prompt Engineering Lessons
The biggest challenge wasn't calling the API — it was crafting prompts that produce consistently useful recommendations. A few things I learned:
- ▹Be specific about output format. Vague prompts produce vague recommendations. I specify the exact JSON schema I expect, including fields for recommendation type, priority, estimated savings, implementation steps, and risks.
- ▹Include pricing context. The LLM doesn't know current cloud pricing. I inject relevant pricing data directly into the prompt so recommendations include realistic cost estimates.
- ▹Use few-shot examples. Including 2–3 examples of good recommendations in the prompt dramatically improved consistency and quality.
- ▹Separate analysis from recommendation. I ask the model to first analyze the data (thinking step), then produce the recommendation. This chain-of-thought approach reduces hallucinations and improves reasoning.
Guardrails and Validation
LLM outputs can't be trusted blindly for infrastructure decisions. Every recommendation goes through validation:
- ▹Schema Validation — Pydantic models enforce that recommendations have all required fields with correct types.
- ▹Sanity Checks — Estimated savings are bounded by actual current spend. Recommended instance types must exist in the target cloud provider's catalog.
- ▹Confidence Scoring — The model assigns a confidence score; recommendations below the threshold are flagged for human review.
- ▹Human-in-the-Loop — All recommendations are presented as suggestions. Cloud architects review and approve before any changes are made.
Results
The LLM-powered recommendation engine reduced the time to produce a comprehensive cloud optimization report from 3–5 days of architect time to about 4 hours (including review). The quality of recommendations, as rated by our senior architects, was comparable to manually-produced reports for 80%+ of cases.
More importantly, the consistency improved. Manual reports varied in quality depending on the architect's experience and familiarity with the specific cloud provider. The LLM-powered system produces uniformly detailed recommendations across all providers.
This isn't about replacing cloud architects — it's about giving them a powerful first draft that they can refine, letting them focus on the genuinely complex decisions that require human judgment.