<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Daniel McLoughlin ☁️]]></title><description><![CDATA[Now then! I'm Daniel McLoughlin, a Technology Strategist bridging Azure App Mod with AI. Here, I share insights on cloud strategy, governance, and implementatio]]></description><link>https://daniel.mcloughlin.cloud</link><generator>RSS for Node</generator><lastBuildDate>Wed, 15 Apr 2026 18:34:26 GMT</lastBuildDate><atom:link href="https://daniel.mcloughlin.cloud/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Microsoft App Modernization Guidance for Azure]]></title><description><![CDATA[In recent months, I’ve created visual tools to help people make sense of Microsoft’s most important cloud frameworks:

The Cloud Adoption Framework (CAF) – Microsoft’s end-to-end cloud journey guidance

The Well-Architected Framework (WAF) – principl...]]></description><link>https://daniel.mcloughlin.cloud/microsoft-app-modernization-guidance-for-azure</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/microsoft-app-modernization-guidance-for-azure</guid><category><![CDATA[Azure]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[app development]]></category><category><![CDATA[modernization]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[copilot]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Mon, 04 Aug 2025 12:55:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1754311266210/a09f91cb-71b8-4c08-ae27-93bf4db37e5e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In recent months, I’ve created <a target="_blank" href="https://daniel.mcloughlin.cloud/visuals-for-azure-governance">visual tools to help people make sense of Microsoft’s most important cloud frameworks</a>:</p>
<ul>
<li><p>The <strong>Cloud Adoption Framework (CAF)</strong> – Microsoft’s end-to-end cloud journey guidance</p>
</li>
<li><p>The <strong>Well-Architected Framework (WAF)</strong> – principles for designing secure, reliable, and cost-effective workloads</p>
</li>
</ul>
<p>These two frameworks underpin how organisations plan, govern, and build in Azure.</p>
<p>But at <a target="_blank" href="https://techcommunity.microsoft.com/blog/appsonazureblog/reimagining-app-modernization-for-the-era-of-ai/4414793?previewMessage=true"><strong>Microsoft Build (May 2025)</strong></a>, a new addition quietly joined the party: the <strong>App Modernisation Guidance</strong> – a focused set of recommendations and patterns to help organisations bring legacy apps into the modern cloud world.</p>
<p>Where CAF shows you the overall journey and WAF ensures your workloads are well built, this new guidance zooms in on one of the most complex (and valuable) pieces of the puzzle – modernising legacy applications for the cloud.</p>
<hr />
<h1 id="heading-what-is-the-microsoft-app-modernisation-guidance">What is the Microsoft App Modernisation Guidance?</h1>
<p>The Microsoft App Modernisation Guidance is part of the wider <strong>Cloud Adoption Framework</strong>. It provides a practical, structured lifecycle to help organisations move from legacy systems to modern, cloud-native applications on Azure.</p>
<p>It breaks the journey into seven clear phases:</p>
<ul>
<li><p><strong>Get Started</strong> – Learn the lifecycle, define your roadmap, and assess where you are</p>
</li>
<li><p><strong>Assess</strong> – Evaluate app portfolios, gather inventory, and check cloud readiness</p>
</li>
<li><p><strong>Plan</strong> – Choose the right strategy using Microsoft’s 6 Rs (Rehost, Refactor, Rebuild, etc.)</p>
</li>
<li><p><strong>Launch</strong> – Run a proof of concept, and embed identity, security, and compliance</p>
</li>
<li><p><strong>Foundation</strong> – Set the groundwork for specific stacks like .NET, Java, SAP, Oracle, VMware</p>
</li>
<li><p><strong>Expand</strong> – Rebuild using APIs, microservices, serverless platforms, and data modernisation</p>
</li>
<li><p><strong>Innovate &amp; Optimise</strong> – Introduce AI, low-code, performance, and scale improvements</p>
</li>
</ul>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/app-modernization-guidance/get-started/application-modernization-life-cycle"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754303442552/12ccbb7f-3c2a-4e43-a3fa-302e4889784a.png" alt="Application modernization life cycle diagram" class="image--center mx-auto" /></a></p>
<hr />
<h1 id="heading-why-does-modernisation-matter">Why does modernisation matter?</h1>
<p>Modernising applications isn’t just a technical upgrade – it’s a business enabler. Most organisations still rely on legacy systems that are hard to scale, expensive to run, and difficult to integrate with modern services.</p>
<p>Modernisation helps you:</p>
<ul>
<li><p>Deliver faster and adapt to change</p>
</li>
<li><p>Reduce technical debt and support overhead</p>
</li>
<li><p>Enable AI, analytics, and low-code tools</p>
</li>
<li><p>Improve performance, security, and scalability</p>
</li>
<li><p>Unlock agility and business value from your existing assets</p>
</li>
</ul>
<p>This guidance provides structure and clarity for what can otherwise feel like an overwhelming task.</p>
<hr />
<h1 id="heading-how-does-it-relate-to-caf-and-waf">How does it relate to CAF and WAF?</h1>
<p>Here’s the relationship in simple terms:</p>
<ul>
<li><p><strong>CAF</strong> = the full cloud journey</p>
</li>
<li><p><strong>App Modernisation Guidance</strong> \= the "modernise your apps" part of that journey</p>
</li>
<li><p><strong>WAF</strong> = best practices to make sure your apps are well-architected, secure, and scalable</p>
</li>
</ul>
<p>The App Modernisation Guidance slots directly into CAF’s <strong>Modernise</strong> phase and can be used alongside WAF to validate the architecture of replatformed or rebuilt workloads.</p>
<hr />
<h1 id="heading-ai-powered-modernisation-with-copilot">AI-powered modernisation with Copilot</h1>
<p>Microsoft is also bringing AI into the process. The <strong>GitHub Copilot Upgrade Assistants</strong> help speed up app modernisation by offering intelligent recommendations and automated code changes.</p>
<ul>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/dotnet/core/porting/github-copilot-app-modernization-overview">Copilot for .NET</a> – for upgrading from .NET Framework to modern .NET</p>
</li>
<li><p><a target="_blank" href="https://learn.microsoft.com/en-us/java/upgrade/overview">Copilot for Java</a> – for migrating Java apps to Azure-native platforms</p>
</li>
</ul>
<p>These tools can save a significant amount of time when migrating large legacy estates, and are well worth a try.</p>
<hr />
<h1 id="heading-why-i-built-a-visual-explorer-for-this-guidance">Why I built a visual explorer for this guidance</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1754311279432/2fa02219-bd30-4439-b228-871e787fce03.png" alt class="image--center mx-auto" /></p>
<p>Like many others, I’ve found real value in having a <strong>visual, structured view</strong> of large Microsoft frameworks. After publishing explorers for <strong>CAF</strong>, <strong>WAF</strong>, and <strong>Microsoft AI Adoption</strong>, I wanted to do the same for this new addition.</p>
<p>The <strong>Markmap Visual Explorer for the App Modernisation Guidance</strong> lays out all seven phases in a single interactive map – with direct links to Microsoft’s official documentation.</p>
<p>It’s built for architects, developers, IT leads – anyone planning or delivering app modernisation.</p>
<p>📦 <a target="_blank" href="https://github.com/CloudDevDan/app-modernization-markmap"><strong>GitHub Repository</strong></a><br />🗺 <a target="_blank" href="https://app-mod.daniel.mcloughlin.cloud/"><strong>Launch Interactive Map</strong></a>  </p>
<hr />
<h1 id="heading-final-thoughts">Final thoughts</h1>
<p>In my opinion, this guidance is a real value-add to the Cloud Adoption Framework.</p>
<p>It helps clear away much of the ambiguity around how to get from A to B in the cloud – offering practical, grounded steps that go beyond theory. It’s not just about moving tech – it’s about aligning modernisation efforts to real business outcomes, in a way that’s measurable, focused, and easy to understand.</p>
<p>If you’re modernising legacy systems or preparing to – this is worth your time.</p>
<p>And if you want the big picture at a glance – give the Markmap a try.</p>
<hr />
<p><strong>A quick note on spelling…</strong></p>
<p>You may notice a mix of UK and US spelling in this post. I’m based in the UK and follow British spelling conventions (like <em>modernisation</em> and <em>optimise</em>), but I also use Microsoft's official product names and terminology (like <em>App Modernization Guidance</em>) as they appear in the documentation - which typically uses US spelling. It’s a deliberate mix to keep things both accurate and readable.</p>
]]></content:encoded></item><item><title><![CDATA[Visuals for Azure Governance]]></title><description><![CDATA[I've been working with the Azure Cloud Adoption Framework (CAF) and the Azure Well-Architected Framework (WAF) quite a bit recently - and while the content is excellent and comprehensive, I found it hard to navigate quickly or explain clearly in prac...]]></description><link>https://daniel.mcloughlin.cloud/visuals-for-azure-governance</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/visuals-for-azure-governance</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Governance]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Tue, 24 Jun 2025 15:28:01 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750777702080/b54f522c-71e5-4417-a760-0d7f3a5a4654.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>I've been working with the <a target="_blank" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/">Azure Cloud Adoption Framework (CAF)</a> and the <a target="_blank" href="https://learn.microsoft.com/en-us/azure/architecture/framework/">Azure Well-Architected Framework (WAF)</a> quite a bit recently - and while the content is excellent and comprehensive, I found it hard to navigate quickly or explain clearly in practice.</p>
<p>I'm a visual learner. The sheer volume of documentation can feel overwhelming, and I needed a better way to understand the structure, flow, and relationships between the key parts. So I built a couple of interactive mind maps using a tool called <a target="_blank" href="https://markmap.js.org/">Markmap</a>. They’ve helped me get to grips with the frameworks more easily, and I’m sharing them here in case they help others too.</p>
<p>Let’s take a look:</p>
<hr />
<h1 id="heading-azure-caf">Azure CAF</h1>
<p><img src="https://github.com/CloudDevDan/azure-caf-markmap/blob/main/img/overview.jpg?raw=true" alt="overview.jpg" /></p>
<p>This project visualises the structure of the Cloud Adoption Framework in a single, explorable map. From strategy and planning to governance and management, the layout is designed to help you grasp how the different stages fit together.</p>
<p>📦 <a target="_blank" href="https://github.com/CloudDevDan/azure-caf-markmap">GitHub Repository</a><br />🗺 <a target="_blank" href="https://azure-caf.daniel.mcloughlin.cloud/">Launch Interactive Map</a></p>
<hr />
<h1 id="heading-azure-waf">Azure WAF</h1>
<p><img src="https://github.com/CloudDevDan/azure-waf-markmap/raw/main/img/overview.jpg" alt="Tracker Screenshot" /></p>
<p>The WAF map breaks down the five pillars of the Well-Architected Framework, along with key principles and design considerations. It’s a handy way to refer to the framework during architecture sessions or to anchor discussions around best practices.</p>
<p>📦 <a target="_blank" href="https://github.com/CloudDevDan/azure-waf-markmap">GitHub Repository</a><br />🗺 <a target="_blank" href="https://azure-waf.daniel.mcloughlin.cloud/">Launch Interactive Map</a></p>
<hr />
<h1 id="heading-microsoft-ai-adoption">Microsoft AI Adoption</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1753950959206/8af2800a-d647-4a94-ad3c-f832fe04b5c2.png" alt class="image--center mx-auto" /></p>
<p>The Microsoft AI Adoption framework is an official CAF adoption scenario designed to help organisations adopt AI responsibly and effectively – with tailored guidance for both startups and enterprises, plus checklists, best practices, and platform recommendations.</p>
<p>📦 <a target="_blank" href="https://github.com/CloudDevDan/ai-adoption-markmap">GitHub Repository</a><br />🗺 <a target="_blank" href="https://ai-adoption.daniel.mcloughlin.cloud/">Launch Interactive Map</a></p>
<hr />
<h1 id="heading-why-and-how">Why and How</h1>
<p>I built these maps because I learn best when I can <em>see</em> how things connect.</p>
<p>Working through long-form documentation is necessary - but sometimes, especially when preparing for reviews or walking through a framework with a team, I need a more structured, visual approach. That’s where <a target="_blank" href="https://markmap.js.org/">Markmap</a> comes in.</p>
<p>Markmap turns Markdown into an interactive mind map. It’s:</p>
<ul>
<li><p>Lightweight and open source</p>
</li>
<li><p>Easy to maintain in GitHub</p>
</li>
<li><p>Perfect for creating skimmable, navigable versions of dense content</p>
</li>
</ul>
<p>Each map starts with a plain Markdown file, structured using bullet points. Markmap does the rest.</p>
<p>If you're a visual thinker, or just looking for a more approachable way to engage with CAF and WAF, these tools might be helpful.</p>
<hr />
<p>Both projects are open source and available to fork, adapt, or build upon. Feedback and contributions welcome.</p>
<p>Thanks for checking them out.</p>
<p>Dan</p>
]]></content:encoded></item><item><title><![CDATA[Azure AI Foundry vs Azure AI Services vs Azure Machine Learning]]></title><description><![CDATA[Introduction
I've seen a lot of confusion recently about Azure AI Foundry and its role in the Azure AI ecosystem. Specifically, people are assuming Foundry is a full machine learning platform where you can train and run ML workloads. To make matters ...]]></description><link>https://daniel.mcloughlin.cloud/azure-ai-foundry-vs-azure-ai-services-vs-azure-machine-learning</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/azure-ai-foundry-vs-azure-ai-services-vs-azure-machine-learning</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[AI]]></category><category><![CDATA[genai]]></category><category><![CDATA[ML]]></category><category><![CDATA[Machine Learning]]></category><category><![CDATA[Azure]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Tue, 17 Jun 2025 11:39:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750156273407/13e6226a-3cc7-498c-8a08-a5f7aa36f4e5.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>I've seen a lot of confusion recently about Azure AI Foundry and its role in the Azure AI ecosystem. Specifically, people are assuming Foundry is a full machine learning platform where you can train and run ML workloads. To make matters worse, a recent change in the Azure portal has renamed "Azure AI Services" to "AI Foundry," adding even more fuel to the fire.</p>
<p>Let's clear this up.</p>
<hr />
<h1 id="heading-azure-ai-foundry-what-it-actually-is">Azure AI Foundry: What It Actually Is</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750159290603/f35decd0-8baf-4244-9ff0-2bff69cc8522.png" alt class="image--center mx-auto" /></p>
<p>Azure AI Foundry is Microsoft's <mark>GenAI orchestration platform</mark>. It helps you build generative AI applications by bringing together:</p>
<ul>
<li><p>Model catalogues (Azure OpenAI, OSS models, 3rd party models)</p>
</li>
<li><p>Prompt Flow integration</p>
</li>
<li><p>Fine-tuning for foundation models</p>
</li>
<li><p>Evaluation tooling</p>
</li>
<li><p>Responsible AI guardrails</p>
</li>
<li><p>Deployment endpoints</p>
</li>
<li><p>Agent and application templates</p>
</li>
</ul>
<p>It's designed for building GenAI apps - not for classical machine learning.</p>
<p>You deploy Azure AI Foundry as an Azure resource, and access it via its own portal.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750159495547/36bfc65e-67b3-4c24-bd2c-1596e6ca5b1d.png" alt class="image--center mx-auto" /></p>
<p>I go into much more detail on what Foundry is (and isn't) in my earlier post: "<a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-what-it-is-and-what-it-isnt">Azure AI Foundry: What It Is &amp; What It Isn't</a>".</p>
<hr />
<h1 id="heading-what-foundry-is-not">What Foundry Is Not</h1>
<p>Azure AI Foundry is <strong>not</strong>:</p>
<ul>
<li><p>A platform for training ML models from your own data</p>
</li>
<li><p>A replacement for Azure Machine Learning</p>
</li>
<li><p>A full MLOps pipeline</p>
</li>
</ul>
<hr />
<h1 id="heading-azure-machine-learning-where-ml-lives">Azure Machine Learning: Where ML Lives</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750159388921/58d54a45-7afa-4254-a7e9-84579a1ba16b.png" alt class="image--center mx-auto" /></p>
<p>If you're doing traditional ML - training models, running experiments, managing pipelines - you still use Azure Machine Learning. That's where the full ML workflow lives, including hyperparameter tuning, feature stores, custom models, and deployment.</p>
<p>Much like Azure AI Foundry, you deploy Azure Machine Learning as an Azure resource, and access it via its own portal, called Azure Machine Learning Studio.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750159571625/5f8a25be-ec5f-41b7-83f7-dac57f66ee02.png" alt class="image--center mx-auto" /></p>
<hr />
<h1 id="heading-the-portal-rename-the-root-of-the-confusion">The Portal Rename: The Root of the Confusion</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1750159797160/5c74fcb2-7149-4cdb-8acd-12d98a423e3b.png" alt class="image--center mx-auto" /></p>
<p>The Azure portal renamed the "AI Services" blade to "AI Foundry". This blade now includes:</p>
<ul>
<li><p>The Foundry platform itself</p>
</li>
<li><p>Azure OpenAI, AI Foundry (projects), AI Hubs (parents of the projects), AI Search</p>
</li>
<li><p>All the classic Cognitive Services (Vision, Speech, Language, Document Intelligence, etc.)</p>
</li>
<li><p>Azure Machine Learning (still listed as a separate resource)</p>
</li>
</ul>
<p>Seeing all these services under the "AI Foundry" banner is leading people to assume Foundry now includes ML capabilities. It doesn't.</p>
<p>Azure Databricks is not included in any of this.</p>
<hr />
<h1 id="heading-a-quick-word-on-using-code">A Quick Word on Using Code</h1>
<p>Regardless of whether you're using Azure AI Foundry or Azure Machine Learning, both platforms can be accessed programmatically using languages like Python. For Foundry, this typically involves working with the Azure SDKs, REST APIs, or integrating with tools like Prompt Flow in code. For Azure Machine Learning, Python is often the primary interface - from data preparation and training to deployment and monitoring, with full SDK support for end-to-end ML workflows.</p>
<p>You can also work with both platforms directly from Visual Studio Code. With or without extensions, VS Code gives you flexibility to write and manage code, interact with Azure resources, and integrate with version control. Microsoft offers specific extensions for Azure Machine Learning and Azure AI Foundry (in preview) that simplify code level integration with the deployed platform in Azure.</p>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/ai-foundry/how-to/develop/get-started-projects-vs-code"><img src="https://learn.microsoft.com/en-us/azure/ai-foundry/media/how-to/get-started-projects-vs-code/visual-studio-command-palette-small.png" alt="A screenshot of the Visual Studio Code command palette for Azure AI Foundry." /></a></p>
<p>When it comes to endpoints, both Foundry and Azure ML allow you to manage deployment endpoints directly in the Azure portal. In Foundry, once you've selected and deployed a model from the model catalogue, an endpoint is automatically provisioned for you. You can then retrieve the endpoint URL and keys directly from the portal for use in your applications. Azure ML offers similar functionality for deploying custom models, exposing them as managed web services that can be consumed via REST APIs or integrated into larger applications.</p>
<hr />
<h1 id="heading-quick-rule-of-thumb">Quick Rule of Thumb</h1>
<ul>
<li><p><strong>Building GenAI apps?</strong> Use Azure AI Foundry.</p>
</li>
<li><p><strong>Running ML workloads?</strong> Use Azure Machine Learning or Azure Databricks.</p>
</li>
<li><p><strong>Using pre-trained APIs?</strong> Use the services within the AI Foundry portal blade (formerly Cognitive Services and AI Services).</p>
</li>
</ul>
<p>Hopefully this helps cut through the noise. The Azure AI landscape is evolving fast, and it's easy to get tripped up - I'm learning it in public so you don't have to.</p>
<hr />
<h1 id="heading-further-reading">Further Reading</h1>
<p>If you're interested in more context on Foundry, check out my earlier posts:</p>
<ul>
<li><p><a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-what-it-is-and-what-it-isnt">Azure AI Foundry: What It Is &amp; What It Isn't</a></p>
</li>
<li><p><a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-why-use-it">Azure AI Foundry: Why Use It</a></p>
</li>
<li><p><a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-vs-microsoft-copilot-studio">Azure AI Foundry vs Microsoft Copilot Studio</a></p>
</li>
<li><p><a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-whats-new-from-build-2025">Azure AI Foundry: What's New From Build 2025</a></p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Azure AI Foundry vs Microsoft Copilot Studio]]></title><description><![CDATA[Why reinvent the wheel - especially when someone else out there can do it better than you?
I had every intention of writing a piece comparing Azure AI Foundry to Microsoft Copilot Studio. It’s a question I’ve faced a few times in my line of work as a...]]></description><link>https://daniel.mcloughlin.cloud/azure-ai-foundry-vs-microsoft-copilot-studio</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/azure-ai-foundry-vs-microsoft-copilot-studio</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[AI]]></category><category><![CDATA[copilot]]></category><category><![CDATA[foundry]]></category><category><![CDATA[Azure]]></category><category><![CDATA[generative ai]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Fri, 06 Jun 2025 14:34:03 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749219074800/36950c46-8c84-4b90-a42c-639e52f6c930.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Why reinvent the wheel - especially when someone else out there can do it better than you?</p>
<p>I had every intention of writing a piece comparing Azure AI Foundry to Microsoft Copilot Studio. It’s a question I’ve faced a few times in my line of work as a Tech Strategist, and it felt like a natural progression in my <a target="_blank" href="https://daniel.mcloughlin.cloud/series/azureai">Azure AI Foundry series</a> - where I’ve already explored what it is, what it isn’t, and why use it.</p>
<p>However, while scrolling LinkedIn earlier this week, I came across a post by none other than <a target="_blank" href="https://www.linkedin.com/in/pdtit/">Peter De Tender</a> - a well-respected Microsoft Technical Trainer - who had already tackled the exact comparison I had in mind.</p>
<p>So rather than start from scratch, I’m doing something better: amplifying his excellent article and encouraging you to give it a read.</p>
<p>👉 <strong>Here’s the link to his article:</strong></p>
<div class="embed-wrapper"><div class="embed-loading"><div class="loadingRow"></div><div class="loadingRow"></div></div><a class="embed-card" href="https://techcommunity.microsoft.com/blog/azure-ai-services-blog/navigating-ai-solutions-microsoft-copilot-studio-vs-azure-ai-foundry/4411678">https://techcommunity.microsoft.com/blog/azure-ai-services-blog/navigating-ai-solutions-microsoft-copilot-studio-vs-azure-ai-foundry/4411678</a></div>
<p> </p>
<h3 id="heading-tldr-my-quick-take">TL;DR – My Quick Take</h3>
<p>Peter nails the distinction:</p>
<ul>
<li><p><strong>Copilot Studio</strong> is perfect for rapid, low-code solutions - chatbots, automation, and lightweight business workflows. It’s ideal for organisations already deeply embedded in the Microsoft 365 ecosystem.</p>
</li>
<li><p><strong>Azure AI Foundry</strong>, on the other hand, is designed for developers and technical teams looking to build, customise, and scale GenAI apps with precision and flexibility. It’s about orchestration, experimentation, and end-to-end control.</p>
</li>
</ul>
<p>They serve different purposes - but both are valuable depending on the problem you're solving.</p>
<blockquote>
<p>So, like all things in tech, the answer to the question on which one to use is: it depends!</p>
</blockquote>
<p>From my side, one key consideration that stems beyond usage is the cost model, and the surrounding ecosystem you deploy and build into.</p>
<p>For example, Azure AI Foundry is deployed into Azure as a resource - just like a storage account or virtual machine. Whilst the costs for Azure AI Foundry aren't immediately clear (I'll write more about this in detail in another post) - the costs themselves will be billed via the Azure subscription into which they are deployed.</p>
<p>Copilot Studio, in contrast, operates within the Power Platform environment. Solutions are developed in Power Apps environments and published into environments tied to your Microsoft 365 tenant. Pricing is typically consumption-based, with licensing linked to Power Platform plans or pay-as-you-go usage via Power Platform capacity. This means you’re operating inside the governance, compliance, and licensing model of Microsoft 365, not Azure.</p>
]]></content:encoded></item><item><title><![CDATA[Azure AI Foundry: What's New From Build 2025]]></title><description><![CDATA[Introduction
Let’s be honest: keeping up with Microsoft Build announcements can feel like drinking from a fire hose - especially when AI is involved! But if, like me, you’re interested in Azure AI Foundry, this year’s updates are worth paying close a...]]></description><link>https://daniel.mcloughlin.cloud/azure-ai-foundry-whats-new-from-build-2025</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/azure-ai-foundry-whats-new-from-build-2025</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[Azure]]></category><category><![CDATA[AI]]></category><category><![CDATA[foundry]]></category><category><![CDATA[build]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Thu, 29 May 2025 10:40:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748440948231/a8dc9a94-0b51-4cdf-96e6-e62d5c1ccacc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>Let’s be honest: keeping up with Microsoft Build announcements can feel like drinking from a fire hose - especially when AI is involved! But if, like me, you’re interested in Azure AI Foundry, this year’s updates are worth paying close attention to.</p>
<p>Following on from my last post - <a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-why-use-it">Why Use Azure AI Foundry?</a> - I wanted to zoom in on the key announcements from Build 2025 and what they <em>actually mean</em> in practice. Not just the “what”, but the “why” - especially if you’re a builder trying to make sense of where to focus.</p>
<p>I won’t attempt to cover every announcement (there were loads!) - but I have pulled together the ones that stood out most to me from the Foundry perspective.</p>
<p>If you want the full rundown, the <a target="_blank" href="https://news.microsoft.com/build-2025-book-of-news/">Build Book of News</a> is a great companion resource.</p>
<hr />
<h1 id="heading-model-router"><strong>Model Router</strong></h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748508128619/70a679f7-89b7-422d-8d9a-052cc57a9acf.png" alt class="image--center mx-auto" /></p>
<p><strong>What it is:</strong><br />This one is really cool! The Model Router is a new feature that automatically selects the most suitable Azure OpenAI model for your specific prompt, optimising for performance and cost.</p>
<blockquote>
<p>By evaluating factors like query complexity, cost, and performance, it intelligently routes requests to the most suitable model.</p>
</blockquote>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/model-router"><em>Source</em></a></p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Reduces the need for manual model selection</p>
</li>
<li><p>Enhances response quality by choosing the best-fit model</p>
</li>
<li><p>Optimises costs by selecting the most efficient model</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>Dynamic selection between GPT-4 and GPT-3.5 based on prompt complexity</p>
</li>
<li><p>Automatically routing image-related prompts to vision models</p>
</li>
<li><p>Selecting lightweight models for simple tasks to save costs</p>
</li>
</ul>
<hr />
<h1 id="heading-more-models-amp-more-capacity"><strong>More Models &amp;</strong> More Capacity</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748511630913/4078747b-61d9-4059-87ca-0318f9eb69d6.png" alt class="image--center mx-auto" /></p>
<p><strong>What it is:</strong><br />Soooo many models! In Azure AI Foundry, there are models sold directly by Microsoft, and models from third parties. The update here is that you can now use models from AI focussed tech companies xAI, Black Forest Labs, and Hugging Face.</p>
<p>On top of that, Microsoft is extending reserved capacity to cover Azure OpenAI and select Foundry Models (including Black Forest Labs, and xAI). This means you get consistent performance even when demand spikes.</p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Huge increase in model choice - over 11,000!</p>
</li>
<li><p>Consistent performance under load with reserved capacity options</p>
</li>
<li><p>Fine-tune and experiment without infrastructure overhead</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>Experiment with emerging models in development, then scale to production seamlessly</p>
</li>
<li><p>Ensure consistent response times for AI services during high-traffic periods</p>
</li>
</ul>
<hr />
<h1 id="heading-foundry-local">Foundry Local</h1>
<p><img src="https://devblogs.microsoft.com/foundry/wp-content/uploads/sites/89/2025/05/foundry_local-1024x567.png" alt="Foundry Local Stack" /></p>
<p><strong>What it is:</strong><br />Foundry Local brings Azure AI Foundry capabilities directly to your own infrastructure – whether that’s a developer workstation, edge device, or air-gapped data centre. This includes support for offline execution! Imagine interacting with an AI chatbot as a Windows app on your laptop - even when completely offline!</p>
<p><a target="_blank" href="https://aka.ms/FoundryLocal"><em>Source</em></a></p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Enables AI use in scenarios with strict data privacy or sovereignty requirements</p>
</li>
<li><p>Delivers sub-second response times without a cloud round-trip</p>
</li>
<li><p>Supports hybrid and offline environments with no connectivity needed</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>Run AI models at the edge in manufacturing, retail, or healthcare settings</p>
</li>
<li><p>Enable offline document processing or vision capabilities on laptops</p>
</li>
<li><p>Deploy secure, private agents in government or defence environments</p>
</li>
</ul>
<hr />
<h1 id="heading-fine-tuning-amp-developer-tier">Fine-Tuning &amp; Developer Tier</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748513065432/75847353-7fd2-4474-89b1-f85602048ad6.png" alt class="image--center mx-auto" /></p>
<p><strong>What it is:</strong><br />Fine-tuning has traditionally been tricky – not just technically, but also in terms of cost and where you could actually run it. That’s changing.</p>
<blockquote>
<p>Fine-tuning allows you to retrain a model on your own data so it adapts to your domain. This is different from grounding, which connects a model to external data at runtime without changing how the model is trained.</p>
</blockquote>
<p>With the public previews of <strong>Global Training</strong> and the new <strong>Developer Tier</strong>, Azure AI Foundry is making fine-tuning more accessible than ever. You can now fine-tune the latest Azure OpenAI models in new worldwide regions, with lower pricing designed specifically for experimentation and iteration.</p>
<p>Global Training handles the infrastructure behind the scenes – and the Developer Tier removes the upfront hosting cost, so you only pay when you actually train or use a model.</p>
<p><a target="_blank" href="https://techcommunity.microsoft.com/blog/Azure-AI-Services-blog/azure-openai-fine-tuning-is-everywhere/4414654"><em>Source</em></a></p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Run fine-tuning closer to your data with expanded regional support</p>
</li>
<li><p>Experiment more freely with reduced pricing</p>
</li>
<li><p>Skip the infra setup – Foundry handles it for you</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>Trial multiple fine-tuning approaches in a low-cost environment before committing to production</p>
</li>
<li><p>Fine-tune a model in-region to meet data residency requirements for financial or healthcare data</p>
</li>
<li><p>Quickly test how adding domain-specific examples affects summarisation performance - without setting up infrastructure</p>
</li>
<li><p>Build and evaluate early-stage agent prototypes on the Developer Tier, then scale seamlessly to production using the same workflows</p>
</li>
</ul>
<hr />
<h1 id="heading-multi-agent-orchestration">Multi-agent Orchestration</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748513610093/54c4fcee-081e-4ea6-ab4e-33d54a7398b5.png" alt class="image--center mx-auto" /></p>
<p><strong>What it is:</strong><br />Azure AI Foundry now includes native tools for designing and coordinating multiple AI agents within a single system. This goes beyond prompt chaining - agents can now have distinct roles, shared memory, and coordinated workflows, all managed within Foundry. I need to write about this, and Agents in general, in a lot more detail!</p>
<p>As part of this, Microsoft also introduced an <strong>agent catalogue</strong>: a growing library of pre-built, configurable agents for common tasks like retrieval, planning, evaluation, and summarisation. You can use them as-is or customise them to fit your specific needs.</p>
<p><a target="_blank" href="https://aka.ms/Build25/Multi-Agent_Workflows"><em>Source</em></a></p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Enables more advanced, multi-step AI use cases</p>
</li>
<li><p>Removes the need for custom orchestration logic</p>
</li>
<li><p>Encourages modular, maintainable agent design</p>
</li>
<li><p>Supports real collaboration between specialised agents</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>A planning agent delegates tasks to retrieval, generation, and validation agents</p>
</li>
<li><p>A multi-agent customer service flow handles triage, resolution, and escalation</p>
</li>
<li><p>A compliance assistant splits work across extraction, analysis, and reporting agents</p>
</li>
<li><p>A document workflow uses separate agents to summarise, translate, and format content</p>
</li>
</ul>
<hr />
<h1 id="heading-identity-for-agents">Identity for Agents</h1>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1748513751495/3eed9c11-a11c-4cb4-b1d2-a1ee0984c20c.png" alt class="image--center mx-auto" /></p>
<p><strong>What it is:</strong><br />Microsoft Entra Agent ID is a new capability that brings enterprise-grade identity and access management to AI agents. Just like you’d give an app or service a managed identity, you can now give AI agents their own secure, verifiable identity within your organisation.</p>
<p>This allows agents to authenticate, authorise, and operate securely across your systems, with support for auditing, policy enforcement, and lifecycle management - all integrated with Microsoft Entra.</p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Secures agent interactions with APIs, data, and enterprise resources</p>
</li>
<li><p>Enables RBAC and policy enforcement for AI agents</p>
</li>
<li><p>Improves traceability and auditing of agent actions</p>
</li>
<li><p>Aligns agent behaviour with existing identity and governance frameworks</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>An AI agent authenticates with Entra to access SharePoint or Microsoft Graph</p>
</li>
<li><p>Different agents have scoped permissions based on function - e.g. read-only vs full write access</p>
</li>
<li><p>Agent activity is logged and monitored alongside human and app identities</p>
</li>
</ul>
<hr />
<h1 id="heading-azure-ai-foundry-observability">Azure AI Foundry Observability</h1>
<p><img src="https://devblogs.microsoft.com/foundry/wp-content/uploads/sites/89/2025/05/Picture1.png" alt /></p>
<p><strong>What it is:</strong><br />Azure AI Foundry Observability is a unified solution for governance, evaluation, tracing, and monitoring. It brings real-time visibility into models, agents, workflows, and user interactions - all from a single view.</p>
<p>In my last post, I briefly described evaluation and monitoring in Foundry. I’ve not had chance to look at this in greater detail as yet, but as I understand this announcement, Foundry Observability combines what existed before (within Foundry itself), such as live request tracing, model metrics, agent behaviours, evaluation results, etc - but now fully integrated with Azure Monitor, Application Insights, and the Foundry portal itself.</p>
<p><a target="_blank" href="https://devblogs.microsoft.com/foundry/achieve-end-to-end-observability-in-azure-ai-foundry/"><em>Source</em></a></p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>One integrated view - no more tool sprawl</p>
</li>
<li><p>Real-time tracing and evaluation across agents and models</p>
</li>
<li><p>Built-in governance features to support audits and responsible AI</p>
</li>
<li><p>Makes it easier to go from prototype to production with confidence</p>
</li>
</ul>
<p><strong>Example use cases:</strong></p>
<ul>
<li><p>Trace a user request across multiple agents in a multi-turn workflow</p>
</li>
<li><p>Set alerts when accuracy or safety thresholds are breached</p>
</li>
<li><p>Track agent behaviour over time to spot drift or unexpected changes</p>
</li>
<li><p>Export evaluation logs for compliance or internal reviews</p>
</li>
</ul>
<hr />
<h1 id="heading-and-the-rest"><strong>And The Rest!</strong></h1>
<p>There’s a lot I haven’t covered here - from updates to Agentic Retrieval in Azure AI Search, to Agent Evaluators, and improvements to the Foundry API and SDK.</p>
<p>I’ve also noticed growing overlap with tools like Copilot Studio and Azure Logic Apps, which adds more capability - but also more complexity. It’s a lot to take in, but I hope this post has helped you cut through the noise and focus on what’s new and important in Azure AI Foundry.</p>
<p>The link below (click the image) will take you to another excellent write-up, and the <a target="_blank" href="https://news.microsoft.com/build-2025-book-of-news/"><strong>Build Book of News</strong></a> is another fantastic resource for exploring the full range of announcements.</p>
<p><a target="_blank" href="https://azure.microsoft.com/en-us/blog/azure-ai-foundry-your-ai-app-and-agent-factory/"><img src="https://azure.microsoft.com/en-us/blog/wp-content/uploads/2025/05/1042849_MS_Azure_Build-2025_BlogHeader-1_1260x708-1024x575.webp" alt="What's new in Azure AI Foundry text and Azure AI Foundry logo." /></a></p>
<p>If you’d prefer to see it all in action, I highly recommend the Build session: <a target="_blank" href="https://build.microsoft.com/en-US/sessions/BRK155?source=/schedule">Azure AI Foundry: The AI App and Agent Factory</a> - complete with demos of many of the updates mentioned here.  </p>
<hr />
<p><em>Disclaimer: The views expressed in this blog are my own and do not necessarily reflect those of my employer or Microsoft.</em></p>
]]></content:encoded></item><item><title><![CDATA[Azure AI Foundry: Why Use It?]]></title><description><![CDATA[Introduction
When I explore new tools and technologies, I tend to start with two questions: what is it, and why would I use it? Only once I’ve got a handle on those do I dive into the how.
In my previous post – Azure AI Foundry: What It Is & What It ...]]></description><link>https://daniel.mcloughlin.cloud/azure-ai-foundry-why-use-it</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/azure-ai-foundry-why-use-it</guid><category><![CDATA[Prompt Flow]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[AI]]></category><category><![CDATA[Azure AI Foundry]]></category><category><![CDATA[#microsoft-azure]]></category><category><![CDATA[#responsibleai]]></category><category><![CDATA[genai]]></category><category><![CDATA[generative ai]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Thu, 15 May 2025 19:13:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1747305662597/9fe2efe8-1e9d-4cc9-b827-46afdfb02299.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>When I explore new tools and technologies, I tend to start with two questions: <em>what is it</em>, and <em>why would I use it?</em> Only once I’ve got a handle on those do I dive into the <em>how</em>.</p>
<p>In my previous post – <a target="_blank" href="https://daniel.mcloughlin.cloud/azure-ai-foundry-what-it-is-and-what-it-isnt"><em>Azure AI Foundry: What It Is &amp; What It Isn’t</em></a> – I covered the first of those questions. I unpacked what Azure AI Foundry actually is, what it isn’t, and how it fits into the broader Azure AI ecosystem.</p>
<p>This post focuses on the <em>why</em>.</p>
<p>Usually, this is the easy part. With traditional Azure services like Web Apps or Storage Accounts, it’s simple enough to connect features to use-cases, such as autoscaling for traffic spikes, geo-redundancy for disaster recovery, that sort of thing.</p>
<p>But Azure AI Foundry doesn’t work like that – and that’s where it gets interesting.</p>
<p>Foundry isn’t a single service. It’s not a standalone resource. It’s a platform – a combination of services, tooling, and governance layers designed to help you build, customise, deploy, monitor, and govern AI solutions in one place.</p>
<p>So if I’m going to explain why you’d use it, I need to talk about the platform as a whole and the building blocks within it – because that’s where the real value is.</p>
<p>Azure AI Foundry as a platform is structured into three core stages:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747334951031/310952cf-b022-49c2-92a5-b3f962d1ca60.png" alt class="image--center mx-auto" /></p>
<p><em>Note how each of these stages serves a purpose in the journey from idea to production!</em></p>
<p>In this post, I’m going to explore the use-cases for Azure AI Foundry by walking through each stage in turn.</p>
<p>Once you know why you’d make use of each stage, you’ll start to see the real power of AI Foundry as a wider platform.</p>
<hr />
<h1 id="heading-define-and-explore">Define and Explore</h1>
<p>Before you build anything, you need to choose the right model – and that’s where Azure AI Foundry really sets itself apart, thanks to two key capabilities: the <strong>Model Catalogue</strong> and the <strong>Playground</strong>.</p>
<h2 id="heading-model-catalogue">Model Catalogue</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747320514890/26730830-e164-4c37-90a3-3b1c262f0d76.png" alt class="image--center mx-auto" /></p>
<p>At the time of writing, the catalogue includes 1,987 models!! From OpenAI, Meta, Mistral, DeepSeek, and others – all accessible in one place through a unified interface.</p>
<p>The real benefit is that you don’t need to integrate with third parties or set up separate accounts and environments to use non-Microsoft models. You can also search, filter, and rank options against metrics such as quality, cost, and latency – helping you find the right fit without guesswork or hours of R&amp;D.</p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Access top models across providers without leaving Azure</p>
</li>
<li><p>Use leaderboards to evaluate quality, cost-efficiency, and throughput</p>
</li>
<li><p>Avoid vendor lock-in with consistent interfaces for every model</p>
</li>
</ul>
<p><strong>Example:</strong><br />You’re building a chatbot for customer support. You need something that can:</p>
<ul>
<li><p>Respond conversationally and accurately</p>
</li>
<li><p>Return answers quickly (throughput)</p>
</li>
<li><p>Stay within budget</p>
</li>
</ul>
<p>Using the catalogue, you can shortlist models, compare them side-by-side, and pick one based on actual performance data – not just a brand name.</p>
<h2 id="heading-playground">Playground</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747320604785/d6ae28c4-6df4-4694-93fa-c0aed31f4785.png" alt class="image--center mx-auto" /></p>
<p>Think of the Playground as your safe space to get hands-on – whether you're working with text, speech, vision, or something more specialised.</p>
<p>And it’s not just for chatbots. There are dedicated playgrounds for:</p>
<ul>
<li><p><strong>Agents</strong> – build AI agents grounded in your enterprise data that can act independently</p>
</li>
<li><p><strong>Audio</strong> – test native speech-to-speech and voice-based experiences</p>
</li>
<li><p><strong>Images</strong> – generate visuals from text prompts using models like DALL·E</p>
</li>
<li><p><strong>Speech, Language &amp; Translation</strong> – summarise text, answer questions, or translate between languages</p>
</li>
</ul>
<p>Once you’ve picked a model, you can start interacting with it – trying prompts, tuning system instructions, uploading data, and refining the experience – all within the same portal.</p>
<p><strong>Why it matters:</strong></p>
<ul>
<li><p>Everything happens in one place – no infrastructure to spin up</p>
</li>
<li><p>Supports a wide range of AI tasks, not just conversational use-cases</p>
</li>
<li><p>You can test grounding, flow, tone, latency, and more</p>
</li>
</ul>
<p><strong>Example:</strong><br />You’re building an insurance claims assistant. In the Playground, you can test whether a model can understand policy language, summarise complaints, and respond appropriately – all before writing any code.</p>
<blockquote>
<p>There’s a clear pattern forming here: pick a model, try it out, then…</p>
</blockquote>
<hr />
<h1 id="heading-build-and-customise">Build and Customise</h1>
<p>Once you've identified the right model, Foundry provides everything you need to shape it into a working solution. This stage is where ideas become applications.</p>
<h2 id="heading-agents">Agents</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747321104498/94aad3fd-cbf2-4126-be7c-9ae02d37cc17.png" alt class="image--center mx-auto" /></p>
<p>If you want AI that does more than just respond to prompts – like fetching data, making decisions, or calling APIs – Agents give you that flexibility within a structured workflow.</p>
<p><strong>Why it matters:</strong><br />Great for building multistep processes that need real-time decisions and data handling.</p>
<p><strong>Example:</strong><br />A user asks a chatbot, “Where’s my order?” Behind the scenes, agents could:</p>
<ul>
<li><p>Call an API to get the status</p>
</li>
<li><p>Determine if a refund is needed</p>
</li>
<li><p>Respond with a personalised update – all automatically</p>
</li>
</ul>
<h2 id="heading-templates">Templates</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747321165301/e093dc8d-8274-48c0-9dea-bf58dc54b05b.png" alt class="image--center mx-auto" /></p>
<p>Templates are code-first samples hosted on GitHub, with pre-written instructions and scripts ready to deploy locally, in containers, or GitHub Codespaces.</p>
<p><strong>Why it matters:</strong><br />They help you go from experiment to pro-code implementation fast – especially when you want to integrate a model into a real application.</p>
<p><strong>Example:</strong><br />You want to quickly test a multi-agent use-case that combines order tracking with sentiment analysis. Instead of starting from scratch, you deploy a GitHub-hosted template that wires everything together with minimal setup.</p>
<h2 id="heading-fine-tuning">Fine-tuning</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747321295155/bfc1deb8-5e95-4c52-907c-a7416d9feaf7.png" alt class="image--center mx-auto" /></p>
<p>Fine-tuning allows you to retrain a model on your own data so it adapts to your domain. This is different from grounding, which connects a model to external data at runtime without changing how the model is trained.</p>
<p>Think of it like this:</p>
<ul>
<li><p><strong>Grounding</strong> is real-time – you point the model to fresh data each time it’s used</p>
</li>
<li><p><strong>Fine-tuning</strong> is permanent – you change how the model behaves based on your own examples</p>
</li>
</ul>
<p><strong>Why it matters:</strong><br />Fine-tuning lets you customise a model to better reflect your domain, tone, and terminology - so it can generate responses that are more accurate, aligned with your brand, and tailored to your users.</p>
<p><strong>Example:</strong><br />You run a financial services chatbot. By fine-tuning the model on your internal policies and product names, it gives more accurate and compliant responses without needing long prompts every time.</p>
<h2 id="heading-content-understanding">Content Understanding</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747321525881/7e37e1aa-df66-4b99-aef0-5fc6d6876968.png" alt class="image--center mx-auto" /></p>
<p>Content Understanding uses AI and services like Azure AI Search to extract meaningful, structured, machine-readable data from unstructured sources – such as text, images, audio, or documents.</p>
<p><strong>Why it matters:</strong><br />Perfect for automating document processing, turning messy data into searchable content, and helping AI find the right information to generate accurate responses.</p>
<p><strong>Example:</strong><br />You upload scanned contracts, meeting transcripts, and product datasheets. Content Understanding processes these files, extracts key details like clauses, deadlines, and names, and makes them searchable – ready to power chatbots or automated review systems.</p>
<h2 id="heading-prompt-flow">Prompt Flow</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747321601090/6e60bad8-4093-4389-b8e7-768dca7cf35e.png" alt class="image--center mx-auto" /></p>
<p>Prompt Flow is a visual tool for designing and testing AI workflows by combining models, logic, and data into repeatable steps.</p>
<p><strong>Why it matters:</strong><br />Prompt Flow gives you a faster, more reliable way to turn prompt experiments into real applications. Instead of managing scattered scripts and logic, you can design, test, and iterate on AI workflows in one place — with visibility, version control, and deployability built in.  </p>
<p><strong>Example:</strong><br />You build a flow that takes customer feedback, summarises it using a model, runs sentiment analysis via a Python script, and stores the result in a database – all within a single visual workflow.</p>
<blockquote>
<p>At this stage, you’ve picked a model, tested it, and now built it into something usable. But before it’s truly production ready, there’s more to consider…</p>
</blockquote>
<hr />
<h1 id="heading-assess-and-improve">Assess and Improve</h1>
<p>Building an AI application is one thing – keeping it reliable, safe, and effective is another. Foundry supports this with tracing, evaluation, and Responsible AI tooling.</p>
<h2 id="heading-tracing">Tracing</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747322198941/38f14ef5-dfbe-485a-a6dd-92a118494ea8.png" alt class="image--center mx-auto" /></p>
<p>Tracing gives you detailed visibility into how your AI application runs behind the scenes. It captures each step in a prompt flow – including model calls, API interactions, function executions, and data handling – and presents it as a visual timeline using Azure Application Insights.</p>
<p><strong>Why it matters:</strong><br />Tracing helps you understand exactly how your AI flow behaves – which is critical when diagnosing issues, identifying performance bottlenecks, or validating that your logic is working as expected. It saves hours of guesswork and gives you confidence that your app will behave reliably in production.</p>
<p><strong>Example:</strong><br />You’ve built a multi-step chatbot that fetches data from several APIs. A user reports slow responses. Using tracing, you discover a third-party API call is consistently delaying the flow – allowing you to pinpoint and fix the issue quickly.</p>
<h2 id="heading-evaluation">Evaluation</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747322410835/3eba5877-9506-418e-ac09-7e21b8fc88cf.png" alt class="image--center mx-auto" /></p>
<p>Foundry's evaluation tooling lets you measure how well your application is performing using manual or automated evaluators. These are designed to test the quality, safety, and reliability of the outputs generated by your models, datasets, or prompt flows.</p>
<p>You can choose from Microsoft-curated evaluators like:</p>
<ul>
<li><p><strong>Groundedness Evaluator</strong> – checks how well the response stays tied to source data</p>
</li>
<li><p><strong>Toxicity and Violence Evaluators</strong> – flag unsafe content</p>
</li>
<li><p><strong>Relevance, Similarity, and Retrieval Evaluators</strong> – assess response quality</p>
</li>
<li><p><strong>Hate-and-Unfairness Evaluator</strong> – monitors bias and discrimination</p>
</li>
<li><p><strong>Indirect Attack Evaluator</strong> – part of red-teaming scenarios</p>
</li>
</ul>
<p>All evaluators can be run manually, automated in CI/CD pipelines, or used during development to fine-tune quality before release.</p>
<p><strong>Why it matters:</strong><br />Evaluation helps you build trustworthy AI. It gives you clear metrics for performance and safety, helps you choose the best model or prompt for your scenario, and ensures consistent quality as your application evolves.</p>
<p><strong>Example:</strong><br />You're testing two prompt flows for summarising customer emails. By using the Relevance and Groundedness evaluators, you can see which one produces more accurate summaries, and automatically flag hallucinated content – before your app ever goes live.</p>
<h2 id="heading-safety-and-security">Safety and Security</h2>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1747324566711/8900eb29-de59-4042-93be-b4bf81c31e2c.png" alt class="image--center mx-auto" /></p>
<p>Foundry includes built-in tools to help you deliver AI responsibly – by identifying, measuring, and managing risk throughout your app’s lifecycle.</p>
<p>The process is structured around three key stages:</p>
<ul>
<li><p><strong>Map</strong> potential issues like misuse, bias, or unsafe content using red-teaming and testing</p>
</li>
<li><p><strong>Measure</strong> how frequently risks occur and how severe they are, using test datasets and evaluation metrics</p>
</li>
<li><p><strong>Manage</strong> risk in production with layered mitigation plans – including filters, grounding, system prompts, and operational monitoring</p>
</li>
</ul>
<p><strong>Why it matters:</strong><br />When you're shipping AI into the real world, trust is everything. Foundry helps you catch issues early, prove your system is safe, and stay ahead of regulatory or reputational risks – without having to build your own safety stack from scratch.</p>
<p><strong>Example:</strong><br />You're about to launch a chatbot that handles customer queries. With Foundry, you can simulate harmful prompts, evaluate how the model responds, and configure output filters and safety layers to prevent misuse – all before your users ever see it.</p>
<blockquote>
<p>There we go – you’ve picked a model, tested it, built it into something usable, and now you’ve embedded it into your application, knowing you can keep a close eye on it.</p>
</blockquote>
<hr />
<h1 id="heading-conclusion">Conclusion</h1>
<p>Let’s imagine for a moment that Azure AI Foundry didn’t exist. You’d be evaluating models from multiple providers, stitching together tools, and figuring out how to maintain visibility, security, and compliance at every step. Even spinning up a simple AI-enabled app would be time-consuming and complex.</p>
<p>Azure AI Foundry changes that. It brings everything into one unified platform – so you can move faster, stay consistent, and go from prototype to production with the guardrails already built in.</p>
<p>That’s the real value of Foundry: the heavy lifting is done for you. Everything’s under one roof and deeply integrated with the Azure ecosystem. You get consistency, speed, and security – so you can focus on building, not bolting things together.</p>
<hr />
<p><em>Disclaimer: The views expressed in this blog are my own and do not necessarily reflect those of my employer or Microsoft.</em></p>
]]></content:encoded></item><item><title><![CDATA[🧠 Dan’s AI Terminology Tracker]]></title><description><![CDATA[This past weekend - in between kids’ birthday parties, cake, and chaos - I pulled together something I’ve been meaning to build for a while:
🎯 A visual, open source AI terminology tracker - designed to help make sense of the rapidly evolving languag...]]></description><link>https://daniel.mcloughlin.cloud/dans-ai-terminology-tracker</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/dans-ai-terminology-tracker</guid><category><![CDATA[AI]]></category><category><![CDATA[ML]]></category><category><![CDATA[Azure]]></category><category><![CDATA[Microsoft]]></category><category><![CDATA[opensource]]></category><category><![CDATA[Open Source]]></category><category><![CDATA[terminology]]></category><category><![CDATA[cloud terminology]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Mon, 07 Apr 2025 11:35:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1744025869583/76cd0bd9-4381-4171-bdee-48c950a728a2.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>This past weekend - in between kids’ birthday parties, cake, and chaos - I pulled together something I’ve been meaning to build for a while:</p>
<p>🎯 A <strong>visual, open source AI terminology tracker</strong> - designed to help make sense of the rapidly evolving language around AI, ML, Generative AI, and the Microsoft AI ecosystem.</p>
<p>As someone diving deep into <strong>Azure AI Foundry</strong>, I kept running into scattered definitions, mismatched documentation, and “what does that mean again?” moments. So I did what we do - I built a tool.</p>
<hr />
<h2 id="heading-what-it-is">🗺️ What It Is</h2>
<p>The tracker is a <strong>Markmap-powered mind map</strong> that organizes:</p>
<ul>
<li><p>AI/ML learning paradigms</p>
</li>
<li><p>LLM and Generative AI concepts</p>
</li>
<li><p>MLOps and GenAIOps workflows</p>
</li>
<li><p>Data pipelines and infrastructure</p>
</li>
<li><p>Microsoft-specific tools like Azure OpenAI, Prompt Flow, AI Studio, and more</p>
</li>
<li><p>Ethics, governance, and emerging regulations like the EU AI Act</p>
</li>
</ul>
<p>It’s designed to be <strong>clickable, skimmable, and useful</strong> whether you’re a beginner or deep in the Azure AI space.</p>
<hr />
<h2 id="heading-explore-it">🔗 Explore It</h2>
<ul>
<li><p><strong>Live Map:</strong> <a target="_blank" href="https://ai-terms.daniel.mcloughlin.cloud">ai-terms.daniel.mcloughlin.cloud</a></p>
</li>
<li><p><strong>GitHub Repo (open to contributions):</strong> <a target="_blank" href="https://github.com/clouddevdan/dans-ai-terminology-tracker">github.com/clouddevdan/dans-ai-terminology-tracker</a></p>
</li>
</ul>
<blockquote>
<p>💬 Got a term I’ve missed? A better link? Open a PR or raise an issue - it’s a living resource.</p>
</blockquote>
<hr />
<h2 id="heading-a-few-notes">🚧 A Few Notes</h2>
<ul>
<li><p>It’s still a <strong>work in progress</strong> — expect regular updates</p>
</li>
<li><p>It leans toward the <strong>Microsoft ecosystem</strong> — by design</p>
</li>
<li><p>Built with ❤️, Markdown, Markmap, and the 30 minutes between party clean-up and bedtime</p>
</li>
</ul>
<hr />
<p>Thanks for checking it out - and if it helps clarify something along your AI learning journey, I’d love to hear about it!</p>
]]></content:encoded></item><item><title><![CDATA[Azure AI Foundry: What It Is & What It Isn't]]></title><description><![CDATA[Introduction
The title of this post probably gives the game away, but why write it in the first place? Azure AI Foundry sounds impressive, but isn’t it just a rebrand of Azure AI Services? And wasn’t that just a rebrand of Azure Cognitive Services? O...]]></description><link>https://daniel.mcloughlin.cloud/azure-ai-foundry-what-it-is-and-what-it-isnt</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/azure-ai-foundry-what-it-is-and-what-it-isnt</guid><category><![CDATA[Microsoft]]></category><category><![CDATA[AI]]></category><category><![CDATA[Azure]]></category><category><![CDATA[azure ai services]]></category><category><![CDATA[Azure AI Foundry]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Thu, 03 Apr 2025 13:03:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743666241500/cca0c4d5-1b7b-4a10-b21f-909798056985.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h1 id="heading-introduction">Introduction</h1>
<p>The title of this post probably gives the game away, but why write it in the first place? Azure AI Foundry sounds impressive, but isn’t it just a rebrand of Azure AI Services? And wasn’t that just a rebrand of Azure Cognitive Services? Or maybe it’s all just a frontend for some advanced OpenAI stuff? Or is it the umbrella for everything AI-related in Azure now? Where does Copilot fit into all of this?!</p>
<p>See my point.</p>
<p>The name is great, but the purpose isn’t immediately obvious - and when something isn’t immediately clear, I find it helpful to break it down. So, in this post, I’ll do my best to unpick what Azure AI Foundry <em>is</em>, what it <em>is not</em>, and how it fits into the wider Azure AI story.</p>
<h1 id="heading-a-brief-history">A Brief History</h1>
<p>When I think of Azure AI, the first thing that comes to mind is Azure Cognitive Services. From what I remember, that was essentially a wrapper - a way to bundle up existing services like Document Intelligence, Speech Services, and others under a single label.</p>
<p>Then came Azure AI Services, which, at a glance, looked like another round of consolidation and rebranding.</p>
<p>Things started to shift with the introduction of Azure OpenAI, and more recently Azure AI Studio, which introduced a more unified space to interact with multiple services in one place.</p>
<p>Now we have Azure AI Foundry, which looks like the next iteration of that effort - but with a clearer focus on generative AI but also orchestration.</p>
<p><strong>Quick timeline to give this some context:</strong></p>
<ul>
<li><p><strong>2015 (Apr)</strong>: Project Oxford launched (origin of Microsoft Cognitive Services).</p>
</li>
<li><p><strong>2016 (Mar)</strong>: Rebranded as Microsoft Cognitive Services.</p>
</li>
<li><p><strong>2017 (Apr)</strong>: Cognitive Services reached general availability.</p>
</li>
<li><p><strong>2017</strong>: Azure Machine Learning launched for model training and deployment.</p>
</li>
<li><p><strong>2019–2020</strong>: Expanded enterprise AI with MLOps and responsible AI tools.</p>
</li>
<li><p><strong>2023 (Jan)</strong>: Microsoft expanded its OpenAI partnership with a major investment.</p>
</li>
<li><p><strong>2023</strong>: Azure OpenAI Service launched (GPT, DALL·E, Codex via API).</p>
</li>
<li><p><strong>2023 (Nov)</strong>: Azure AI Studio launched at Ignite as a unified GenAI platform.</p>
</li>
<li><p><strong>2024 (Mar)</strong>: Azure AI Foundry announced for building and scaling GenAI solutions.</p>
</li>
<li><p><strong>2024 (Nov)</strong>: Azure AI Foundry launched, unifying Microsoft’s enterprise GenAI stack.</p>
</li>
</ul>
<p><em>I used ChatGPT for this, so please forgive any inaccuracies.</em></p>
<h1 id="heading-azure-ai-foundry-what-is-it">Azure AI Foundry - What Is It?</h1>
<p>In a nutshell, Azure AI Foundry is more than just a wrapper containing lots of related services - <strong>it’s a platform</strong>.</p>
<p>Under a unified interface, Azure AI Foundry:</p>
<ul>
<li><p>Brings together what used to be known as Cognitive Services (Document Intelligence, Vision, Language, Translator, Speech), Azure OpenAI, Azure AI Search, and more.</p>
</li>
<li><p>Gives access to a huge model catalogue, far beyond just Azure OpenAI models. You can compare models based on features and capabilities.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743594604874/366ab1a5-f781-421d-ac3b-b9c7406f9339.png" alt="Azure AI Foundry Model catalogue" class="image--center mx-auto" /></p>
<ul>
<li><p>Lets you deploy models directly from the catalogue (subject to availability), generating API endpoints along with sample code to get started.</p>
</li>
<li><p>Includes tools for model evaluation, helping you measure and compare model performance. (Model training also appears to be supported - I'll dig into the why and how in a future post.)</p>
</li>
<li><p>Provides a hands-on "playground" environment for experimentation, covering things like image generation and real-time audio. You can link your playground work to models from the catalogue.</p>
</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743594673467/dbb541dc-fe05-4669-b1cb-18535348f234.png" alt="AI Playgrounds" class="image--center mx-auto" /></p>
<ul>
<li><p>Supports custom data connections for building more tailored applications.</p>
</li>
<li><p>Enables additional capabilities like building Agents, using Templates, fine-tuning models, and creating prompt-flows. (I’ll explore these more in future posts as I learn.)</p>
</li>
<li><p>Includes a dedicated set of features for safety and security - this is a big focus within Foundry and deserves its own section, which I plan to explore in more detail as I get hands-on.</p>
</li>
</ul>
<p>When evaluating new products and services like this, I try to take a holistic view - looking beyond just the features and capabilities. Azure AI Foundry isn’t a resource in the traditional sense. It’s not like a virtual machine or web app that you simply deploy and start using. Instead, it’s a platform within a platform, with its own self-contained ecosystem. It brings together a wide range of powerful tools and capabilities, which are impressive on their own—but the real value emerges when you start stitching them together and integrating them with the wider Azure and Microsoft ecosystem.</p>
<p>For example, you might use Azure AI Foundry to prototype, validate, and test your AI workflows, incorporating Responsible AI checks along the way. Then, as you transition from prototype to production, you’d connect your work to other Azure services such as Entra ID, Azure Monitor, AKS, API Management, and more—unlocking end-to-end scalability, governance, and operational readiness.</p>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/ai-foundry/concepts/evaluation-approach-gen-ai"><img src="https://learn.microsoft.com/en-us/azure/ai-foundry/media/evaluations/lifecycle.png#lightbox" alt="Diagram of enterprise GenAIOps lifecycle, showing model selection, building an AI application, and operationalizing." /></a></p>
<p>This diagram illustrates the GenAIOps lifecycle that underpins the journey I’ve just tried to describe - showing how Azure AI Foundry supports the transition from experimentation to production.</p>
<p>In short: it’s actually really hard to describe! There’s a lot to it, and in all honestly, I’m finding the portal UI a bit overwhelming. I’ll break that down into manageable chunks as this series progresses.</p>
<h1 id="heading-azure-ai-foundry-what-it-isnt">Azure AI Foundry - What It Isn’t!</h1>
<p>First off, at the time of writing at least, Azure AI Foundry is not something you have to use in order to consume Azure AI services. You can still deploy many of these services independently—for example, setting up Document Intelligence or Azure OpenAI as standalone resources.</p>
<p>Why might you want to do that? Well, deploying services independently may give you more flexibility, especially if you're only using a specific capability and don’t need the full orchestration layer that Foundry provides. It might also be preferable for production environments where you already have infrastructure and pipelines set up, or where resource-level access and governance are tightly controlled.</p>
<p>Second, Azure AI Foundry is not everything AI-related in Azure. Services like:</p>
<ul>
<li><p>Azure Machine Learning (for model training and ML workflows)</p>
</li>
<li><p>Azure Databricks (for collaborative data science and big data analytics)</p>
</li>
<li><p>Azure Synapse Analytics and Microsoft Fabric (for end-to-end analytics solutions)</p>
</li>
</ul>
<p>...are not natively part of Foundry.</p>
<p>That said, they can be integrated where needed. Foundry plays nicely with the rest of the Azure ecosystem, so you can still connect to data, embed workflows, or trigger interactions across services like Fabric or Logic Apps.</p>
<h1 id="heading-github-marketplace">GitHub Marketplace</h1>
<p>As with most things in IT, there’s more than one way to do things!</p>
<p>GitHub also offers a model catalogue and playground through the <a target="_blank" href="https://github.com/marketplace"><strong>GitHub Marketplace</strong></a>.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743683672232/30c4232d-6360-4e64-aeee-8f0de5d5ab5d.png" alt="GitHub Marketplace" class="image--center mx-auto" /></p>
<p>Unlike Foundry, where you have to deploy a model to use it, the GitHub Marketplace allows you to use hosted models directly without any setup. You can even compare models side-by-side, much like you can in Foundry. So what’s the catch? GitHub is free to use, but comes with rate limits, making it better suited for experimentation and quick testing than anything more meaningful.</p>
<h1 id="heading-azure-ai-foundry-vs-microsoft-copilot">Azure AI Foundry vs Microsoft Copilot</h1>
<p>Calm down! Calm down! I’m not going to get into the whole “Pro-code vs Low-code” debate here. Not in this blog post, anyway!</p>
<p>I feel the need to call this out as “Microsoft AI” isn’t just one thing, or even one suite of things. There are different tools for different jobs.</p>
<p>Azure AI Foundry is built for developers and technical teams who want to create tailored AI experiences. It’s highly configurable, gives access to a broad set of models, and is designed for integration with other Azure services.</p>
<p>Copilot, on the other hand, is a ready-made tool aimed at end-users. It’s built into Microsoft 365 apps like Word, Excel, and Dynamics, and is meant to enhance productivity out of the box. It’s low-code (or no-code), and while it can be extended and customised to some extent, it doesn’t offer the same depth or flexibility as Foundry.</p>
<p>I’ll be digging deeper into this comparison in a dedicated blog post, where I’ll explore use cases, integration approaches, and how to decide which makes sense for your project.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>Azure AI Foundry isn’t a replacement for everything else in Azure’s AI and data stack - and it doesn’t try to be. Instead, it offers a structured, developer-friendly way to experiment with and build generative AI solutions using Microsoft’s growing set of pre-trained services.</p>
<p>As someone learning this in real time, I’m using this series to make sense of the evolving landscape. If you’re on a similar path, hopefully this helps clarify where Azure AI Foundry fits in - and where it doesn’t.</p>
<hr />
<p><em>Disclaimer: The views expressed in this blog are my own and do not necessarily reflect those of my employer or Microsoft. AI tools were used to assist with structure, spelling, and grammar — not for content authoring or ideas.</em></p>
]]></content:encoded></item><item><title><![CDATA[Beginner To Builder with Azure AI Foundry]]></title><description><![CDATA[Does anyone else feel utterly overwhelmed by the concept of learning generative AI, or is it just me?!
I love the variety you get when working in tech, but the rate of change can be hard to stay on top of - and when it comes to GenAI, it feels like i...]]></description><link>https://daniel.mcloughlin.cloud/beginner-to-builder-with-azure-ai-foundry</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/beginner-to-builder-with-azure-ai-foundry</guid><category><![CDATA[Azure]]></category><category><![CDATA[genai]]></category><category><![CDATA[foundry]]></category><category><![CDATA[AI]]></category><category><![CDATA[Microsoft]]></category><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Mon, 31 Mar 2025 16:22:38 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1743431675206/797a0517-7112-4d0c-84cf-50cbdf127c1a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Does anyone else feel utterly overwhelmed by the concept of learning generative AI, or is it just me?!</p>
<p>I love the variety you get when working in tech, but the rate of change can be hard to stay on top of - and when it comes to GenAI, it feels like it’s flying past me at a rate of knots.</p>
<p>I like to think I’m a dab hand with ChatGPT and Copilot, and I’ve done my fair share of prompting, but these days I can’t open LinkedIn without being bombarded by posts and images on topics like Agentic AI, MCP, and don’t even get me started on the latest image generation features from OpenAI…</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1743437422848/9b45679e-a737-472c-b83c-6a9b8ae76e4c.png" alt="A surprised bald man with glasses and a beard holds his head in shock, standing in front of a whiteboard that reads &quot;Learning AI is hard.&quot;" class="image--center mx-auto" /></p>
<p><em>👆 Yes - I made the image above using ChatGPT. Sorry, not sorry.</em></p>
<p>I’m taking it upon myself to go deeper into the world of AI, but I know I need to narrow the scope to avoid being overwhelmed. I also want to build on the skills and experiences I already have with Microsoft Azure. That way, I’m laying foundations on solid ground - not starting from absolute zero.</p>
<p>I’ve got a rough idea of where to begin (shoutout to the <a target="_blank" href="https://learn.microsoft.com/en-us/credentials/certifications/azure-ai-fundamentals/?practice-assessment-type=certification">Azure AI Fundamentals certification on Microsoft Learn</a>), but I find even that content fairly broad. And I know from experience: I learn best by doing.</p>
<h3 id="heading-why-azure-ai-foundry">Why Azure AI Foundry?</h3>
<p>That’s why I’ve decided to anchor my learning on <strong>Azure AI Foundry</strong> - using the Fundamentals content as background reading, and getting hands-on within a self-contained platform that lets me move from beginner to builder, one step at a time.</p>
<p>I’ve chosen this platform because it offers a safe, structured space to explore GenAI specifically within the Azure ecosystem - my area of expertise - without the chaos of piecing everything together from scratch.</p>
<p>Azure AI Foundry brings together the key elements needed to build, customise, deploy, and scale GenAI-powered applications - all in one place. For someone like me, looking to explore real-world use cases without getting lost in the weeds, that’s a huge advantage.</p>
<h3 id="heading-what-to-expect-from-this-series">What to Expect from This Series</h3>
<p>I’m calling this series <strong>Beginner to Builder with Azure AI Foundry</strong>, because - quite literally - that’s the journey I’m on. I’ll be learning in public, documenting my findings, and sharing them in a way that helps others who might be in a similar place.</p>
<p>Expect posts that:</p>
<ul>
<li><p>Break down AI Foundry into digestible, logical chunks</p>
</li>
<li><p>Explore strategic planning, customisation, optimisation, and operational considerations</p>
</li>
<li><p>Reflect on real-world applicability, not just technical possibilities</p>
</li>
<li><p>Speak to those who are curious but unsure where to start</p>
</li>
</ul>
<p>If you’re keen to explore GenAI from a <strong>practical, Azure-first perspective</strong> - without trying to absorb everything at once - this series is for you.</p>
<p>First post dropping soon - keep an eye out!</p>
<hr />
<p><em>Disclaimer: The views expressed in this blog are my own and do not necessarily reflect those of my employer or Microsoft. AI tools were used to assist with structure, spelling, and grammar — not for content authoring or ideas.</em></p>
]]></content:encoded></item><item><title><![CDATA[Archive: Microsoft Azure AI Landing Zones]]></title><description><![CDATA[Note: This article has been archived.

To say it’s a hot topic at the moment would be the understatement of the year!
One need only look at Microsoft’s announcements from Build and Inspire this year to understand the scale of their investment into AI...]]></description><link>https://daniel.mcloughlin.cloud/microsoft-azure-ai-landing-zones</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/microsoft-azure-ai-landing-zones</guid><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Tue, 08 Aug 2023 12:15:25 GMT</pubDate><content:encoded><![CDATA[<p>Note: This article has been archived.</p>
<hr />
<p>To say it’s a hot topic at the moment would be the understatement of the year!</p>
<p>One need only look at Microsoft’s announcements from Build and Inspire this year to understand the scale of their investment into AI. They’ve not only launched new AI driven products such as the Azure OpenAI Service and Bing Chat Enterprise – they’ve been baking it into what feels like their entire product set through the likes of Microsoft 365 Copilot, Microsoft Sales Copilot, GitHub Copilot and more. Microsoft has even partnered with Meta (the owners of Facebook et al.) to invest in Llama 2, a large language model (LLM) akin to ChatGPT (albeit with different use-cases).</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1691484625855/f306578e-c2ef-4528-aa28-d68f1f1116ae.png" alt class="image--center mx-auto" /></p>
<p>Given the plethora of Microsoft marketing campaigns, announcements, updates, and initiatives such as the <a target="_blank" href="https://www.microsoft.com/en-us/cloudskillschallenge/ai/registration/2023">Microsoft Learn AI Skills Challenge</a>, it’s understandably easy for those of us of working in the Microsoft space to fall down the rabbit hole of AI and lose sight of the bigger picture.</p>
<p>Consider OpenAI’s ChatGPT for example. It’s become so well known, even my non-technical friends and family have asked me about it. Conversational AI such as ChatGPT, and other generative AI such as OpenAI’s DALL·E 2 and Codex only form part of a much wider AI offering.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1691484574130/3956bcbd-18db-4f4d-9847-8e80a5eff710.png" alt class="image--center mx-auto" /></p>
<p>Consider the recently renamed Azure AI services offering from Microsoft. Whilst ChatGPT may be stealing the limelight, incredibly powerful AI based solutions such as Cognitive Search and Speech offer a plethora of AI based capabilities. Also, let’s not forget Azure Machine Learning, which rather confusingly seems to have been classified outside the Azure AI services grouping.</p>
<p>The point I’m trying to make is that AI, particularly in Azure, is way more than just OpenAI based solutions like ChatGPT.</p>
<p>The bigger picture, however, does not stop there. What good is AI without the data that feeds it, or the infrastructure that hosts it? And what good are either of those things without security, and governance, and connectivity, and so on….</p>
<p>AI in Azure is not something deployed in isolation. Not for anything beyond a dev/test scenario, anyway.</p>
<p>Consider the below Azure Speech Services solution:</p>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/speech-services"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1691485001258/31a97c6b-ff3d-4a87-b52b-8fd7f036afc2.png" alt class="image--center mx-auto" /></a></p>
<p>In addition to the Azure Cognitive Services both for Language and Speech, there is data in the form of blob storage, processing in the form of Function Apps and presentation in the form of a Web App and Power Bi. The solution comprises several individual components working together.</p>
<p>Now, consider the security and governance aspects of this solution: Is each resource accessible from the internet, or behind a Private Endpoint? Can anyone in your organisation get read access to the data, or is it restricted to specific users or groups? What if multiple instances (and variations thereof) exist, such as production, development and testing environments?</p>
<p>This is where <a target="_blank" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/"><strong>Landing Zones</strong></a> come in.</p>
<p>Applied at either an application level (such as in the above scenario) or at a platform level (covering your entire Azure tenant and subscriptions), an Azure Landing Zone can be used to define and maintain key design principles, such as security and governance.</p>
<p>Our sample solution could be deployed over multiple Azure subscriptions - one per environment. The subscription can be used as a boundary for both costs but also access. For example, the production environment is deployed to its own subscription. Access onto that subscription is applied via Role Based Access Control (RBAC) only to those users/groups that need it, and Azure Policy is used to audit or even enforce rules, such as the regional location to which resources can be deployed. You'd apply the same concept to a development environment for example, where you'd apply subscription level budgets to prevent costs spiralling out of control, and use Azure Policy to prevent the deployment is highly expensive SKUs.</p>
<p>You can scale this concept out even further, and have centralised resources such as a firewall and VPN gateway providing hub and spoke network topology, and so on.</p>
<p>You'd typically define your platform and application Landing Zones as infrastructure-as-code (IaC) such as Terraform or Bicep, maintain the code within source control, such as GitHub or Azure Repos, and deploy them via pipelines, such as GitHub Actions or Azure Pipelines. Doing so can ensure consistency, compliance and the application of best practice.</p>
<p>So how does this tie back to AI again?!</p>
<p>A couple of weeks ago, Microsoft released a <a target="_blank" href="https://techcommunity.microsoft.com/t5/azure-architecture-blog/azure-openai-landing-zone-reference-architecture/ba-p/3882102?WT.mc_id=academic-0000-abartolo">reference architecture for an Azure OpenAI Landing Zone</a>. Seen below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1691486470056/d4c493b7-4621-4183-941b-3c6ab44c5e76.png" alt class="image--center mx-auto" /></p>
<p>What you're looking at here appears to be nothing more than a slight extension to the Microsoft provided reference architecture for an <a target="_blank" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/#azure-landing-zone-architecture">enterprise scale Azure landing zone</a>, pictured below:</p>
<p><a target="_blank" href="https://learn.microsoft.com/en-us/azure/cloud-adoption-framework/ready/landing-zone/#azure-landing-zone-architecture"><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1691487269474/311c0b44-2310-416c-9818-37e1706ab4d2.png" alt class="image--center mx-auto" /></a></p>
<p>Note that in Microsoft's AI Landing Zone example, they only reference OpenAI!</p>
<p>What Microsoft have shared with us is their best practice approach to integrating Azure AI resources within a wider context, such as an application or platform level Landing Zone. This includes the likes of Private Endpoints, Network Security Groups and Web Application Firewalls. They also address concepts such as load balancing, monitoring, and identity management.</p>
<p>Building out this solution is way more complicated than it looks (for example via IaC and pipelines), especially with the complexities of private networking in Azure, but to me, this highlights the point I made earlier: AI in Azure doesn't exist in isolation.</p>
<p>Yes, it's new, shiny and very exciting. The possibilities are endless - and it's totally fine to get caught up in the hype! From an Azure perspective, however, implementing AI based resources needs to be as carefully considered as that of any other resource, such as a SQL Database or Virtual Machine. The AI itself is only as good (performant, secure, scalable, resilient etc.) as the infrastructure it's deployed on, and the environment it's deployed onto. Azure Landing Zones exist to help ensure just that.</p>
]]></content:encoded></item><item><title><![CDATA[Archive: Recovering From A Deleted Terraform State File]]></title><description><![CDATA[Note: This article has been archived.

Context: The article refers to Terraform and Microsoft Azure, but the underlying concepts would likely apply to any Terraform resource provider.

The "how" isn't important here (you'd need to buy me a drink to g...]]></description><link>https://daniel.mcloughlin.cloud/recovering-from-a-deleted-terraform-state-file</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/recovering-from-a-deleted-terraform-state-file</guid><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Mon, 17 Apr 2023 12:15:25 GMT</pubDate><content:encoded><![CDATA[<p>Note: This article has been archived.</p>
<hr />
<p>Context: <em>The article refers to Terraform and Microsoft Azure, but the underlying concepts would likely apply to any Terraform resource provider.</em></p>
<hr />
<p>The "how" isn't important here (you'd need to buy me a drink to get it out of me!) - but what you need to know is this:</p>
<p>Late last year, I was faced with a situation where one of my customers attempted to do a Terraform deployment to their Production environment, only to find the Terraform state file was missing! Not only that, the entire Storage Account hosting the state file had been deleted, along with the Staging environment state file as well.</p>
<p>The Storage Account in question had 14 day soft delete enabled, however, this issue was discovered on the 15th day, and Microsoft confirmed there was no way to recover the data.</p>
<p>The Terraform state file had been <strong>permanently</strong> <strong>deleted</strong>, and the customer's Production and Staging deployments were now blocked.</p>
<p>The Terraform state files for both Production and Staging needed to be manually recreated.</p>
<p>This is the story of how got out of this mess...</p>
<h1 id="heading-rebuilding-the-storage-account">Rebuilding The Storage Account</h1>
<p>The first thing I needed to do was create a new Storage Account in which to host the re-created state files.</p>
<p>It turns out the previous (now deleted) one had a very generic name (<code>xxxxshared</code>), so it really wasn't obvious what it was used for. It also turns out that it didn't have any resource locks to prevent accidental deletion, nor did it have any tags for further identification or metadata. The RBAC was somewhat suitable given the context, but clearly the data protection settings (14 day soft delete on blobs) were not.</p>
<p>The new Storage Account was given a name and tags by which it was VERY obvious what it was used for, and a resource lock was applied to prevent accidental deletion. The soft delete timeframe was extended, and snapshots were also enabled. The more data protection on this, the better!</p>
<h1 id="heading-rebuilding-the-state-file">Rebuilding The State File</h1>
<p>The problem with the state file being deleted is that when you run a <code>terraform plan</code>, Terraform thinks that everything needs to be created from scratch, as if it were deploying to a clean environment.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682518833929/52106178-1407-4152-b476-8798fd63c10b.png" alt class="image--center mx-auto" /></p>
<p>When you try to run a <code>terraform apply</code>, Terraform throws a massive wobbly, complaining that resources exist in Azure, but not in state - which, to be fair, is totally accurate.</p>
<pre><code class="lang-json">Error: A resource with the ID <span class="hljs-string">"/subscriptions/xxxxxx/resourceGroups/rsg-uks-xxxxxx"</span> already exists - to be managed via Terraform this resource needs to be imported into the State. Please see the resource documentation for <span class="hljs-string">"azurerm_resource_group"</span> for more information.
</code></pre>
<p>The way to fix that is to run the <code>terraform import</code> command, where you pass in the Terraform resource ID and the corresponding Azure resource ID.</p>
<p>Thomas Thornton has an excellent blog post on this, <a target="_blank" href="https://thomasthornton.cloud/2021/03/31/importing-terraform-state-in-azure/">here</a>.</p>
<pre><code class="lang-json">terraform import azurerm_resource_group.xxxxxx /subscriptions/xxxxxx/resourceGroups/rsg-uks-xxxxxx
</code></pre>
<p>In itself, this is a fairly simple approach, however, this particular Terraform deployment was incredibly complex. It was partially modular, riddled with resource interdependencies, and it was huge, absolutely huge! For context, the state file for the development environment was 58,176 line long! I don't recall the number of Azure resources that were deployed, but there were a lot of them, in one big Terraform deployment.</p>
<p>This presented a rather large challenge when coming to run the <code>terraform import</code> commands (which I would need to do per resource). Firstly, how on earth was I going to get a list of every single Terraform resource, and then how would I get the corresponding Azure resource ID. Given the size and complexity of the Terraform deployment, this was going to be a challenge.</p>
<p>My saving grace is that I was able to get hold of the pipeline logs from the previous deployment to Staging.</p>
<p>From there, I was able to extract the Terraform <code>terraform apply</code> output, and from that I could parse the file (using PowerShell and some manual tweaking) to come up with a list of both Terraform and Azure resource IDs. The output of this was a whopping big script of multiple <code>terraform import</code> commands that I could run to rebuild the Staging state file.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682585558384/26b7f3e1-a736-45f9-8d95-1b4be80e6ded.png" alt class="image--center mx-auto" /></p>
<p><em>For a reason I can no longer recall, the</em> <code>terraform apply</code> <em>output was all I could get access to at this time. I think it was down to the plan being output to file and saved as a build artifact, which was eventually removed.</em></p>
<p>With Production however, I was not so lucky. The Staging deployment was done recently in readiness to eventually deploy to Production. The most recent Production deployment logs were long gone due to a retention policy,</p>
<p>This meant that Staging was a version ahead of Prod. Therefore, there were infra differences between the two environments, so I couldn't just duplicate the Staging state file and do a search and replace on the subscription and resource names.</p>
<p>To get around this, I had to repeatedly run <code>terraform import</code> followed by <code>terraform plan</code> to capture the Terraform resource IDs, and manually match them to the Azure resource IDs. Updating as I go for anything the plan output flagged as being missing. Thankfully, several of the resource IDs could be pinched from the Staging work I did earlier, and repeatedly running <code>terraform import</code> followed by <code>terraform plan</code> allowed me to fill in the gaps. In was a painstaking process.</p>
<h1 id="heading-deployment-issues">Deployment Issues</h1>
<p>I discovered some resources simply could not be imported back into state.</p>
<p>One example of this was with the use of randomly generated uuid strings for the use of creating unique Storage Account names. The customer was using the <a target="_blank" href="https://registry.terraform.io/providers/hashicorp/random/latest/docs/resources/uuid">random_uuid</a> Terraform resource, such as the below:</p>
<pre><code class="lang-json">resource <span class="hljs-string">"random_uuid"</span> <span class="hljs-string">"ruuid"</span> {}

resource <span class="hljs-string">"azurerm_storage_account"</span> <span class="hljs-string">"xxx"</span> {
  name                      =   substr(replace(<span class="hljs-attr">"xxx${random_uuid.ruuid.result}"</span>, <span class="hljs-attr">"-"</span>, <span class="hljs-attr">""</span>), 0, 23)
  resource_group_name       = var.rsg_name
  location                  = var.location_name
  account_tier              = <span class="hljs-attr">"Standard"</span>
  account_replication_type  = <span class="hljs-attr">"LRS"</span>
  enable_https_traffic_only = true
  min_tls_version           = <span class="hljs-attr">"TLS1_2"</span>
  tags                      = jsondecode(var.environment_tag)
  blob_properties {
    versioning_enabled = true
  }
}
</code></pre>
<p>With the above, the randomly generated UUID existed in state as its own entity. This would be something like <code>aabbccdd-eeff-0011-2233-445566778899</code>. The Storage Account name consisted of only part of this UUID, and that depended on the friendly name that was prepended to it. With the above example, the Storage account would be called: <code>xxxaabbccddeeff00112233</code>, but obviously this differed between Storage Accounts.</p>
<p>As I didn't know the original full UUIDs, there was no way I could import it back into state.</p>
<p>To get around this, I got a bit hacky by taking the UUID part of every existing storage account, and converting that into a UUID by adding a shed load of random numbers, being careful to add the hyphens in the right place. For example:</p>
<pre><code class="lang-json">terraform import random_uuid.main aabbccdd-eeff<span class="hljs-number">-0011</span><span class="hljs-number">-2233</span><span class="hljs-number">-666666666666</span>
</code></pre>
<p>Needless to say, it was a hacky faff, but it worked.</p>
<p>Another fairly critical issue I faced was with RBAC role assignments. For example:</p>
<pre><code class="lang-json">resource <span class="hljs-string">"azurerm_role_assignment"</span> <span class="hljs-string">"example"</span> {
  scope                = data.azurerm_subscription.primary.id
  role_definition_name = <span class="hljs-attr">"Reader"</span>
  principal_id         = data.azurerm_client_config.example.object_id
}
</code></pre>
<p>These were numerous and existing in multiple different modules. I was able to get the Terraform resource IDs easy enough, but the corresponding Azure resource ID proved to be a nightmare, as they exist as GUIDs.</p>
<p>For example:</p>
<pre><code class="lang-json">terraform import azurerm_role_assignment.example /subscriptions/<span class="hljs-number">00000000</span><span class="hljs-number">-0000</span><span class="hljs-number">-0000</span><span class="hljs-number">-0000</span><span class="hljs-number">-000000000000</span>/providers/Microsoft.Authorization/roleAssignments/<span class="hljs-number">00000000</span><span class="hljs-number">-0000</span><span class="hljs-number">-0000</span><span class="hljs-number">-0000</span><span class="hljs-number">-000000000000</span>
</code></pre>
<p>I was able to run the PowerShell command <code>Get-AzRoleAssignment</code> (<a target="_blank" href="https://learn.microsoft.com/en-us/azure/role-based-access-control/role-assignments-list-powershell">source</a>) to pull the IDs from Azure, but the problem here was the sheer amount of these to identify. It wasn't just a case of user IDs on a subscription or resource group, oooh no, there were also many, many inter-resource role assignments, for example a Function App Managed Identity with an RBAC role of "Azure Service Bus Data Sender" applied to an Azure Service Bus.</p>
<p>I was able to script this in PowerShell to pull a great big list of every single role assignment, and then did a lot of searching of the Terraform config to match the IDs.</p>
<p>Even with this in place, the <code>terraform import</code> command still failed on some, but not all, role assignments. Either I'd made mistakes when matching the IDs, or something else under the hood with Azure.</p>
<p>Ultimately, due to the complexity and time pressures, we had to manually delete some role assignments and let Terraform recreate them on the next apply.</p>
<h1 id="heading-deployment-prep">Deployment Prep</h1>
<p>Before the Production <code>terraform apply</code> was run following the state file recreation, I made a point of taking manual backups of the Key Vault secrets, App Config contents and ensuring the databases had good enough backups. Anything that was deemed critical and could potentially have been impacted by the deployment was backed up.</p>
<p>I mention this here should you be reading this article faced with a similar situation. May this be your prompt to make backups and be prepared for the worst!</p>
<h1 id="heading-lessons-learned">Lessons Learned</h1>
<p>As I'm sure you can imagine, this was a stressful time for both myself and my customer. The workarounds mentioned above were complex and far from ideal. Ultimately, the problem was resolved once the state files had been recreated, and terraform plan and apply jobs had been run against it.</p>
<p>If you're ever in this position, then I really feel for you, and may this article act as a reference on how I got out of it.</p>
<p>For anyone reading this, I would strongly suggest following the below Lessons Learned to hopefully avoid landing yourself in a similar situation. If you can think of any more, let me know!</p>
<ul>
<li><p>When saving remote state in an Azure Storage Account:</p>
<ul>
<li><p>Give the Storage Account a suitable, easily identifiable name.</p>
</li>
<li><p>Apply adequate data protection, such as soft delete and snapshots.</p>
</li>
<li><p>Apply adequate access control, such as RBAC, ABAC and network restrictions where appropriate.</p>
</li>
<li><p>Apply tags.</p>
</li>
<li><p>Apply a resource lock.</p>
</li>
</ul>
</li>
</ul>
<p>    <img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1682586302735/49cbbecc-a061-4a30-8b01-51d432aa28cc.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>In my customer example, we deployed dedicated Storage Accounts to contain the state files for both Staging and Production. Both with the above points applied.</p>
</li>
<li><p>Make backups of your Terraform plan and apply outputs. For example, add a pipeline step to save them to blob storage (with blob retention policies applied).</p>
</li>
<li><p>Terraform resources such as role assignments and RUUIDs are very difficult to recreate in state. Review the documentation carefully to fully understand their application.</p>
</li>
<li><p>Run Terraform Apply very carefully, especially when testing fixes. Don't use auto approve in this context, and carefully review the Terraform Plan outputs.</p>
</li>
<li><p>Have adequate documentation, ideally with diagrams. Knowledge is power in this scenario.</p>
</li>
<li><p>If you don't have one already, have a disaster recovery plan, and consider adding a scenario such as this to it. For example, if you have to go through something such as this, can you get access to your Terraform variable file contents and environment variables (for example GitHub secrets)? Can you run a deployment locally if you really had to? Proper Planning and Preparation Prevents P!ss Poor Performance!</p>
</li>
<li><p>Follow <a target="_blank" href="https://www.terraform-best-practices.com/">Terraform best practice</a>. My personal advice would be to use a modular approach and have multiple smaller deployments (by component lifecycle for example) rather than one large, complex deployment (and therefore one large, complex state file). Having said that, there isn't a one-size-fits approach, so do what's right for your given scenario. Checkout <a target="_blank" href="https://developer.hashicorp.com/terraform/cloud-docs/recommended-practices">Terraform Recommended Practices</a> for more detail.</p>
</li>
</ul>
]]></content:encoded></item><item><title><![CDATA[Archive: The Realities of Azure Private Endpoints]]></title><description><![CDATA[Note: This article has been archived.

Following on from my previous article Private Endpoints and Terraform - A Tale of Time - I want to go into more detail about what my experience working with Private Endpoints has been like.
The use of Private En...]]></description><link>https://daniel.mcloughlin.cloud/the-realities-of-azure-private-endpoints</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/the-realities-of-azure-private-endpoints</guid><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Fri, 14 Apr 2023 12:15:25 GMT</pubDate><content:encoded><![CDATA[<p>Note: This article has been archived.</p>
<hr />
<p>Following on from my previous article <a target="_blank" href="https://clouddevdan.co.uk/private-endpoints-and-terraform-a-tale-of-time">Private Endpoints and Terraform - A Tale of Time</a> - I want to go into more detail about what my experience working with Private Endpoints has been like.</p>
<p>The use of Private Endpoints is an increasingly popular choice, and Microsoft suggest their use is considered best practice (<a target="_blank" href="https://learn.microsoft.com/en-us/azure/security/fundamentals/network-best-practices#secure-your-critical-azure-service-resources-to-only-your-virtual-networks">ref</a>), however, I feel a lot hinges on what your definition of the word 'private' really means.</p>
<p>In this article, I’ll go over what Private Endpoints are when compared to Service Endpoints and will provide examples of what I feel are the realities when working with them.</p>
<h1 id="heading-what-is-a-private-endpoint"><strong>What is a Private Endpoint?</strong></h1>
<p><em>To avoid repeating myself, the below is an extract from my Tale of Time article (link above) that covers how I defined what a Private Endpoint is:</em></p>
<p>A Private Endpoint adds a network interface to a resource, providing it with a private IP address assigned from your VNET (Virtual Network). Once applied, you can communicate with this resource exclusively via the VNET.</p>
<p>The alternative to Private Endpoints is Service Endpoints, where the resources are still accessible over the public internet, however, their integrated firewalls restrict access only to designated VNETS/subnets or public IP addresses.</p>
<p>Private Endpoints are more secure in this context but come with additional complexity.</p>
<p>Once applied with a Private Endpoint, a resources endpoint is no longer publicly routable.</p>
<p>Private DNS Zones solve this issue. Linked to one or more VNETS, a Private DNS Zone holds DNS records for the private resources. When you deploy a Private Endpoint and link it to a Private DNS Zone, the resources public IP is updated with a CNAME record pointing it to the Private DNS Zone.</p>
<p>For example, <a target="_blank" href="http://mystorageaccount.blob.core.windows.net"><code>mystorageaccount.blob.core.windows.net</code></a> would point to <code>mystorageaccount.privatelink.blob.core.windows.net</code>, pointing it to something like <code>192.168.0.8</code>.</p>
<p>Any VNET linked to one or more Private DNS Zones will resolve those endpoints privately.</p>
<p>Source: <a target="_blank" href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview"><strong>What is a private endpoint?</strong></a></p>
<h1 id="heading-the-realities">The Realities</h1>
<p>With Service Endpoints, a PaaS resource is still publicly available. You can configure inbound traffic to come in either from designated public IPs or from within the VNET itself. The resource itself is still accessible online.</p>
<p>With a Private Endpoint, however, there is no option to allow designated public IPs. Traffic comes in from the VNET, or not at all.</p>
<p>Because of this, performance may be improved by reducing network latency as communication stays within the Microsoft network backbone.</p>
<p>Also note that you can apply network policies for Private Endpoints (NSGs and Route Tables on the subnet hosting them) so you can get granular control over your network traffic.</p>
<p>In a nutshell, using a Private Endpoint provides the most private and granular level of network access to a resource. But what does private in this context really mean, and do the cons of using a Private Endpoint outweigh the pros?</p>
<p>Consider the below:</p>
<h2 id="heading-control-plane-vs-data-plane">Control Plane vs Data Plane</h2>
<p>It's important to note the difference between control plane and data plane operations.</p>
<p>For example, let's presume we have an Azure Storage Account with a Private Endpoint applied for the blob endpoint. A control plane operation would be something like using the Azure CLI to return the account keys. A data plane operation would be something like listing blobs within a private container.</p>
<p>With a Private Endpoint applied for the blob endpoint, you can perform control plane operations regardless of your presence on the VNET. Yes, you need to be authenticated, but your source IP does not matter in this context.</p>
<p>With a data plane operation, however, you can only perform operations if you are coming at it from within the VNET.</p>
<p>I point this out here to highlight that adding a Private Endpoint does not make the resource totally private, only specific operations.</p>
<p>This is important when setting expectations.</p>
<h2 id="heading-devops-amp-iac">DevOps &amp; IaC</h2>
<p>From a DevOps perspective (i.e. using pipelines to deploy infrastructure-as-code), you'll likely need to use self-hosted runners that have network connectivity onto the VNET on which the Private Endpointed resources are going to reside. Using a hub and spoke example you could deploy a VM within the hub that hosts the DevOps agents, ensuring the hub VNET is peered and routable to the spoke VNETS that have the Private Endpointed resources on.</p>
<p>As per my article <a target="_blank" href="https://clouddevdan.co.uk/private-endpoints-and-terraform-a-tale-of-time">Private Endpoints and Terraform - A Tale of Time</a>, the automated deployment of resources with Private Endpoints introduces further complexity when the likes of Terraform try to deploy resources in parallel but get chewed up when a resources network config changes part way through.</p>
<h2 id="heading-network-complexities">Network Complexities</h2>
<p>Imagine you have a hub and spoke network topology. In your hub, you have a VNET and shared Private DNS Zones. There are multiple spokes that each have their own VNET, including resources with Private Endpoints applied.</p>
<p>The hub VNET would need to peer with each spoke VNET, and every Private DNS Zones would need to be linked to the hub VNET and every single spoke VNET.</p>
<p>Best practice would dictate that there are granular permissions in place to separate the hub from the spokes. For example, the hub and each spoke could be on dedicated subscriptions with individual RBAC policies applies.</p>
<p>This may sound fairly straightforward, but consider how a Terraform deployment would deploy a spoke environment. The deployment would need to be on a self-hosted runner, likely in the hub to facilitate network connectivity. The Terraform deployment would also need to utilise multiple identities (e.g. Service Principals) to have the correct RBAC level access to both the hub and the spoke.</p>
<h2 id="heading-costs">Costs</h2>
<p>A single Private Endpoint costs approx £7.55 per month. Multiply that by how many resources you plan to add a Private Endpoint to, over how many environments you need to deploy, and the result can be expensive. Especially when you consider resources like Storage Accounts that have multiple endpoints (blob, table, file, queue) and therefore need multiple Private Endpoints for every endpoint used.</p>
<p>Furthermore, a single Private DNS Zones also costs approx £7.55.</p>
<p>Source: <a target="_blank" href="https://azure.microsoft.com/en-gb/pricing/calculator/">Pricing calculator</a></p>
<h1 id="heading-examples">Examples</h1>
<p>What I want to do here is demonstrate some of the realities of working with Private Endpoints, using a Storage Account as an example.</p>
<h2 id="heading-core-infrastructure">Core Infrastructure</h2>
<p>My core infra consists of a resource group called <code>rsg-pe-testing</code> and a VNET called <code>vnet-uks-01</code>. Within the VNET I've created a dedicated subnet for all Private Endpoints.</p>
<p>I've also created a Virtual Machine called <code>vm-uks-jumpbox</code>, attached the to above VNET. I'll be using this as a point of access to my VNET. I opted for a Ubuntu 20.04 VM so I don't have to faff about with an OS GUI and I can jump right into the CLI.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681390821250/9c555921-aa36-4560-8ff1-05bdec183298.png" alt class="image--center mx-auto" /></p>
<h2 id="heading-storage-account">Storage Account</h2>
<p>For my first example, I've created an LRS Storage Account called <code>strukspetestingdmclo01</code>. By default, I've left the network connectivity open to public access.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681390962457/f5097c21-3600-49d5-a249-01a795dd3ccc.png" alt class="image--center mx-auto" /></p>
<p>Before we move on, I want to talk about the above screen a little bit. I interpret the middle option 'selected virtual networks and IP addresses' to be related to public internet access and Service Endpoints. This is where direct internet access is still routed to the Storage Account through whitelisted public IPs or via Service Endpoints on the Microsoft backbone.</p>
<p>I interpret the bottom option as the one to select if I wanted to use a Private Endpoint i.e. public internet is outright blocked and the only allowed inbound traffic is from within the VNET itself.</p>
<p><em>Note the text in the above screenshot is from the Azure portal when first creating the Storage Account. Once within the Storage Account page itself, under the Networking blade, that option simply says 'Disabled'.</em></p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681391435003/49f2f90b-25cd-46fc-9e03-674dd59958ee.png" alt class="image--center mx-auto" /></p>
<p>Under the Endpoints blade of the Storage Account in the Azure portal, I can pull the different endpoints for the Storage Account, such as Blob and Table. As expected, these currently resolve to a public IP address from both within and external to my VNET.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681391617566/ec5a1512-52a6-4463-b08f-3b342aa4aa91.png" alt class="image--center mx-auto" /></p>
<h3 id="heading-service-endpoints">Service Endpoints</h3>
<p>To prove the point that the Storage Account is still exposed over the public internet, I enable the Microsoft.Storage Service Endpoint on the subnet that the VM is running on and configured my Storage Account to only allow traffic from that specific subnet. This can be a faff to do in the portal as if you forget to click on the Save button, your changes won't be applied!</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681391864472/499fecf4-df71-48fa-8925-27b06fe0c0d2.png" alt class="image--center mx-auto" /></p>
<p>Once applied, the <code>nslookup</code> command both internally and externally still resolves the public IP address of the Storage Account - as expected here.</p>
<p>To further prove the point, I used the Azure CLI on both the Jump Box (inside the VNET - left pane) and my laptop (outside the VNET - right pane) to try and create a new blob container. As expected, it worked from within the VNET, but not from the outside. Creating a blob container is a data plane operation.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681393381441/63685cdf-1eed-4304-a5d6-63eba3d1794a.png" alt class="image--center mx-auto" /></p>
<p><em>Note that the VM has a Managed Identity assigned and both that, along with the user account from my laptop have the Storage Blob Data Owner RBAC role applied to the specified Storage Account.</em></p>
<h3 id="heading-private-endpoints">Private Endpoints</h3>
<p>Next, let's see what happens when I simply click the 'Disable' radio button on the Networking blade (before any Private Endpoints have been created).</p>
<p>What I'm expecting to see here is an outright block of any network ingress to the Storage Account.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681394317575/294ab98a-870f-40dd-a013-593581f83995.png" alt class="image--center mx-auto" /></p>
<p>As you can see, the <code>nslookup</code> command still resolved the endpoint DNS to a public IP address, but the Azure CLI command to create a new container is blocked both internally and externally due to the network rules.</p>
<p>This suggests to me that the Storage Account endpoint is still resolvable, but not routable. What I understand to be happening is that all traffic (regardless of source) is still hitting the Storage Account directly, but it's the integrated firewall that's blocking the requests.</p>
<p>Now (clearly) I'm not a network expert, far from it, but this doesn't sit right with me. If I wanted my Storage Account to be totally private and only accessible over the VNET, I'm not sure I'd be happy about this. In theory, I could run a denial-of-service attack on the Storage Account as it's still reachable online.</p>
<p>I turned to <a target="_blank" href="https://learn.microsoft.com/en-us/azure/storage/common/storage-network-security?tabs=azure-portal">the docs</a> to see if there was anything official to back this up, but all I came out with was the below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681397312531/3463f285-1815-4a4f-a380-dad713c306b5.png" alt class="image--center mx-auto" /></p>
<p>I created a new storage account called <code>strukspetestingdmclo02</code> with networking disabled during the creation wizard in the portal, however, this yielded the same results.</p>
<p>Let's see what adding a Private Endpoint does to this. After all, in the above test I am hitting the public endpoint, so let's flip this and run the same tests.</p>
<p>For <code>strukspetestingdmclo02</code> I create a new Private Endpoint and a Private DNS Zone for <code>privatelink.blob.core.windows.net</code>. The Private DNS Zone is linked to my VNET, and I can see that the blob endpoint for my Storage Account has been assigned an internal IP of <code>10.0.1.4</code>.</p>
<p>Running <code>nslookup</code> on the Jump Box returns the private IP as expected, but my laptop returns the public IP still.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681397943096/da30662a-ab3e-4cf2-abaf-4bca703727fa.png" alt class="image--center mx-auto" /></p>
<p>Trying to create a new container yields the same results too. It works from the Jump Box, but my laptop is blocked by the network rules applied to the integrated firewall, not because it cannot reach it. Again, this is a data plane operation. A control plane operation such as returning the account keys still works from both devices.</p>
<p>Repeating the same tests on an Azure Function app confirms my theory. With public access enabled, the default landing page is displayed from my laptop:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681401634123/2eeb3299-9c25-4bdc-8735-bbe26dd21840.png" alt class="image--center mx-auto" /></p>
<p>However, with a Private Endpoint applied, I am presented with a 403 Forbidden error. I'm still hitting the Function App directly, but its integrated firewall is blocking my requests - it's not that I outright cannot route to it.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1681401512116/d2bbc440-1206-4107-be05-5eea40f6b57d.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-summary">Summary</h1>
<p>So what am I getting at here exactly?</p>
<p>First of all, I've demonstrated that the use of Private Endpoints, or simply setting a resource to Disable public access doesn't actually disable public access. Not in the true sense of the word. Public traffic still reaches the resource, it's just blocked by an integrated firewall.</p>
<p>In my mind I presumed this would work in the same way as deleting a Public IP address from a VM would work i.e. public traffic cannot in any way shape or form hit the resource.</p>
<p>Also note that a Private Endpoint will only protect data plane operations, not the control plane.</p>
<p>To be nit-picky, a Private Endpoint is not actually private. It's just more private that the other options!</p>
<p>This may be an important consideration when working with security-conscious customers. Customers like these may wrongly presume that following Microsoft's best practices and adding Private Endpoints will totally render their environment offline, which isn't strictly true.</p>
<p>Private Endpoints are the most private and granular level of network access you can define on an Azure PaaS resource, but the term 'private' needs to have the correct expectations set.</p>
<h1 id="heading-conclusion">Conclusion</h1>
<p>In summary, Azure Private Endpoints and Service Endpoints both offer secure communication between virtual networks and Azure services, but they differ in their approach and benefits.</p>
<p>Private Endpoints provide a more secure and efficient communication option but introduce added costs and complexities. Also, be clear on the term 'private', as it can mean different things to different people.</p>
<p>Service Endpoints, on the other hand, are easier to configure and support, but do not provide as private connectivity to Azure services as Private Endpoints would.</p>
<p>When choosing between Private Endpoints and Service Endpoints, organisations should consider their security and performance needs, as well as the specific Azure services they plan to use. With the right configuration, both options can help organisations achieve a secure and efficient cloud environment.</p>
<h1 id="heading-feedback">Feedback</h1>
<p>Your feedback is welcome here. Networking is out of my comfort zone, and the above article is based on my experiences and opinions. Please feel free to comment or get in touch. Thank you.</p>
]]></content:encoded></item><item><title><![CDATA[Archive: Private Endpoints and Terraform - A Tale of Time]]></title><description><![CDATA[Note: This article has been archived.

Adding Private Endpoints to resources in Azure is an excellent way to secure their connectivity by preventing access over public internet. Taking resources offline however adds its own challenges, particularly w...]]></description><link>https://daniel.mcloughlin.cloud/private-endpoints-and-terraform-a-tale-of-time</link><guid isPermaLink="true">https://daniel.mcloughlin.cloud/private-endpoints-and-terraform-a-tale-of-time</guid><dc:creator><![CDATA[Daniel McLoughlin]]></dc:creator><pubDate>Sat, 25 Feb 2023 13:15:25 GMT</pubDate><content:encoded><![CDATA[<p>Note: This article has been archived.</p>
<hr />
<p>Adding Private Endpoints to resources in Azure is an excellent way to secure their connectivity by preventing access over public internet. Taking resources offline however adds its own challenges, particularly when a resources DNS changes part-way through a parallel deployment.</p>
<p>This article highlights the pitfalls with deploying private resources in Azure via Terraform, and how to overcome them.</p>
<p>This article is also my contribution to <a target="_blank" href="https://www.azurespringclean.com/">Azure Spring Clean</a> 2023 - aimed at promoting well-managed Azure tenants.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678958208404/322e513e-264d-4af9-a7ec-2fbaa6067551.jpeg" alt class="image--center mx-auto" /></p>
<h1 id="heading-mission-statement">Mission Statement</h1>
<p>I've been brought in to help a customer with a complex Terraform deployment.</p>
<p>Their application infrastructure consists of an Azure Web App front-end, and multiple back-end components including Storage Accounts, Key Vaults and Cosmos DB.</p>
<p>Their existing Terraform is modular, however, the entire solution is deployed in one run.</p>
<p>The infra deployment is via GitHub Actions on a self-hosted runner, running on an Azure VM.</p>
<p>I've already worked with this customer to implement a hub and spoke network topology, where centralised resources such as the self-hosted runner and Private DNS Zones are in the hub, and the application itself is deployed as a spoke.</p>
<p>The app infra deployment peers the spoke VNET to the hub VNET, and links the hubs Private DNS Zones to the spoke VNET.</p>
<p>The infrastructure deployment is throwing a lot of very confusing errors.</p>
<p><strong>My job is to solve the problem.</strong></p>
<h1 id="heading-replicating-the-issue">Replicating The Issue</h1>
<p>The Terraform deployment is complex, but I know the errors are specific to Private Endpoints. To save time and remove complexity, I replicate the scenario with a stripped-down Terraform config, deploying a smaller number of resources, as per the below:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678959200524/1b21cf47-1343-40f4-a1f6-370c579d0b65.png" alt class="image--center mx-auto" /></p>
<p>My Terraform deployment does the following:</p>
<ul>
<li><p>Deploys the spoke VNET and a subnet called Endpoints (for the Private Endpoints).</p>
</li>
<li><p>Peers the spoke VNET with the hub VNET (and vice versa).</p>
</li>
<li><p>Links the hub Private DNS Zones to the spoke VNET.</p>
</li>
<li><p>Deploys a Key Vault with a Private Endpoint, and sets the network rules to deny public access.</p>
</li>
<li><p>Deploys a number of Storage Accounts, each with multiple blob containers, tables and queues.</p>
</li>
<li><p>Each Storage Account will need 3 x Private Endpoints, one for blob, one for table and one for queue.</p>
</li>
<li><p>The network rules for each Storage Account are set to deny public access.</p>
</li>
<li><p>The primary key for each Storage Account is saved as a secret within the Key Vault.</p>
</li>
</ul>
<p>This is only a sample of what the wider app infra looks like.</p>
<p><a target="_blank" href="https://github.com/CloudDevDan/Terraform-Private-Networking-Demo/tree/main/private-storage">You can see my Terraform for this in my GitHub repo, here</a>.</p>
<h1 id="heading-what-is-a-private-endpoint">What is a Private Endpoint?</h1>
<p>A Private Endpoint adds a network interface to a resource, providing it with a private IP address assigned from your VNET. Once applied, you can communicate with this resource exclusively via the VNET.</p>
<p>The alternative to Private Endpoints is Service Endpoints, where the resources are still accessible over public internet, however, their integrated firewalls restrict access only to designated VNETS/subnets or public IP addresses.</p>
<p>Private Endpoints are more secure in this context but come with additional complexity.</p>
<p>Once applied with a Private Endpoint, a resources endpoint is no longer publicly routable.</p>
<p>Private DNS Zones solve this issue. Linked to one or more VNETS, a Private DNS Zone holds DNS records for the private resources. When you deploy a Private Endpoint and link it to a Private DNS Zone, the resources public IP is updated with a CNAME record pointing it to the Private DNS Zone.</p>
<p>For example, <code>mystorageaccount.blob.core.windows.net</code> would point to <code>privatelink.blob.core.windows.net</code>, pointing it to something like <code>192.168.0.8</code>.</p>
<p>Any VNET linked to one or more Private DNS Zones will resolve those endpoints privately.</p>
<p>In the context of my solution, the Private DNS Zones need to be linked to both the hub and spoke VNETS so that resources attached to either VNET can resolve private DNS requests correctly.</p>
<p>Below is an example of a Private DNS Zone for <code>blob.core.windows.net</code>:</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678962137449/159acd2a-039c-4c61-8fba-582c8bc61b1e.png" alt class="image--center mx-auto" /></p>
<p>In this solution example, the Private DNS Zones are centralised in the hub, so they can be shared amongst multiple spoke environments.</p>
<p>Source: <a target="_blank" href="https://learn.microsoft.com/en-us/azure/private-link/private-endpoint-overview">What is a private endpoint?</a></p>
<h1 id="heading-understanding-the-errors">Understanding The Errors</h1>
<p>Using my stripped-down sample, I've been able to replicate the errors my customer has been experiencing.</p>
<p>The one thing they all have in common is StatusCode=403 (HTTP Forbidden).</p>
<p>Sometimes the error is thrown when trying to save the Storage Account key as a Key Vault secret, other times it fails to create a particular blob container or storage queue, again, with a 403 error. There's no consistency - but it's clearly network related.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678962227948/e5ea6eaa-bad6-4136-a9e5-e7cfb0445e95.png" alt class="image--center mx-auto" /></p>
<p>The interesting point here is that these errors only happen on the first run. If I re-run the entire deployment, everything works.</p>
<p>The issue here was clearly to do with time.</p>
<p>Terraform (quite rightly) is trying to deploy as much as it can in parallel whilst working out its own dependency maps.</p>
<p>Terraform may understand that the Key Vault needs to exist before the Storage Account key can be written as a secret, but whilst it's queuing up its list of resources to deploy their network settings are changing. What was originally publicly resolvable is now exclusively privately resolvable. The DNS has changed!</p>
<p>A re-run of the same deployment does not replicate the issue as Terraform can determine the network change, but also the self-hosted runner can pick up on it too.</p>
<p>I have a distant memory of something like this happening in Bicep a while ago, but I can't recall the specifics. I have a suspicion the underlying issue could be down to the Azure APIs under the hood. Either that or it's simply a race condition issue, or possibly DNS caching, either on the runner or in Terraform itself.</p>
<p>I can't be certain of the exact root cause on this one, but I'm going to blame it on DNS...</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678963021228/5db9921d-ce9a-47e8-b38e-3fb2a747936e.png" alt class="image--center mx-auto" /></p>
<h1 id="heading-considering-my-options">Considering My Options</h1>
<p>My first instinct was to split this rather large and complex deployment into multiple, smaller deployments according to resource lifecycle. For example, different Terraform deployments for the VNET, Key Vault and Storage Accounts.</p>
<p>Whilst this would have been a relatively simple job for my stripped-down sample, the customer's full-blown solution was far more complex and would have required significant re-work.</p>
<p>The issues would have been distributing the config between multiple deployments, and capturing the outputs of one deployment as input for another over a matrix of linked resources and modules.</p>
<p>I considered tools such as <a target="_blank" href="https://terragrunt.gruntwork.io/">Terragrunt</a> to centralise the config, but the customer wasn't keen.</p>
<p>I needed to get my hands dirty with some Terraform!</p>
<h1 id="heading-a-tale-of-time">A Tale of Time</h1>
<p>The code for my working solution can be found in my <a target="_blank" href="https://github.com/CloudDevDan/Terraform-Private-Networking-Demo/tree/main/private-storage-fix">GitHub repo, here</a>.</p>
<p>The first thing I did was determine the order in which resources were created. Primarily within the Storage Account module.</p>
<p>For example, I used <code>depends_on</code> blocks to specify that the Private Endpoint goes on only after the containers, tables and queues have been created. The resource to deny all public access to the Storage Account was set to go on last.</p>
<pre><code class="lang-json">resource <span class="hljs-string">"azurerm_storage_account_network_rules"</span> <span class="hljs-string">"example"</span> {
  storage_account_id = azurerm_storage_account.example.id
  default_action     = <span class="hljs-attr">"Deny"</span>
  ip_rules           = []
  bypass             = [<span class="hljs-attr">"None"</span>]
  depends_on = [
    azurerm_storage_container.example,
    azurerm_storage_queue.example,
    azurerm_storage_table.example,
    azurerm_private_endpoint.private_endpoint
  ]
}
</code></pre>
<p>This helped, but I was still getting some ad-hoc 403 errors. I needed to allow time for the VNETs to peer, and DNS to propagate.</p>
<p>I solved this by adding some cheeky little sleep timers, using the Terraform <code>time_sleep</code> resource.</p>
<pre><code class="lang-json">resource <span class="hljs-string">"time_sleep"</span> <span class="hljs-string">"peering_propagation"</span> {
  create_duration = <span class="hljs-attr">"2m"</span>
  triggers = {
    peering_confirmation = module.vnet.peering_confirmation
  }
}
</code></pre>
<p>Within the VNET module, I set an output of the ID provided from the peering with the HUB VNET. Once the peering had been completed and the ID was produced, Terraform then waits for 2 minutes.</p>
<p>I did the same thing with the VNET links to the Private DNS zones in the hub, and also the creation of the Key Vault. For example:</p>
<pre><code class="lang-json">resource <span class="hljs-string">"time_sleep"</span> <span class="hljs-string">"vault_endpoint_propagation"</span> {
  create_duration = <span class="hljs-attr">"2m"</span>

  triggers = {
    propagation = module.private-key-vault.vault_endpoint_propagation
  }
}
</code></pre>
<p>Terraform will wait for 2 minutes once the Private Endpoint resource ID has been produced.</p>
<p>To determine the order in which my modules were deployed (e.g. don't run the Storage Account module until the Key Vault has a Private Endpoint applied), I used the <code>depends_on</code> block again, but this time linked to the sleep timer.</p>
<pre><code class="lang-json">module <span class="hljs-string">"private-storage-account"</span> {
  source               = <span class="hljs-attr">"./private-storage-account"</span>
  count                = var.resource_count
  tags                 = local.tags
  app_name             = local.app_name
  location             = local.location
  resource_group_name  = azurerm_resource_group.rg.name
  virtual_network_name = module.vnet.spoke-vnet-name
  endpoints_subnet_id  = module.vnet.endpoints-subnet-id
  private_dns_zone_ids = data.azurerm_private_dns_zone.dns_zones
  key_vault_id         = module.private-key-vault.key_vault_id
  depends_on = [
    time_sleep.vault_endpoint_propagation
  ]
}
</code></pre>
<p>Now, the Storage Account module will not run until the Key Vault has a Private Endpoint, and two minutes have passed, allowing time for DNS propagation.</p>
<p>Thankfully both mine and the customer's Terraform config is modular. So any module that needs access to the Key Vault, or to otherwise resolve private DNS can be set to wait until the prerequisite resources have been deployed first.</p>
<p>It's not pretty, but it works!</p>
<h1 id="heading-testing-the-solution">Testing The Solution</h1>
<p>By running the fixed Terraform config and monitoring the output, I can clearly see the resources are being deployed roughly in the right order of what I set. Anything that can be done in parallel will be handled by Terraform.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678966060357/8cda8f05-af6f-475b-9faa-efcb2dd5516f.png" alt class="image--center mx-auto" /></p>
<p>The sleep timers add some delay to the total time the deployment takes to complete, but it's better than waiting for it to fail and having to re-run it every time.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678967250542/9060572b-8ef6-4eec-af9c-64144a4358ce.png" alt class="image--center mx-auto" /></p>
<p>Another thing to note here is that before my fix was applied, Terraform destroy would also throw similar 403 errors. Following my fix, this is no longer an issue</p>
<h1 id="heading-top-tips">Top Tips</h1>
<p>These are some tips I've picked up whilst working on this solution:</p>
<ul>
<li>If your GitHub Actions run on a self-hosted runner, add an action to flush DNS before running a plan or apply. If DNS changes have been made on the network (such as adding Private DNS zones) - this will clear things up. For example:</li>
</ul>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1678964721144/402789a3-2fe0-463b-834f-2af9c7e2f12f.png" alt class="image--center mx-auto" /></p>
<ul>
<li><p>Use command line tools (on the runner itself) to check the DNS resolution of a particular endpoint, for example: <code>nslookup</code><a target="_blank" href="http://clouddevdan-kv.vault.azure.net"><code>clouddevdan-kv.vault.azure.net</code></a></p>
</li>
<li><p>If your Terraform code needs to interact with resources in a different subscription or resource group (such as with my example, performing write actions on the hub VNET by establishing a peer), you can use an additional Terraform <code>azurerm</code> provider and use it to specify different credentials and tenant/subscription IDs. For example:</p>
</li>
</ul>
<pre><code class="lang-json"># provider.tf
provider <span class="hljs-string">"azurerm"</span> {
  alias           = <span class="hljs-attr">"hub"</span>
  subscription_id = var.hub_subscription_id
  tenant_id       = var.az_tenant_id
  client_id       = var.azure_cli_hub_networking_client_id
  client_secret   = var.azure_cli_hub_networking_secret
  features {}
}

# vnet module call in main.tf (referencing the additional provider with alias <span class="hljs-string">"hub"</span> (set above)
module <span class="hljs-string">"vnet"</span> {
  source                         = <span class="hljs-attr">"./vnet"</span>
  resource_group_name            = azurerm_resource_group.rg.name
  location                       = local.location
  tags                           = local.tags
  virtual_network_name           = local.virtual_network_name
  hub_resource_group_name        = data.azurerm_resource_group.hub-rg.name
  hub_virtual_network_name       = data.azurerm_virtual_network.hub-vnet.name
  hub_virtual_network_id         = data.azurerm_virtual_network.hub-vnet.id
  address_space                  = local.address_space
  endpoints_subnet_name          = local.endpoints_subnet_name
  endpoints_subnet_address_space = local.endpoints_subnet_address_space
  providers = {
    azurerm.hub = azurerm.hub
  }
}

# within the vnet module main.tf, set:
terraform {
  required_providers {
    azurerm = {
      source  = <span class="hljs-attr">"hashicorp/azurerm"</span>
      version = <span class="hljs-attr">"=3.47.0"</span>
      configuration_aliases = [ azurerm.hub ]
    }
  }
}

# peering hub to spoke vnet using the <span class="hljs-string">"hub"</span> provider
resource <span class="hljs-string">"azurerm_virtual_network_peering"</span> <span class="hljs-string">"hub-to-spoke"</span> {
  provider                     = azurerm.hub
  name                         = <span class="hljs-attr">"hub-to-${var.virtual_network_name}"</span>
  resource_group_name          = var.hub_resource_group_name
  virtual_network_name         = var.hub_virtual_network_name
  remote_virtual_network_id    = azurerm_virtual_network.vnet.id
  allow_virtual_network_access = true
  allow_forwarded_traffic      = true
  allow_gateway_transit        = false
  depends_on = [
    azurerm_virtual_network_peering.spoke-to-hub
  ]
}
</code></pre>
<h1 id="heading-conclusion">Conclusion</h1>
<p>What I've learnt from this is that you can't always apply best practices.</p>
<p>No matter how much you may want to, sometimes your hands are tied and you need to make good on a tricky situation.</p>
<p>My solution may appear to be a tad hacky, but from a bigger picture, it's relatively simple. All I'm doing is controlling the order in which some Azure resources are deployed in Terraform, and adding in small time delays in between some of them to account for network changes.</p>
<p>Thanks for reading!</p>
<p>All my source code is available within <a target="_blank" href="https://github.com/CloudDevDan/Terraform-Private-Networking-Demo">this GitHub repo</a>.</p>
<p><em>Source of cover photo by</em> <a target="_blank" href="https://unsplash.com/@icons8?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText"><em>Icons8 Team</em></a> <em>on</em> <a target="_blank" href="https://unsplash.com/s/photos/time?utm_source=unsplash&amp;utm_medium=referral&amp;utm_content=creditCopyText"><em>Unsplash</em></a></p>
]]></content:encoded></item></channel></rss>