Sunday, April 2, 2023

Schedule Azure Automation Runbook with Terraform and Powershell

 


Source: Data Semantics

Azure Automation is a service in Azure that allows you to automate processes from within Azure.

An automation account manages several other resources in order to achieve this. I’d be going over these features in this article and how you can use them to automate a simple task.

You are managing a software that collects log files constantly. To save storage costs, you have to be cleaning the storage container regularly, let’s say once every week. How can you automate this in azure?

Everything I’d be showing can be done via the azure portal but we would instead be using Terraform.

As always with azure, first create your resource group where all related resources will reside, followed by the automation account.

I will explain the SystemAssigned Identity bit next:

We used the System Assigned Identity access above. Some older automation accounts use Service Principles, but this brings up issues such as:

  • Having to renew certificates every year to maintain access.
  • The service principle is given access to the entire subscription, whereas a managed identity can simply be given access to only what it needs.

And more…

Using managed identites to access automation accounts is a newer and very useful feature. We can either use user-assigned managed identities or system-assigned managed identities.

In this case, a system-assigned identity suffices, since we are not using it for any other resources apart from this automation account, so it will be created while deploying the automation account (in the Terraform script above).

Now all we need to do is assign it the required role, using Terraform. Here, I will simply be giving it access to the whole resource group, which is where I’d be placing all the resources involved in the task:

Note: In order for the runbook to access the storage account that contains the logging, it must also be in this resource group logging .

We need to first create this resource using Terraform, so Terraform can manage it along with others.

You would create a folder named files in the same directory your Terraform files exist, and then create the script LogCleaning.ps1 inside it.

This is what your PowerShell Workflow script would look like:

This is a very simple script, you could add extras such as a try-catch block or a way to keep a few log files. You can be as creative as you’d like!

You would notice the StorageCredentialsName in the runbook. This is used to access the storage account where the logs live. We can get these credentials using the Terraform block below.

Assuming we have already created a storage account to store these files in another Terraform block named log_storage

This is what determines how often the runbook executes. The

You would also notice the Params dictionary in the runbook. There are many ways to pass input parameters to the runbook, but I will be passing it through the schedule in this example since we already require one.

We create a schedule first and then link it to the runbook using a job schedule.

We’re again assumming a storage account was earlier created named log_container

We also need to first get your subscription ID using the snippet below:

Now we will proceed as follows:

And now you just need to run your terraform plan followed by terraform apply and you’d have your logs cleaned up every week without you doing a thing!

Saturday, July 23, 2022

Deploying Azure Automation Account and Runbooks via Terraform

 

Azure Automation Accounts leverage Azure Runbooks to automate processes within organizations’ Azure tenants. This process can be very powerful and help organizations effectively manage, scan, and update their environments. This post is not about Azure Automation Accounts or Azure Runbooks but rather the process by which to deploy these Accounts and their associated scripts via Terraform.

If unfamiliar, Terraform is an open-source Infrastructure as code provider. One of it’s biggest selling points is that it can be used for deploying to a plethora of providers. Since we are dealing with Azure we will be using the Azure provider. We will also assume that you are already familiar with how to deploy Terraform to Azure. If you are not here is the Terraform walkthrough for Azure.

The first step in the deployment will be creating the Azure Automation Runbook. This is done via the azurerm_automation_account resource like below:

resource "azurerm_automation_account" "aa_demo" {
  name                = 'aademo'
  location            = azurerm_resource_group.rg_automation_account.location
  resource_group_name = azurerm_resource_group.rg_automation_account.name

  sku_name = "Basic"

}

This automation account is referencing a resource group that will also be created as part of the Terraform file. Automation Accounts like any other Azure resource requires a Resource Group. The Resource Group is setup like:

resource "azurerm_resource_group" "rg_automation_account" {
  name     = 'rg-aatest-dev-eus
  location = "east us"

}

Unfortunately, the ability to create the Automation Account as a “RunAsAccount” cannot be configured at this this via Terraform. RunAsAccount is similar to a Managed Identity in Azure, aka the script will run as the resource…..thus RunAsAccount. That being said there is a github issue that outlines the steps that one could work around this. However; for initial setup it might be easier to create the Automation Account and toggle the Run. This located after the Automation Account has been created by going to “Run as Accounts”-> Create

So at this point the Terraform file will create the Resource Group and the Azure Automation Account. However, we still need to create the Runbook and upload the code that will be ran under it.

I have found the easiest way to do this is to store the script to be ran in the same project as the Terraform file. In this case we have the script in runbooks\poweshell\demo.ps1

This script will need to be imported into the Terraform file as a data reference of type local file. A block like this should do the trick:

data "local_file" "demo_ps1" {
  filename = "../runbooks/powershell/demo.ps1"
}

Once we do this Terraform is aware of the existence of the demo.ps1 file. This is important as we will pass this reference to the Runbook.

To create the Runbook we will leverage the azure_automation_runbook resource.

resource "azurerm_automation_runbook" "demo_rb" {
  name                    = "Demo-Runbook"
  location                = azurerm_resource_group.rg_automation_account.location
  resource_group_name     = azurerm_resource_group.rg_automation_account.name
  automation_account_name = azurerm_automation_account.aa_demo.name
  log_verbose             = "true"
  log_progress            = "true"
  description             = "This Run Book is a demo"
  runbook_type            = "PowerShell"
  content                 = data.local_file.demo_ps1.content
}

The content argument is key as that will pass the script that was referenced earlier and upload it’s contents as part of the deployment.

So now that the Automation Account has been created and the Runbook the demo.ps1 file can be executed in Azure. However; the need may still arise to schedule the execution of the demo.ps1 script. To do this we can leverage the azurerm_automation_job_schedule resource with first a schedule defined via azurerm_automation_schedule.

First the schedule:

resource "azurerm_automation_schedule" "sunday" {
  name                    = "EverySundayEST"
  resource_group_name     = azurerm_resource_group.rg_automation_account.name
  automation_account_name = azurerm_automation_account.aa_demo.name
  frequency               = "Week"
  interval                = 1
  timezone                = "America/New_York"
  description             = "Run every Sunday"
  week_days               = ["Sunday"]
}

This schedule is agnostic of the current Runbook and can be reused multiple times.

Next is the Terraform that links the Runbook and the schedule together:

resource "azurerm_automation_job_schedule" "demo_sched" {
  resource_group_name     = azurerm_resource_group.rg_automation_account.name
  automation_account_name = azurerm_automation_account.aa_demo.name
  schedule_name           = azurerm_automation_schedule.sunday.name
  runbook_name            = azurerm_automation_runbook.demo_rb.name
  depends_on = [azurerm_automation_schedule.sunday]
}

Now normally with Terraform the depends_on does not need to be declared as it should recognize that the sunday schedule is being reference and thus infer the demo_sched won’t run until the sunday schedule is created. However, at the time of this blog post there is an open bug on this issue. Thus, the workout is the explicitly call out the dependency.

After this everything is all done! Congratulations you should now be able to deploy an Azure Automation Account, Azure Runbooks, Schedules, and associated scripts via Terraform!

Friday, July 22, 2022

Automating Cloud Infrastructure Management for AWS Projects with Terraform

 

Automating infrastructure management helps to enhance control over a product’s environment, optimize resource use, and reduce spending on cloud infrastructure maintenance. With the right tool in place, you can describe infrastructure in code: create it once and simply copy it to new applications, making only a few changes.

In this article, we compare three tools that manage infrastructure as code: AWS Cloud Development Kit (CDK), AWS CloudFormation, and Terraform. We also show how to create and automate the management of cloud infrastructure in a way that we can later use on other projects.

This article will be useful for cloud infrastructure management and DevOps teams looking for a way to optimize their work as well as for those who want to learn how to use Terraform to automate infrastructure configuration.

Contents:

Why manage infrastructure as code?

3 tools for managing AWS infrastructure as code

Configuring infrastructure for an AWS project with Terraform and Terragrunt

Conclusion

Why manage infrastructure as code?

IT infrastructure management oversees the performance of infrastructure elements needed for software to deliver business value. These elements include physical equipment like endpoints, servers, and data storage as well as virtual elements like network and app configurations, interfaces, and policies.

Usually, DevOps engineers are in charge of IT infrastructure. They need to keep it flexible, easily scalable, secure, and controllable. To achieve these goals, DevOps engineers containerize applications, deploying and managing them with tools like Docker.

Containerization allows for running an application in a manageable cluster without the need to manually configure the application and follow documentation step by step. Instead, engineers can use a Dockerfile to record changes and transfer code from one environment to another.

A containerized application can be deployed on a physical server, virtual machine, or cloud service. A cloud service is the most convenient option, since it comes with much more benefits than downsides:

Pros and cons of deploying apps in the cloud

Once an application is deployed in the cloud, DevOps engineers can start working on its infrastructure. Of course, they can do it manually, but that’s a bad development practice. Automating infrastructure management processes, on the other hand, helps you to experience the following benefits:

6 reasons to automate infrastructure management

The infrastructure as code (IaC) approach allows DevOps engineers to simplify and automate the creation, management, and monitoring of software infrastructure. With IaC, DevOps engineers can describe infrastructure elements, required policies, and resources in machine-readable configuration files. These files allow engineers to streamline resource management, copy infrastructure from one project to another, and share project knowledge.

The key downside of using IaC is the risk of duplicating errors from the initial project infrastructure when reusing it. That’s why creating configuration files requires a great deal of planning and expertise working with IaC tools. And it all starts with choosing the right tool.

Related services

Cloud Infrastructure Management Services

3 tools for managing AWS infrastructure as code

In this article, we’ll talk about managing AWS-based infrastructure and some of the tools you can use for this purpose. Particularly, we’ll go over:

3 tools for AWS infrastructure management

AWS Cloud Development Kit, or CDK, is an open-source software development platform that allows you to specify resources for cloud applications. It ensures flexible management of containerized applications. It also allows DevOps engineers to write infrastructure code in JavaScript, TypeScript, Python, C#, Java, .NET, and Go.

On the downside, AWS CDK requires perfect knowledge of programming languages to be able to configure infrastructure properly. That creates an additional challenge for DevOps engineers, who usually don’t need a deep knowledge of programming languages.

AWS CloudFormation is an infrastructure as code solution that provides you with a simple way to model AWS and third-party resources, allocate infrastructure resources within minutes, and manage them during the whole lifecycle. The key benefit of AWS CloudFormation is its support of YAML configuration files. They help to easily organize infrastructure code.

The key downsides of AWS CloudFormation are that it only supports AWS cloud services and requires learning a specific syntax.

Terraform is an open-source tool that allows you to define and submit cloud infrastructure using the HashiCorp configuration language or JSON. Both have convenient and easy-to-understand syntax.

The key benefit of implementing cloud automation using Terraform is support for all major cloud computing services: AWS, Google Cloud Platform, Microsoft Azure, and DigitalOcean. Terraform also supports the Kubernetes API. Plus, it has detailed documentation and many ready-to-use modules.

For a better experience, use Terraform with Terragrunt — a wrapper that provides you with additional tools to store infrastructure configurations and allows you to use modules.

With these advantages, Terraform appears to be the most convenient choice to automate the management of cloud infrastructure. This tool is more versatile than AWS CDK or AWS CloudFormation, as it allows you to work with various cloud services and use ready-made modules. That’s why in our own cloud infrastructure management activities, we mostly rely on Terraform.

With that in mind, let’s see how to use Terraform to automate AWS cloud infrastructure configuration and management.

Configuring infrastructure for an AWS project with Terraform and Terragrunt

Configuring project infrastructure as code allows us to upload it to the repository that we’ll later use to deploy the application. Terragrunt stores temporary files and sensitive data in the cloud so we can access them from various machines and don’t have to upload them to Git.

Here’s how Terraform can automate AWS cloud infrastructure:

Configuring application infrastructure in Terraform

But first, we need to create the elements of our environment. To do it, let’s create the following files and folders at the root of our repository:

1. The terragrunt.hcl file contains most of the Terragrunt configuration information: the region, DynamoDB table for temporary variables, and bucket for status file storage.

Status files are Terraform’s artifacts that store data on created resources. At each launch, Terraform compares the current project infrastructure with the corresponding status file, applies changes, and edits this file.

terragrunt.hcl contents

Figure 1. terragrunt.hcl contents

2. The tfvars file is for creating basic variables applicable to all environments. For example, these can be SSH administrator keys.

common.tfvars contents

Figure 2. common.tfvars contents

3. The modules folder contains all the modules you use. With Terragrunt, you can use one module multiple times in different projects. You can also add third-party modules by adding a link to the corresponding repository or module branch.

ACM module that creates SSL certificates

Figure 3. ACM module that creates SSL certificates

4. The environments/qa folder contains HCL configurations for the QA environment. For example, with the following code, we can call the ACM module that creates project certificates:

Calling the ACM module

Figure 4. Calling the ACM module

5. The environments/qa/terraform.tfvars file is for all parameters that depend on the environment.

The contents of the environments/qa/terraform.tfvars file

Figure 5. The contents of the environments/qa/terraform.tfvars file

As soon as the infrastructure code is ready, we can apply it to any AWS account without the need for additional manual activities and get a consistent result:

Executing infrastructure code

Figure 6. Executing infrastructure code

After we get this result, we can start automating infrastructure deployments in the cloud no matter which cloud services our projects use.

Downsides of managing infrastructure with Terraform

Keep in mind that there are several limitations in managing cloud infrastructure using Terraform that we’ve discovered when using it in our projects:

  • Complex configuration change management. When Terraform is actively used by several engineering teams, changing the project infrastructure with this tool may take more time than doing it manually. Any change in the configuration has to be committed to HCL files, tested, and implemented.
  • Complex permission management. To be able to work with Terraform, DevOps engineers need an account with elevated access rights. It may be complicated to divide project infrastructure into several parts and configure access rights for DevOps engineers correctly and securely.
  • Out-of-time deployment of product-specific features. Terraform developers often deliver features for AWS CloudFormation and other products later than you may expect.

Related services

Cloud Computing & Virtualization Development

Conclusion

Automating cloud infrastructure management can greatly reduce the amount of time and effort DevOps engineers put into configuring the infrastructure of cloud-based projects. With the right tools and approach, you can configure infrastructure once and then reuse it in other projects, making only the necessary changes.

In this article, we showed you how to automate infrastructure deployments in the cloud with Terraform. But the skills of our DevOps and cloud infrastructure management engineers go far beyond that. Feel free to reach out if you need to leverage our expertise in your project!It was originally published on https://www.apriorit.com/

It was originally published on https://www.apriorit.com/

Friday, May 27, 2022

Going Cloud Native with AWS Elastic Container Service

  Summary Very good article to understand benefits and some pitfalls of adopting cloud infrastructure or shifting to different cloud infrastructure.  In this article it was compared native services to amazon AWS. But points are quite generic and useful.

Zoosk Java microservices are hosted on Amazon Elastic Container Service. In Amazon’s words, “Amazon EC2 Container Service (ECS) is a highly scalable, high performance container management service that supports Docker containers and allows you to easily run applications on a managed cluster of Amazon EC2 instances. Amazon ECS eliminates the need for you to install, operate, and scale your own cluster management infrastructure.”

Sounds very appealing to have them manage one’s container applications with minimal effort. Since we had to migrate our services without a dedicated Ops resource we decided this service would be the best to host our services. Migration of the services to Amazon Elastic Container Service involved changing how we developed, built, and deployed services. All services that were stateful had to be refactored to be stateless to leverage autoscaling, where instances come and go. The migration process involved updating our tech stack to use the latest open source frameworks, deciding what AWS Services fit our use cases, development, coming up with a roll out strategy to prevent user disruptions, and cutting spend in the cloud. As part of the development, we migrated our services from using RabbitMQ to SQSMySQLto AuroraMemcached to Elasticache, and Solr to ElasticSearch. Going from nothing to running production services supporting millions of users has shown us the pros and cons of ECS.

Development process for a service to be deployed to ECS


ECS Topology

The Good

Services only consume what they need — Before we had 12 servers (4 cores, intel xeon E3–1200, 32 gb ram) in our datacenter hosting the Java microservices. Some of the services consumed only 5% of the cpu and memory on the server. Hence the servers were severly underutilized and not cost effective. Migrating to ECS allowed full usage of CPU and memory by placing a service in a cluster of EC2 instances. The ECS scheduler places the service on a EC2 instance with enough CPU and memory to allocate. We reduced the instances needed to three m4.larges because of the ability to place multiple containers until an EC2 instance runs out of resources.

Scales faster than ec2 instances — Generally containers are faster to spin up than EC2 instances off of an AMI. If traffic increases to a threshold defined, ECS will create a new container and add it to the load balancer, with ECS always having enough resources in the cluster to leverage the horizontal scaling benefits. Otherwise there will be extra delay for spinning up an EC2 instance to provide the resources needed.

Orchestration of containers — ECS features handle many use cases necessary for deploying and maintaining container services in a distributed system. There is minimal set up for a private docker registry (ECR), load balancing, scheduling, and creating an orchestration server. This reduces the amount of Ops works needed to get the service up and running.

Autoscaling — During peak traffic, services scale out to handle the load and prevent down time. During off peak hours services scale in instances to save money. With AWS ECS there are two levels of autoscaling one is at the cluster (EC2 instances for providing CPU and memory resources for the cluster) and the other one at the service (docker container instances to handle traffic).

Burstable CPU — An ECS service is allocated CPU and memory from an EC2 instance. A service allocated 1024 cpu units (1024 units = 1 core) on an instance that has 4096 available (quad core) is guaranteed to have one core available for the service at all times. If no other service is using the other three cores on the EC2 instance, the service is able to use all four cores if needed.

No extra cost associated with AWS ECS — Users only pay for the used AWS resources i.e. EC2, Elastic Load balancer, etc.

Zero Downtime deployments — When deploying a new version of a service, ECS will deploy the new version onto the cluster for staging. The load balancer executes a health check to the new containers canary endpoint. If the containers pass the health check with HTTP 200, the load balancer sends traffic to the new containers and drains the old containers for deletion.

The Bad

Might accidentally bring down a service — Autoscaling allows the ability to save money by turning off instances when not in use. However scale in action at the cluster level can bring down instances running tasks. AWS auto-scaling default policies delete instances with the oldest launch configuration or instances closest to the next billing hour. This can cause a service disruption if the instances terminated contained a service and its backups. Scaling in a cluster requires adding scale in protection to the instances running tasks to prevent service disruptions. At Zoosk, we created a python script to protect or unprotect instances that have a running task. Execution of the script happens before any scale in action. Protect, then scale in. When we initially migrated our services, we over provisioned our services to give us a buffer. After running for about a week, we did cost cutting by scaling in our cluster. Since I was an ECS newbie I just thought that ECS was smart enough to not take down instances that had a service running on it. Wrong. I brought down a production service for a minute as ECS brought it back up. Not a good look for my first year at Zoosk. So learn from my mistake!

Deploys could be better — During deployments there is no phase that allows a percentage of traffic through to the new containers and if things look good then commit, else roll back. ECS commits a deploy if the containers pass the health check from the load balancer. It will drain the existing containers before allowing one to check if the new version is stable with the production traffic. To rollback in ECS one must deploy the old task definition (containers). This increases the downtime of the service compared to flipping the traffic back to the old containers. Hence creating a canary endpoint for services is crucial for ECS because this is the gatekeeper of committing a deploy. ECS has no way of stopping a deploy unless you set a deploy action of the same task definition.

Sitting idle EC2 instance — EC2 instances in the cluster might sit there with no tasks running but are needed to provide resources for scaling out when the time comes. Zero downtime deploys will not execute if you do not have at least twice as much CPU and memory for the deployed service available in the cluster.

Outdated ECS docs — Cloudformation templates AWS has around ECS create resources that are not configured properly such as cloudwatch alarms for ECS. I was a newbie at Cloudformation and the template contained unnamed resources. Resources created in the Cloudformation stack will have a hash appended to the name. Scripting for resources can be difficult if the name changes for deploys with Cloudformation. If a cluster needs to be renamed to remove the hash it requires the entire cluster to be deleted and remade. So remember to name the resources.

Stateless applications only — Services developed for ECS have to be stateless due to scale-in and scale-out possibilities. During deploys ECS brings a new set of containers and gets rid of the old thus the state of your past version is gone.

The Learnings / Gotcha’s

Warm the load balancer — The AWS Elastic Load Balancer has limitations where the load balancer needs to gradually increase the request rate to allow time to scale, otherwise requests will be dropped. If you are sending all the traffic immediately you will need to contact AWS to “pre-warm” the load balancer or run a load test to warm it up. We faced this issue were we tried to send 400k requests per minute to a fresh load balancer and the load balancer would drop requests causing a downtime with our service. The workaround was that, before sending the traffic, we ran a load test to warm up the load balancer.

Autoscaling — Implementing autoscaling correctly involves getting rid of the peaks and falls of CPU usage by reducing and increasing CPU resources for a service. For a majority of our services we aimed to have a CPU usage range of 50–70%. Thus we set the scale in trigger at 50% and scale out trigger at 70%. We set a Cloudwatch alert to be triggered if the CPU reached 85% for longer than 5 minutes.

Metrics discrepancies — Two ways of monitoring the CPU and memory of the services is running Docker stats inside the EC2 and Cloudwatch for an aggregate of usage across all containers of the service. While testing one instance of a service I noticed discrepancies on how much CPU and memory was being utilized. Docker reported higher numbers and I assumed since it was the container platform it was correct. However when running the htop command on the EC2 instance the cpu usage coincided with the reportings in Cloudwatch. Trust Cloudwatch.

Memory allocations matters — If a service consumes more than the allocated memory the container will die. I wasted a lot of time wondering why the docker container kept dying and found out it went over the memory allocated in ECS. In Java, if you allocated lots of memory to the JVM, garbage collection would be triggered less often and the service would consume more memory. We found that our services were consuming a lot of memory when they did not have to. This wasted resources on ECS, which meant wasted money. Always profile (VisualVM is great) and load test (JMeter) to get a clear idea of how much memory the service needs.

Saving logs brought down a service — Send logs and metrics to a central location for analyzing, alerting, and monitoring. Cloudwatch is a great option for that. Set up access logs for ELB for auditing and storage in S3. There was an issue at Zoosk where services would stop working because there wasn’t any more disk space in the docker container. Services wrote their application and access logs locally. Docker disk space by default allocates only 10gb. By shipping logs and metrics to external services prevention of a service going down due to full disk space is gone.

Docker SHA — Deploying with SHA of docker is mandatory in production because using tags does not guarantee that the same version will be deployed (because if a developer pushes their changes with that tag it will overwrite the previous version). If you need to rollback, there is no way to do that with a tag. Docker SHA is a unique identifier of a version of a container. Using the SHA will allow audibility because tags can be overwritten in a docker registry but SHA’s can’t.

For Java developers — Because AWS resources use DNS name entries that occasionally change, we recommend that you configure your JVM with a TTL value of no more than 60 seconds. This ensures that when a resource’s IP address changes, your application will be able to receive and use the resource’s new IP address by requerying the DNS.

Conclusion

AWS ECS is an excellent option for hosting container services in the cloud. A developer can easily deploy and maintain services on ECS with minimal Ops work needed. ECS reduces the troubles of having to manage your own container orchestration platform at zero cost. Of course we wish that the deployment process could be improved and there are many features I would like to see in the product. But, overall, we are happy about how ECS has been able to serve our millions of users the features that they love.


Monday, May 9, 2022

AWS - Single-page application

This one is pretty simple starting point for  deployment model for single page application on AWS. Nothing complex and basic building blocks on aws. 




Ref: Single-page application - AWS Serverless Multi-Tier Architectures with Amazon API Gateway and AWS Lambda

TierComponents
Presentation

Static website content hosted in Amazon S3, distributed by CloudFront.

AWS Certificate Manager allows a custom SSL/TLS certificate to be used.

Logic

API Gateway with AWS Lambda.

This architecture shows three exposed services (/tickets/shows, and /info). API Gateway endpoints are secured by a Lambda authorizer. In this method, users sign in through a third-party identity provider and obtain access and ID tokens. These tokens are included in API Gateway calls, and the Lambda authorizer validates these tokens and generates an IAM policy containing API initiation permissions.

Each Lambda function is assigned its own IAM role to provide access to the appropriate data source.

Data

Amazon DynamoDB is used for the /tickets and /shows services.

Amazon ElastiCache is used by the /shows service to improve database performance. Cache misses are sent to DynamoDB.

Monday, May 2, 2022

CI/CD with API Management

 


Very good working code for 

Ref: GitHub - Azure/azure-api-management-devops-resource-kit: Azure API Management DevOps Resource Kit

  • How to automate deployment of APIs into API Management?
  • How to migrate configurations from one environment to another?
  • How to avoid interference between different development teams who share the same API Management instance?