🏗️ DevOps Altyapısı Sıfırdan Implementation Guide#
Hiçbir şeyin kurulu olmadığını varsayarak adım adım DevOps altyapısı kuracağız.
📋 ÖN KOŞULLAR VE HAZIRLIK#
🖥️ 1. Geliştirici Makine Kurulumu#
# 1.1 WSL2 kurulumu (Windows kullanıcıları için)
wsl --install
wsl --set-default-version 2
# 1.2 Essential tools kurulumu
# Ubuntu/Debian
sudo apt update && sudo apt install -y \
curl wget git vim nano unzip \
build-essential software-properties-common \
apt-transport-https ca-certificates gnupg lsb-release
# macOS
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
brew install curl wget git vim nano unzip
🔧 1.2 Development Tools Kurulumu#
# Docker kurulumu (Ubuntu)
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt update && sudo apt install -y docker-ce docker-ce-cli containerd.io docker-compose-plugin
sudo usermod -aG docker $USER
newgrp docker
# Docker kurulumu (macOS)
brew install --cask docker
# Docker test
docker --version
docker run hello-world
# kubectl kurulumu
curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
kubectl version --client
# Helm kurulumu
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash
helm version
# Terraform kurulumu
wget -O- https://apt.releases.hashicorp.com/gpg | sudo gpg --dearmor -o /usr/share/keyrings/hashicorp-archive-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | sudo tee /etc/apt/sources.list.d/hashicorp.list
sudo apt update && sudo apt install terraform
terraform --version
# AWS CLI kurulumu
curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
unzip awscliv2.zip
sudo ./aws/install
aws --version
🎯 1.3 IDE ve Editör Kurulumu#
# VS Code kurulumu
# Ubuntu
wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg
sudo install -o root -g root -m 644 packages.microsoft.gpg /etc/apt/trusted.gpg.d/
sudo sh -c 'echo "deb [arch=amd64,arm64,armhf signed-by=/etc/apt/trusted.gpg.d/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list'
sudo apt update && sudo apt install code
# macOS
brew install --cask visual-studio-code
# Essential VS Code extensions
code --install-extension ms-vscode-remote.remote-wsl
code --install-extension ms-vscode.vscode-docker
code --install-extension hashicorp.terraform
code --install-extension ms-kubernetes-tools.vscode-kubernetes-tools
code --install-extension redhat.vscode-yaml
code --install-extension ms-vscode.azure-account
🏢 PHASE 1: AWS HESAP VE İLK KURULUMLAR (Gün 1-2)#
☁️ 2.1 AWS Hesap Kurulumu ve Organization Setup#
# 2.1.1 AWS hesabı oluştur (manuel - web üzerinden)
# https://aws.amazon.com/free/ üzerinden hesap oluştur
# 2.1.2 AWS CLI konfigürasyonu
aws configure
# AWS Access Key ID [None]: YOUR_ACCESS_KEY
# AWS Secret Access Key [None]: YOUR_SECRET_KEY
# Default region name [None]: eu-west-1
# Default output format [None]: json
# 2.1.3 AWS hesap doğrulama
aws sts get-caller-identity
aws ec2 describe-regions
# 2.1.4 AWS Organization setup (Root hesap için)
aws organizations create-organization --feature-set ALL
# 2.1.5 Organizational Units oluştur
aws organizations create-organizational-unit \
--parent-id r-xxxx \
--name "Production"
aws organizations create-organizational-unit \
--parent-id r-xxxx \
--name "Development"
aws organizations create-organizational-unit \
--parent-id r-xxxx \
--name "Security"
🔐 2.2 IAM Setup ve Security Hardening#
# 2.2.1 Admin user oluştur (root user kullanmamak için)
aws iam create-user --user-name devops-admin
# 2.2.2 Admin user'a AdministratorAccess policy ekle
aws iam attach-user-policy \
--user-name devops-admin \
--policy-arn arn:aws:iam::aws:policy/AdministratorAccess
# 2.2.3 Admin user için programmatic access
aws iam create-access-key --user-name devops-admin
# Output'taki access key ve secret key'i kaydet
# 2.2.4 Password policy oluştur
cat > password-policy.json << 'EOF'
{
"MinimumPasswordLength": 12,
"RequireSymbols": true,
"RequireNumbers": true,
"RequireUppercaseCharacters": true,
"RequireLowercaseCharacters": true,
"AllowUsersToChangePassword": true,
"MaxPasswordAge": 90,
"PasswordReusePrevention": 5,
"HardExpiry": false
}
EOF
aws iam update-account-password-policy --cli-input-json file://password-policy.json
# 2.2.5 MFA activation (console üzerinden yapılacak)
# https://console.aws.amazon.com/iam/home#/security_credentials
🏗️ 2.3 Project Directory Structure Oluşturma#
# 2.3.1 Ana proje dizini oluştur
mkdir -p ~/devops-infrastructure
cd ~/devops-infrastructure
# 2.3.2 Directory structure oluştur
mkdir -p {terraform/{modules,environments/{dev,staging,prod}},kubernetes/{base,overlays/{dev,staging,prod}},docker,scripts,docs,monitoring,backup}
# 2.3.3 Git repository initialize
git init
git config user.name "Your Name"
git config user.email "your.email@company.com"
# 2.3.4 .gitignore oluştur
cat > .gitignore << 'EOF'
# Terraform
*.tfstate
*.tfstate.*
.terraform/
.terraform.lock.hcl
terraform.tfvars
*.tfplan
# Docker
.dockerignore
# IDE
.vscode/
.idea/
# OS
.DS_Store
Thumbs.db
# Logs
*.log
# Secrets
secrets/
*.pem
*.key
!public.key
# Backup
backup/
EOF
# 2.3.5 README.md oluştur
cat > README.md << 'EOF'
# DevOps Infrastructure
Bu repository şirketimizin DevOps altyapısını içerir.
## Struktur
- `terraform/` - Infrastructure as Code
- `kubernetes/` - K8s manifests
- `docker/` - Dockerfile'lar
- `scripts/` - Automation scripts
- `docs/` - Dokümantasyon
- `monitoring/` - Monitoring configs
- `backup/` - Backup scripts
## Kurulum
[Kurulum talimatları buraya]
EOF
git add .
git commit -m "Initial project structure"
🛠️ PHASE 2: TERRAFORM VE INFRASTRUCTURE AS CODE (Gün 3-5)#
🏗️ 3.1 Terraform Backend Setup#
# 3.1.1 Terraform backend için S3 bucket ve DynamoDB table oluştur
cd ~/devops-infrastructure/terraform
# 3.1.2 Backend setup script
cat > setup-backend.sh << 'EOF'
#!/bin/bash
# Variables
BUCKET_NAME="devops-terraform-state-$(openssl rand -hex 8)"
REGION="eu-west-1"
DYNAMODB_TABLE="terraform-state-lock"
# S3 bucket oluştur
aws s3 mb s3://$BUCKET_NAME --region $REGION
# S3 bucket versioning aktifleştir
aws s3api put-bucket-versioning \
--bucket $BUCKET_NAME \
--versioning-configuration Status=Enabled
# S3 bucket encryption aktifleştir
aws s3api put-bucket-encryption \
--bucket $BUCKET_NAME \
--server-side-encryption-configuration '{
"Rules": [
{
"ApplyServerSideEncryptionByDefault": {
"SSEAlgorithm": "AES256"
}
}
]
}'
# DynamoDB table oluştur
aws dynamodb create-table \
--table-name $DYNAMODB_TABLE \
--attribute-definitions AttributeName=LockID,AttributeType=S \
--key-schema AttributeName=LockID,KeyType=HASH \
--provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
--region $REGION
echo "Backend setup completed!"
echo "S3 Bucket: $BUCKET_NAME"
echo "DynamoDB Table: $DYNAMODB_TABLE"
echo "Region: $REGION"
# .env dosyasına kaydet
cat > ../.env << EOF
export TF_VAR_backend_bucket=$BUCKET_NAME
export TF_VAR_backend_region=$REGION
export TF_VAR_backend_dynamodb_table=$DYNAMODB_TABLE
EOF
EOF
chmod +x setup-backend.sh
./setup-backend.sh
source ../.env
🗂️ 3.2 Terraform Module Structure#
# 3.2.1 Terraform modules dizin yapısı
cd ~/devops-infrastructure/terraform/modules
# 3.2.2 VPC module
mkdir -p vpc
cat > vpc/main.tf << 'EOF'
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "environment" {
description = "Environment name"
type = string
}
variable "project_name" {
description = "Project name"
type = string
}
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = {
Name = "${var.project_name}-${var.environment}-vpc"
Environment = var.environment
Project = var.project_name
}
}
# Internet Gateway
resource "aws_internet_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "${var.project_name}-${var.environment}-igw"
Environment = var.environment
Project = var.project_name
}
}
# Public Subnets
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = {
Name = "${var.project_name}-${var.environment}-public-${count.index + 1}"
Environment = var.environment
Project = var.project_name
Type = "public"
}
}
# Private Subnets
resource "aws_subnet" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index + length(var.availability_zones))
availability_zone = var.availability_zones[count.index]
tags = {
Name = "${var.project_name}-${var.environment}-private-${count.index + 1}"
Environment = var.environment
Project = var.project_name
Type = "private"
}
}
# Elastic IPs for NAT Gateways
resource "aws_eip" "nat" {
count = length(var.availability_zones)
domain = "vpc"
depends_on = [aws_internet_gateway.main]
tags = {
Name = "${var.project_name}-${var.environment}-eip-${count.index + 1}"
Environment = var.environment
Project = var.project_name
}
}
# NAT Gateways
resource "aws_nat_gateway" "main" {
count = length(var.availability_zones)
allocation_id = aws_eip.nat[count.index].id
subnet_id = aws_subnet.public[count.index].id
tags = {
Name = "${var.project_name}-${var.environment}-nat-${count.index + 1}"
Environment = var.environment
Project = var.project_name
}
depends_on = [aws_internet_gateway.main]
}
# Route table for public subnets
resource "aws_route_table" "public" {
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
gateway_id = aws_internet_gateway.main.id
}
tags = {
Name = "${var.project_name}-${var.environment}-public-rt"
Environment = var.environment
Project = var.project_name
}
}
# Route table associations for public subnets
resource "aws_route_table_association" "public" {
count = length(aws_subnet.public)
subnet_id = aws_subnet.public[count.index].id
route_table_id = aws_route_table.public.id
}
# Route tables for private subnets
resource "aws_route_table" "private" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
route {
cidr_block = "0.0.0.0/0"
nat_gateway_id = aws_nat_gateway.main[count.index].id
}
tags = {
Name = "${var.project_name}-${var.environment}-private-rt-${count.index + 1}"
Environment = var.environment
Project = var.project_name
}
}
# Route table associations for private subnets
resource "aws_route_table_association" "private" {
count = length(aws_subnet.private)
subnet_id = aws_subnet.private[count.index].id
route_table_id = aws_route_table.private[count.index].id
}
# VPC Flow Logs
resource "aws_flow_log" "vpc" {
iam_role_arn = aws_iam_role.flow_log.arn
log_destination = aws_cloudwatch_log_group.vpc_flow_log.arn
traffic_type = "ALL"
vpc_id = aws_vpc.main.id
}
resource "aws_cloudwatch_log_group" "vpc_flow_log" {
name = "/aws/vpc/flow-logs"
retention_in_days = 7
}
resource "aws_iam_role" "flow_log" {
name = "${var.project_name}-${var.environment}-flow-log-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "vpc-flow-logs.amazonaws.com"
}
}
]
})
}
resource "aws_iam_role_policy" "flow_log" {
name = "${var.project_name}-${var.environment}-flow-log-policy"
role = aws_iam_role.flow_log.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents",
"logs:DescribeLogGroups",
"logs:DescribeLogStreams"
]
Effect = "Allow"
Resource = "*"
}
]
})
}
EOF
cat > vpc/outputs.tf << 'EOF'
output "vpc_id" {
description = "ID of the VPC"
value = aws_vpc.main.id
}
output "vpc_cidr_block" {
description = "CIDR block of the VPC"
value = aws_vpc.main.cidr_block
}
output "public_subnet_ids" {
description = "IDs of the public subnets"
value = aws_subnet.public[*].id
}
output "private_subnet_ids" {
description = "IDs of the private subnets"
value = aws_subnet.private[*].id
}
output "internet_gateway_id" {
description = "ID of the Internet Gateway"
value = aws_internet_gateway.main.id
}
output "nat_gateway_ids" {
description = "IDs of the NAT Gateways"
value = aws_nat_gateway.main[*].id
}
EOF
cat > vpc/versions.tf << 'EOF'
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
}
EOF
🔒 3.3 Security Groups Module#
# 3.3.1 Security Groups module
mkdir -p security-groups
cat > security-groups/main.tf << 'EOF'
variable "vpc_id" {
description = "VPC ID"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "project_name" {
description = "Project name"
type = string
}
# ALB Security Group
resource "aws_security_group" "alb" {
name_prefix = "${var.project_name}-${var.environment}-alb-"
vpc_id = var.vpc_id
ingress {
description = "HTTP"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
ingress {
description = "HTTPS"
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-alb-sg"
Environment = var.environment
Project = var.project_name
}
lifecycle {
create_before_destroy = true
}
}
# EKS Cluster Security Group
resource "aws_security_group" "eks_cluster" {
name_prefix = "${var.project_name}-${var.environment}-eks-cluster-"
vpc_id = var.vpc_id
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-eks-cluster-sg"
Environment = var.environment
Project = var.project_name
}
lifecycle {
create_before_destroy = true
}
}
# EKS Node Group Security Group
resource "aws_security_group" "eks_nodes" {
name_prefix = "${var.project_name}-${var.environment}-eks-nodes-"
vpc_id = var.vpc_id
ingress {
description = "Allow nodes to communicate with each other"
from_port = 0
to_port = 65535
protocol = "tcp"
self = true
}
ingress {
description = "Allow worker Kubelets and pods to receive communication from the cluster control plane"
from_port = 1025
to_port = 65535
protocol = "tcp"
security_groups = [aws_security_group.eks_cluster.id]
}
ingress {
description = "Allow pods running extension API servers on port 443 to receive communication from cluster control plane"
from_port = 443
to_port = 443
protocol = "tcp"
security_groups = [aws_security_group.eks_cluster.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-eks-nodes-sg"
Environment = var.environment
Project = var.project_name
}
lifecycle {
create_before_destroy = true
}
}
# RDS Security Group
resource "aws_security_group" "rds" {
name_prefix = "${var.project_name}-${var.environment}-rds-"
vpc_id = var.vpc_id
ingress {
description = "MySQL/Aurora"
from_port = 3306
to_port = 3306
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}
ingress {
description = "PostgreSQL"
from_port = 5432
to_port = 5432
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-rds-sg"
Environment = var.environment
Project = var.project_name
}
lifecycle {
create_before_destroy = true
}
}
# ElastiCache Security Group
resource "aws_security_group" "elasticache" {
name_prefix = "${var.project_name}-${var.environment}-elasticache-"
vpc_id = var.vpc_id
ingress {
description = "Redis"
from_port = 6379
to_port = 6379
protocol = "tcp"
security_groups = [aws_security_group.eks_nodes.id]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Name = "${var.project_name}-${var.environment}-elasticache-sg"
Environment = var.environment
Project = var.project_name
}
lifecycle {
create_before_destroy = true
}
}
EOF
cat > security-groups/outputs.tf << 'EOF'
output "alb_security_group_id" {
description = "ALB Security Group ID"
value = aws_security_group.alb.id
}
output "eks_cluster_security_group_id" {
description = "EKS Cluster Security Group ID"
value = aws_security_group.eks_cluster.id
}
output "eks_nodes_security_group_id" {
description = "EKS Nodes Security Group ID"
value = aws_security_group.eks_nodes.id
}
output "rds_security_group_id" {
description = "RDS Security Group ID"
value = aws_security_group.rds.id
}
output "elasticache_security_group_id" {
description = "ElastiCache Security Group ID"
value = aws_security_group.elasticache.id
}
EOF
🔧 3.4 EKS Module#
# 3.4.1 EKS module
mkdir -p eks
cat > eks/main.tf << 'EOF'
variable "cluster_name" {
description = "EKS cluster name"
type = string
}
variable "cluster_version" {
description = "Kubernetes version"
type = string
default = "1.28"
}
variable "subnet_ids" {
description = "Subnet IDs for EKS cluster"
type = list(string)
}
variable "node_subnet_ids" {
description = "Subnet IDs for EKS node groups"
type = list(string)
}
variable "cluster_security_group_id" {
description = "Security group ID for EKS cluster"
type = string
}
variable "node_security_group_id" {
description = "Security group ID for EKS nodes"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "project_name" {
description = "Project name"
type = string
}
# EKS Cluster IAM Role
resource "aws_iam_role" "cluster" {
name = "${var.project_name}-${var.environment}-eks-cluster-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "eks.amazonaws.com"
}
}
]
})
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_iam_role_policy_attachment" "cluster_AmazonEKSClusterPolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
role = aws_iam_role.cluster.name
}
# EKS Node Group IAM Role
resource "aws_iam_role" "node_group" {
name = "${var.project_name}-${var.environment}-eks-node-group-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "ec2.amazonaws.com"
}
}
]
})
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_iam_role_policy_attachment" "node_group_AmazonEKSWorkerNodePolicy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
role = aws_iam_role.node_group.name
}
resource "aws_iam_role_policy_attachment" "node_group_AmazonEKS_CNI_Policy" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
role = aws_iam_role.node_group.name
}
resource "aws_iam_role_policy_attachment" "node_group_AmazonEC2ContainerRegistryReadOnly" {
policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
role = aws_iam_role.node_group.name
}
# EKS Cluster
resource "aws_eks_cluster" "main" {
name = var.cluster_name
role_arn = aws_iam_role.cluster.arn
version = var.cluster_version
vpc_config {
subnet_ids = var.subnet_ids
security_group_ids = [var.cluster_security_group_id]
endpoint_private_access = true
endpoint_public_access = true
public_access_cidrs = ["0.0.0.0/0"]
}
enabled_cluster_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
encryption_config {
provider {
key_arn = aws_kms_key.eks.arn
}
resources = ["secrets"]
}
depends_on = [
aws_iam_role_policy_attachment.cluster_AmazonEKSClusterPolicy,
aws_cloudwatch_log_group.eks
]
tags = {
Name = var.cluster_name
Environment = var.environment
Project = var.project_name
}
}
# CloudWatch Log Group for EKS
resource "aws_cloudwatch_log_group" "eks" {
name = "/aws/eks/${var.cluster_name}/cluster"
retention_in_days = 7
}
# KMS Key for EKS encryption
resource "aws_kms_key" "eks" {
description = "EKS Secret Encryption Key"
deletion_window_in_days = 7
tags = {
Name = "${var.project_name}-${var.environment}-eks-kms"
Environment = var.environment
Project = var.project_name
}
}
resource "aws_kms_alias" "eks" {
name = "alias/${var.project_name}-${var.environment}-eks"
target_key_id = aws_kms_key.eks.key_id
}
# EKS Node Group
resource "aws_eks_node_group" "main" {
cluster_name = aws_eks_cluster.main.name
node_group_name = "${var.cluster_name}-node-group"
node_role_arn = aws_iam_role.node_group.arn
subnet_ids = var.node_subnet_ids
capacity_type = "ON_DEMAND"
ami_type = "AL2_x86_64"
instance_types = ["t3.medium"]
disk_size = 20
scaling_config {
desired_size = 2
max_size = 10
min_size = 1
}
update_config {
max_unavailable = 1
}
# Remote access configuration
remote_access {
ec2_ssh_key = aws_key_pair.eks_nodes.key_name
source_security_group_ids = [var.node_security_group_id]
}
depends_on = [
aws_iam_role_policy_attachment.node_group_AmazonEKSWorkerNodePolicy,
aws_iam_role_policy_attachment.node_group_AmazonEKS_CNI_Policy,
aws_iam_role_policy_attachment.node_group_AmazonEC2ContainerRegistryReadOnly,
]
tags = {
Name = "${var.cluster_name}-node-group"
Environment = var.environment
Project = var.project_name
}
}
# SSH Key Pair for EKS nodes
resource "aws_key_pair" "eks_nodes" {
key_name = "${var.cluster_name}-eks-nodes"
public_key = file("~/.ssh/id_rsa.pub")
tags = {
Name = "${var.cluster_name}-eks-nodes"
Environment = var.environment
Project = var.project_name
}
}
# EKS Add-ons
resource "aws_eks_addon" "coredns" {
cluster_name = aws_eks_cluster.main.name
addon_name = "coredns"
addon_version = "v1.10.1-eksbuild.5"
resolve_conflicts_on_create = "OVERWRITE"
}
resource "aws_eks_addon" "kube_proxy" {
cluster_name = aws_eks_cluster.main.name
addon_name = "kube-proxy"
addon_version = "v1.28.2-eksbuild.2"
resolve_conflicts_on_create = "OVERWRITE"
}
resource "aws_eks_addon" "vpc_cni" {
cluster_name = aws_eks_cluster.main.name
addon_name = "vpc-cni"
addon_version = "v1.15.1-eksbuild.1"
resolve_conflicts_on_create = "OVERWRITE"
}
resource "aws_eks_addon" "ebs_csi" {
cluster_name = aws_eks_cluster.main.name
addon_name = "aws-ebs-csi-driver"
addon_version = "v1.25.0-eksbuild.1"
resolve_conflicts_on_create = "OVERWRITE"
}
EOF
cat > eks/outputs.tf << 'EOF'
output "cluster_id" {
description = "EKS cluster ID"
value = aws_eks_cluster.main.id
}
output "cluster_arn" {
description = "EKS cluster ARN"
value = aws_eks_cluster.main.arn
}
output "cluster_endpoint" {
description = "EKS cluster endpoint"
value = aws_eks_cluster.main.endpoint
}
output "cluster_security_group_id" {
description = "EKS cluster security group ID"
value = aws_eks_cluster.main.vpc_config[0].cluster_security_group_id
}
output "cluster_certificate_authority_data" {
description = "EKS cluster certificate authority data"
value = aws_eks_cluster.main.certificate_authority[0].data
}
output "cluster_version" {
description = "EKS cluster Kubernetes version"
value = aws_eks_cluster.main.version
}
output "node_group_arn" {
description = "EKS node group ARN"
value = aws_eks_node_group.main.arn
}
output "node_group_status" {
description = "EKS node group status"
value = aws_eks_node_group.main.status
}
EOF
🗃️ 3.5 RDS Module#
# 3.5.1 RDS module
mkdir -p rds
cat > rds/main.tf << 'EOF'
variable "db_name" {
description = "Database name"
type = string
}
variable "db_username" {
description = "Database username"
type = string
default = "admin"
}
variable "db_password" {
description = "Database password"
type = string
sensitive = true
}
variable "subnet_ids" {
description = "Subnet IDs for RDS"
type = list(string)
}
variable "security_group_id" {
description = "Security group ID for RDS"
type = string
}
variable "environment" {
description = "Environment name"
type = string
}
variable "project_name" {
description = "Project name"
type = string
}
variable "engine" {
description = "Database engine"
type = string
default = "postgres"
}
variable "engine_version" {
description = "Database engine version"
type = string
default = "15.4"
}
variable "instance_class" {
description = "RDS instance class"
type = string
default = "db.t3.micro"
}
variable "allocated_storage" {
description = "RDS allocated storage"
type = number
default = 20
}
variable "backup_retention_period" {
description = "Backup retention period in days"
type = number
default = 7
}
# DB Subnet Group
resource "aws_db_subnet_group" "main" {
name = "${var.project_name}-${var.environment}-db-subnet-group"
subnet_ids = var.subnet_ids
tags = {
Name = "${var.project_name}-${var.environment}-db-subnet-group"
Environment = var.environment
Project = var.project_name
}
}
# DB Parameter Group
resource "aws_db_parameter_group" "main" {
family = "${var.engine}15"
name = "${var.project_name}-${var.environment}-db-params"
dynamic "parameter" {
for_each = var.engine == "postgres" ? [
{
name = "log_statement"
value = "all"
},
{
name = "log_duration"
value = "1"
},
{
name = "log_min_duration_statement"
value = "1000"
}
] : []
content {
name = parameter.value.name
value = parameter.value.value
}
}
tags = {
Name = "${var.project_name}-${var.environment}-db-params"
Environment = var.environment
Project = var.project_name
}
}
# KMS Key for RDS encryption
resource "aws_kms_key" "rds" {
description = "RDS encryption key"
deletion_window_in_days = 7
tags = {
Name = "${var.project_name}-${var.environment}-rds-kms"
Environment = var.environment
Project = var.project_name
}
}
resource "aws_kms_alias" "rds" {
name = "alias/${var.project_name}-${var.environment}-rds"
target_key_id = aws_kms_key.rds.key_id
}
# RDS Instance
resource "aws_db_instance" "main" {
identifier = "${var.project_name}-${var.environment}-db"
# Engine options
engine = var.engine
engine_version = var.engine_version
instance_class = var.instance_class
# Storage
allocated_storage = var.allocated_storage
max_allocated_storage = var.allocated_storage * 2
storage_type = "gp3"
storage_encrypted = true
kms_key_id = aws_kms_key.rds.arn
# Database
db_name = var.db_name
username = var.db_username
password = var.db_password
# Network & Security
db_subnet_group_name = aws_db_subnet_group.main.name
vpc_security_group_ids = [var.security_group_id]
publicly_accessible = false
# Backup
backup_retention_period = var.backup_retention_period
backup_window = "03:00-04:00"
maintenance_window = "sun:04:00-sun:05:00"
# Monitoring
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
# Performance Insights
performance_insights_enabled = true
performance_insights_kms_key_id = aws_kms_key.rds.arn
# Parameters
parameter_group_name = aws_db_parameter_group.main.name
# Deletion protection
deletion_protection = var.environment == "prod" ? true : false
skip_final_snapshot = var.environment == "prod" ? false : true
final_snapshot_identifier = var.environment == "prod" ? "${var.project_name}-${var.environment}-final-snapshot-${formatdate("YYYY-MM-DD-hhmm", timestamp())}" : null
tags = {
Name = "${var.project_name}-${var.environment}-db"
Environment = var.environment
Project = var.project_name
}
}
# IAM Role for RDS Enhanced Monitoring
resource "aws_iam_role" "rds_monitoring" {
name = "${var.project_name}-${var.environment}-rds-monitoring-role"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "monitoring.rds.amazonaws.com"
}
}
]
})
tags = {
Environment = var.environment
Project = var.project_name
}
}
resource "aws_iam_role_policy_attachment" "rds_monitoring" {
role = aws_iam_role.rds_monitoring.name
policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonRDSEnhancedMonitoringRole"
}
# Read Replica (for production)
resource "aws_db_instance" "read_replica" {
count = var.environment == "prod" ? 1 : 0
identifier = "${var.project_name}-${var.environment}-db-read-replica"
replicate_source_db = aws_db_instance.main.identifier
instance_class = var.instance_class
# Network & Security
vpc_security_group_ids = [var.security_group_id]
publicly_accessible = false
# Monitoring
monitoring_interval = 60
monitoring_role_arn = aws_iam_role.rds_monitoring.arn
# Performance Insights
performance_insights_enabled = true
performance_insights_kms_key_id = aws_kms_key.rds.arn
skip_final_snapshot = true
tags = {
Name = "${var.project_name}-${var.environment}-db-read-replica"
Environment = var.environment
Project = var.project_name
}
}
EOF
cat > rds/outputs.tf << 'EOF'
output "db_instance_endpoint" {
description = "RDS instance endpoint"
value = aws_db_instance.main.endpoint
}
output "db_instance_id" {
description = "RDS instance ID"
value = aws_db_instance.main.id
}
output "db_instance_arn" {
description = "RDS instance ARN"
value = aws_db_instance.main.arn
}
output "db_instance_port" {
description = "RDS instance port"
value = aws_db_instance.main.port
}
output "db_subnet_group_id" {
description = "DB subnet group ID"
value = aws_db_subnet_group.main.id
}
output "db_parameter_group_id" {
description = "DB parameter group ID"
value = aws_db_parameter_group.main.id
}
output "read_replica_endpoint" {
description = "Read replica endpoint"
value = var.environment == "prod" ? aws_db_instance.read_replica[0].endpoint : null
}
EOF
🎯 3.6 Environment-Specific Configurations#
# 3.6.1 Development environment
cd ~/devops-infrastructure/terraform/environments/dev
# SSH key pair oluştur
ssh-keygen -t rsa -b 4096 -C "devops@company.com" -f ~/.ssh/id_rsa -N ""
cat > main.tf << 'EOF'
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
# Backend configuration will be provided via backend config file
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "Terraform"
}
}
}
# Local values
locals {
cluster_name = "${var.project_name}-${var.environment}-eks"
}
# VPC Module
module "vpc" {
source = "../../modules/vpc"
vpc_cidr = var.vpc_cidr
availability_zones = var.availability_zones
environment = var.environment
project_name = var.project_name
}
# Security Groups Module
module "security_groups" {
source = "../../modules/security-groups"
vpc_id = module.vpc.vpc_id
environment = var.environment
project_name = var.project_name
}
# EKS Module
module "eks" {
source = "../../modules/eks"
cluster_name = local.cluster_name
cluster_version = var.kubernetes_version
subnet_ids = concat(module.vpc.public_subnet_ids, module.vpc.private_subnet_ids)
node_subnet_ids = module.vpc.private_subnet_ids
cluster_security_group_id = module.security_groups.eks_cluster_security_group_id
node_security_group_id = module.security_groups.eks_nodes_security_group_id
environment = var.environment
project_name = var.project_name
}
# RDS Module
module "rds" {
source = "../../modules/rds"
db_name = var.db_name
db_username = var.db_username
db_password = var.db_password
subnet_ids = module.vpc.private_subnet_ids
security_group_id = module.security_groups.rds_security_group_id
environment = var.environment
project_name = var.project_name
engine = "postgres"
engine_version = "15.4"
instance_class = "db.t3.micro"
allocated_storage = 20
}
EOF
cat > variables.tf << 'EOF'
variable "aws_region" {
description = "AWS region"
type = string
default = "eu-west-1"
}
variable "environment" {
description = "Environment name"
type = string
default = "dev"
}
variable "project_name" {
description = "Project name"
type = string
default = "mycompany"
}
variable "vpc_cidr" {
description = "CIDR block for VPC"
type = string
default = "10.0.0.0/16"
}
variable "availability_zones" {
description = "Availability zones"
type = list(string)
default = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
}
variable "kubernetes_version" {
description = "Kubernetes version"
type = string
default = "1.28"
}
variable "db_name" {
description = "Database name"
type = string
default = "mycompanydb"
}
variable "db_username" {
description = "Database username"
type = string
default = "admin"
}
variable "db_password" {
description = "Database password"
type = string
sensitive = true
}
EOF
cat > terraform.tfvars << 'EOF'
aws_region = "eu-west-1"
environment = "dev"
project_name = "mycompany"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
kubernetes_version = "1.28"
db_name = "mycompanydb"
db_username = "admin"
db_password = "SuperSecurePassword123!"
EOF
cat > outputs.tf << 'EOF'
output "vpc_id" {
description = "VPC ID"
value = module.vpc.vpc_id
}
output "eks_cluster_endpoint" {
description = "EKS cluster endpoint"
value = module.eks.cluster_endpoint
}
output "eks_cluster_name" {
description = "EKS cluster name"
value = module.eks.cluster_id
}
output "rds_endpoint" {
description = "RDS endpoint"
value = module.rds.db_instance_endpoint
}
output "configure_kubectl" {
description = "Configure kubectl command"
value = "aws eks update-kubeconfig --region ${var.aws_region} --name ${module.eks.cluster_id}"
}
EOF
# Backend configuration
cat > backend.conf << EOF
bucket = "$TF_VAR_backend_bucket"
key = "dev/terraform.tfstate"
region = "$TF_VAR_backend_region"
dynamodb_table = "$TF_VAR_backend_dynamodb_table"
encrypt = true
EOF
🚀 3.7 Terraform Initialize ve Deploy#
# 3.7.1 Terraform initialize
cd ~/devops-infrastructure/terraform/environments/dev
terraform init -backend-config=backend.conf
# 3.7.2 Terraform plan
terraform plan -out=tfplan
# 3.7.3 Terraform apply
terraform apply tfplan
# 3.7.4 kubectl konfigürasyonu
aws eks update-kubeconfig --region eu-west-1 --name $(terraform output -raw eks_cluster_name)
# 3.7.5 Cluster bağlantısını test et
kubectl get nodes
kubectl get pods --all-namespaces
# 3.7.6 Terraform outputs
terraform output
🐳 PHASE 3: CONTAINERIZATION VE REGISTRY (Gün 6-7)#
📦 4.1 GitHub Container Registry Setup#
# 4.1.1 GitHub Personal Access Token oluştur
# GitHub -> Settings -> Developer settings -> Personal access tokens -> Tokens (classic)
# Permissions: write:packages, read:packages, delete:packages
# 4.1.2 GitHub Container Registry'ye login
echo $GITHUB_TOKEN | docker login ghcr.io -u USERNAME --password-stdin
# 4.1.3 Test image push
docker pull hello-world
docker tag hello-world ghcr.io/yourusername/hello-world:latest
docker push ghcr.io/yourusername/hello-world:latest
🏗️ 4.2 Docker Multi-Stage Build Templates#
# 4.2.1 Docker templates dizini
cd ~/devops-infrastructure/docker
mkdir -p {nodejs,python,golang,java,nginx}
# 4.2.2 Node.js Dockerfile template
cat > nodejs/Dockerfile << 'EOF'
# Multi-stage build for Node.js applications
FROM node:18-alpine AS builder
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy source code
COPY . .
# Build application
RUN npm run build
# Production stage
FROM node:18-alpine AS production
# Install dumb-init for proper signal handling
RUN apk add --no-cache dumb-init
# Create non-root user
RUN addgroup -g 1001 -S nodejs && \
adduser -S nodejs -u 1001
# Set working directory
WORKDIR /app
# Copy package files
COPY package*.json ./
# Install only production dependencies
RUN npm ci --only=production && npm cache clean --force
# Copy built application from builder stage
COPY --from=builder --chown=nodejs:nodejs /app/dist ./dist
COPY --from=builder --chown=nodejs:nodejs /app/node_modules ./node_modules
# Switch to non-root user
USER nodejs
# Expose port
EXPOSE 3000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD node healthcheck.js
# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
# Start application
CMD ["node", "dist/index.js"]
EOF
# 4.2.3 Python Dockerfile template
cat > python/Dockerfile << 'EOF'
# Multi-stage build for Python applications
FROM python:3.11-slim AS builder
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PIP_NO_CACHE_DIR=1 \
PIP_DISABLE_PIP_VERSION_CHECK=1
# Install system dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Create virtual environment
RUN python -m venv /opt/venv
ENV PATH="/opt/venv/bin:$PATH"
# Copy requirements
COPY requirements.txt .
# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt
# Production stage
FROM python:3.11-slim AS production
# Set environment variables
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1 \
PATH="/opt/venv/bin:$PATH"
# Install runtime dependencies
RUN apt-get update && apt-get install -y --no-install-recommends \
dumb-init \
&& rm -rf /var/lib/apt/lists/*
# Create non-root user
RUN groupadd -r appuser && useradd -r -g appuser appuser
# Copy virtual environment from builder
COPY --from=builder /opt/venv /opt/venv
# Set working directory
WORKDIR /app
# Copy application code
COPY --chown=appuser:appuser . .
# Switch to non-root user
USER appuser
# Expose port
EXPOSE 8000
# Health check
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD python healthcheck.py
# Use dumb-init to handle signals properly
ENTRYPOINT ["dumb-init", "--"]
# Start application
CMD ["python", "app.py"]
EOF
# 4.2.4 Golang Dockerfile template
cat > golang/Dockerfile << 'EOF'
# Multi-stage build for Go applications
FROM golang:1.21-alpine AS builder
# Install git for go modules
RUN apk add --no-cache git
# Set working directory
WORKDIR /app
# Copy go mod files
COPY go.mod go.sum ./
# Download dependencies
RUN go mod download
# Copy source code
COPY . .
# Build application with optimizations
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build \
-ldflags='-w -s -extldflags "-static"' \
-a -installsuffix cgo \
-o main .
# Production stage
FROM scratch AS production
# Add ca-certificates for HTTPS
COPY --from=builder /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
# Copy binary from builder
COPY --from=builder /app/main /main
# Expose port
EXPOSE 8080
# Health check (for scratch images, implement in Go)
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD ["/main", "-health"]
# Start application
ENTRYPOINT ["/main"]
EOF
# 4.2.5 Docker Compose template
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
web:
build:
context: .
dockerfile: Dockerfile
target: production
ports:
- "3000:3000"
environment:
- NODE_ENV=production
- DATABASE_URL=postgresql://user:password@db:5432/myapp
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
networks:
- app-network
restart: unless-stopped
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
interval: 30s
timeout: 10s
retries: 3
db:
image: postgres:15-alpine
environment:
- POSTGRES_DB=myapp
- POSTGRES_USER=user
- POSTGRES_PASSWORD=password
volumes:
- postgres_data:/var/lib/postgresql/data
- ./init.sql:/docker-entrypoint-initdb.d/init.sql
networks:
- app-network
restart: unless-stopped
healthcheck:
test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
interval: 30s
timeout: 5s
retries: 3
redis:
image: redis:7-alpine
command: redis-server --appendonly yes
volumes:
- redis_data:/data
networks:
- app-network
restart: unless-stopped
healthcheck:
test: ["CMD", "redis-cli", "ping"]
interval: 30s
timeout: 5s
retries: 3
volumes:
postgres_data:
redis_data:
networks:
app-network:
driver: bridge
EOF
# 4.2.6 .dockerignore
cat > .dockerignore << 'EOF'
# Git
.git
.gitignore
# Documentation
README.md
CHANGELOG.md
docs/
# Dependencies
node_modules/
vendor/
__pycache__/
*.pyc
target/
# Build artifacts
dist/
build/
*.log
# IDE
.vscode/
.idea/
*.swp
*.swo
# OS
.DS_Store
Thumbs.db
# Environment
.env
.env.local
.env.*.local
# Testing
coverage/
.nyc_output/
test-results/
# Terraform
*.tfstate
*.tfstate.*
.terraform/
# Docker
Dockerfile*
docker-compose*
EOF
🔒 4.3 Container Security Scanning Setup#
# 4.3.1 Trivy kurulumu (vulnerability scanner)
# Ubuntu/Debian
wget -qO - https://aquasecurity.github.io/trivy-repo/deb/public.key | sudo apt-key add -
echo "deb https://aquasecurity.github.io/trivy-repo/deb $(lsb_release -sc) main" | sudo tee -a /etc/apt/sources.list.d/trivy.list
sudo apt-get update && sudo apt-get install trivy
# macOS
brew install trivy
# 4.3.2 Hadolint kurulumu (Dockerfile linter)
# Ubuntu/Debian
wget -O hadolint https://github.com/hadolint/hadolint/releases/download/v2.12.0/hadolint-Linux-x86_64
chmod +x hadolint
sudo mv hadolint /usr/local/bin/
# macOS
brew install hadolint
# 4.3.3 Container security scanning script
cat > ~/devops-infrastructure/scripts/container-security-scan.sh << 'EOF'
#!/bin/bash
# Container Security Scanning Script
set -e
IMAGE_NAME=$1
if [ -z "$IMAGE_NAME" ]; then
echo "Usage: $0 <image-name>"
exit 1
fi
echo "🔍 Starting security scan for $IMAGE_NAME..."
# 1. Dockerfile linting
echo "📋 Running Dockerfile lint..."
if [ -f "Dockerfile" ]; then
hadolint Dockerfile || echo "⚠️ Dockerfile linting issues found"
else
echo "❌ Dockerfile not found"
fi
# 2. Image vulnerability scanning
echo "🛡️ Running vulnerability scan..."
trivy image --exit-code 1 --severity HIGH,CRITICAL $IMAGE_NAME
# 3. Configuration scanning
echo "⚙️ Running configuration scan..."
trivy config --exit-code 1 .
# 4. Secret scanning
echo "🔐 Running secret scan..."
trivy fs --exit-code 1 --scanners secret .
echo "✅ Security scan completed for $IMAGE_NAME"
EOF
chmod +x ~/devops-infrastructure/scripts/container-security-scan.sh
# 4.3.4 Pre-commit hooks için security scanning
cat > ~/devops-infrastructure/.pre-commit-config.yaml << 'EOF'
repos:
- repo: https://github.com/hadolint/hadolint
rev: v2.12.0
hooks:
- id: hadolint-docker
args: [--config, .hadolint.yaml]
- repo: https://github.com/aquasecurity/trivy
rev: v0.48.0
hooks:
- id: trivy-docker
args: [--exit-code, "1", --severity, "HIGH,CRITICAL"]
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-added-large-files
- id: check-merge-conflict
EOF
# Hadolint config
cat > ~/devops-infrastructure/.hadolint.yaml << 'EOF'
ignored:
- DL3008 # Pin versions in apt get install
- DL3009 # Delete the apt-get lists after installing something
- DL3015 # Avoid additional packages by specifying --no-install-recommends
trusted-registries:
- docker.io
- ghcr.io
- quay.io
EOF
🔄 PHASE 4: CI/CD PIPELINE KURULUMU (Gün 8-10)#
🛠️ 5.1 Jenkins on Kubernetes Setup#
# 5.1.1 Jenkins namespace ve RBAC oluştur
cd ~/devops-infrastructure/kubernetes/base
mkdir -p jenkins
cat > jenkins/namespace.yaml << 'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: jenkins
labels:
name: jenkins
EOF
cat > jenkins/serviceaccount.yaml << 'EOF'
apiVersion: v1
kind: ServiceAccount
metadata:
name: jenkins
namespace: jenkins
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: jenkins
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create","delete","get","list","patch","update","watch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get","list","watch"]
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
- apiGroups: [""]
resources: ["events"]
verbs: ["get","list","watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: jenkins
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: jenkins
subjects:
- kind: ServiceAccount
name: jenkins
namespace: jenkins
EOF
cat > jenkins/pvc.yaml << 'EOF'
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: jenkins-pvc
namespace: jenkins
spec:
accessModes:
- ReadWriteOnce
storageClassName: gp3
resources:
requests:
storage: 10Gi
EOF
cat > jenkins/deployment.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: jenkins
namespace: jenkins
spec:
replicas: 1
selector:
matchLabels:
app: jenkins
template:
metadata:
labels:
app: jenkins
spec:
serviceAccountName: jenkins
containers:
- name: jenkins
image: jenkins/jenkins:2.414.1-lts-jdk11
ports:
- containerPort: 8080
- containerPort: 50000
env:
- name: JAVA_OPTS
value: "-Xmx2048m -Dhudson.slaves.NodeProvisioner.MARGIN=50 -Dhudson.slaves.NodeProvisioner.MARGIN0=0.85"
- name: JENKINS_OPTS
value: "--httpPort=8080"
volumeMounts:
- name: jenkins-home
mountPath: /var/jenkins_home
- name: docker-sock
mountPath: /var/run/docker.sock
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /login
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 5
readinessProbe:
httpGet:
path: /login
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
volumes:
- name: jenkins-home
persistentVolumeClaim:
claimName: jenkins-pvc
- name: docker-sock
hostPath:
path: /var/run/docker.sock
securityContext:
fsGroup: 1000
runAsUser: 1000
EOF
cat > jenkins/service.yaml << 'EOF'
apiVersion: v1
kind: Service
metadata:
name: jenkins
namespace: jenkins
spec:
ports:
- name: http
port: 8080
targetPort: 8080
- name: jnlp
port: 50000
targetPort: 50000
selector:
app: jenkins
type: ClusterIP
EOF
cat > jenkins/ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: jenkins
namespace: jenkins
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/proxy-request-buffering: "off"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- jenkins.yourdomain.com
secretName: jenkins-tls
rules:
- host: jenkins.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: jenkins
port:
number: 8080
EOF
# 5.1.2 Jenkins deploy
kubectl apply -f jenkins/
kubectl get pods -n jenkins
kubectl logs -f deployment/jenkins -n jenkins
🌐 5.2 NGINX Ingress Controller Setup#
# 5.2.1 NGINX Ingress Controller kurulumu
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx
helm repo update
helm install ingress-nginx ingress-nginx/ingress-nginx \
--namespace ingress-nginx \
--create-namespace \
--set controller.service.type=LoadBalancer \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-type"="nlb" \
--set controller.service.annotations."service\.beta\.kubernetes\.io/aws-load-balancer-cross-zone-load-balancing-enabled"="true"
# 5.2.2 Ingress controller durumunu kontrol et
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx
# 5.2.3 External IP'yi al
EXTERNAL_IP=$(kubectl get svc ingress-nginx-controller -n ingress-nginx -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')
echo "External LoadBalancer: $EXTERNAL_IP"
🔐 5.3 Cert-Manager Setup (SSL/TLS)#
# 5.3.1 Cert-Manager kurulumu
helm repo add jetstack https://charts.jetstack.io
helm repo update
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.13.0/cert-manager.crds.yaml
helm install cert-manager jetstack/cert-manager \
--namespace cert-manager \
--create-namespace \
--version v1.13.0
# 5.3.2 Let's Encrypt ClusterIssuer
cat > ~/devops-infrastructure/kubernetes/base/cert-manager-issuer.yaml << 'EOF'
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-staging
spec:
acme:
server: https://acme-staging-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-staging
solvers:
- http01:
ingress:
class: nginx
---
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: admin@yourdomain.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
EOF
kubectl apply -f ~/devops-infrastructure/kubernetes/base/cert-manager-issuer.yaml
# 5.3.3 Cert-manager durumunu kontrol et
kubectl get pods -n cert-manager
kubectl get clusterissuers
🔧 5.4 Jenkins Initial Setup#
# 5.4.1 Jenkins admin password'unu al
kubectl exec -n jenkins -it deployment/jenkins -- cat /var/jenkins_home/secrets/initialAdminPassword
# 5.4.2 Jenkins URL'sine eriş (port-forward ile)
kubectl port-forward -n jenkins svc/jenkins 8080:8080
# 5.4.3 Jenkins Initial Setup (Browser üzerinden)
# http://localhost:8080
# - Initial password gir
# - Suggested plugins install et
# - Admin user oluştur
# - Jenkins URL'yi ayarla
# 5.4.4 Essential Jenkins plugins kurulumu (Browser üzerinden)
# Manage Jenkins -> Manage Plugins -> Available
# - Blue Ocean
# - Pipeline
# - Git Pipeline for Blue Ocean
# - Docker Pipeline
# - Kubernetes CLI
# - GitHub Integration
# - Slack Notification
# - Build Timestamp
# - AnsiColor
# - Workspace Cleanup
📝 5.5 Jenkins Pipeline as Code#
# 5.5.1 Shared Pipeline Library oluştur
mkdir -p ~/devops-infrastructure/jenkins/shared-library/{vars,src,resources}
cat > ~/devops-infrastructure/jenkins/shared-library/vars/buildAndPush.groovy << 'EOF'
def call(Map config) {
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: docker
image: docker:latest
command:
- cat
tty: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
- name: kubectl
image: bitnami/kubectl:latest
command:
- cat
tty: true
- name: helm
image: alpine/helm:latest
command:
- cat
tty: true
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
"""
}
}
environment {
DOCKER_REGISTRY = 'ghcr.io'
IMAGE_NAME = "${config.imageName}"
GIT_COMMIT_SHORT = sh(script: "git rev-parse --short HEAD", returnStdout: true).trim()
BUILD_VERSION = "${env.BUILD_NUMBER}-${GIT_COMMIT_SHORT}"
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Build Info') {
steps {
script {
currentBuild.displayName = "#${env.BUILD_NUMBER} - ${BUILD_VERSION}"
currentBuild.description = "Branch: ${env.BRANCH_NAME}"
}
}
}
stage('Lint Dockerfile') {
steps {
container('docker') {
sh '''
echo "🔍 Linting Dockerfile..."
# Dockerfile linting would go here
'''
}
}
}
stage('Build Docker Image') {
steps {
container('docker') {
script {
def image = docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION}")
docker.withRegistry("https://${DOCKER_REGISTRY}", 'github-registry-credentials') {
image.push()
image.push("latest")
}
}
}
}
}
stage('Security Scan') {
steps {
container('docker') {
sh '''
echo "🛡️ Running security scan..."
# Trivy scanning would go here
'''
}
}
}
stage('Deploy to Dev') {
when {
branch 'develop'
}
steps {
container('kubectl') {
sh '''
echo "🚀 Deploying to development..."
kubectl set image deployment/${IMAGE_NAME} ${IMAGE_NAME}=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION} -n dev
kubectl rollout status deployment/${IMAGE_NAME} -n dev
'''
}
}
}
stage('Deploy to Staging') {
when {
branch 'main'
}
steps {
container('kubectl') {
sh '''
echo "🚀 Deploying to staging..."
kubectl set image deployment/${IMAGE_NAME} ${IMAGE_NAME}=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION} -n staging
kubectl rollout status deployment/${IMAGE_NAME} -n staging
'''
}
}
}
stage('Deploy to Production') {
when {
buildingTag()
}
steps {
script {
timeout(time: 5, unit: 'MINUTES') {
input message: 'Deploy to production?', ok: 'Deploy'
}
}
container('kubectl') {
sh '''
echo "🚀 Deploying to production..."
kubectl set image deployment/${IMAGE_NAME} ${IMAGE_NAME}=${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION} -n production
kubectl rollout status deployment/${IMAGE_NAME} -n production
'''
}
}
}
}
post {
success {
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ ${IMAGE_NAME} v${BUILD_VERSION} deployed successfully to ${env.BRANCH_NAME}"
)
}
failure {
slackSend(
channel: '#deployments',
color: 'danger',
message: "❌ ${IMAGE_NAME} v${BUILD_VERSION} deployment failed on ${env.BRANCH_NAME}"
)
}
}
}
}
EOF
# 5.5.2 Sample application Jenkinsfile
cat > ~/devops-infrastructure/jenkins/sample-Jenkinsfile << 'EOF'
@Library('shared-library') _
buildAndPush([
imageName: 'mycompany/sample-app'
])
EOF
🔐 5.6 Jenkins Credentials Setup#
# 5.6.1 GitHub credentials secret oluştur
kubectl create secret generic github-registry-credentials \
--from-literal=username=YOUR_GITHUB_USERNAME \
--from-literal=password=YOUR_GITHUB_TOKEN \
--namespace=jenkins
# 5.6.2 AWS credentials secret oluştur
kubectl create secret generic aws-credentials \
--from-literal=access-key-id=YOUR_AWS_ACCESS_KEY \
--from-literal=secret-access-key=YOUR_AWS_SECRET_KEY \
--namespace=jenkins
# 5.6.3 Jenkins'te credentials ekle (Browser üzerinden)
# Manage Jenkins -> Manage Credentials -> Global -> Add Credentials
# - GitHub Token: Kind=Username with password, ID=github-registry-credentials
# - AWS Credentials: Kind=AWS Credentials, ID=aws-credentials
# - Kubeconfig: Kind=Secret file, ID=kubeconfig
☸️ PHASE 5: KUBERNETES ADVANCED SETUP (Gün 11-13)#
🏷️ 6.1 Namespace ve RBAC Setup#
# 6.1.1 Environment namespaces oluştur
cd ~/devops-infrastructure/kubernetes/base
cat > namespaces.yaml << 'EOF'
apiVersion: v1
kind: Namespace
metadata:
name: dev
labels:
environment: dev
istio-injection: enabled
---
apiVersion: v1
kind: Namespace
metadata:
name: staging
labels:
environment: staging
istio-injection: enabled
---
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
environment: production
istio-injection: enabled
---
apiVersion: v1
kind: Namespace
metadata:
name: monitoring
labels:
environment: monitoring
istio-injection: disabled
---
apiVersion: v1
kind: Namespace
metadata:
name: logging
labels:
environment: logging
istio-injection: disabled
EOF
kubectl apply -f namespaces.yaml
# 6.1.2 RBAC setup
cat > rbac.yaml << 'EOF'
# Developer Role - dev namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: dev
name: developer
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps", "secrets"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
- apiGroups: ["apps"]
resources: ["deployments", "replicasets"]
verbs: ["get", "list", "create", "update", "patch", "delete"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
# Staging Role - staging namespace
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: staging
name: staging-deployer
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "update", "patch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
---
# Production Role - production namespace (read-only + deploy)
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: production-deployer
rules:
- apiGroups: [""]
resources: ["pods", "services", "configmaps"]
verbs: ["get", "list"]
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "list", "update", "patch"]
- apiGroups: [""]
resources: ["pods/log"]
verbs: ["get", "list"]
---
# ServiceAccount for developers
apiVersion: v1
kind: ServiceAccount
metadata:
name: developer
namespace: dev
---
# ServiceAccount for staging
apiVersion: v1
kind: ServiceAccount
metadata:
name: staging-deployer
namespace: staging
---
# ServiceAccount for production
apiVersion: v1
kind: ServiceAccount
metadata:
name: production-deployer
namespace: production
---
# RoleBinding for developers
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: developer-binding
namespace: dev
subjects:
- kind: ServiceAccount
name: developer
namespace: dev
roleRef:
kind: Role
name: developer
apiGroup: rbac.authorization.k8s.io
---
# RoleBinding for staging
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: staging-deployer-binding
namespace: staging
subjects:
- kind: ServiceAccount
name: staging-deployer
namespace: staging
roleRef:
kind: Role
name: staging-deployer
apiGroup: rbac.authorization.k8s.io
---
# RoleBinding for production
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: production-deployer-binding
namespace: production
subjects:
- kind: ServiceAccount
name: production-deployer
namespace: production
roleRef:
kind: Role
name: production-deployer
apiGroup: rbac.authorization.k8s.io
EOF
kubectl apply -f rbac.yaml
📦 6.2 StorageClass ve Persistent Volumes#
# 6.2.1 StorageClass definitions
cat > storage-classes.yaml << 'EOF'
# GP3 StorageClass (default)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: ebs.csi.aws.com
parameters:
type: gp3
fsType: ext4
encrypted: "true"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# GP3 Fast StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gp3-fast
provisioner: ebs.csi.aws.com
parameters:
type: gp3
fsType: ext4
encrypted: "true"
iops: "4000"
throughput: "250"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
---
# IO1 StorageClass (high performance)
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: io1
provisioner: ebs.csi.aws.com
parameters:
type: io1
fsType: ext4
encrypted: "true"
iops: "1000"
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
EOF
kubectl apply -f storage-classes.yaml
kubectl get storageclass
🔧 6.3 Horizontal Pod Autoscaler (HPA) Setup#
# 6.3.1 Metrics Server kurulumu
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
# Metrics server düzeltmesi (EKS için)
kubectl patch deployment metrics-server -n kube-system --type='json' -p='[
{
"op": "add",
"path": "/spec/template/spec/containers/0/args/-",
"value": "--kubelet-insecure-tls"
}
]'
# 6.3.2 HPA template
cat > hpa-template.yaml << 'EOF'
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: sample-app-hpa
namespace: dev
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 50
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 30
EOF
🔄 6.4 Cluster Autoscaler Setup#
# 6.4.1 Cluster Autoscaler kurulumu
cat > cluster-autoscaler.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: cluster-autoscaler
namespace: kube-system
labels:
app: cluster-autoscaler
spec:
selector:
matchLabels:
app: cluster-autoscaler
template:
metadata:
labels:
app: cluster-autoscaler
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8085'
spec:
serviceAccountName: cluster-autoscaler
containers:
- image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.28.0
name: cluster-autoscaler
resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 100m
memory: 300Mi
command:
- ./cluster-autoscaler
- --v=4
- --stderrthreshold=info
- --cloud-provider=aws
- --skip-nodes-with-local-storage=false
- --expander=least-waste
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/CLUSTER_NAME
env:
- name: AWS_REGION
value: eu-west-1
volumeMounts:
- name: ssl-certs
mountPath: /etc/ssl/certs/ca-certificates.crt
readOnly: true
imagePullPolicy: "Always"
volumes:
- name: ssl-certs
hostPath:
path: "/etc/ssl/certs/ca-bundle.crt"
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
name: cluster-autoscaler
namespace: kube-system
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/cluster-autoscaler
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
rules:
- apiGroups: [""]
resources: ["events", "endpoints"]
verbs: ["create", "patch"]
- apiGroups: [""]
resources: ["pods/eviction"]
verbs: ["create"]
- apiGroups: [""]
resources: ["pods/status"]
verbs: ["update"]
- apiGroups: [""]
resources: ["endpoints"]
resourceNames: ["cluster-autoscaler"]
verbs: ["get", "update"]
- apiGroups: [""]
resources: ["nodes"]
verbs: ["watch", "list", "get", "update"]
- apiGroups: [""]
resources: ["pods", "services", "replicationcontrollers", "persistentvolumeclaims", "persistentvolumes"]
verbs: ["watch", "list", "get"]
- apiGroups: ["extensions"]
resources: ["replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["policy"]
resources: ["poddisruptionbudgets"]
verbs: ["watch", "list"]
- apiGroups: ["apps"]
resources: ["statefulsets", "replicasets", "daemonsets"]
verbs: ["watch", "list", "get"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses", "csinodes", "csidrivers", "csistoragecapacities"]
verbs: ["watch", "list", "get"]
- apiGroups: ["batch", "extensions"]
resources: ["jobs"]
verbs: ["get", "list", "watch", "patch"]
- apiGroups: ["coordination.k8s.io"]
resources: ["leases"]
verbs: ["create"]
- apiGroups: ["coordination.k8s.io"]
resourceNames: ["cluster-autoscaler"]
resources: ["leases"]
verbs: ["get", "update"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: cluster-autoscaler
labels:
k8s-addon: cluster-autoscaler.addons.k8s.io
k8s-app: cluster-autoscaler
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-autoscaler
subjects:
- kind: ServiceAccount
name: cluster-autoscaler
namespace: kube-system
EOF
# CLUSTER_NAME'i gerçek cluster ismiyle değiştir
sed -i 's/CLUSTER_NAME/mycompany-dev-eks/g' cluster-autoscaler.yaml
kubectl apply -f cluster-autoscaler.yaml
🔒 6.5 Network Policies#
# 6.5.1 Calico CNI kurulumu (Network Policies için)
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml
# Calico configuration
cat > calico-config.yaml << 'EOF'
apiVersion: operator.tigera.io/v1
kind: Installation
metadata:
name: default
spec:
calicoNetwork:
ipPools:
- blockSize: 26
cidr: 192.168.0.0/16
encapsulation: VXLANCrossSubnet
natOutgoing: Enabled
nodeSelector: all()
EOF
kubectl apply -f calico-config.yaml
# 6.5.2 Network Policy templates
cat > network-policies.yaml << 'EOF'
# Default deny all ingress traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-ingress
namespace: dev
spec:
podSelector: {}
policyTypes:
- Ingress
---
# Allow ingress from same namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-same-namespace
namespace: dev
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: dev
---
# Allow ingress from ingress-nginx
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-ingress-nginx
namespace: dev
spec:
podSelector:
matchLabels:
app: frontend
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
ports:
- protocol: TCP
port: 8080
---
# Allow database access only from backend
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: database-access
namespace: dev
spec:
podSelector:
matchLabels:
app: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
app: backend
ports:
- protocol: TCP
port: 5432
---
# Allow monitoring namespace access
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-monitoring
namespace: dev
spec:
podSelector: {}
policyTypes:
- Ingress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: monitoring
ports:
- protocol: TCP
port: 8080
- protocol: TCP
port: 9090
EOF
kubectl apply -f network-policies.yaml
📊 PHASE 6: OBSERVABILITY STACK (Gün 14-16)#
📈 7.1 Prometheus & Grafana Setup#
# 7.1.1 kube-prometheus-stack kurulumu
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# Custom values.yaml oluştur
cat > monitoring-values.yaml << 'EOF'
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
retention: 15d
resources:
requests:
memory: 2Gi
cpu: 1000m
limits:
memory: 4Gi
cpu: 2000m
additionalScrapeConfigs: |
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
alertmanager:
alertmanagerSpec:
storage:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 5Gi
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
config:
global:
slack_api_url: 'YOUR_SLACK_WEBHOOK_URL'
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'web.hook'
routes:
- match:
alertname: DeadMansSwitch
receiver: 'null'
- match_re:
severity: critical|warning
receiver: 'slack-notifications'
receivers:
- name: 'null'
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
- name: 'slack-notifications'
slack_configs:
- channel: '#alerts'
title: 'Cluster Alert - {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
send_resolved: true
grafana:
adminPassword: 'AdminPassword123!'
persistence:
enabled: true
storageClassName: gp3
size: 10Gi
resources:
requests:
memory: 256Mi
cpu: 100m
limits:
memory: 512Mi
cpu: 200m
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
orgId: 1
folder: ''
type: file
disableDeletion: false
editable: true
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
kubernetes-cluster-overview:
gnetId: 7249
revision: 1
datasource: Prometheus
kubernetes-pod-overview:
gnetId: 6417
revision: 1
datasource: Prometheus
nginx-ingress-controller:
gnetId: 9614
revision: 1
datasource: Prometheus
node-exporter:
gnetId: 1860
revision: 31
datasource: Prometheus
nodeExporter:
enabled: true
kubeStateMetrics:
enabled: true
defaultRules:
create: true
rules:
alertmanager: true
etcd: true
configReloaders: true
general: true
k8s: true
kubeApiserverAvailability: true
kubeApiserverBurnrate: true
kubeApiserverHistogram: true
kubeApiserverSlos: true
kubelet: true
kubeProxy: true
kubePrometheusGeneral: true
kubePrometheusNodeRecording: true
kubernetesApps: true
kubernetesResources: true
kubernetesStorage: true
kubernetesSystem: true
network: true
node: true
nodeExporterAlerting: true
nodeExporterRecording: true
prometheus: true
prometheusOperator: true
EOF
# Monitoring namespace'i oluştur ve kube-prometheus-stack'i kur
kubectl create namespace monitoring
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--values monitoring-values.yaml
# 7.1.2 Monitoring durumunu kontrol et
kubectl get pods -n monitoring
kubectl get svc -n monitoring
# 7.1.3 Grafana ingress
cat > grafana-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: monitoring
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- grafana.yourdomain.com
secretName: grafana-tls
rules:
- host: grafana.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: kube-prometheus-stack-grafana
port:
number: 80
EOF
kubectl apply -f grafana-ingress.yaml
📝 7.2 Centralized Logging Setup#
# 7.2.1 OpenSearch (Elasticsearch alternative) kurulumu
helm repo add opensearch https://opensearch-project.github.io/helm-charts/
helm repo update
cat > opensearch-values.yaml << 'EOF'
clusterName: "opensearch-cluster"
nodeGroup: "master"
roles:
- master
- ingest
- data
replicas: 3
opensearchJavaOpts: "-Xmx1g -Xms1g"
resources:
requests:
cpu: "500m"
memory: "1Gi"
limits:
cpu: "1000m"
memory: "2Gi"
persistence:
enabled: true
size: 30Gi
storageClass: gp3
config:
opensearch.yml: |
cluster.name: opensearch-cluster
network.host: 0.0.0.0
plugins:
security:
ssl:
transport:
pemcert_filepath: esnode.pem
pemkey_filepath: esnode-key.pem
pemtrustedcas_filepath: root-ca.pem
enforce_hostname_verification: false
http:
enabled: false
allow_unsafe_democertificates: true
allow_default_init_securityindex: true
authcz:
admin_dn:
- CN=kirk,OU=client,O=client,L=test,C=de
audit.type: internal_opensearch
enable_snapshot_restore_privilege: true
check_snapshot_restore_write_privileges: true
restapi:
roles_enabled: ["all_access", "security_rest_api_access"]
system_indices:
enabled: true
indices:
[
".opendistro-alerting-config",
".opendistro-alerting-alert*",
".opendistro-anomaly-results*",
".opendistro-anomaly-detector*",
".opendistro-anomaly-checkpoints",
".opendistro-anomaly-detection-state",
".opendistro-reports-*",
".opendistro-notifications-*",
".opendistro-notebooks",
".opendistro-asynchronous-search-response*",
]
EOF
kubectl create namespace logging
helm install opensearch opensearch/opensearch \
--namespace logging \
--values opensearch-values.yaml
# 7.2.2 OpenSearch Dashboards kurulumu
cat > opensearch-dashboards-values.yaml << 'EOF'
replicaCount: 1
opensearchHosts: "https://opensearch-cluster-master:9200"
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "500m"
memory: "1Gi"
config:
opensearch_dashboards.yml: |
server.name: opensearch-dashboards
server.host: 0.0.0.0
opensearch.hosts: [https://opensearch-cluster-master:9200]
opensearch.ssl.verificationMode: none
opensearch.username: admin
opensearch.password: admin
opensearch.requestHeadersAllowlist: [authorization, securitytenant]
opensearch_security.multitenancy.enabled: true
opensearch_security.multitenancy.tenants.preferred: [Private, Global]
opensearch_security.readonly_mode.roles: [kibana_read_only]
opensearch_security.cookie.secure: false
EOF
helm install opensearch-dashboards opensearch/opensearch-dashboards \
--namespace logging \
--values opensearch-dashboards-values.yaml
# 7.2.3 Fluent Bit kurulumu
cat > fluent-bit-values.yaml << 'EOF'
daemonSetVolumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: etcmachineid
hostPath:
path: /etc/machine-id
type: File
daemonSetVolumeMounts:
- name: varlog
mountPath: /var/log
readOnly: true
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
- name: etcmachineid
mountPath: /etc/machine-id
readOnly: true
config:
service: |
[SERVICE]
Daemon Off
Flush {{ .Values.flush }}
Log_Level {{ .Values.logLevel }}
Parsers_File parsers.conf
Parsers_File custom_parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port {{ .Values.metricsPort }}
Health_Check On
inputs: |
[INPUT]
Name tail
Path /var/log/containers/*.log
multiline.parser docker, cri
Tag kube.*
Mem_Buf_Limit 50MB
Skip_Long_Lines On
[INPUT]
Name systemd
Tag host.*
Systemd_Filter _SYSTEMD_UNIT=kubelet.service
Read_From_Tail On
filters: |
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
Annotations Off
Labels On
[FILTER]
Name nest
Match kube.*
Operation lift
Nested_under kubernetes
Add_prefix kubernetes_
[FILTER]
Name modify
Match kube.*
Remove kubernetes_pod_id
Remove kubernetes_docker_id
Remove kubernetes_container_hash
outputs: |
[OUTPUT]
Name opensearch
Match kube.*
Host opensearch-cluster-master.logging.svc.cluster.local
Port 9200
Index fluentbit
Type _doc
HTTP_User admin
HTTP_Passwd admin
tls On
tls.verify Off
Suppress_Type_Name On
Replace_Dots On
[OUTPUT]
Name opensearch
Match host.*
Host opensearch-cluster-master.logging.svc.cluster.local
Port 9200
Index fluentbit-systemd
Type _doc
HTTP_User admin
HTTP_Passwd admin
tls On
tls.verify Off
Suppress_Type_Name On
Replace_Dots On
EOF
helm repo add fluent https://fluent.github.io/helm-charts
helm install fluent-bit fluent/fluent-bit \
--namespace logging \
--values fluent-bit-values.yaml
# 7.2.4 OpenSearch Dashboards ingress
cat > opensearch-dashboards-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: opensearch-dashboards
namespace: logging
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "HTTP"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- logs.yourdomain.com
secretName: opensearch-dashboards-tls
rules:
- host: logs.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: opensearch-dashboards
port:
number: 5601
EOF
kubectl apply -f opensearch-dashboards-ingress.yaml
🔍 7.3 Distributed Tracing with Jaeger#
# 7.3.1 Jaeger Operator kurulumu
kubectl create namespace observability
kubectl apply -f https://github.com/jaegertracing/jaeger-operator/releases/download/v1.47.0/jaeger-operator.yaml -n observability
# 7.3.2 Jaeger instance
cat > jaeger.yaml << 'EOF'
apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
name: jaeger
namespace: observability
spec:
strategy: production
storage:
type: opensearch
opensearch:
serverUrls: https://opensearch-cluster-master.logging.svc.cluster.local:9200
username: admin
password: admin
tls:
insecureSkipVerify: true
collector:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
query:
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
ingress:
enabled: true
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- jaeger.yourdomain.com
tls:
- secretName: jaeger-tls
hosts:
- jaeger.yourdomain.com
EOF
kubectl apply -f jaeger.yaml
# 7.3.3 OpenTelemetry Collector kurulumu
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update
cat > otel-collector-values.yaml << 'EOF'
mode: daemonset
presets:
logsCollection:
enabled: true
hostMetrics:
enabled: true
kubernetesAttributes:
enabled: true
config:
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
thrift_compact:
endpoint: 0.0.0.0:6831
zipkin:
endpoint: 0.0.0.0:9411
processors:
batch: {}
memory_limiter:
limit_mib: 400
resource:
attributes:
- key: cluster.name
value: mycompany-dev-eks
action: insert
exporters:
jaeger:
endpoint: jaeger-collector.observability.svc.cluster.local:14250
tls:
insecure: true
prometheus:
endpoint: "0.0.0.0:8889"
const_labels:
cluster: mycompany-dev-eks
service:
pipelines:
traces:
receivers: [otlp, jaeger, zipkin]
processors: [memory_limiter, resource, batch]
exporters: [jaeger]
metrics:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: [prometheus]
logs:
receivers: [otlp]
processors: [memory_limiter, resource, batch]
exporters: []
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 256m
memory: 512Mi
EOF
helm install opentelemetry-collector open-telemetry/opentelemetry-collector \
--namespace observability \
--values otel-collector-values.yaml
🔒 PHASE 7: SECRETS MANAGEMENT & SECURITY (Gün 17-18)#
🔐 8.1 HashiCorp Vault Setup#
# 8.1.1 Vault Helm kurulumu
helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update
cat > vault-values.yaml << 'EOF'
global:
enabled: true
tlsDisable: false
injector:
enabled: true
replicas: 1
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
server:
image:
repository: "vault"
tag: "1.15.0"
resources:
requests:
memory: 256Mi
cpu: 250m
limits:
memory: 256Mi
cpu: 250m
readinessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true&sealedcode=204&uninitcode=204"
livenessProbe:
enabled: true
path: "/v1/sys/health?standbyok=true"
initialDelaySeconds: 60
extraEnvironmentVars:
VAULT_CACERT: /vault/userconfig/vault-ha-tls/vault.ca
VAULT_TLSCERT: /vault/userconfig/vault-ha-tls/vault.crt
VAULT_TLSKEY: /vault/userconfig/vault-ha-tls/vault.key
extraVolumes:
- type: secret
name: vault-ha-tls
path: /vault/userconfig
standalone:
enabled: false
ha:
enabled: true
replicas: 3
raft:
enabled: true
setNodeId: true
config: |
ui = true
listener "tcp" {
tls_disable = 0
address = "[::]:8200"
cluster_address = "[::]:8201"
tls_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
tls_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
tls_client_ca_file = "/vault/userconfig/vault-ha-tls/vault.ca"
}
storage "raft" {
path = "/vault/data"
retry_join {
leader_api_addr = "https://vault-0.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
}
retry_join {
leader_api_addr = "https://vault-1.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
}
retry_join {
leader_api_addr = "https://vault-2.vault-internal:8200"
leader_ca_cert_file = "/vault/userconfig/vault-ha-tls/vault.ca"
leader_client_cert_file = "/vault/userconfig/vault-ha-tls/vault.crt"
leader_client_key_file = "/vault/userconfig/vault-ha-tls/vault.key"
}
}
service_registration "kubernetes" {}
service:
enabled: true
type: ClusterIP
port: 8200
targetPort: 8200
dataStorage:
enabled: true
size: 10Gi
storageClass: gp3
auditStorage:
enabled: true
size: 10Gi
storageClass: gp3
ui:
enabled: true
serviceType: ClusterIP
EOF
# 8.1.2 TLS sertifikaları oluştur
mkdir -p vault-tls
cd vault-tls
# CA private key
openssl genrsa -out vault-ca.key 2048
# CA certificate
openssl req -new -x509 -key vault-ca.key -out vault-ca.crt -days 365 \
-subj "/C=US/ST=CA/L=San Francisco/O=HashiCorp/CN=Vault CA"
# Vault private key
openssl genrsa -out vault.key 2048
# Vault certificate signing request
cat > vault.conf << 'EOF'
[req]
distinguished_name = req_distinguished_name
req_extensions = v3_req
prompt = no
[req_distinguished_name]
C = US
ST = CA
L = San Francisco
O = HashiCorp
CN = vault
[v3_req]
keyUsage = keyEncipherment, dataEncipherment
extendedKeyUsage = serverAuth
subjectAltName = @alt_names
[alt_names]
DNS.1 = vault
DNS.2 = vault.vault
DNS.3 = vault.vault.svc
DNS.4 = vault.vault.svc.cluster.local
DNS.5 = vault-0.vault-internal
DNS.6 = vault-1.vault-internal
DNS.7 = vault-2.vault-internal
DNS.8 = vault-0.vault-internal.vault.svc.cluster.local
DNS.9 = vault-1.vault-internal.vault.svc.cluster.local
DNS.10 = vault-2.vault-internal.vault.svc.cluster.local
DNS.11 = vault.yourdomain.com
IP.1 = 127.0.0.1
EOF
openssl req -new -key vault.key -out vault.csr -config vault.conf
# Vault certificate
openssl x509 -req -in vault.csr -CA vault-ca.crt -CAkey vault-ca.key \
-CAcreateserial -out vault.crt -days 365 -extensions v3_req -extfile vault.conf
# 8.1.3 Vault namespace ve TLS secret oluştur
kubectl create namespace vault
kubectl create secret generic vault-ha-tls \
--from-file=vault.key=vault.key \
--from-file=vault.crt=vault.crt \
--from-file=vault.ca=vault-ca.crt \
--namespace vault
cd ..
# 8.1.4 Vault kurulumu
helm install vault hashicorp/vault \
--namespace vault \
--values vault-values.yaml
# 8.1.5 Vault'u initialize et ve unseal et
kubectl exec vault-0 -n vault -- vault operator init \
-key-shares=5 \
-key-threshold=3 \
-format=json > cluster-keys.json
# Root token ve unseal key'leri çıkar
VAULT_UNSEAL_KEY_1=$(cat cluster-keys.json | jq -r ".unseal_keys_b64[0]")
VAULT_UNSEAL_KEY_2=$(cat cluster-keys.json | jq -r ".unseal_keys_b64[1]")
VAULT_UNSEAL_KEY_3=$(cat cluster-keys.json | jq -r ".unseal_keys_b64[2]")
CLUSTER_ROOT_TOKEN=$(cat cluster-keys.json | jq -r ".root_token")
# Vault unseal
kubectl exec vault-0 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_1
kubectl exec vault-0 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_2
kubectl exec vault-0 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_3
# Diğer node'ları join et
kubectl exec vault-1 -n vault -- vault operator raft join https://vault-0.vault-internal:8200
kubectl exec vault-1 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_1
kubectl exec vault-1 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_2
kubectl exec vault-1 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_3
kubectl exec vault-2 -n vault -- vault operator raft join https://vault-0.vault-internal:8200
kubectl exec vault-2 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_1
kubectl exec vault-2 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_2
kubectl exec vault-2 -n vault -- vault operator unseal $VAULT_UNSEAL_KEY_3
echo "Root Token: $CLUSTER_ROOT_TOKEN"
🔧 8.2 External Secrets Operator#
# 8.2.1 External Secrets Operator kurulumu
helm repo add external-secrets https://charts.external-secrets.io
helm repo update
helm install external-secrets external-secrets/external-secrets \
--namespace external-secrets \
--create-namespace
# 8.2.2 Vault'ta Kubernetes auth method aktifleştir
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault auth enable kubernetes
# Service account token path'ini al
TOKEN_REVIEW_JWT=$(kubectl get secret \
$(kubectl get serviceaccount vault -n vault -o jsonpath='{.secrets[0].name}') \
-n vault -o jsonpath='{.data.token}' | base64 --decode)
KUBE_CA_CERT=$(kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.certificate-authority-data}' | base64 --decode)
KUBE_HOST=$(kubectl config view --raw --minify --flatten -o jsonpath='{.clusters[].cluster.server}')
# Kubernetes auth method konfigüre et
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault write auth/kubernetes/config \
token_reviewer_jwt="$TOKEN_REVIEW_JWT" \
kubernetes_host="$KUBE_HOST" \
kubernetes_ca_cert="$KUBE_CA_CERT"
# 8.2.3 Vault policy ve role oluştur
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault policy write mycompany-dev - <<EOF
path "secret/data/dev/*" {
capabilities = ["read"]
}
EOF
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault write auth/kubernetes/role/mycompany-dev \
bound_service_account_names=external-secrets \
bound_service_account_namespaces=dev \
policies=mycompany-dev \
ttl=24h
# 8.2.4 Vault'ta secret engine aktifleştir
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault secrets enable -path=secret kv-v2
# Test secrets ekle
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault kv put secret/dev/database \
username=myapp \
password=SuperSecretPassword123!
kubectl exec vault-0 -n vault -- env VAULT_TOKEN=$CLUSTER_ROOT_TOKEN vault kv put secret/dev/api-keys \
github-token=ghp_xxxxxxxxxxxx \
slack-webhook=https://hooks.slack.com/services/xxx
# 8.2.5 SecretStore oluştur
cat > vault-secret-store.yaml << 'EOF'
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-backend
namespace: dev
spec:
provider:
vault:
server: "https://vault.vault.svc.cluster.local:8200"
path: "secret"
version: "v2"
caBundle: "LS0tLS1CRUdJTi..." # Base64 encoded CA cert
auth:
kubernetes:
mountPath: "kubernetes"
role: "mycompany-dev"
serviceAccountRef:
name: "external-secrets"
EOF
# CA cert'i base64 encode et
CA_BUNDLE=$(cat vault-tls/vault-ca.crt | base64 -w 0)
sed -i "s/LS0tLS1CRUdJTi.../$CA_BUNDLE/g" vault-secret-store.yaml
kubectl apply -f vault-secret-store.yaml
# 8.2.6 ExternalSecret oluştur
cat > external-secret-database.yaml << 'EOF'
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: database-credentials
namespace: dev
spec:
refreshInterval: 1m
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: database-secret
creationPolicy: Owner
data:
- secretKey: username
remoteRef:
key: secret/dev/database
property: username
- secretKey: password
remoteRef:
key: secret/dev/database
property: password
EOF
kubectl apply -f external-secret-database.yaml
# Secret'in oluştuğunu kontrol et
kubectl get secrets -n dev
kubectl describe externalsecret database-credentials -n dev
🛡️ 8.3 Pod Security Standards#
# 8.3.1 Pod Security Standards uygula
kubectl label --overwrite namespace dev pod-security.kubernetes.io/enforce=restricted
kubectl label --overwrite namespace dev pod-security.kubernetes.io/audit=restricted
kubectl label --overwrite namespace dev pod-security.kubernetes.io/warn=restricted
kubectl label --overwrite namespace staging pod-security.kubernetes.io/enforce=restricted
kubectl label --overwrite namespace staging pod-security.kubernetes.io/audit=restricted
kubectl label --overwrite namespace staging pod-security.kubernetes.io/warn=restricted
kubectl label --overwrite namespace production pod-security.kubernetes.io/enforce=restricted
kubectl label --overwrite namespace production pod-security.kubernetes.io/audit=restricted
kubectl label --overwrite namespace production pod-security.kubernetes.io/warn=restricted
# 8.3.2 Security context template
cat > security-context-template.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: secure-app
namespace: dev
spec:
replicas: 1
selector:
matchLabels:
app: secure-app
template:
metadata:
labels:
app: secure-app
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
seccompProfile:
type: RuntimeDefault
containers:
- name: app
image: nginx:alpine
ports:
- containerPort: 8080
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
capabilities:
drop:
- ALL
add:
- NET_BIND_SERVICE
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
volumeMounts:
- name: tmp
mountPath: /tmp
- name: var-cache-nginx
mountPath: /var/cache/nginx
- name: var-run
mountPath: /var/run
volumes:
- name: tmp
emptyDir: {}
- name: var-cache-nginx
emptyDir: {}
- name: var-run
emptyDir: {}
EOF
🔍 8.4 Falco Runtime Security#
# 8.4.1 Falco kurulumu
helm repo add falcosecurity https://falcosecurity.github.io/charts
helm repo update
cat > falco-values.yaml << 'EOF'
falco:
rules_file:
- /etc/falco/falco_rules.yaml
- /etc/falco/falco_rules.local.yaml
- /etc/falco/k8s_audit_rules.yaml
- /etc/falco/rules.d
time_format_iso_8601: true
json_output: true
json_include_output_property: true
json_include_tags_property: true
log_stderr: true
log_syslog: true
log_level: info
priority: debug
buffered_outputs: false
syscall_event_drops:
actions:
- log
- alert
rate: 0.03333
max_burst: 1000
outputs:
rate: 1
max_burst: 1000
syslog_output:
enabled: true
file_output:
enabled: false
stdout_output:
enabled: true
webserver:
enabled: true
listen_port: 8765
k8s_healthz_endpoint: /healthz
ssl_enabled: false
ssl_certificate: /etc/ssl/falco/falco.pem
grpc:
enabled: false
grpc_output:
enabled: false
customRules:
custom-rules.yaml: |-
- rule: Unexpected outbound connection destination
desc: Detect outbound connections to unexpected destinations
condition: >
outbound and not
(fd.sip in (internal_networks))
output: Outbound connection to unexpected destination (command=%proc.cmdline dest=%fd.rip)
priority: WARNING
tags: [network, mitre_exfiltration]
- rule: Suspicious process in container
desc: Detect suspicious processes running in containers
condition: >
spawned_process and container and
(proc.name in (nc, ncat, netcat, nmap, dig, nslookup, tcpdump))
output: Suspicious process in container (command=%proc.cmdline container=%container.name)
priority: WARNING
tags: [process, container]
driver:
enabled: true
kind: ebpf
collectors:
enabled: true
docker:
enabled: true
containerd:
enabled: true
crio:
enabled: false
resources:
requests:
cpu: 100m
memory: 512Mi
limits:
cpu: 200m
memory: 1024Mi
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
- effect: NoSchedule
key: node-role.kubernetes.io/control-plane
falcosidekick:
enabled: true
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
config:
slack:
webhookurl: "YOUR_SLACK_WEBHOOK_URL"
channel: "#security-alerts"
username: "Falco"
minimumpriority: "warning"
messageformat: "long"
alertmanager:
hostport: "http://kube-prometheus-stack-alertmanager.monitoring.svc.cluster.local:9093"
minimumpriority: "warning"
EOF
kubectl create namespace falco
helm install falco falcosecurity/falco \
--namespace falco \
--values falco-values.yaml
# 8.4.2 Falco durumunu kontrol et
kubectl get pods -n falco
kubectl logs -l app.kubernetes.io/name=falco -n falco
🗄️ PHASE 8: BACKUP & DISASTER RECOVERY (Gün 19-20)#
💾 9.1 Velero Backup Setup#
# 9.1.1 AWS S3 bucket oluştur
BACKUP_BUCKET="mycompany-k8s-backups-$(openssl rand -hex 4)"
aws s3 mb s3://$BACKUP_BUCKET --region eu-west-1
# S3 bucket policy
cat > backup-bucket-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VeleroBackupAccess",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/velero-role"
},
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": "arn:aws:s3:::$BACKUP_BUCKET/*"
},
{
"Sid": "VeleroBackupList",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam::$(aws sts get-caller-identity --query Account --output text):role/velero-role"
},
"Action": [
"s3:ListBucket"
],
"Resource": "arn:aws:s3:::$BACKUP_BUCKET"
}
]
}
EOF
# IAM policy için Velero permissions
cat > velero-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"ec2:DescribeVolumes",
"ec2:DescribeSnapshots",
"ec2:CreateTags",
"ec2:CreateVolume",
"ec2:CreateSnapshot",
"ec2:DeleteSnapshot"
],
"Resource": "*"
},
{
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:DeleteObject",
"s3:PutObject",
"s3:AbortMultipartUpload",
"s3:ListMultipartUploadParts"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME/*"
]
},
{
"Effect": "Allow",
"Action": [
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::BUCKET-NAME"
]
}
]
}
EOF
sed -i "s/BUCKET-NAME/$BACKUP_BUCKET/g" velero-policy.json
# IAM policy oluştur
aws iam create-policy \
--policy-name VeleroBackupPolicy \
--policy-document file://velero-policy.json
# Service account için trust policy
cat > velero-trust-policy.json << 'EOF'
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Federated": "arn:aws:iam::ACCOUNT-ID:oidc-provider/OIDC-URL"
},
"Action": "sts:AssumeRoleWithWebIdentity",
"Condition": {
"StringEquals": {
"OIDC-URL:sub": "system:serviceaccount:velero:velero",
"OIDC-URL:aud": "sts.amazonaws.com"
}
}
}
]
}
EOF
# OIDC provider URL'ini al
OIDC_URL=$(aws eks describe-cluster --name mycompany-dev-eks --query "cluster.identity.oidc.issuer" --output text | sed 's|https://||')
ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
sed -i "s/ACCOUNT-ID/$ACCOUNT_ID/g" velero-trust-policy.json
sed -i "s/OIDC-URL/$OIDC_URL/g" velero-trust-policy.json
# IAM role oluştur
aws iam create-role \
--role-name velero-role \
--assume-role-policy-document file://velero-trust-policy.json
# Policy'yi role'e attach et
aws iam attach-role-policy \
--role-arn arn:aws:iam::$ACCOUNT_ID:role/velero-role \
--policy-arn arn:aws:iam::$ACCOUNT_ID:policy/VeleroBackupPolicy
# 9.1.2 Velero CLI kurulumu
wget https://github.com/vmware-tanzu/velero/releases/download/v1.12.0/velero-v1.12.0-linux-amd64.tar.gz
tar -xzf velero-v1.12.0-linux-amd64.tar.gz
sudo mv velero-v1.12.0-linux-amd64/velero /usr/local/bin/
rm -rf velero-v1.12.0-linux-amd64*
# 9.1.3 Velero kurulumu
cat > velero-values.yaml << EOF
configuration:
backupStorageLocation:
- name: aws
provider: aws
bucket: $BACKUP_BUCKET
config:
region: eu-west-1
volumeSnapshotLocation:
- name: aws
provider: aws
config:
region: eu-west-1
credentials:
useSecret: false
serviceAccount:
server:
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::$ACCOUNT_ID:role/velero-role
initContainers:
- name: velero-plugin-for-aws
image: velero/velero-plugin-for-aws:v1.8.0
volumeMounts:
- mountPath: /target
name: plugins
resources:
requests:
cpu: 500m
memory: 128Mi
limits:
cpu: 1000m
memory: 512Mi
nodeAgent:
resources:
requests:
cpu: 500m
memory: 512Mi
limits:
cpu: 1000m
memory: 1024Mi
schedules:
daily-backup:
disabled: false
schedule: "0 2 * * *"
template:
includedNamespaces:
- dev
- staging
- production
- monitoring
- vault
excludedResources:
- events
- events.events.k8s.io
storageLocation: aws
ttl: 720h0m0s
snapshotVolumes: true
weekly-backup:
disabled: false
schedule: "0 3 * * 0"
template:
includedNamespaces:
- dev
- staging
- production
- monitoring
- vault
excludedResources:
- events
- events.events.k8s.io
storageLocation: aws
ttl: 2160h0m0s
snapshotVolumes: true
EOF
helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
helm repo update
kubectl create namespace velero
helm install velero vmware-tanzu/velero \
--namespace velero \
--values velero-values.yaml
# 9.1.4 Manual backup test
velero backup create test-backup --include-namespaces dev
velero backup describe test-backup
velero backup logs test-backup
echo "Backup bucket: $BACKUP_BUCKET"
🔄 9.2 Database Backup Strategy#
# 9.2.1 RDS automated backup script
cat > ~/devops-infrastructure/scripts/rds-backup.sh << 'EOF'
#!/bin/bash
# RDS Backup Script
set -e
DB_IDENTIFIER="mycompany-dev-db"
BACKUP_PREFIX="manual-backup"
REGION="eu-west-1"
# Create manual snapshot
SNAPSHOT_ID="${BACKUP_PREFIX}-$(date +%Y%m%d%H%M%S)"
echo "Creating RDS snapshot: $SNAPSHOT_ID"
aws rds create-db-snapshot \
--db-instance-identifier $DB_IDENTIFIER \
--db-snapshot-identifier $SNAPSHOT_ID \
--region $REGION
# Wait for snapshot completion
echo "Waiting for snapshot completion..."
aws rds wait db-snapshot-completed \
--db-snapshot-identifier $SNAPSHOT_ID \
--region $REGION
echo "Snapshot created successfully: $SNAPSHOT_ID"
# List recent snapshots
echo "Recent snapshots:"
aws rds describe-db-snapshots \
--db-instance-identifier $DB_IDENTIFIER \
--snapshot-type manual \
--region $REGION \
--query 'DBSnapshots[0:5].[DBSnapshotIdentifier,Status,SnapshotCreateTime]' \
--output table
# Cleanup old manual snapshots (keep last 7)
OLD_SNAPSHOTS=$(aws rds describe-db-snapshots \
--db-instance-identifier $DB_IDENTIFIER \
--snapshot-type manual \
--region $REGION \
--query 'DBSnapshots[7:].DBSnapshotIdentifier' \
--output text)
if [ ! -z "$OLD_SNAPSHOTS" ]; then
echo "Cleaning up old snapshots..."
for snapshot in $OLD_SNAPSHOTS; do
echo "Deleting snapshot: $snapshot"
aws rds delete-db-snapshot \
--db-snapshot-identifier $snapshot \
--region $REGION
done
fi
echo "Backup completed successfully!"
EOF
chmod +x ~/devops-infrastructure/scripts/rds-backup.sh
# 9.2.2 PostgreSQL logical backup (for application data)
cat > ~/devops-infrastructure/scripts/postgres-logical-backup.sh << 'EOF'
#!/bin/bash
# PostgreSQL Logical Backup Script
set -e
# Configuration
DB_HOST="your-rds-endpoint"
DB_NAME="mycompanydb"
DB_USER="admin"
BACKUP_DIR="/tmp/pg-backups"
S3_BUCKET="mycompany-db-logical-backups"
DATE=$(date +%Y%m%d_%H%M%S)
# Create backup directory
mkdir -p $BACKUP_DIR
# Get password from Kubernetes secret
DB_PASSWORD=$(kubectl get secret database-secret -n dev -o jsonpath='{.data.password}' | base64 -d)
export PGPASSWORD=$DB_PASSWORD
# Create backup
echo "Creating logical backup..."
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
--verbose \
--no-password \
--format=custom \
--compress=9 \
--file=$BACKUP_DIR/logical-backup-$DATE.dump
# Upload to S3
echo "Uploading to S3..."
aws s3 cp $BACKUP_DIR/logical-backup-$DATE.dump \
s3://$S3_BUCKET/logical-backups/logical-backup-$DATE.dump
# Cleanup local file
rm $BACKUP_DIR/logical-backup-$DATE.dump
# Cleanup old S3 backups (keep last 30 days)
echo "Cleaning up old backups..."
aws s3 ls s3://$S3_BUCKET/logical-backups/ \
--recursive \
--query "Contents[?LastModified<='$(date -d '30 days ago' --iso-8601)'].Key" \
--output text | \
xargs -I {} aws s3 rm s3://$S3_BUCKET/{}
echo "Logical backup completed successfully!"
EOF
chmod +x ~/devops-infrastructure/scripts/postgres-logical-backup.sh
# 9.2.3 CronJob for automated database backups
cat > database-backup-cronjob.yaml << 'EOF'
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres-logical-backup
namespace: dev
spec:
schedule: "0 1 * * *" # Daily at 1 AM
concurrencyPolicy: Forbid
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
serviceAccountName: backup-sa
containers:
- name: backup
image: postgres:15-alpine
env:
- name: DB_HOST
value: "your-rds-endpoint"
- name: DB_NAME
value: "mycompanydb"
- name: DB_USER
value: "admin"
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: database-secret
key: password
- name: S3_BUCKET
value: "mycompany-db-logical-backups"
command:
- /bin/bash
- -c
- |
set -e
apk add --no-cache aws-cli
DATE=$(date +%Y%m%d_%H%M%S)
BACKUP_FILE="/tmp/logical-backup-$DATE.dump"
export PGPASSWORD=$DB_PASSWORD
echo "Creating logical backup..."
pg_dump -h $DB_HOST -U $DB_USER -d $DB_NAME \
--verbose \
--no-password \
--format=custom \
--compress=9 \
--file=$BACKUP_FILE
echo "Uploading to S3..."
aws s3 cp $BACKUP_FILE s3://$S3_BUCKET/logical-backups/
echo "Backup completed successfully!"
resources:
requests:
memory: "256Mi"
cpu: "100m"
limits:
memory: "512Mi"
cpu: "200m"
restartPolicy: OnFailure
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: backup-sa
namespace: dev
annotations:
eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/backup-role
EOF
kubectl apply -f database-backup-cronjob.yaml
📋 9.3 Disaster Recovery Runbook#
# 9.3.1 DR runbook oluştur
cat > ~/devops-infrastructure/docs/disaster-recovery-runbook.md << 'EOF'
# Disaster Recovery Runbook
## Overview
Bu doküman Kubernetes cluster ve RDS veritabanı için disaster recovery prosedürlerini içerir.
## RTO/RPO Targets
- **RTO (Recovery Time Objective)**: 4 saat
- **RPO (Recovery Point Objective)**: 1 saat
## Disaster Scenarios
### 1. Complete Cluster Loss
#### Assessment
```bash
# Cluster durumunu kontrol et
kubectl get nodes
kubectl get pods --all-namespaces
# AWS EKS cluster durumu
aws eks describe-cluster --name mycompany-dev-eks
Recovery Steps#
-
Yeni cluster oluştur
-
Velero restore
-
Database connectivity kontrol
2. Database Disaster#
Assessment#
# RDS status kontrol
aws rds describe-db-instances \
--db-instance-identifier mycompany-dev-db
# Connection test
kubectl run db-test --rm -i --tty \
--image=postgres:15-alpine -- \
psql -h RDS_ENDPOINT -U admin -d mycompanydb -c "SELECT 1;"
Recovery Steps#
-
Point-in-time recovery
# Son valid backup time'ı bul aws rds describe-db-instances \ --db-instance-identifier mycompany-dev-db \ --query 'DBInstances[0].LatestRestorableTime' # Point-in-time restore aws rds restore-db-instance-to-point-in-time \ --source-db-instance-identifier mycompany-dev-db \ --target-db-instance-identifier mycompany-dev-db-recovered \ --restore-time 2024-XX-XXTXX:XX:XX.000Z -
Manual snapshot restore
-
Application reconnection
3. Data Corruption#
Assessment#
# Check for data inconsistencies
kubectl exec -it deployment/backend -n dev -- \
python manage.py check_data_integrity
# Check database logs
aws rds describe-db-log-files \
--db-instance-identifier mycompany-dev-db
Recovery Steps#
-
Identify corruption scope
-
Restore from logical backup
Testing Procedures#
Monthly DR Drill#
- Create test restore in separate namespace
- Verify data integrity
- Test application functionality
- Document lessons learned
Quarterly Full DR Test#
- Complete environment recreation
- Full data restore
- End-to-end testing
- Performance validation
Emergency Contacts#
- DevOps Team: +90-XXX-XXX-XXXX
- Database Team: +90-XXX-XXX-XXXX
- On-call Engineer: +90-XXX-XXX-XXXX
Post-Incident Actions#
- Root Cause Analysis
- Document incident timeline
- Identify failure points
-
Implement preventive measures
-
Update Procedures
- Update runbooks
- Improve monitoring
-
Enhance alerting
-
Team Communication
- Share lessons learned
- Update training materials
- Schedule review meeting EOF
9.3.2 DR test script#
cat > ~/devops-infrastructure/scripts/dr-test.sh << 'EOF'
!/bin/bash#
Disaster Recovery Test Script#
set -e
NAMESPACE="dr-test" BACKUP_NAME="$1"
if [ -z "$BACKUP_NAME" ]; then echo "Usage: $0
echo "Starting DR test with backup: $BACKUP_NAME"
Create test namespace#
kubectl create namespace $NAMESPACE --dry-run=client -o yaml | kubectl apply -f -
Restore from backup to test namespace#
velero restore create dr-test-$(date +%Y%m%d%H%M%S) \ --from-backup \(BACKUP_NAME \ --namespace-mappings dev:\)NAMESPACE,staging:$NAMESPACE
Wait for restore completion#
echo "Waiting for restore completion..." sleep 60
Check restored resources#
echo "Checking restored resources..." kubectl get all -n $NAMESPACE
Test database connectivity#
echo "Testing database connectivity..." kubectl run db-test -n $NAMESPACE --rm -i --tty \ --image=postgres:15-alpine -- \ psql -h $(kubectl get secret database-secret -n $NAMESPACE -o jsonpath='{.data.host}' | base64 -d) \ -U $(kubectl get secret database-secret -n $NAMESPACE -o jsonpath='{.data.username}' | base64 -d) \ -d mycompanydb \ -c "SELECT COUNT(*) FROM information_schema.tables;"
echo "DR test completed successfully!" echo "Cleanup: kubectl delete namespace $NAMESPACE" EOF
chmod +x ~/devops-infrastructure/scripts/dr-test.sh
---
## 🎯 **PHASE 9: GITOPS & DEPLOYMENT AUTOMATION** (Gün 21-22)
### 🔄 **10.1 ArgoCD Setup**
```bash
# 10.1.1 ArgoCD kurulumu
kubectl create namespace argocd
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# 10.1.2 ArgoCD CLI kurulumu
wget https://github.com/argoproj/argo-cd/releases/latest/download/argocd-linux-amd64
sudo install -m 555 argocd-linux-amd64 /usr/local/bin/argocd
rm argocd-linux-amd64
# 10.1.3 ArgoCD initial password
ARGOCD_PASSWORD=$(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
echo "ArgoCD admin password: $ARGOCD_PASSWORD"
# 10.1.4 ArgoCD ingress
cat > argocd-ingress.yaml << 'EOF'
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: argocd-server-ingress
namespace: argocd
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
nginx.ingress.kubernetes.io/backend-protocol: "GRPC"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- argocd.yourdomain.com
secretName: argocd-tls
rules:
- host: argocd.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: argocd-server
port:
number: 443
EOF
kubectl apply -f argocd-ingress.yaml
# 10.1.5 ArgoCD server configuration
kubectl patch configmap argocd-cmd-params-cm -n argocd --patch '{"data":{"server.insecure":"true"}}'
kubectl rollout restart deployment argocd-server -n argocd
# 10.1.6 ArgoCD login
argocd login argocd.yourdomain.com --username admin --password $ARGOCD_PASSWORD --insecure
📁 10.2 GitOps Repository Structure#
# 10.2.1 GitOps repository oluştur
cd ~/
git clone https://github.com/yourusername/gitops-config.git
cd gitops-config
# Repository structure
mkdir -p {applications/{dev,staging,production},infrastructure/{monitoring,logging,security},bootstrap}
# 10.2.2 Application of Applications pattern
cat > bootstrap/root-app.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: root-app
namespace: argocd
finalizers:
- resources-finalizer.argocd.argoproj.io
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: bootstrap
destination:
server: https://kubernetes.default.svc
namespace: argocd
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# 10.2.3 Infrastructure applications
cat > bootstrap/infrastructure-apps.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: monitoring-stack
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: infrastructure/monitoring
destination:
server: https://kubernetes.default.svc
namespace: monitoring
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: logging-stack
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: infrastructure/logging
destination:
server: https://kubernetes.default.svc
namespace: logging
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
---
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: security-stack
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: infrastructure/security
destination:
server: https://kubernetes.default.svc
namespace: security
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# 10.2.4 Environment-specific applications
cat > bootstrap/dev-apps.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: dev-applications
namespace: argocd
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: applications/dev
destination:
server: https://kubernetes.default.svc
namespace: dev
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# 10.2.5 Sample application manifest
cat > applications/dev/sample-app.yaml << 'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
name: sample-app
namespace: dev
labels:
app: sample-app
spec:
replicas: 2
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
securityContext:
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
containers:
- name: app
image: ghcr.io/yourusername/sample-app:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-secret
key: url
securityContext:
allowPrivilegeEscalation: false
readOnlyRootFilesystem: true
runAsNonRoot: true
runAsUser: 10001
runAsGroup: 10001
capabilities:
drop:
- ALL
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
volumeMounts:
- name: tmp
mountPath: /tmp
volumes:
- name: tmp
emptyDir: {}
---
apiVersion: v1
kind: Service
metadata:
name: sample-app
namespace: dev
labels:
app: sample-app
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
name: http
type: ClusterIP
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: sample-app
namespace: dev
annotations:
kubernetes.io/ingress.class: nginx
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- app-dev.yourdomain.com
secretName: sample-app-tls
rules:
- host: app-dev.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: sample-app
port:
number: 80
EOF
# Git'e commit
git add .
git commit -m "Initial GitOps repository structure"
git push origin main
🚀 10.3 Progressive Delivery with Argo Rollouts#
# 10.3.1 Argo Rollouts kurulumu
kubectl create namespace argo-rollouts
kubectl apply -n argo-rollouts -f https://github.com/argoproj/argo-rollouts/releases/latest/download/install.yaml
# 10.3.2 Argo Rollouts CLI
wget https://github.com/argoproj/argo-rollouts/releases/latest/download/kubectl-argo-rollouts-linux-amd64
sudo install -m 555 kubectl-argo-rollouts-linux-amd64 /usr/local/bin/kubectl-argo-rollouts
rm kubectl-argo-rollouts-linux-amd64
# 10.3.3 Canary deployment example
cat > applications/dev/sample-app-rollout.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: sample-app-rollout
namespace: dev
spec:
replicas: 5
strategy:
canary:
steps:
- setWeight: 20
- pause: {}
- setWeight: 40
- pause: {duration: 10}
- setWeight: 60
- pause: {duration: 10}
- setWeight: 80
- pause: {duration: 10}
canaryService: sample-app-canary
stableService: sample-app-stable
trafficRouting:
nginx:
stableIngress: sample-app-stable
annotationPrefix: nginx.ingress.kubernetes.io
additionalIngressAnnotations:
canary-by-header: X-Canary
analysis:
templates:
- templateName: success-rate
startingStep: 2
args:
- name: service-name
value: sample-app-canary.dev.svc.cluster.local
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
prometheus.io/path: "/metrics"
spec:
containers:
- name: app
image: ghcr.io/yourusername/sample-app:v1.0.0
ports:
- containerPort: 8080
name: http
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: database-secret
key: url
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: sample-app-stable
namespace: dev
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
name: http
---
apiVersion: v1
kind: Service
metadata:
name: sample-app-canary
namespace: dev
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
name: http
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
namespace: dev
spec:
args:
- name: service-name
metrics:
- name: success-rate
interval: 30s
successCondition: result[0] >= 0.95
failureLimit: 3
provider:
prometheus:
address: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
query: |
sum(irate(
http_requests_total{job="{{args.service-name}}",status!~"5.*"}[5m]
)) /
sum(irate(
http_requests_total{job="{{args.service-name}}"}[5m]
))
EOF
# 10.3.4 Blue-Green deployment example
cat > applications/staging/sample-app-bluegreen.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: sample-app-bluegreen
namespace: staging
spec:
replicas: 3
strategy:
blueGreen:
activeService: sample-app-active
previewService: sample-app-preview
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: sample-app-preview.staging.svc.cluster.local
postPromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: sample-app-active.staging.svc.cluster.local
selector:
matchLabels:
app: sample-app
template:
metadata:
labels:
app: sample-app
spec:
containers:
- name: app
image: ghcr.io/yourusername/sample-app:v1.0.0
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: sample-app-active
namespace: staging
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
name: http
---
apiVersion: v1
kind: Service
metadata:
name: sample-app-preview
namespace: staging
spec:
selector:
app: sample-app
ports:
- port: 80
targetPort: 8080
name: http
EOF
# Changes'ları commit et
git add .
git commit -m "Add progressive delivery configurations"
git push origin main
🔧 10.4 CI/CD Integration with GitOps#
# 10.4.1 Image updater için ArgoCD configuration
cat > argocd-image-updater.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: argocd-image-updater-config
namespace: argocd
data:
registries.conf: |
registries:
- name: GitHub Container Registry
prefix: ghcr.io
api_url: https://ghcr.io
credentials: ext:/scripts/auth1.sh
credsexpire: 10h
ssh_config: |
Host github.com
User git
IdentitiesOnly yes
IdentityFile ~/.ssh/id_rsa
StrictHostKeyChecking no
---
apiVersion: v1
kind: Secret
metadata:
name: argocd-image-updater-secret
namespace: argocd
type: Opaque
stringData:
auth1.sh: |
#!/bin/sh
echo "username:$GITHUB_TOKEN"
EOF
kubectl apply -f argocd-image-updater.yaml
# 10.4.2 Application annotation for image updates
cat > applications/dev/sample-app-with-image-updater.yaml << 'EOF'
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: sample-app
namespace: argocd
annotations:
argocd-image-updater.argoproj.io/image-list: myapp=ghcr.io/yourusername/sample-app
argocd-image-updater.argoproj.io/write-back-method: git
argocd-image-updater.argoproj.io/git-branch: main
spec:
project: default
source:
repoURL: https://github.com/yourusername/gitops-config.git
targetRevision: main
path: applications/dev
destination:
server: https://kubernetes.default.svc
namespace: dev
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
EOF
# 10.4.3 Updated Jenkins pipeline with GitOps
cat > ~/devops-infrastructure/jenkins/gitops-pipeline.groovy << 'EOF'
@Library('shared-library') _
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: docker
image: docker:latest
command:
- cat
tty: true
volumeMounts:
- mountPath: /var/run/docker.sock
name: docker-sock
- name: git
image: alpine/git:latest
command:
- cat
tty: true
volumes:
- name: docker-sock
hostPath:
path: /var/run/docker.sock
"""
}
}
environment {
DOCKER_REGISTRY = 'ghcr.io'
IMAGE_NAME = 'yourusername/sample-app'
GIT_COMMIT_SHORT = sh(script: "git rev-parse --short HEAD", returnStdout: true).trim()
BUILD_VERSION = "v1.0.${env.BUILD_NUMBER}-${GIT_COMMIT_SHORT}"
GITOPS_REPO = 'https://github.com/yourusername/gitops-config.git'
}
stages {
stage('Build & Push') {
steps {
container('docker') {
script {
def image = docker.build("${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION}")
docker.withRegistry("https://${DOCKER_REGISTRY}", 'github-registry-credentials') {
image.push()
image.push("latest")
}
}
}
}
}
stage('Update GitOps Repo') {
steps {
container('git') {
withCredentials([usernamePassword(credentialsId: 'github-credentials', usernameVariable: 'GIT_USERNAME', passwordVariable: 'GIT_TOKEN')]) {
sh '''
git config --global user.email "jenkins@company.com"
git config --global user.name "Jenkins CI"
# Clone GitOps repository
git clone https://${GIT_USERNAME}:${GIT_TOKEN}@github.com/yourusername/gitops-config.git
cd gitops-config
# Update image tag in deployment manifest
sed -i "s|image: ${DOCKER_REGISTRY}/${IMAGE_NAME}:.*|image: ${DOCKER_REGISTRY}/${IMAGE_NAME}:${BUILD_VERSION}|g" applications/dev/sample-app.yaml
# Commit and push changes
git add .
git commit -m "Update ${IMAGE_NAME} to ${BUILD_VERSION}"
git push origin main
'''
}
}
}
}
}
post {
success {
slackSend(
channel: '#deployments',
color: 'good',
message: "✅ ${IMAGE_NAME}:${BUILD_VERSION} built and GitOps updated successfully"
)
}
failure {
slackSend(
channel: '#deployments',
color: 'danger',
message: "❌ Pipeline failed for ${IMAGE_NAME}:${BUILD_VERSION}"
)
}
}
}
EOF
# 10.4.4 ArgoCD'ye root application'ı deploy et
kubectl apply -f ~/gitops-config/bootstrap/root-app.yaml
echo "GitOps setup completed!"
echo "ArgoCD UI: https://argocd.yourdomain.com"
echo "Login: admin / $ARGOCD_PASSWORD"
📈 PHASE 10: COST OPTIMIZATION & PERFORMANCE (Gün 23-24)#
💰 11.1 Cost Monitoring Setup#
# 11.1.1 AWS Cost and Usage Report setup
cat > ~/devops-infrastructure/scripts/setup-cost-monitoring.sh << 'EOF'
#!/bin/bash
# AWS Cost Monitoring Setup Script
set -e
BUCKET_NAME="mycompany-cost-reports-$(openssl rand -hex 4)"
REGION="eu-west-1"
# Create S3 bucket for cost reports
aws s3 mb s3://$BUCKET_NAME --region $REGION
# Bucket policy for AWS Cost and Usage Reports
cat > cost-bucket-policy.json << EOF
{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Principal": {
"Service": "billingreports.amazonaws.com"
},
"Action": [
"s3:GetBucketAcl",
"s3:GetBucketPolicy"
],
"Resource": "arn:aws:s3:::$BUCKET_NAME"
},
{
"Effect": "Allow",
"Principal": {
"Service": "billingreports.amazonaws.com"
},
"Action": "s3:PutObject",
"Resource": "arn:aws:s3:::$BUCKET_NAME/*"
}
]
}
EOF
aws s3api put-bucket-policy --bucket $BUCKET_NAME --policy file://cost-bucket-policy.json
echo "Cost monitoring S3 bucket created: $BUCKET_NAME"
echo "Configure Cost and Usage Report in AWS Console:"
echo "https://console.aws.amazon.com/billing/home#/reports"
rm cost-bucket-policy.json
EOF
chmod +x ~/devops-infrastructure/scripts/setup-cost-monitoring.sh
./~/devops-infrastructure/scripts/setup-cost-monitoring.sh
# 11.1.2 Kubecost kurulumu
helm repo add kubecost https://kubecost.github.io/cost-analyzer/
helm repo update
cat > kubecost-values.yaml << 'EOF'
global:
prometheus:
fqdn: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
enabled: false
grafana:
fqdn: http://kube-prometheus-stack-grafana.monitoring.svc.cluster.local:80
enabled: false
kubecostFrontend:
image: "kubecost/frontend"
resources:
requests:
cpu: "10m"
memory: "55Mi"
limits:
cpu: "100m"
memory: "256Mi"
kubecost:
image: "kubecost/server"
resources:
requests:
cpu: "100m"
memory: "55Mi"
limits:
cpu: "200m"
memory: "256Mi"
kubecostModel:
image: "kubecost/cost-model"
resources:
requests:
cpu: "200m"
memory: "55Mi"
limits:
cpu: "800m"
memory: "256Mi"
ingress:
enabled: true
className: nginx
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
hosts:
- host: kubecost.yourdomain.com
paths:
- path: /
pathType: Prefix
tls:
- secretName: kubecost-tls
hosts:
- kubecost.yourdomain.com
persistentVolume:
enabled: true
storageClass: gp3
size: 32Gi
nodeSelector: {}
tolerations: []
affinity: {}
service:
type: ClusterIP
port: 9090
targetPort: 9090
EOF
kubectl create namespace kubecost
helm install kubecost kubecost/cost-analyzer \
--namespace kubecost \
--values kubecost-values.yaml
# 11.1.3 Resource recommendation script
cat > ~/devops-infrastructure/scripts/resource-recommendations.sh << 'EOF'
#!/bin/bash
# Resource Recommendations Script
set -e
echo "📊 Generating resource recommendations..."
# VPA recommendations
echo "=== VPA Recommendations ==="
kubectl get vpa --all-namespaces -o custom-columns=\
NAMESPACE:.metadata.namespace,\
NAME:.metadata.name,\
MODE:.spec.updatePolicy.updateMode,\
CPU_TARGET:.status.recommendation.containerRecommendations[0].target.cpu,\
MEMORY_TARGET:.status.recommendation.containerRecommendations[0].target.memory
# Top resource consuming pods
echo "=== Top CPU Consuming Pods ==="
kubectl top pods --all-namespaces --sort-by=cpu | head -10
echo "=== Top Memory Consuming Pods ==="
kubectl top pods --all-namespaces --sort-by=memory | head -10
# Unused resources
echo "=== Pods with Low Resource Utilization ==="
kubectl get pods --all-namespaces -o json | \
jq -r '.items[] | select(.status.phase=="Running") |
.metadata.namespace + "/" + .metadata.name + " - " +
(.spec.containers[0].resources.requests.cpu // "no-limit") + " CPU, " +
(.spec.containers[0].resources.requests.memory // "no-limit") + " Memory"'
# HPA status
echo "=== HPA Status ==="
kubectl get hpa --all-namespaces
echo "📋 Recommendations:"
echo "1. Check VPA recommendations for right-sizing"
echo "2. Set resource requests/limits for pods without them"
echo "3. Consider HPA for variable workloads"
echo "4. Use VPA in recommendation mode first"
EOF
chmod +x ~/devops-infrastructure/scripts/resource-recommendations.sh
⚡ 11.2 Performance Optimization#
# 11.2.1 Vertical Pod Autoscaler setup
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-install.sh
cd ~/devops-infrastructure
# 11.2.2 VPA example configurations
cat > vpa-examples.yaml << 'EOF'
# VPA for sample app (recommendation mode)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: sample-app-vpa
namespace: dev
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: sample-app
updatePolicy:
updateMode: "Off" # Recommendation only
resourcePolicy:
containerPolicies:
- containerName: app
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1000m
memory: 1Gi
controlledResources: ["cpu", "memory"]
---
# VPA for monitoring stack (auto mode)
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: kube-prometheus-stack-prometheus
updatePolicy:
updateMode: "Auto"
resourcePolicy:
containerPolicies:
- containerName: prometheus
minAllowed:
cpu: 500m
memory: 1Gi
maxAllowed:
cpu: 4000m
memory: 8Gi
controlledResources: ["cpu", "memory"]
EOF
kubectl apply -f vpa-examples.yaml
# 11.2.3 KEDA (Event-driven autoscaling) setup
helm repo add kedacore https://kedacore.github.io/charts
helm repo update
helm install keda kedacore/keda \
--namespace keda \
--create-namespace
# 11.2.4 KEDA ScaledObject example (Redis queue)
cat > keda-redis-scaler.yaml << 'EOF'
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: redis-scaledobject
namespace: dev
spec:
scaleTargetRef:
name: worker-deployment
minReplicaCount: 1
maxReplicaCount: 10
triggers:
- type: redis
metadata:
address: redis.dev.svc.cluster.local:6379
listName: job_queue
listLength: '5'
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: prometheus-scaledobject
namespace: dev
spec:
scaleTargetRef:
name: sample-app
minReplicaCount: 2
maxReplicaCount: 20
triggers:
- type: prometheus
metadata:
serverAddress: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090
metricName: http_requests_per_second
threshold: '100'
query: sum(rate(http_requests_total{job="sample-app"}[1m]))
EOF
kubectl apply -f keda-redis-scaler.yaml
# 11.2.5 Performance monitoring dashboard
cat > performance-monitoring.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: performance-dashboard
namespace: monitoring
labels:
grafana_dashboard: "1"
data:
performance-dashboard.json: |
{
"dashboard": {
"id": null,
"title": "Application Performance Monitoring",
"tags": ["performance", "apm"],
"timezone": "browser",
"panels": [
{
"id": 1,
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "{{service}}"
}
]
},
{
"id": 2,
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, service))",
"legendFormat": "95th percentile - {{service}}"
}
]
},
{
"id": 3,
"title": "Error Rate",
"type": "graph",
"targets": [
{
"expr": "sum(rate(http_requests_total{status=~'5..'}[5m])) by (service) / sum(rate(http_requests_total[5m])) by (service)",
"legendFormat": "Error rate - {{service}}"
}
]
}
],
"time": {
"from": "now-1h",
"to": "now"
},
"refresh": "30s"
}
}
EOF
kubectl apply -f performance-monitoring.yaml
🧪 11.3 Load Testing & Performance Validation#
# 11.3.1 K6 load testing setup
cat > load-testing/k6-config.yaml << 'EOF'
apiVersion: v1
kind: ConfigMap
metadata:
name: k6-scripts
namespace: dev
data:
load-test.js: |
import http from 'k6/http';
import { check, sleep } from 'k6';
import { Rate } from 'k6/metrics';
export let errorRate = new Rate('errors');
export let options = {
stages: [
{ duration: '2m', target: 10 }, // Ramp up
{ duration: '5m', target: 100 }, // Stay at 100 users
{ duration: '2m', target: 200 }, // Ramp up to 200 users
{ duration: '5m', target: 200 }, // Stay at 200 users
{ duration: '2m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% of requests under 500ms
http_req_failed: ['rate<0.05'], // Error rate under 5%
errors: ['rate<0.1'], // Custom error rate under 10%
},
};
export default function() {
let response = http.get('https://app-dev.yourdomain.com/api/health');
check(response, {
'status is 200': (r) => r.status === 200,
'response time < 500ms': (r) => r.timings.duration < 500,
}) || errorRate.add(1);
sleep(1);
}
stress-test.js: |
import http from 'k6/http';
import { check } from 'k6';
export let options = {
stages: [
{ duration: '1m', target: 50 },
{ duration: '1m', target: 100 },
{ duration: '1m', target: 200 },
{ duration: '1m', target: 500 },
{ duration: '2m', target: 1000 }, // Stress level
{ duration: '2m', target: 0 },
],
};
export default function() {
let response = http.get('https://app-dev.yourdomain.com/api/users');
check(response, {
'status is 200': (r) => r.status === 200,
});
}
EOF
kubectl apply -f load-testing/
# 11.3.2 K6 operator kurulumu
kubectl apply -f https://github.com/grafana/k6-operator/releases/latest/download/bundle.yaml
# 11.3.3 Load test job
cat > load-test-job.yaml << 'EOF'
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: load-test
namespace: dev
spec:
parallelism: 4
script:
configMap:
name: k6-scripts
file: load-test.js
separate: true
runner:
image: grafana/k6:latest
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
env:
- name: K6_PROMETHEUS_RW_SERVER_URL
value: http://kube-prometheus-stack-prometheus.monitoring.svc.cluster.local:9090/api/v1/write
- name: K6_PROMETHEUS_RW_TREND_AS_NATIVE_HISTOGRAM
value: "true"
EOF
# Load test çalıştır
kubectl apply -f load-test-job.yaml
kubectl logs -f job/load-test-1 -n dev
# 11.3.4 Automated performance test pipeline
cat > ~/devops-infrastructure/jenkins/performance-test-pipeline.groovy << 'EOF'
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- cat
tty: true
- name: k6
image: grafana/k6:latest
command:
- cat
tty: true
"""
}
}
parameters {
choice(
name: 'TEST_TYPE',
choices: ['load-test', 'stress-test', 'spike-test'],
description: 'Type of performance test to run'
)
string(
name: 'TARGET_URL',
defaultValue: 'https://app-staging.yourdomain.com',
description: 'Target URL for testing'
)
string(
name: 'DURATION',
defaultValue: '5m',
description: 'Test duration'
)
}
stages {
stage('Deploy Test Config') {
steps {
container('kubectl') {
sh '''
cat > k6-test-config.yaml << EOF
apiVersion: v1
kind: ConfigMap
metadata:
name: k6-test-config-${BUILD_NUMBER}
namespace: dev
data:
test.js: |
import http from 'k6/http';
import { check, sleep } from 'k6';
export let options = {
duration: '${DURATION}',
vus: 50,
thresholds: {
http_req_duration: ['p(95)<1000'],
http_req_failed: ['rate<0.05'],
},
};
export default function() {
let response = http.get('${TARGET_URL}/health');
check(response, {
'status is 200': (r) => r.status === 200,
});
sleep(1);
}
EOF
kubectl apply -f k6-test-config.yaml
'''
}
}
}
stage('Run Performance Test') {
steps {
container('kubectl') {
sh '''
cat > k6-job.yaml << EOF
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: perf-test-${BUILD_NUMBER}
namespace: dev
spec:
parallelism: 2
script:
configMap:
name: k6-test-config-${BUILD_NUMBER}
file: test.js
separate: true
EOF
kubectl apply -f k6-job.yaml
# Wait for test completion
kubectl wait --for=condition=complete job/perf-test-${BUILD_NUMBER}-1 -n dev --timeout=600s
# Get test results
kubectl logs job/perf-test-${BUILD_NUMBER}-1 -n dev
'''
}
}
}
stage('Analyze Results') {
steps {
container('kubectl') {
sh '''
# Extract test metrics and validate against thresholds
TEST_RESULTS=$(kubectl logs job/perf-test-${BUILD_NUMBER}-1 -n dev | grep -E "(http_req_duration|http_req_failed)")
echo "Test Results: $TEST_RESULTS"
# Check if test passed thresholds
if kubectl logs job/perf-test-${BUILD_NUMBER}-1 -n dev | grep -q "✓"; then
echo "Performance test PASSED"
else
echo "Performance test FAILED"
exit 1
fi
'''
}
}
}
}
post {
always {
container('kubectl') {
sh '''
# Cleanup test resources
kubectl delete configmap k6-test-config-${BUILD_NUMBER} -n dev || true
kubectl delete k6 perf-test-${BUILD_NUMBER} -n dev || true
'''
}
}
success {
slackSend(
channel: '#performance',
color: 'good',
message: "✅ Performance test passed for ${params.TARGET_URL}"
)
}
failure {
slackSend(
channel: '#performance',
color: 'danger',
message: "❌ Performance test failed for ${params.TARGET_URL}"
)
}
}
}
EOF
📊 11.4 Cost Optimization Scripts#
# 11.4.1 Resource rightsizing script
cat > ~/devops-infrastructure/scripts/cost-optimization.sh << 'EOF'
#!/bin/bash
# Cost Optimization Analysis Script
set -e
echo "💰 AWS Cost Optimization Analysis"
echo "=================================="
# 1. Unused EBS volumes
echo "🔍 Checking for unused EBS volumes..."
aws ec2 describe-volumes \
--filters Name=status,Values=available \
--query 'Volumes[*].[VolumeId,Size,VolumeType,CreateTime]' \
--output table
# 2. Unattached Elastic IPs
echo "🔍 Checking for unattached Elastic IPs..."
aws ec2 describe-addresses \
--query 'Addresses[?AssociationId==null].[PublicIp,AllocationId]' \
--output table
# 3. Old snapshots (older than 30 days)
echo "🔍 Checking for old snapshots..."
CUTOFF_DATE=$(date -d '30 days ago' --iso-8601)
aws ec2 describe-snapshots \
--owner-ids self \
--query "Snapshots[?StartTime<='$CUTOFF_DATE'].[SnapshotId,StartTime,VolumeSize]" \
--output table
# 4. Right-sizing recommendations
echo "🔍 Generating right-sizing recommendations..."
aws ce get-rightsizing-recommendation \
--service "EC2-Instance" \
--query 'RightsizingRecommendations[*].[CurrentInstance.InstanceName,CurrentInstance.InstanceType,RightsizingType,TargetInstances[0].EstimatedMonthlySavings.Amount]' \
--output table
# 5. Reserved Instance recommendations
echo "🔍 Checking Reserved Instance opportunities..."
aws ce get-reservation-purchase-recommendation \
--service "EC2-Instance" \
--query 'Recommendations[*].[InstanceDetails.EC2InstanceDetails.InstanceType,InstanceDetails.EC2InstanceDetails.Region,RecommendationDetails.EstimatedMonthlySavingsAmount]' \
--output table
echo "💡 Cost Optimization Recommendations:"
echo "1. Delete unused EBS volumes"
echo "2. Release unattached Elastic IPs"
echo "3. Delete old snapshots"
echo "4. Implement right-sizing recommendations"
echo "5. Consider Reserved Instances for stable workloads"
EOF
chmod +x ~/devops-infrastructure/scripts/cost-optimization.sh
# 11.4.2 Spot instance integration
cat > spot-instances.yaml << 'EOF'
# Karpenter for spot instances
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: spot-provisioner
spec:
# Requirements that constrain which nodes will be created
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: kubernetes.io/arch
operator: In
values: ["amd64"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["t3.medium", "t3.large", "m5.large", "m5.xlarge"]
# Provisioned nodes will have these taints
taints:
- key: spot
value: "true"
effect: NoSchedule
# Resource limits constrain the total size of the cluster
limits:
resources:
cpu: 1000
memory: 1000Gi
# Deprovisioning configuration
ttlSecondsAfterEmpty: 30
# Provider-specific configuration
providerRef:
name: spot-nodepool
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodePool
metadata:
name: spot-nodepool
spec:
amiFamily: AL2
subnetSelector:
karpenter.sh/discovery: "mycompany-dev-eks"
securityGroupSelector:
karpenter.sh/discovery: "mycompany-dev-eks"
instanceProfile: "KarpenterNodeInstanceProfile"
# Spot instance configuration
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["spot"]
- key: node.kubernetes.io/instance-type
operator: In
values: ["t3.medium", "t3.large", "m5.large"]
userData: |
#!/bin/bash
/etc/eks/bootstrap.sh mycompany-dev-eks
echo "spot=true" >> /etc/kubernetes/kubelet/kubelet-config.json
EOF
# 11.4.3 Resource quota ve limits
cat > resource-quotas.yaml << 'EOF'
# Development namespace quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: dev-quota
namespace: dev
spec:
hard:
requests.cpu: "4"
requests.memory: 8Gi
limits.cpu: "8"
limits.memory: 16Gi
persistentvolumeclaims: "10"
pods: "20"
services: "10"
secrets: "20"
configmaps: "20"
---
# Staging namespace quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: staging-quota
namespace: staging
spec:
hard:
requests.cpu: "8"
requests.memory: 16Gi
limits.cpu: "16"
limits.memory: 32Gi
persistentvolumeclaims: "15"
pods: "30"
services: "15"
---
# Production namespace quotas
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "20"
requests.memory: 40Gi
limits.cpu: "40"
limits.memory: 80Gi
persistentvolumeclaims: "25"
pods: "50"
services: "25"
---
# Limit ranges for all namespaces
apiVersion: v1
kind: LimitRange
metadata:
name: default-limits
namespace: dev
spec:
limits:
- default:
cpu: "200m"
memory: "256Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
type: Container
- max:
cpu: "2"
memory: "4Gi"
min:
cpu: "50m"
memory: "64Mi"
type: Container
EOF
kubectl apply -f resource-quotas.yaml
📚 PHASE 11: DOCUMENTATION & TEAM PROCESSES (Gün 25-26)#
📖 12.1 Comprehensive Documentation#
# 12.1.1 Architecture documentation
cat > ~/devops-infrastructure/docs/architecture-overview.md << 'EOF'
# DevOps Infrastructure Architecture
## Overview
Bu doküman şirketimizin Kubernetes-based DevOps altyapısının mimari yapısını detaylandırır.
## High-Level Architecture
```mermaid
graph TB
Developer[Developer] --> GitHub[GitHub Repository]
GitHub --> Jenkins[Jenkins CI/CD]
Jenkins --> Registry[GitHub Container Registry]
Jenkins --> ArgoCD[ArgoCD GitOps]
ArgoCD --> EKS[Amazon EKS]
EKS --> Apps[Applications]
subgraph "AWS Infrastructure"
VPC[VPC]
EKS --> VPC
RDS[RDS PostgreSQL]
ElastiCache[ElastiCache Redis]
S3[S3 Buckets]
ALB[Application Load Balancer]
end
subgraph "Monitoring Stack"
Prometheus[Prometheus]
Grafana[Grafana]
AlertManager[AlertManager]
Jaeger[Jaeger Tracing]
end
subgraph "Logging Stack"
FluentBit[Fluent Bit]
OpenSearch[OpenSearch]
OpenSearchDashboards[OpenSearch Dashboards]
end
subgraph "Security"
Vault[HashiCorp Vault]
Falco[Falco Runtime Security]
OPA[OPA Gatekeeper]
end
Apps --> Monitoring Stack
Apps --> Logging Stack
Apps --> Security
Component Details#
Infrastructure Layer#
AWS Services#
- VPC: Multi-AZ setup with public/private subnets
- EKS: Managed Kubernetes cluster (v1.28)
- RDS: PostgreSQL with Multi-AZ and read replicas
- ElastiCache: Redis for caching and session storage
- ALB: Application Load Balancer with SSL termination
- S3: Object storage for backups, logs, and artifacts
Kubernetes Components#
- Namespaces: dev, staging, production, monitoring, logging, security
- RBAC: Role-based access control for different teams
- Network Policies: Micro-segmentation with Calico
- Pod Security Standards: Enforced security contexts
- Storage Classes: GP3, IO1 for different performance needs
Application Layer#
Deployment Strategy#
- GitOps: ArgoCD-based continuous deployment
- Progressive Delivery: Canary and Blue-Green deployments
- Auto-scaling: HPA, VPA, and KEDA for event-driven scaling
- Service Mesh: Istio for traffic management (optional)
Security#
- Secrets Management: HashiCorp Vault with External Secrets Operator
- Runtime Security: Falco for threat detection
- Policy Enforcement: OPA Gatekeeper for admission control
- Image Security: Trivy scanning in CI/CD pipeline
Observability#
Monitoring#
- Metrics: Prometheus with custom and pre-built dashboards
- Visualization: Grafana with role-based dashboards
- Alerting: AlertManager with Slack/PagerDuty integration
- Distributed Tracing: Jaeger for request tracing
Logging#
- Collection: Fluent Bit daemonset
- Storage: OpenSearch cluster
- Analysis: OpenSearch Dashboards
- Retention: 30-day retention with automated cleanup
Security Architecture#
Access Control#
- AWS IAM: Service accounts with IRSA
- Kubernetes RBAC: Namespace-level permissions
- Vault: Centralized secrets management
- Network Policies: Pod-to-pod communication rules
Security Scanning#
- Container Images: Trivy in CI/CD
- Infrastructure: Checkov for Terraform
- Runtime: Falco for anomaly detection
- Policy: OPA for compliance enforcement
Disaster Recovery#
Backup Strategy#
- Kubernetes: Velero daily/weekly backups
- Database: RDS automated backups + manual snapshots
- Storage: EBS snapshots
- Cross-region: S3 replication for critical data
Recovery Objectives#
- RTO: 4 hours for complete infrastructure
- RPO: 1 hour for data loss
- Testing: Monthly DR drills
Cost Optimization#
Strategies#
- Resource Right-sizing: VPA recommendations
- Spot Instances: Karpenter for non-critical workloads
- Storage Optimization: GP3 for better price/performance
- Reserved Instances: For predictable workloads
Monitoring#
- Kubecost: Kubernetes cost visibility
- AWS Cost Explorer: Infrastructure cost analysis
- Automated Cleanup: Unused resources identification
Performance Optimization#
Auto-scaling#
- HPA: CPU/Memory-based pod scaling
- VPA: Resource recommendation and adjustment
- KEDA: Event-driven scaling (queue length, metrics)
- Cluster Autoscaler: Node-level scaling
Load Testing#
- K6: Automated performance testing
- Chaos Engineering: Failure injection testing
- SLI/SLO: Service level monitoring
Operational Procedures#
Deployment Process#
- Developer pushes code to GitHub
- Jenkins builds and tests application
- Jenkins pushes image to GHCR
- Jenkins updates GitOps repository
- ArgoCD syncs changes to Kubernetes
- Progressive delivery monitors health
Incident Response#
- Detection: Automated alerting via AlertManager
- Notification: Slack/PagerDuty escalation
- Response: Runbook-driven remediation
- Recovery: Automated rollback if needed
- Post-mortem: Root cause analysis
Team Responsibilities#
DevOps Team#
- Infrastructure maintenance
- CI/CD pipeline management
- Security compliance
- Performance optimization
Development Teams#
- Application deployment
- Resource requirements definition
- Application monitoring setup
- Performance testing
Operations Team#
- Incident response
- Backup verification
- Capacity planning
- Change management EOF
12.1.2 Operational runbooks#
cat > ~/devops-infrastructure/docs/operational-runbooks.md << 'EOF'
Operational Runbooks#
Incident Response Procedures#
High CPU Usage Alert#
Symptoms#
- AlertManager fires "High CPU Usage" alert
- Application response times increase
- Users report slowness
Investigation Steps#
# 1. Check current CPU usage
kubectl top pods -n <namespace> --sort-by=cpu
# 2. Check HPA status
kubectl get hpa -n <namespace>
# 3. Check pod resource limits
kubectl describe pod <pod-name> -n <namespace>
# 4. Review metrics in Grafana
# Go to CPU Usage dashboard: https://grafana.yourdomain.com/d/cpu-usage
Resolution Steps#
# 1. Immediate: Scale up manually if HPA not working
kubectl scale deployment <deployment-name> --replicas=<new-count> -n <namespace>
# 2. Check for resource limits
kubectl patch deployment <deployment-name> -n <namespace> --patch '
{
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "<container-name>",
"resources": {
"limits": {
"cpu": "1000m",
"memory": "1Gi"
}
}
}
]
}
}
}
}'
# 3. Restart problematic pods
kubectl rollout restart deployment <deployment-name> -n <namespace>
Prevention#
- Implement proper resource requests/limits
- Set up HPA with appropriate thresholds
- Regular load testing
Database Connection Issues#
Symptoms#
- Applications cannot connect to database
- Connection timeout errors
- Database-related alerts
Investigation Steps#
# 1. Check database connectivity from pod
kubectl run db-test --rm -i --tty --image=postgres:15-alpine -- \
psql -h <db-host> -U <username> -d <database> -c "SELECT 1;"
# 2. Check database secret
kubectl get secret database-secret -n <namespace> -o yaml
# 3. Check RDS status
aws rds describe-db-instances --db-instance-identifier <db-identifier>
# 4. Check security groups
aws ec2 describe-security-groups --group-ids <sg-id>
Resolution Steps#
# 1. Restart application pods
kubectl rollout restart deployment -n <namespace>
# 2. Check and update database credentials
kubectl patch secret database-secret -n <namespace> --patch '
{
"data": {
"password": "<base64-encoded-new-password>"
}
}'
# 3. If RDS issue, check AWS console and restart if needed
aws rds reboot-db-instance --db-instance-identifier <db-identifier>
Pod Stuck in Pending State#
Investigation Steps#
# 1. Describe the pod
kubectl describe pod <pod-name> -n <namespace>
# 2. Check node resources
kubectl describe nodes
# 3. Check PVC status if using persistent storage
kubectl get pvc -n <namespace>
# 4. Check for resource quotas
kubectl describe quota -n <namespace>
Resolution Steps#
# 1. If insufficient resources, scale cluster
aws eks update-nodegroup-config \
--cluster-name <cluster-name> \
--nodegroup-name <nodegroup-name> \
--scaling-config minSize=<min>,maxSize=<max>,desiredSize=<desired>
# 2. If PVC issue, check storage class
kubectl get storageclass
# 3. If quota exceeded, increase or clean up resources
kubectl delete deployment <unused-deployment> -n <namespace>
Maintenance Procedures#
Kubernetes Cluster Upgrade#
Pre-upgrade Checklist#
- Backup cluster state with Velero
- Review breaking changes in new version
- Test upgrade in staging environment
- Notify team about maintenance window
- Prepare rollback plan
Upgrade Steps#
# 1. Update control plane
aws eks update-cluster-version \
--name <cluster-name> \
--version <new-version>
# 2. Wait for update completion
aws eks wait cluster-active --name <cluster-name>
# 3. Update node groups
aws eks update-nodegroup-version \
--cluster-name <cluster-name> \
--nodegroup-name <nodegroup-name> \
--version <new-version>
# 4. Update addons
aws eks update-addon \
--cluster-name <cluster-name> \
--addon-name vpc-cni \
--addon-version <new-version>
# 5. Verify cluster health
kubectl get nodes
kubectl get pods --all-namespaces
Database Maintenance#
Monthly Tasks#
# 1. Review database performance
aws rds describe-db-instances \
--db-instance-identifier <db-identifier> \
--query 'DBInstances[0].PerformanceInsights'
# 2. Cleanup old snapshots
aws rds describe-db-snapshots \
--db-instance-identifier <db-identifier> \
--snapshot-type manual \
--query 'DBSnapshots[30:].[DBSnapshotIdentifier]' \
--output text | \
xargs -I {} aws rds delete-db-snapshot --db-snapshot-identifier {}
# 3. Analyze slow queries
# Access RDS Performance Insights dashboard
Certificate Renewal#
Let's Encrypt Certificates#
# 1. Check certificate expiry
kubectl get certificates -A
# 2. Force renewal if needed
kubectl annotate certificate <cert-name> -n <namespace> \
cert-manager.io/issue-temporary-certificate="true"
# 3. Verify renewal
kubectl describe certificate <cert-name> -n <namespace>
Monitoring and Alerting#
Key Metrics to Monitor#
Infrastructure#
- Node CPU/Memory usage > 80%
- Disk usage > 85%
- Network connectivity issues
- Pod restart frequency
Application#
- Response time > 2s (95th percentile)
- Error rate > 5%
- Request rate anomalies
- Database connection pool exhaustion
Security#
- Failed authentication attempts
- Privilege escalation attempts
- Unusual network traffic
- Policy violations
Alert Escalation#
Severity Levels#
- P1 (Critical): Immediate response (5 min)
- Production down
- Data breach
-
Security incident
-
P2 (High): 30 min response
- Performance degradation
- Service partially down
-
High error rates
-
P3 (Medium): 2 hour response
- Non-critical service issues
- Capacity warnings
-
Configuration issues
-
P4 (Low): Next business day
- Informational alerts
- Optimization opportunities
- Compliance warnings
Change Management#
Deployment Approval Process#
Development Environment#
- Automatic deployment on merge to
developbranch - No approval required
- Immediate rollback available
Staging Environment#
- Automatic deployment on merge to
mainbranch - Automated testing required
- Manual approval for production promotion
Production Environment#
- Manual approval required
- Deployment during maintenance window
- Canary deployment strategy
- Automated rollback on failure
Emergency Change Process#
- Incident commander approval
- Minimal viable fix
- Fast-track testing
- Immediate deployment
- Post-incident review EOF
12.1.3 Team onboarding guide#
cat > ~/devops-infrastructure/docs/team-onboarding.md << 'EOF'
Team Onboarding Guide#
Prerequisites#
Required Tools#
- kubectl - Kubernetes CLI
- helm - Kubernetes package manager
- terraform - Infrastructure as Code
- docker - Container runtime
- aws-cli - AWS command line interface
- argocd - GitOps CLI
- git - Version control
Installation Script#
# Run the automated setup script
curl -fsSL https://raw.githubusercontent.com/yourusername/devops-infrastructure/main/scripts/setup-dev-environment.sh | bash
Access Setup#
1. AWS Access#
# Configure AWS CLI
aws configure
# Use provided access key and secret key
# Test access
aws sts get-caller-identity
2. Kubernetes Access#
# Configure kubectl
aws eks update-kubeconfig --region eu-west-1 --name mycompany-dev-eks
# Test cluster access
kubectl get nodes
3. ArgoCD Access#
4. Vault Access#
# Set Vault address
export VAULT_ADDR="https://vault.yourdomain.com"
# Login with provided token
vault auth -method=userpass username=<your-username>
Development Workflow#
1. Application Development#
# 1. Clone application repository
git clone https://github.com/yourusername/sample-app.git
cd sample-app
# 2. Create feature branch
git checkout -b feature/new-feature
# 3. Make changes and test locally
docker build -t sample-app:local .
docker run -p 8080:8080 sample-app:local
# 4. Commit and push
git add .
git commit -m "feat: add new feature"
git push origin feature/new-feature
# 5. Create pull request
# Pipeline will automatically build and deploy to dev environment
2. Infrastructure Changes#
# 1. Clone infrastructure repository
git clone https://github.com/yourusername/devops-infrastructure.git
cd devops-infrastructure
# 2. Make changes to Terraform
cd terraform/environments/dev
terraform plan
# 3. Apply changes
terraform apply
# 4. Update GitOps repository if needed
cd ../../..
git clone https://github.com/yourusername/gitops-config.git
# Make necessary Kubernetes manifest changes
Common Tasks#
Deploy New Application#
1. Create Kubernetes Manifests#
# applications/dev/new-app.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: new-app
namespace: dev
spec:
replicas: 2
selector:
matchLabels:
app: new-app
template:
metadata:
labels:
app: new-app
spec:
containers:
- name: app
image: ghcr.io/yourusername/new-app:v1.0.0
ports:
- containerPort: 8080
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "200m"
2. Create Service and Ingress#
---
apiVersion: v1
kind: Service
metadata:
name: new-app
namespace: dev
spec:
selector:
app: new-app
ports:
- port: 80
targetPort: 8080
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: new-app
namespace: dev
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: new-app-dev.yourdomain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: new-app
port:
number: 80
Debug Application Issues#
1. Check Pod Status#
# List pods
kubectl get pods -n dev
# Describe problematic pod
kubectl describe pod <pod-name> -n dev
# Check logs
kubectl logs <pod-name> -n dev --tail=100
2. Access Pod for Debugging#
# Execute commands in pod
kubectl exec -it <pod-name> -n dev -- /bin/bash
# Port forward for local access
kubectl port-forward <pod-name> 8080:8080 -n dev
3. Check Resource Usage#
Scale Applications#
Manual Scaling#
# Scale deployment
kubectl scale deployment <app-name> --replicas=5 -n dev
# Check scaling status
kubectl get deployment <app-name> -n dev
Configure Auto-scaling#
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
namespace: dev
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: app-name
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Monitoring and Troubleshooting#
Access Monitoring Tools#
Grafana Dashboards#
- URL: https://grafana.yourdomain.com
- Default Dashboards:
- Kubernetes Cluster Overview
- Application Performance
- Infrastructure Metrics
- Cost Analysis
Log Analysis#
- URL: https://logs.yourdomain.com
- Common Queries:
Distributed Tracing#
- URL: https://jaeger.yourdomain.com
- Usage: Search by service name, operation, or trace ID
Performance Testing#
Run Load Test#
# Apply load test configuration
kubectl apply -f - <<EOF
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: load-test
namespace: dev
spec:
parallelism: 2
script:
configMap:
name: k6-scripts
file: load-test.js
EOF
# Monitor test progress
kubectl logs -f job/load-test-1 -n dev
Security Best Practices#
Container Security#
- Use minimal base images (distroless, alpine)
- Run as non-root user
- Set resource limits
- Scan images for vulnerabilities
Kubernetes Security#
- Use namespaces for isolation
- Implement RBAC properly
- Set Pod Security Standards
- Use Network Policies
Secrets Management#
# Create secret in Vault
vault kv put secret/dev/app-config \
database_password="super-secret" \
api_key="api-key-value"
# Create ExternalSecret to sync
kubectl apply -f - <<EOF
apiVersion: external-secrets.io/v1beta1
kind: ExternalSecret
metadata:
name: app-config
namespace: dev
spec:
refreshInterval: 1m
secretStoreRef:
name: vault-backend
kind: SecretStore
target:
name: app-config-secret
data:
- secretKey: database_password
remoteRef:
key: secret/dev/app-config
property: database_password
EOF
Getting Help#
Internal Resources#
- DevOps Team Slack: #devops-team
- Documentation: https://docs.company.com/devops
- Runbooks: ~/devops-infrastructure/docs/
- Architecture Diagrams: ~/devops-infrastructure/docs/architecture/
Emergency Contacts#
- On-call Engineer: +90-XXX-XXX-XXXX
- DevOps Team Lead: +90-XXX-XXX-XXXX
- Security Team: security@company.com
External Resources#
- Kubernetes Documentation: https://kubernetes.io/docs/
- AWS EKS Guide: https://docs.aws.amazon.com/eks/
- ArgoCD Documentation: https://argo-cd.readthedocs.io/
- Prometheus Documentation: https://prometheus.io/docs/ EOF
### 📊 **12.2 Automated Reporting** ```bash # 12.2.1 Infrastructure health report script cat > ~/devops-infrastructure/scripts/health-report.sh << 'EOF' #!/bin/bash # Infrastructure Health Report Generator set -e REPORT_DATE=$(date +"%Y-%m-%d") REPORT_FILE="/tmp/infrastructure-health-report-$REPORT_DATE.md" cat > $REPORT_FILE << EOF # Infrastructure Health Report - $REPORT_DATE ## Executive Summary Generated at: $(date) Report Period: Last 24 hours ## Cluster Health ### Node Status \`\`\` $(kubectl get nodes -o wide) \`\`\` ### Resource Utilization \`\`\` $(kubectl top nodes) \`\`\` ### Pod Status Summary \`\`\` $(kubectl get pods --all-namespaces | grep -E "(Running|Pending|Failed|Error)" | awk '{print $4}' | sort | uniq -c) \`\`\` ## Application Health ### Deployment Status \`\`\` $(kubectl get deployments --all-namespaces) \`\`\` ### Failed Pods (if any) \`\`\` $(kubectl get pods --all-namespaces --field-selector=status.phase=Failed) \`\`\` ### HPA Status \`\`\` $(kubectl get hpa --all-namespaces) \`\`\` ## Security Status ### Pod Security Policy Violations \`\`\` $(kubectl get events --all-namespaces | grep -i "security\|policy" | head -10) \`\`\` ### Certificate Status \`\`\` $(kubectl get certificates --all-namespaces) \`\`\` ## Cost Summary ### Resource Requests vs Limits \`\`\` $(kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.status.phase=="Running") | "\(.metadata.namespace)/\(.metadata.name): CPU Req: \(.spec.containers[0].resources.requests.cpu // "none"), Mem Req: \(.spec.containers[0].resources.requests.memory // "none")"') \`\`\` ## Backup Status ### Velero Backup Status \`\`\` $(velero backup get | head -10) \`\`\` ### Latest Backup Results \`\`\` $(velero backup describe $(velero backup get -o name | head -1 | cut -d'/' -f2) | grep -E "(Status|Started|Completed)") \`\`\` ## Alerts Summary ### Active Alerts (Last 24h) \`\`\` $(curl -s "http://kube-prometheus-stack-alertmanager.monitoring.svc.cluster.local:9093/api/v1/alerts" | jq -r '.data[] | select(.status.state=="firing") | "\(.labels.alertname): \(.labels.severity)"' | sort | uniq -c) \`\`\` ## Performance Metrics ### Top Resource Consuming Pods \`\`\` $(kubectl top pods --all-namespaces --sort-by=cpu | head -10) \`\`\` ## Recommendations EOF # Add recommendations based on findings echo "### Current Issues" >> $REPORT_FILE # Check for pods without resource limits NO_LIMITS=$(kubectl get pods --all-namespaces -o json | jq -r '.items[] | select(.status.phase=="Running") | select(.spec.containers[0].resources.limits == null) | "\(.metadata.namespace)/\(.metadata.name)"' | wc -l) if [ $NO_LIMITS -gt 0 ]; then echo "- $NO_LIMITS pods running without resource limits" >> $REPORT_FILE fi # Check for high CPU usage HIGH_CPU_NODES=$(kubectl top nodes --no-headers | awk '$3 > 80 {count++} END {print count+0}') if [ $HIGH_CPU_NODES -gt 0 ]; then echo "- $HIGH_CPU_NODES nodes with high CPU usage (>80%)" >> $REPORT_FILE fi # Check for failed pods FAILED_PODS=$(kubectl get pods --all-namespaces --field-selector=status.phase=Failed --no-headers | wc -l) if [ $FAILED_PODS -gt 0 ]; then echo "- $FAILED_PODS failed pods need investigation" >> $REPORT_FILE fi echo "" >> $REPORT_FILE echo "### Optimization Opportunities" >> $REPORT_FILE echo "- Review VPA recommendations for resource optimization" >> $REPORT_FILE echo "- Consider implementing HPA for variable workloads" >> $REPORT_FILE echo "- Evaluate spot instance usage for cost savings" >> $REPORT_FILE echo "Report generated: $REPORT_FILE" # Send to Slack if webhook configured if [ ! -z "$SLACK_WEBHOOK_URL" ]; then curl -X POST -H 'Content-type: application/json' \ --data "{\"text\":\"📊 Daily Infrastructure Health Report generated for $REPORT_DATE\"}" \ $SLACK_WEBHOOK_URL fi EOF chmod +x ~/devops-infrastructure/scripts/health-report.sh # 12.2.2 Automated health report CronJob cat > health-report-cronjob.yaml << 'EOF' apiVersion: batch/v1 kind: CronJob metadata: name: infrastructure-health-report namespace: monitoring spec: schedule: "0 8 * * *" # Daily at 8 AM jobTemplate: spec: template: spec: serviceAccountName: health-reporter containers: - name: reporter image: bitnami/kubectl:latest command: - /bin/bash - -c - | # Install required tools apt-get update && apt-get install -y curl jq # Generate report /scripts/health-report.sh # Upload to S3 if configured if [ ! -z "$S3_BUCKET" ]; then aws s3 cp /tmp/infrastructure-health-report-*.md s3://$S3_BUCKET/reports/ fi env: - name: S3_BUCKET value: "mycompany-reports" - name: SLACK_WEBHOOK_URL valueFrom: secretKeyRef: name: slack-webhook key: url volumeMounts: - name: scripts mountPath: /scripts resources: requests: memory: "128Mi" cpu: "100m" limits: memory: "256Mi" cpu: "200m" volumes: - name: scripts configMap: name: health-report-scripts defaultMode: 0755 restartPolicy: OnFailure --- apiVersion: v1 kind: ServiceAccount metadata: name: health-reporter namespace: monitoring annotations: eks.amazonaws.com/role-arn: arn:aws:iam::ACCOUNT_ID:role/health-reporter-role --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRole metadata: name: health-reporter rules: - apiGroups: [""] resources: ["nodes", "pods", "services", "events"] verbs: ["get", "list"] - apiGroups: ["apps"] resources: ["deployments", "replicasets"] verbs: ["get", "list"] - apiGroups: ["autoscaling"] resources: ["horizontalpodautoscalers"] verbs: ["get", "list"] - apiGroups: ["metrics.k8s.io"] resources: ["nodes", "pods"] verbs: ["get", "list"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: ClusterRoleBinding metadata: name: health-reporter roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: health-reporter subjects: - kind: ServiceAccount name: health-reporter namespace: monitoring EOF # ConfigMap for scripts kubectl create configmap health-report-scripts \ --from-file=health-report.sh=~/devops-infrastructure/scripts/health-report.sh \ -n monitoring kubectl apply -f health-report-cronjob.yaml
🎓 12.3 Training and Knowledge Transfer#
# 12.3.1 Training curriculum
cat > ~/devops-infrastructure/docs/training-curriculum.md << 'EOF'
# DevOps Team Training Curriculum
## Week 1: Fundamentals
### Day 1-2: Kubernetes Basics
- **Topics**: Pods, Services, Deployments, ConfigMaps, Secrets
- **Hands-on**: Deploy sample application
- **Assessment**: Create multi-tier application deployment
### Day 3-4: Infrastructure as Code
- **Topics**: Terraform basics, AWS resources, State management
- **Hands-on**: Create VPC and EKS cluster
- **Assessment**: Deploy complete infrastructure
### Day 5: CI/CD Fundamentals
- **Topics**: Jenkins, Pipeline as Code, Docker
- **Hands-on**: Create build pipeline
- **Assessment**: End-to-end deployment pipeline
## Week 2: Advanced Topics
### Day 1-2: GitOps and Progressive Delivery
- **Topics**: ArgoCD, Argo Rollouts, Canary deployments
- **Hands-on**: Setup GitOps workflow
- **Assessment**: Implement progressive delivery
### Day 3: Monitoring and Observability
- **Topics**: Prometheus, Grafana, Jaeger, Log analysis
- **Hands-on**: Create custom dashboards
- **Assessment**: End-to-end observability setup
### Day 4: Security Best Practices
- **Topics**: Vault, RBAC, Network Policies, Image scanning
- **Hands-on**: Implement security controls
- **Assessment**: Security audit and remediation
### Day 5: Troubleshooting and Operations
- **Topics**: Debugging techniques, Performance tuning, Incident response
- **Hands-on**: Simulate and resolve incidents
- **Assessment**: Handle real-world scenarios
## Ongoing Learning
### Monthly Topics
- **Month 1**: Cost optimization and resource management
- **Month 2**: Advanced networking and service mesh
- **Month 3**: Disaster recovery and backup strategies
- **Month 4**: Chaos engineering and reliability
- **Month 5**: Multi-cluster and multi-cloud strategies
- **Month 6**: Advanced security and compliance
### Certification Paths
1. **AWS Certified DevOps Engineer**
2. **Certified Kubernetes Administrator (CKA)**
3. **Certified Kubernetes Security Specialist (CKS)**
4. **HashiCorp Certified: Terraform Associate**
## Lab Exercises
### Exercise 1: Application Deployment
```bash
# Deploy sample application with monitoring
kubectl apply -f - <<EOF
apiVersion: apps/v1
kind: Deployment
metadata:
name: training-app
namespace: training
spec:
replicas: 3
selector:
matchLabels:
app: training-app
template:
metadata:
labels:
app: training-app
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
containers:
- name: app
image: nginx:alpine
ports:
- containerPort: 8080
resources:
requests:
memory: "64Mi"
cpu: "50m"
limits:
memory: "128Mi"
cpu: "100m"
EOF
Exercise 2: Troubleshooting Scenarios#
- Pod CrashLoopBackOff
- Service discovery issues
- Resource exhaustion
- Network connectivity problems
- Storage mounting failures
Exercise 3: Performance Testing#
# Setup load testing
kubectl apply -f - <<EOF
apiVersion: k6.io/v1alpha1
kind: K6
metadata:
name: training-load-test
namespace: training
spec:
parallelism: 2
script:
configMap:
name: training-scripts
file: basic-test.js
EOF
Knowledge Check Questions#
Kubernetes#
- Explain the difference between Deployment and StatefulSet
- How do you troubleshoot a pod stuck in Pending state?
- What are the different types of Services in Kubernetes?
- How do resource requests and limits work?
Infrastructure#
- Explain Terraform state management
- How do you handle secrets in infrastructure code?
- What are the best practices for AWS resource tagging?
- How do you implement blue-green deployments?
Monitoring#
- What are the four golden signals of monitoring?
- How do you set up custom metrics in Prometheus?
- Explain the difference between metrics, logs, and traces
- How do you create effective alerting rules?
Security#
- What is the principle of least privilege?
- How do you implement network segmentation in Kubernetes?
- Explain the role of service accounts and RBAC
- What are the security best practices for container images?
Practical Assessments#
Assessment 1: Deploy Production-Ready Application#
- Set up complete infrastructure with Terraform
- Deploy application with proper security context
- Implement monitoring and alerting
- Set up automated backups
- Document deployment process
Assessment 2: Incident Response Simulation#
- Scenario: Database connectivity issues
- Task: Diagnose and resolve the problem
- Evaluation: Time to resolution, troubleshooting approach
- Documentation: Post-incident report
Assessment 3: Performance Optimization#
- Given: Application with performance issues
- Task: Identify bottlenecks and optimize
- Tools: Use monitoring data and profiling
- Deliverable: Performance improvement plan
Resources#
Documentation#
Training Platforms#
Books#
- "Kubernetes in Action" by Marko Lukša
- "Terraform: Up & Running" by Yevgeniy Brikman
- "Site Reliability Engineering" by Google
- "The DevOps Handbook" by Gene Kim EOF
12.3.2 Knowledge base setup#
cat > ~/devops-infrastructure/scripts/setup-knowledge-base.sh << 'EOF'
!/bin/bash#
Knowledge Base Setup Script#
set -e
echo "📚 Setting up team knowledge base..."
Create knowledge base structure#
mkdir -p ~/devops-infrastructure/docs/{architecture,runbooks,tutorials,troubleshooting,best-practices}
Architecture documentation#
echo "Creating architecture documentation..." cat > ~/devops-infrastructure/docs/architecture/README.md << 'ARCH_EOF'
Architecture Documentation#
Overview#
This directory contains all architecture-related documentation.
Contents#
system-overview.md- High-level system architecturedata-flow.md- Data flow diagrams and explanationssecurity-architecture.md- Security design and controlsnetworking.md- Network architecture and routingdisaster-recovery.md- DR architecture and procedures
Diagrams#
All diagrams are created using Mermaid and can be viewed in GitHub or VS Code with the Mermaid extension. ARCH_EOF
Runbooks directory#
echo "Creating runbooks..." cat > ~/devops-infrastructure/docs/runbooks/README.md << 'RUN_EOF'
Operational Runbooks#
Purpose#
Step-by-step procedures for common operational tasks and incident response.
Runbook Categories#
incident-response/- Emergency response proceduresmaintenance/- Scheduled maintenance proceduresdeployment/- Deployment and rollback proceduresmonitoring/- Monitoring and alerting procedures
Runbook Template#
Each runbook should include: 1. Purpose and scope 2. Prerequisites 3. Step-by-step procedures 4. Verification steps 5. Rollback procedures 6. Post-completion tasks RUN_EOF
Create searchable index#
echo "Creating searchable documentation index..." cat > ~/devops-infrastructure/scripts/generate-docs-index.sh << 'INDEX_EOF'
!/bin/bash#
Generate searchable documentation index#
echo "# Documentation Index" > ~/devops-infrastructure/docs/INDEX.md echo "Generated on: $(date)" >> ~/devops-infrastructure/docs/INDEX.md echo "" >> ~/devops-infrastructure/docs/INDEX.md
find ~/devops-infrastructure/docs -name ".md" -not -name "INDEX.md" | while read file; do echo "## \((basename "\)file" .md)" >> ~/devops-infrastructure/docs/INDEX.md echo "Path:* \(file" >> ~/devops-infrastructure/docs/INDEX.md echo "" >> ~/devops-infrastructure/docs/INDEX.md # Extract first paragraph as summary head -10 "\)file" | grep -E "^[A-Za-z]" | head -1 >> ~/devops-infrastructure/docs/INDEX.md echo "" >> ~/devops-infrastructure/docs/INDEX.md done
echo "Documentation index generated!" INDEX_EOF
chmod +x ~/devops-infrastructure/scripts/generate-docs-index.sh
echo "✅ Knowledge base structure created!" echo "Run ~/devops-infrastructure/scripts/generate-docs-index.sh to create searchable index" EOF
chmod +x ~/devops-infrastructure/scripts/setup-knowledge-base.sh ./~/devops-infrastructure/scripts/setup-knowledge-base.sh
---
## 🎉 **FINAL SETUP AND VALIDATION** (Gün 27-28)
### ✅ **13.1 End-to-End Testing**
```bash
# 13.1.1 Complete system validation script
cat > ~/devops-infrastructure/scripts/system-validation.sh << 'EOF'
#!/bin/bash
# Complete System Validation Script
set -e
echo "🧪 Starting End-to-End System Validation..."
echo "=========================================="
# Colors for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m' # No Color
SUCCESS_COUNT=0
TOTAL_TESTS=0
check_test() {
local test_name="$1"
local test_command="$2"
TOTAL_TESTS=$((TOTAL_TESTS + 1))
echo -n "Testing $test_name... "
if eval "$test_command" &>/dev/null; then
echo -e "${GREEN}✓ PASS${NC}"
SUCCESS_COUNT=$((SUCCESS_COUNT + 1))
return 0
else
echo -e "${RED}✗ FAIL${NC}"
return 1
fi
}
echo "🔧 Infrastructure Tests"
echo "----------------------"
# AWS connectivity
check_test "AWS CLI access" "aws sts get-caller-identity"
# Terraform state
check_test "Terraform state accessible" "terraform show -json > /dev/null" || true
# EKS cluster
check_test "EKS cluster connectivity" "kubectl get nodes"
# Core system pods
check_test "CoreDNS running" "kubectl get pods -n kube-system -l k8s-app=kube-dns | grep Running"
check_test "AWS Load Balancer Controller" "kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-load-balancer-controller | grep Running"
echo ""
echo "📊 Monitoring Stack Tests"
echo "-------------------------"
# Prometheus
check_test "Prometheus accessible" "kubectl get pods -n monitoring -l app.kubernetes.io/name=prometheus | grep Running"
# Grafana
check_test "Grafana accessible" "kubectl get pods -n monitoring -l app.kubernetes.io/name=grafana | grep Running"
# AlertManager
check_test "AlertManager accessible" "kubectl get pods -n monitoring -l app.kubernetes.io/name=alertmanager | grep Running"
echo ""
echo "📝 Logging Stack Tests"
echo "----------------------"
# Fluent Bit
check_test "Fluent Bit running" "kubectl get pods -n logging -l app.kubernetes.io/name=fluent-bit | grep Running"
# OpenSearch
check_test "OpenSearch cluster healthy" "kubectl get pods -n logging -l app=opensearch | grep Running"
echo ""
echo "🔒 Security Tests"
echo "----------------"
# Vault
check_test "Vault cluster running" "kubectl get pods -n vault -l app.kubernetes.io/name=vault | grep Running"
# External Secrets Operator
check_test "External Secrets Operator" "kubectl get pods -n external-secrets | grep Running"
# Falco
check_test "Falco security monitoring" "kubectl get pods -n falco -l app.kubernetes.io/name=falco | grep Running"
echo ""
echo "🔄 GitOps Tests"
echo "---------------"
# ArgoCD
check_test "ArgoCD server running" "kubectl get pods -n argocd -l app.kubernetes.io/name=argocd-server | grep Running"
# ArgoCD applications
check_test "ArgoCD applications synced" "argocd app list | grep -E 'Synced.*Healthy'"
echo ""
echo "💾 Backup Tests"
echo "---------------"
# Velero
check_test "Velero backup controller" "kubectl get pods -n velero -l app.kubernetes.io/name=velero | grep Running"
# Recent backup
check_test "Recent backup exists" "velero backup get | grep Completed | head -1"
echo ""
echo "🚀 Application Tests"
echo "--------------------"
# Sample application
check_test "Sample application running" "kubectl get pods -n dev -l app=sample-app | grep Running" || true
# Ingress connectivity
check_test "Ingress controller responsive" "kubectl get pods -n ingress-nginx -l app.kubernetes.io/name=ingress-nginx | grep Running"
echo ""
echo "📈 Performance Tests"
echo "-------------------"
# HPA
check_test "HPA controllers active" "kubectl get hpa --all-namespaces | grep -v TARGETS" || true
# VPA
check_test "VPA recommendations available" "kubectl get vpa --all-namespaces" || true
# Resource usage
check_test "Node resource usage healthy" "kubectl top nodes --no-headers | awk '\$3+0 < 90 && \$5+0 < 90' | wc -l | grep -v '^0-vpa
namespace: monitoring
spec:
targetRef:
apiVersion: apps/v1
kind: StatefulSet"
echo ""
echo "🌐 Network Tests"
echo "---------------"
# CoreDNS resolution
check_test "DNS resolution working" "kubectl exec -n kube-system deployments/coredns -- nslookup kubernetes.default.svc.cluster.local"
# Pod-to-pod communication
check_test "Inter-pod communication" "kubectl run network-test --image=busybox --rm -it --restart=Never -- nslookup kubernetes.default" || true
echo ""
echo "🔐 Certificate Tests"
echo "-------------------"
# Cert-manager
check_test "Cert-manager running" "kubectl get pods -n cert-manager | grep Running"
# Certificate issuers
check_test "Certificate issuers ready" "kubectl get clusterissuers | grep True"
# Valid certificates
check_test "TLS certificates valid" "kubectl get certificates --all-namespaces | grep True" || true
echo ""
echo "📊 Cost Monitoring Tests"
echo "------------------------"
# Kubecost
check_test "Kubecost running" "kubectl get pods -n kubecost | grep Running" || true
echo ""
echo "🔍 Observability Tests"
echo "----------------------"
# Jaeger
check_test "Jaeger tracing available" "kubectl get pods -n observability -l app.kubernetes.io/name=jaeger | grep Running" || true
# OpenTelemetry
check_test "OpenTelemetry collector" "kubectl get pods -n observability -l app.kubernetes.io/name=opentelemetry-collector | grep Running" || true
echo ""
echo "================================================"
echo "🎯 VALIDATION SUMMARY"
echo "================================================"
echo "Total Tests: $TOTAL_TESTS"
echo "Passed: $SUCCESS_COUNT"
echo "Failed: $((TOTAL_TESTS - SUCCESS_COUNT))"
if [ $SUCCESS_COUNT -eq $TOTAL_TESTS ]; then
echo -e "${GREEN}🎉 ALL TESTS PASSED! System is fully operational.${NC}"
exit 0
elif [ $SUCCESS_COUNT -gt $((TOTAL_TESTS * 80 / 100)) ]; then
echo -e "${YELLOW}⚠️ Most tests passed. Minor issues detected.${NC}"
exit 0
else
echo -e "${RED}❌ Critical issues detected. System requires attention.${NC}"
exit 1
fi
EOF
chmod +x ~/devops-infrastructure/scripts/system-validation.sh
# 13.1.2 Automated testing pipeline
cat > ~/devops-infrastructure/jenkins/system-validation-pipeline.groovy << 'EOF'
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: kubectl
image: bitnami/kubectl:latest
command:
- cat
tty: true
- name: argocd
image: argoproj/argocd:latest
command:
- cat
tty: true
- name: velero
image: velero/velero:latest
command:
- cat
tty: true
"""
}
}
triggers {
cron('0 6 * * *') // Daily at 6 AM
}
stages {
stage('System Validation') {
steps {
container('kubectl') {
script {
sh '''
# Copy validation script
curl -fsSL https://raw.githubusercontent.com/yourusername/devops-infrastructure/main/scripts/system-validation.sh -o validation.sh
chmod +x validation.sh
# Run validation
./validation.sh
'''
}
}
}
}
stage('Generate Report') {
steps {
container('kubectl') {
sh '''
# Generate detailed report
echo "# System Health Report - $(date)" > system-report.md
echo "" >> system-report.md
echo "## Cluster Overview" >> system-report.md
echo "\`\`\`" >> system-report.md
kubectl get nodes -o wide >> system-report.md
echo "\`\`\`" >> system-report.md
echo "## Pod Status" >> system-report.md
echo "\`\`\`" >> system-report.md
kubectl get pods --all-namespaces | grep -v Running | head -20 >> system-report.md
echo "\`\`\`" >> system-report.md
echo "## Resource Usage" >> system-report.md
echo "\`\`\`" >> system-report.md
kubectl top nodes >> system-report.md
echo "\`\`\`" >> system-report.md
# Archive report
cat system-report.md
'''
}
}
}
}
post {
success {
slackSend(
channel: '#infrastructure',
color: 'good',
message: "✅ Daily system validation completed successfully"
)
}
failure {
slackSend(
channel: '#infrastructure',
color: 'danger',
message: "❌ Daily system validation failed. Immediate attention required!"
)
}
always {
archiveArtifacts artifacts: '*.md', allowEmptyArchive: true
}
}
}
EOF
# 13.1.3 Çalıştır
~/devops-infrastructure/scripts/system-validation.sh
📚 13.2 Final Documentation#
# 13.2.1 Complete setup summary
cat > ~/devops-infrastructure/README.md << 'EOF'
# DevOps Infrastructure - Complete Setup
🎉 **Congratulations!** You have successfully deployed a production-ready DevOps infrastructure.
## 🏗️ What We've Built
### Infrastructure Components
- ✅ **AWS EKS Cluster** - Managed Kubernetes with auto-scaling
- ✅ **VPC & Networking** - Multi-AZ setup with security groups
- ✅ **RDS PostgreSQL** - Managed database with backups
- ✅ **ElastiCache Redis** - In-memory caching
- ✅ **Application Load Balancer** - SSL termination and routing
### CI/CD Pipeline
- ✅ **Jenkins** - Automated build and deployment
- ✅ **ArgoCD** - GitOps continuous deployment
- ✅ **GitHub Container Registry** - Container image storage
- ✅ **Progressive Delivery** - Canary and blue-green deployments
### Monitoring & Observability
- ✅ **Prometheus** - Metrics collection and storage
- ✅ **Grafana** - Visualization and dashboards
- ✅ **AlertManager** - Intelligent alerting
- ✅ **Jaeger** - Distributed tracing
- ✅ **OpenSearch** - Log aggregation and search
- ✅ **Fluent Bit** - Log collection
### Security
- ✅ **HashiCorp Vault** - Secrets management
- ✅ **External Secrets Operator** - Kubernetes-Vault integration
- ✅ **Falco** - Runtime security monitoring
- ✅ **OPA Gatekeeper** - Policy enforcement
- ✅ **Network Policies** - Micro-segmentation
- ✅ **Pod Security Standards** - Container security
### Backup & DR
- ✅ **Velero** - Kubernetes backup and restore
- ✅ **RDS Automated Backups** - Database recovery
- ✅ **Cross-region Replication** - Disaster recovery
- ✅ **Automated Testing** - DR drill automation
### Cost Optimization
- ✅ **Kubecost** - Kubernetes cost visibility
- ✅ **VPA/HPA** - Resource optimization
- ✅ **Spot Instances** - Cost-effective compute
- ✅ **Resource Quotas** - Spend control
## 🚀 Access URLs
| Service | URL | Purpose |
|---------|-----|---------|
| ArgoCD | https://argocd.yourdomain.com | GitOps Management |
| Grafana | https://grafana.yourdomain.com | Monitoring Dashboards |
| Jaeger | https://jaeger.yourdomain.com | Distributed Tracing |
| OpenSearch | https://logs.yourdomain.com | Log Analysis |
| Vault | https://vault.yourdomain.com | Secrets Management |
| Jenkins | https://jenkins.yourdomain.com | CI/CD Pipelines |
| Kubecost | https://kubecost.yourdomain.com | Cost Analytics |
## 🔑 Default Credentials
```bash
# ArgoCD
Username: admin
Password: $(kubectl -n argocd get secret argocd-initial-admin-secret -o jsonpath="{.data.password}" | base64 -d)
# Grafana
Username: admin
Password: AdminPassword123!
# Vault Root Token
Token: $(cat cluster-keys.json | jq -r ".root_token")
📊 System Overview#
# Check overall system health
kubectl get nodes
kubectl get pods --all-namespaces | grep -v Running
# Monitor resource usage
kubectl top nodes
kubectl top pods --all-namespaces --sort-by=cpu
# Check applications
argocd app list
helm list --all-namespaces
🛠️ Common Operations#
Deploy New Application#
# 1. Add application manifests to GitOps repo
cd gitops-config/applications/dev
# Create your application YAML files
# 2. Commit and push
git add .
git commit -m "Add new application"
git push origin main
# 3. ArgoCD will automatically sync
argocd app sync <app-name>
Scale Applications#
# Manual scaling
kubectl scale deployment <app-name> --replicas=5 -n <namespace>
# Auto-scaling with HPA
kubectl autoscale deployment <app-name> --cpu-percent=70 --min=2 --max=10 -n <namespace>
Check Logs#
# Pod logs
kubectl logs <pod-name> -n <namespace> --tail=100
# Application logs in OpenSearch
# Visit: https://logs.yourdomain.com
# Query: kubernetes.namespace_name:"dev" AND kubernetes.labels.app:"your-app"
Monitor Performance#
# Real-time metrics
kubectl top pods -n <namespace>
# Grafana dashboards
# Visit: https://grafana.yourdomain.com
# Check: Kubernetes Cluster Overview dashboard
Backup and Restore#
# Create backup
velero backup create <backup-name> --include-namespaces <namespace>
# Restore from backup
velero restore create <restore-name> --from-backup <backup-name>
# Check backup status
velero backup describe <backup-name>
🚨 Troubleshooting#
Pod Issues#
# Pod not starting
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
# Resource issues
kubectl top pods -n <namespace>
kubectl describe node <node-name>
Network Issues#
# DNS resolution
kubectl exec -it <pod-name> -n <namespace> -- nslookup kubernetes.default
# Service connectivity
kubectl exec -it <pod-name> -n <namespace> -- curl <service-name>.<namespace>.svc.cluster.local
Storage Issues#
# PVC status
kubectl get pvc -n <namespace>
kubectl describe pvc <pvc-name> -n <namespace>
# Storage classes
kubectl get storageclass
📈 Performance Optimization#
Resource Right-sizing#
# Check VPA recommendations
kubectl get vpa --all-namespaces
# Apply VPA recommendations
kubectl patch deployment <app-name> -n <namespace> --patch '
{
"spec": {
"template": {
"spec": {
"containers": [
{
"name": "<container-name>",
"resources": {
"requests": {
"cpu": "<recommended-cpu>",
"memory": "<recommended-memory>"
}
}
}
]
}
}
}
}'
Cost Optimization#
# Check cost recommendations
# Visit: https://kubecost.yourdomain.com
# Use spot instances for development
kubectl taint node <node-name> spot=true:NoSchedule
# Implement resource quotas
kubectl apply -f resource-quotas.yaml
🔒 Security Best Practices#
Regular Security Tasks#
# Update base images regularly
docker pull nginx:alpine
docker tag nginx:alpine ghcr.io/yourusername/nginx:latest
docker push ghcr.io/yourusername/nginx:latest
# Scan for vulnerabilities
trivy image <image-name>
# Check for policy violations
kubectl get events --all-namespaces | grep -i policy
# Review Falco alerts
kubectl logs -l app.kubernetes.io/name=falco -n falco
Certificate Management#
# Check certificate status
kubectl get certificates --all-namespaces
# Force certificate renewal
kubectl annotate certificate <cert-name> -n <namespace> \
cert-manager.io/issue-temporary-certificate="true"
📚 Additional Resources#
Documentation#
Monitoring#
Security#
🆘 Support and Contacts#
Internal Support#
- DevOps Team: #devops-team (Slack)
- On-call Engineer: +90-XXX-XXX-XXXX
- Documentation:
~/devops-infrastructure/docs/
Emergency Procedures#
- Production Down: Follow incident response runbook
- Security Incident: Contact security team immediately
- Data Loss: Initiate disaster recovery procedures
🎉 Congratulations!#
You now have a production-ready, enterprise-grade DevOps infrastructure that includes:
✅ Automated Infrastructure - Everything as code
✅ Continuous Deployment - GitOps workflow
✅ Comprehensive Monitoring - Full observability stack
✅ Enterprise Security - Multi-layer security controls
✅ Disaster Recovery - Automated backup and restore
✅ Cost Optimization - Resource efficiency and cost visibility
✅ Performance Management - Auto-scaling and optimization
✅ Team Processes - Documentation and runbooks
Your infrastructure is ready to support modern application development and deployment at scale! 🚀
Generated on: $(date)
Infrastructure Version: v1.0.0
Last Updated: $(date '+%Y-%m-%d %H:%M:%S') EOF
13.2.2 Quick start guide#
cat > ~/devops-infrastructure/QUICKSTART.md << 'EOF'
🚀 Quick Start Guide#
Prerequisites Checklist#
- AWS Account with administrative access
- Domain name for services (yourdomain.com)
- GitHub account for repositories
- Slack workspace for notifications
- Local development environment setup
30-Minute Setup#
Step 1: Initial Setup (5 minutes)#
# Clone repository
git clone https://github.com/yourusername/devops-infrastructure.git
cd devops-infrastructure
# Run automated setup
./scripts/quick-setup.sh
Step 2: Infrastructure Deployment (15 minutes)#
# Deploy AWS infrastructure
cd terraform/environments/dev
terraform init -backend-config=backend.conf
terraform plan
terraform apply -auto-approve
# Configure kubectl
aws eks update-kubeconfig --region eu-west-1 --name mycompany-dev-eks
Step 3: Application Deployment (10 minutes)#
# Deploy monitoring stack
helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack \
--namespace monitoring --create-namespace --values monitoring-values.yaml
# Deploy ArgoCD
kubectl apply -n argocd -f https://raw.githubusercontent.com/argoproj/argo-cd/stable/manifests/install.yaml
# Deploy root application
kubectl apply -f bootstrap/root-app.yaml
Verification#
# Check cluster health
kubectl get nodes
kubectl get pods --all-namespaces
# Access services
echo "ArgoCD: https://argocd.yourdomain.com"
echo "Grafana: https://grafana.yourdomain.com"
echo "Applications ready! 🎉"
Next Steps#
- Configure DNS - Point your domain to the load balancer
- Setup Certificates - Configure SSL/TLS certificates
- Deploy Applications - Add your applications to GitOps
- Configure Monitoring - Set up dashboards and alerts
- Train Team - Share access and documentation
Need Help?#
- 📖 Full Documentation: README.md
- 🔧 Troubleshooting: docs/troubleshooting.md
- 💬 Support: Contact DevOps team
Happy deploying! 🚀🚀🚀
echo ""
echo "🎉 ============================================"
echo "🎉 DEVOPS INFRASTRUCTURE SETUP COMPLETE!"
echo "🎉 ============================================"
echo ""
echo "📊 Summary:"
echo "✅ Infrastructure as Code (Terraform)"
echo "✅ Kubernetes Cluster (EKS)"
echo "✅ CI/CD Pipeline (Jenkins + ArgoCD)"
echo "✅ Monitoring Stack (Prometheus + Grafana)"
echo "✅ Logging Stack (OpenSearch + Fluent Bit)"
echo "✅ Security Layer (Vault + Falco + OPA)"
echo "✅ Backup & DR (Velero + RDS Backups)"
echo "✅ Cost Optimization (Kubecost + VPA/HPA)"
echo "✅ Documentation & Runbooks"
echo ""
echo "🔗 Access URLs:"
echo "• ArgoCD: https://argocd.yourdomain.com"
echo "• Grafana: https://grafana.yourdomain.com"
echo "• Jenkins: https://jenkins.yourdomain.com"
echo "• Vault: https://vault.yourdomain.com"
echo ""
echo "📚 Next Steps:"
echo "1. Run system validation: ./scripts/system-validation.sh"
echo "2. Configure your domain DNS"
echo "3. Deploy your first application"
echo "4. Train your team with provided documentation"
echo ""
echo "🎯 Your enterprise-grade DevOps infrastructure is ready!"
echo " Happy DevOps! 🚀🚀🚀"
Bu kapsamlı implementation guide ile sıfırdan başlayarak 28 gün içinde tam işlevsel, production-ready bir DevOps altyapısı kurabilirsiniz. Her adım detaylı komutlar, konfigürasyonlar ve best practice'ler içerir.