Description
Key Responsibilities:
- Demonstrated expertise in DevOps methodologies, fostering a culture of collaboration, automation, and continuous delivery.
- Should be able to help team members with the expertise on technology stack.
- Design and implement scalable, multi-cloud infrastructure solutions across AWS, GCP, and other cloud platforms, with a focus on infrastructure-as-code (IaC) using tools like Terraform and Cloud Formation.
- Manage and optimize cloud environments (AWS, GCP) with an emphasis on auto-scaling for dynamic, cost-efficient infrastructure.
- Experience in deploying and managing containerized environments with orchestration tools like EKS, GKE, and other Kubernetes platforms, ensuring resilience and high availability.
- Hands-on experience with Istio, KEDA, karpenter, helm, Argo CD, terraform, or other service mesh solutions to manage microservices traffic, improve security, and streamline observability.
- Deep knowledge of the Linux Operating System, particularly kernel subsystems such as memory, storage, and networking, with expertise in DNS, routing, load balancing, and high availability (HA) strategies.
- Develop and maintain robust CI/CD pipelines using tools like Jenkins, GitHub Actions, and ArgoCD for smooth and automated deployments.
- Proficiency in scripting and development languages like Shell, Python, and Go, applying them to automation, monitoring, and system performance enhancement.
- Proactively implement and enhance monitoring and alerting for infrastructure and applications, driving improved reliability and faster issue resolution
- Troubleshoot and resolve complex production outages and performance issues in cloud infrastructure and application stack.
Required Skills:
- Cloud Platforms: Deep understanding and hands-on experience with AWS and GCP, including infrastructure design, management, and optimization.
- Infrastructure-as-Code (IaC): Proficiency in using Terraform and Cloud Formation to automate infrastructure provisioning, scaling, and management across cloud environments.
- Kubernetes & Containers: Expertise in container orchestration using EKS, GKE, and other Kubernetes platforms for deploying and managing microservices.
- Service Mesh: Practical knowledge of Istio or similar service mesh technologies for managing service-to-service communication, security, and observability.
- MongoDB: Strong experience in MongoDB administration, performance tuning, scaling, and high availability.
- CI/CD Pipelines: Expertise in setting up and maintaining CI/CD pipelines using tools such as Jenkins, GitHub Actions, and ArgoCD for automated deployments and continuous delivery.
- Scripting & Programming: Proficiency in Shell scripting, Python, and Go, with a focus on automation, system optimization, and performance improvements.
- Monitoring & Alerting: Strong skills in setting up and enhancing monitoring and alerting solutions (e.g., Prometheus, Grafana, CloudWatch) for infrastructure, applications, and databases like MongoDB.
- Troubleshooting: Ability to diagnose and resolve infrastructure, database, and application stack issues in real-time, particularly within cloud and multi-cloud environments.
- Networking & Security: In-depth knowledge of DNS, routing, load balancing, and high availability (HA) solutions, with a focus on securing cloud environments and ensuring redundancy.
- Automation & Scalability: Expertise in automating infrastructure tasks and ensuring auto-scaling capabilities for cloud environments to optimize performance and costs.
- Multi-Cloud Management: Experience in managing and optimizing multi-cloud environments with a focus on scalability, disaster recovery, and backup strategies.
- Messaging system: Expertise on distributed systems like Kafka and Pulsar .