Job Description
Key Technical Skills Required:
- 10+ years of Cloud Infrastructure experience (Azure or Google Cloud).
- Experience within the financial industry and/or a highly regulated technology organization.
- Expertise in building Cloud services & components using IAC (Infrastructure as Code)
- Expertise in building, deploying, and managing cloud services such as Azure APIM, managed Kubernetes services like AKS or GKE, Storage, Azure Function Apps/Google Cloud Functions, data processing services like DAG, Airflow, Google Composer etc. with a focus on High Availability and Scalability.
- Expertise in setting up Cloud Monitoring and management tools, experience with incident response and troubleshooting, scripting, and automation skills.
- Experienced in implementing information security best practices – using security tools such as MS Defender, CrowdStrike, Wiz, Snyk etc.
- Experience with Observability/Monitoring technologies like Splunk, CloudWatch, Azure Monitoring etc.
- Excellent communication skills.
- Adaptable, innovative and stays current in Cloud technologies.
- Candidates with relevant certifications such as Certified Kubernetes Administrator, are preferred.
Roles and Responsibilities:
- Build and manage cloud infrastructure, services and applications using automation.
- Build, deploy and manage Highly Available and Scalable cloud services such as Azure APIM, managed Kubernetes services like AKS or GKE, Storage, Azure Function Apps/Google Cloud Functions, data processing services like DAG, Airflow, Google Composer etc.
- Ensure continuous monitoring of cloud resources, respond to incidents and outages adhering to SLA(s), automating operational tasks, managing backups and disaster recovery plans, and ensuring high availability.
- Manage, maintain, and monitor different infrastructure components (servers, network, applications).
- Provide hands-on technical expertise during service impacting events.
- Collaborate with other engineers on code reviews, internal infrastructure improvements and process enhancements and measure, tune and optimize system performance.
- Adhere to security and regulatory best practices involving cloud deployments – implement recommendations from security tools such as MS Defender, CrowdStrike, Wiz, Snyk etc.
- Troubleshoot and remediate issues impacting the operation of the overall infrastructure.
- Automate routine systems tasks and processes using PowerShell or similar tools.
- Maintain security, backup, and redundancy of the enterprise environment.
- Build and maintain “playbooks” and other supporting documentation.
- Collaborate with team members to perform regular system audit and security reviews.
- Managing projects/processes, working independently with limited supervision.
- Participate in periodic 24x7 on-call duties.
- Other duties as assigned.
Job Tags