*Roles & Responsibilities:*
Perform L2 & L3 level full lifecycle triage for all events on production servers, including incident logging and troubleshooting.
Provide support to end-users and address their issues promptly.
Analyze various metrics and logs to understand the reasons for system failure or non-performance, using different debugging and diagnostic tools.
Work with developers to deploy applications ready for production with runbooks, instructions, and processes.
Ensure smooth rollouts and minimize downtime during deployments.
Document current and future procedures, configurations with Confluence, and wiki.
Collaborate with development teams to improve application performance and reliability through performance tuning, load testing, and code optimization.
Identify and implement performance optimizations to enhance system efficiency.
Participate in on-call rotations, respond to incidents, and perform root cause analysis to prevent recurrence.
Implement monitoring and alerting solutions using Prometheus, Grafana, and ELK Stack, enabling proactive issue detection and reducing mean time to resolution.
Monitor application and infrastructure alerts and react quickly.
Recommend and implement solutions to mitigate repeat product issues.
Take ownership and respond to open support tickets.
Flexibility and a willingness to learn are essential to support evolving technologies and systems.
Identify opportunities for automation to improve efficiency in monitoring, incident response, and routine tasks.
Job Type: Contract
Pay: $40.
00 - $45.
00 per hour
Application Question(s):
* Will you work on W2 ?
Work Location: On the road