Roles and responsibilities
We are seeking a proactive L1 Infrastructure Support Engineer to provide support for hardware, servers, and storage systems. The candidate should have a foundational understanding of IT infrastructure with experience in using key tools like SolarWinds, Dynatrace, ServiceNow, and Nexthink for monitoring and incident management.
Key Responsibilities
- Monitoring & Incident Detection:
- Monitor servers, storage, and other IT infrastructure using tools such as SolarWinds, Dynatrace, and Nexthink.
- Detect and respond to hardware and software alerts promptly.
- Escalate issues to L2/L3 teams following standard procedures.
- Basic Troubleshooting & Incident Management:
- Perform basic troubleshooting for hardware and system issues, using logs and system diagnostics.
- Record incidents in ServiceNow, categorize, and escalate as needed.
- Communicate incident status and resolution steps clearly to stakeholders.
- Infrastructure Maintenance & Support:
- Assist with routine maintenance tasks like updates, backups, and health checks.
- Support hardware and software installations and upgrades under L2 guidance.
- Use Nexthink for deeper endpoint insights and health checks.
- Documentation & Reporting:
- Maintain accurate documentation for infrastructure setups and troubleshooting processes.
- Generate regular reports on infrastructure health and incident metrics.
- Collaboration & Communication:
- Coordinate with L2/L3 teams to resolve issues efficiently.
- Participate in team discussions to support continuous improvement.
Required Qualifications
- Experience: 1-3 years of experience in infrastructure support roles.
- Technical Skills: Basic knowledge of servers, virtualization, and networking.
- Tools: Familiarity with monitoring and support tools like SolarWinds, Dynatrace, Nexthink, and incident management tools like ServiceNow.
- Problem-Solving: Effective troubleshooting skills for infrastructure issues.
- Communication: Clear communication for technical and non-technical discussions.
Preferred Qualifications
- Certifications like CompTIA Server+ or basic infrastructure certifications.
- Experience with virtual environments and cloud-based infrastructure.
Desired candidate profile
-
Infrastructure Management:
- Hardware Support: Ensure the proper functioning and maintenance of physical hardware, including servers, storage devices, and network equipment. This may include diagnosing hardware issues, performing replacements, and managing upgrades.
- Operating Systems and Software Installation: Install, configure, and maintain operating systems (Windows, Linux, Unix, etc.), server software, and relevant patches/updates.
- Cloud Infrastructure: Manage cloud services (AWS, Azure, Google Cloud) and resources, ensuring scalability, security, and cost-effectiveness.
- Network Configuration and Support: Maintain and troubleshoot network components such as routers, switches, firewalls, VPNs, and load balancers. Ensure network availability and security.
-
System Monitoring and Maintenance:
- System Monitoring: Continuously monitor the performance of servers, networks, and cloud infrastructure. Utilize monitoring tools like Nagios, Zabbix, SolarWinds, or cloud-native solutions (e.g., AWS CloudWatch) to detect and address potential issues before they affect operations.
- Performance Optimization: Monitor system resources (CPU, memory, disk usage) and network performance to identify bottlenecks or inefficiencies. Implement changes to optimize performance.
- Routine Maintenance: Perform regular updates, patches, and security scans to maintain the integrity of the infrastructure.
-
Troubleshooting and Incident Management:
- Troubleshooting: Identify, diagnose, and resolve hardware, software, or network-related issues affecting the infrastructure. Troubleshoot issues in a timely manner to minimize downtime and disruption.
- Incident Response: Manage incidents, log them, escalate when necessary, and ensure that proper resolution is achieved. Utilize a ticketing system such as Jira, ServiceNow, or Zendesk to track and resolve incidents.
- Root Cause Analysis: After an incident is resolved, conduct a post-mortem analysis to identify the root cause and ensure preventative measures are implemented to avoid recurrence.
-
Backup and Disaster Recovery:
- Backup Systems: Manage and schedule regular backups of critical infrastructure, ensuring data is protected from loss.
- Disaster Recovery: Design, test, and implement disaster recovery plans to ensure business continuity in case of system failures, natural disasters, or cyber-attacks.
-
Security Management:
- Security Patch Management: Ensure all infrastructure components, including operating systems and network devices, are regularly updated with the latest security patches.
- Access Control: Implement and manage security policies for access control, ensuring that only authorized users have access to sensitive systems and data.
- Vulnerability Management: Conduct vulnerability assessments and work with security teams to mitigate potential risks and vulnerabilities in the infrastructure.
- Firewalls and VPNs: Configure and maintain firewalls and VPNs to secure network traffic and prevent unauthorized access.