Area / Job Type
Cape Town area, South Africa (hybrid & flexi-hours)
Permanent - Fulltime
The Company
Our client specialises in servicing and solving complex operational challenges for large online gaming clients. At their core, they are expert problem-solvers and they work hard to deliver gaming software of the highest quality, every time, for each of their clients. They push the boundaries of gaming design and development, aiming at all times to provide advanced solutions to complex operational challenges.
Overview
Administer the Online Gaming platforms, ensure optimum performance and availability of such platforms, provide third/fourth-level troubleshooting support, and liaise with all applicable service providers and third parties including Network Operations Centre (NOC)/1st/2nd line support. Provide technical support to all roles.
Personal Characteristics
You are a great team player; passionate about technology and your job with a proactive and flexible attitude and an ability to tackle day-to-day tasks and projects with an attitude that enables you to constantly take new skills on board.
Sound fault-finding skills and an ability to react to problematic situations in a focused manner are essential in this role. Some on-call support may be required as 3rd/4th line support in problem situations.
Key Responsibilities
- Partake in the design and implementation of new setups
- Assist in process improvement and automation.
- Test and design new systems/services and their suitability for possible use in production environments
- Proactive monitoring and support of the live and staging infrastructure
- Investigate and react to any live or staging issues that might arise
- Prepare and take part in the periodic release of new software
- Act as an escalation point for issues flagged by customer support
- Handle service requests, incidents, problems, and change requests, using ITIL best practice.
- Assist in process improvement and automation.
- Assist in intensive investigations into problems that would include detailed log and metrics analysis to provide detailed analysis, resolution, and future prevention of such issues
Desirable Skills
- At least 10 years of experience predominantly in Linux-based environments
- All-rounder, with very good IT knowledge and especially in Linux technologies
- Possess excellent communication skills, capable of delivering information to both technical as well as non-technical personnel
- Ability to work autonomously and in a team environment
- Deep knowledge of various monitoring tools
- Excellent problem-solving skills
- Detect/analyse alarms to provide fault isolation and remote troubleshooting, being a responsible individual for problems until their complete resolution
- Acts as a senior technical resource for outages until restoration.
- Ability to analyse logs for troubleshooting issues, including malicious activity or cybersecurity threats, code and systems errors, and problems
- Excellent attention to detail and organisational skills
- Understands how to direct, support, and guide other team members and knows how to motivate them
- Production support or implementation of Kubernetes, preferably in google managed Kubernetes environment (GKE) for at least 18 months
- Virtualisation - OpenStack, Vmware
- Cloud - Experience with any: Google GKE, Amazon EKS, Azure AKS, Openshift/OKD
- Excellent networking skills including packet capture and analysis
- Excellent log analysis and setup of aggregated logging systems such as Splunk/EFK
- Excellent monitoring and metrics analysis and systems implementation such as Grafana, Graphite, Prometheus, Zabbix, etc…
- Excellent usage and understanding of infrastructure orchestration systems such as Ansible and Ansible Tower, failing that, experience in terraform/chef/puppet/saltstack
- Experience with code repositories Github/Gitlab
- Experience with MongoDB from an administration and deployment perspective including MongoDB clusters and data replication between instances
- Experience with CI/CD pipelines
- Experience with DNS/HTTPS certificates