A well-run game operation relies on stable server infrastructures, effective outage prevention, and an agile response team. Rayark is seeking an experienced software engineer in site reliability to monitor, improve, and design our server infrastructures. We want to hear from engineers who know to build and maintain a scalable and highly available service. He/she will develop strategies and tools to operate large-scale distributed systems and troubleshoot system issues causing incidents.
You will start by facilitating and deploying all kinds of services to the Google Cloud Platform. Then you will collaborate with backend engineers to design high-quality architecture to build scalable and reliable services.
- Maintain production services availability. (on-call)
- Troubleshoot system issues
- Develop strategies and tools to operate large-scale distributed systems.
- Improve the architecture design of web services to improve availability, scalability, and maintainability.
- Basic understanding of networking theory and protocols (layer 3 and above, such as DNS, TCP/IP, HTTP/HTTPS)
- Basic knowledge of database management systems
- Be able to perform tasks on Linux OS such as scheduling jobs, managing file system, or mount/unmount disks
- Minimum 4 years experience in any programming languages
- Experience with cloud services, especially GCP (Google Cloud Platform)
- Experience with container technologies (Docker, Kubernetes)
REQUIRED APPLICATION MATERIALS
- Cover letter
- Transcript (Applicants who recently graduated, just discharged from the army or have less than 3 years of work experience)