Job Description
As a Site Reliability Engineer on our team, you’ll work with the DoD on the development of more robust systems by building resilient infrastructure. You’ll analyze redundancy, implement monitoring tools, and automate wherever possible. You’ll reduce toil by scripting routine tasks and automating self-repair. This is your chance to leverage your expertise in managing Linux cloud-based application stacks, including front-end, database, networking, and systems while broadening your own knowledge base. Join us as we help protect our country against improvised threat networks
Qualifications
- 2+ years of experience working in Linux environments
- 1+ years of experience supporting production enterprise applications
- Experience with container technologies, including Docker and Kubernetes
- Experience with scripting, declarative Infrastructure as Code tools, including Puppet, Terraform, and Ansible
- Ability to dive deep into all aspects of the stack to identify and fix problems and troubleshoot
- Ability to obtain a DoD 8570 IAT Level II Certification, including Security+, within 30 days of hire
Desired Qualifications:
- Experience with Python and Go, Microservices, Serverless, MLOps, and AIOps
- Experience with Big Data stack using Hadoop, Spark, Accumulo or MongoDB, and Solr or Elasticsearch
- Experience with Red Hat Satellite and Identity Management
- Knowledge of PaaS products, including Rancher or OpenShift
- Knowledge of agile and DevSecOps practices
- Knowledge of software development processes and code management tools and processes
- Knowledge of cloud platforms, including AWS or Azure, including monitoring systems, logging, and security implementation