Application Reliability Specialist

  • Location

    Toronto, Canada

  • Sector:

    Software LifeCycle

  • Job type:


  • Contact:

    Archana Namasivayam

  • Job ref:


  • Published:

    almost 2 years ago

  • Duration:


  • Expiry date:


  • Startdate:


Position: Java Application Reliability Developer

Type: Full Time

Location: Toronto, ON



Our client currently has an opening for an Application Reliability Developer to join their team in downtown Toronto. As a hands-on developer, you will be responsible for the maintenance and support of the company’s highly distributed, high-performance system.


  • Diagnose and resolve application issues to ensure optimal performance and availability of all IT applications and provide root cause analysis with recommendations for improvements.
  • Focus on Site (Application) Reliability Engineering activities including proactive monitoring, responding to alerts and automation.
  • Work with senior members of the team to gather monitoring requirements from stakeholders and deliver solutions utilizing the enterprise monitoring toolset, paving way to the SRE based next generation platform.
  • You will be required to be available on call
  • Understanding large scale Java applications, database architectures, application monitoring and fault management.
  • Troubleshooting applications by leveraging APM tools like AppDynamics/Dynatrace.
  • Having an SRE mind set towards ensuring Application Availability
  • Identifying the application monitoring needs or performance issues and instrumenting them appropriately in AppDynamics and Splunk
  • Designing and instrumenting AppDynamics monitoring and tuning (health rules, alerts) for various applications.
  • Identifying areas of automation for building self-remediation needs.
  • Creating performance analysis reports & dashboards for application teams.
  • Proposing and implementing solutions to improve application availability and reliability.
  • API & Microservices technologies and containers


  • University Degree in Computer Science Engineering or equivalent combination of education and experience.
  • Must have minimum 3 years experience in Application Support with focus on application monitoring, designing and instrumenting monitoring dashboards and tuning of alerts preferably with AppDynamics, Splunk and ELK.
  • Must have RDBMS expertise - Oracle and/or DB2.
  • Experience in ELK, Splunk, Http tracing tools, Prometheus is a plus.
  • Exposure to CI/CD platform will be an added advantage.
  • SRE (Site Reliability Engineering) expertise is a nice to have.
  • 8+ years software development (Java)/ maintenance experience; preferably with experience in payment system or banking domain.
  • Experienced in Core Java Object Oriented programming and understanding of basic Enterprise Integration Patterns.
  • Experience in application support and maintenance of Java/JEE applications.
  • Good knowledge of REST APIs.
  • Debugging expertise in Java tool stack.
  • Excellent understanding of ITIL service management processes.
  • Skilled in IT problem diagnosis and resolution.
  • Experienced in scripting tools such as Power shell, BASH, Python, Ansible.
  • Solid understanding of different types of open source packages, preferably anything Spring, Apache and data transformation (jaxb2, json, xml).
  • Familiarity with web components is a bonus
  • Experience with Atlassian products (Confluence/JIRA/JIRA AGILE/Crucible/Bamboo).
  • Strong communication skills - verbal and written (technical documentation).
  • Participated in the overall delivery of software components as part of an agile development process.


Looking out for someone with extensive experience in Java stack. REST API. Extensive debugging and monitoring experience(preference would be for appdynamics, Dynatrace, splunk, ELK. But any other monitoring tool would work as well). Looking out for 1-2 years of extensive DevOps (CI/CD) experience. Payment system or banking domain experience would be an asset. Looking out for the person to lead troubleshooting. Looking out for someone who would focus on fixing an issue permanently. Would be one step above application support.