Senior Site Reliability Engineer

  • Location

    Charlotte, United States

  • Sector:

    Infrastructure/Networking

  • Job type:

    Temporary

  • Contact:

    Archana Namasivayam

  • Job ref:

    13097

  • Published:

    3 months ago

  • Duration:

    12.0

  • Expiry date:

    2019-07-22

  • Startdate:

    2019-07-29

Job Title: Senior Site Reliability Engineer

Location: Charlotte, NC

Duration: 3 months (Contract)

Job Overview

The Senior Site Reliability Engineer is responsible for providing continuous feedback of site health, reliability, availability, and user experience for all AvidXchange core products.

Job Duties:

  • Production SaaS support, incident management, problem management, and service restoration as needed to quickly respond to and resolve production issues
  • Contribute to the maintenance and continuity of the Site Reliability Engineering strategies and processes.
  • Provide technical leadership for usage and maintenance of tools for measuring core product health in production (with opportunities to extend those capabilities all the way back through the entire DevOps pipeline)
  • Provide technical leadership for calculating system availability SLAs across AvidXchange products
  • Implements and trains team members for measuring and testing of site reliability using chaos-monkey based methodologies
  • Implements and trains team members on the tool consolidation strategy to optimize spend versus value for our end to end monitoring platform
  • Contribute to definition of strategy, standardization of technologies, and establishment of patterns for rapid and continuous development and application of automated solutions to address reliability issues and automate manual tasks
  • Leads, Implements and trains team members for the DevOps principle of ‘Feedback” by creating user experience measures for all AvidXchange products
  • Work with the Software DevOps team to implement and train team members on DevOps CICD continuous performance testing, monitoring, and reliability strategy using Visual Studio Team Services and other cloud-based tools
  • Work with the Software DevOps and Performance Engineering teams to implement and contribute to strategy for DevOps CICD performance and monitoring quality gates within the delivery pipeline
  • Lead, implement and train team members on measurement capability of core product availability across Azure and AvidXchange Cloud using HTTP endpoint testing and synthetic user testing
  • Maintain automated site availability reporting and data platform
  • Present usability, reliability, incident, and user experience of AvidXchange products to senior and/or executive leadership on a weekly basis
  • Define and report SLOs / SLAs for availability to executive leadership and business partners
  • Influence product delivery teams to implement usability and reliability enhancements leading to improved user experience index scores and improved availability
  • Provide detailed analysis and troubleshooting for systems outages providing feedback to product / software engineering
  • Serve as SME for multiple required technologiesFacilitate knowledge transfer to advance the department's overall skillset

Skills:

  • 4 years or more of Experience with Dynatrace AppMon, Dynatrace SaaS or competing products
  • Measure site availability using synthetic testing platforms such as Panopta or Gomez
  • Understanding of web hosting infrastructure and high availability architecture
  • Experience measuring and monitoring .NET applications, SQL Servers/Database, and Serverless cloud resources or equivalent Java-based experience
  • Troubleshoot solutions with service oriented or micro service architectures
  • PowerShell or Linux scripting for creating automated routines for ensuring site availability
  • Development/coding experience and skills for writing custom automation solutions
  • Experience working in an Agile software development environment (Scrum / Kanban)
  • Knowledge and skills surrounding Public Cloud architectures (Azure experience highly desired)
  • Forensic system troubleshooting tools and techniques, including but not limited to:
    • SQL query data reconnaissance
    • SQL Profiler trace collection and analysis
    • SQL Query Plan analysis
    • Windows Performance Monitor
    • Network trace analysis
    • Port monitoring
    • Use of monitoring tools such as Panopta, Dynatrace, Azure Monitor, AppInsights, etc
    • Log data analysis, correlation and trending using tools like Splunk and Azure Log Analytics
    • SysInternals Suite tools like Process Monitor, Process Explorer, et

Competencies:

  • Strong technical leadership and interpersonal skills.
  • Dependable, motivated and quick learner
  • Performs analysis of complex systems and presents findings
  • Provides consultation to teams throughout AvidXchange
  • Able to rapidly comprehend the functions and capabilities of new technologies.
  • Works collaboratively and openly seeks and shares information across the enterprise.
  • Exhibits Enterprise level thinking.
  • Facilitator of Knowledge Transfer
  • Able to work independently, self-organize, and prioritize work.