Observability Engineer

March 11 2024
Categories Bank, Insurance, Financial services,
Toronto, ON • Full time

Venture outside the ordinary - TMX Careers

The TMX group of companies includes leading global exchanges such as the Toronto Stock Exchange, Montreal Exchange, and numerous innovative organizations enhancing capital markets. United as a global team, we’re connecting cross-functionally, traversing industries and geographies, moving opportunity into action, advancing global economic growth, and propelling progress. Through a rich exchange of ideas, meaningful collaboration, and a nimble operating model, we're powering some of the nation's most critical systems, fueling capital formation and innovation, bringing increased opportunity to business visionaries, product ingenuity to consumers, and career exploration to our team.

Ready to be part of the action?

Department Overview:

Global Technology Services (GTS) is one of the foundational divisions of the TMX Group, that empowers internal TMX business lines for their technology needs, operations and digital innovation. GTS as a client centric organization focuses on building technology capabilities, enabling our clients with the best technology solutions and providing effective technology financial and resource management processes. The cost effective operation is a key attribute of the GTS execution. GTS is responsible for delivery of all technology initiatives and services across TMX.

Role Summary:

As an Observability Engineer, you will play a crucial role in maintaining and improving the operational health of our applications and infrastructure. You will be responsible for setting up, configuring, and maintaining our monitoring and observability stack to ensure optimal system performance and reliability. You will be applying GitOps/DevOps principles to manage the platform and help to drive functionality and adoption through continuous improvement, simplification, and automation. You will work on the alignment, optimization, and strategy of our observability tools and platform. You'll work within a team of fellow observability and Systems engineers to make TMX reliability best of breed.

Key Accountabilities:

  • Develop and maintain robust monitoring solutions using Splunk, Splunk Observability, Grafana, AWS CloudWatch, and Prometheus.

  • Implement, maintain, and consult on the observability and monitoring framework that supports the needs of multiple internal stakeholders.

  • Create and manage dashboards and visualizations to provide actionable insights into system health, performance, and operational efficiency.

  • Help manage the Event, Incident, and Operations Escalation Management Policies.

  • Grow and evangelize the capabilities of our observability tools and platforms.

  • Collaborate with development and operations teams to integrate observability tools into the development lifecycle for continuous improvement.

  • Translate business requirements into technical solutions applying best practices and standards that meet the strategic business goal

  • Conduct performance analysis, diagnose issues, and provide solutions to enhance system reliability and scalability.

  • Document observability best practices and maintain configuration documentation.

  • Provide 2nd and 3rd level systems support

  • Liaise with vendors and other IT personnel for problem resolution

Must haves:

  • Proven experience with key observability and monitoring tools such as Splunk, Splunk Observability, Otel, Grafana, AWS CloudWatch, and Prometheus.

  • Strong understanding of cloud environments, preferably AWS, including deployment, management, and operations.

  • Proficient in creating and managing monitoring dashboards and setting up alerts to monitor all phases of the environment.

  • Solid background in scripting and automation using languages such as Python, Bash, or similar.

  • Excellent problem-solving skills, with the ability to handle complex troubleshooting and make critical system-related decisions.

  • Familiarity with configuration languages such as Ansible, and Terraform

  • Linux Operating System knowledge (RedHat Linux preferred)

  • Experience with Source Control Systems and familiar with basic branching and merging strategies. (Git, GitLab, Github, Bitbucket)

  • Strong communication skills, capable of effectively articulating technical challenges and solutions to stakeholders.

Nice to haves:

  • Experience with OS deployment systems (RedHat Satellite)

  • Experience with virtualization (VMWare, RHV)

  • Container platforms and orchestration (Kubernetes, OpenShift, SWARM)

Preferred qualifications:

  • Bachelor’s degree in Computer Science, Engineering, or a related technical field.

  • 5+ years of experience in systems engineering/administration, platform/cloud/devops engineering, or a related field.

  • Relevant certifications in Splunk, AWS, or similar technologies.

  • Experience with additional observability and monitoring tools is a plus.

    In the market for…

    Excitement - Explore emerging technology and innovation, as well as ventures and digital finance that shape the future of global markets! Experience the movement of the market while grounded in the stability of close to 200 years of success.

    Connection - With site hubs in some of the world’s most multicultural cities, we leverage our size and structure to create rich connections and belonging while experiencing powerful global impact through our work.

    Impact - More than a platform, we use our talents to power mission-critical systems that drive global economic advancement, innovation, and growth. As well, our employee-led Team Impact spreads social good via our giving strategy.

    Wellness - From empathetic leadership to a culture of flexibility and balance, we believe wellness at work creates the maximum yield and a stronger “we”. Plus, with a cloud-first and hybrid workstyle, as well as generous time-off and leaves, we support a life well lived!

    Growth - From a growth mindset in our work, to expansion in our business, TMX is home to action-takers energized by the achievement of ambitious growth.

    Ready to enrich your career with impactful work, leaders who truly care, and the flexibility and programs to help you thrive as part of #TeamTMX ? Apply now.

    TMX is committed to creating and sustaining a collegial work environment in which all individuals are treated with dignity and respect and one which reflects the diversity of the community in which we operate. We provide accommodations for applicants and employees who require it.

    Apply now!

    Similar offers

    Searching...
    No similar offer found.
    An error has occured, try again later.

    Jobs.ca network