Network HAM-20240906 Resolved
Priority - Critical

Problem and Remediation

At 10:03pm, we observed a network outage of Cogent Co - which affects several of our servers located in Hamilton and their connectivity to the public internet. 

Root cause

All servers are running as normal and data storage is not affected. Connectivity to certain subnets of the public internet is not operational. The root cause is a loss of connection by an upstream provider (Cogent Co). We are currently in communication with the provider to resolve this issue as soon as possible and identify the root cause on their end.

https://downdetector.ca/status/cogent/

Updates

  • Cogent confirms this is a Toronto/Hamilton-area incident
  • Cogent is still working on resolving the issue (1am EST)
  • Cogent reports that their root cause was a power outage to a network switch, and is being investigated in the field (1:30am)

Cause & Resolution

The root was a power outage to a remote network switch, which had to be resolved by field technicians.

This was accomplished by 3:16am EST. 

All our monitors are showing connectivity and systems are back to normal now.

 

Network HAM-2024-06-01 Resolved
Priority - Critical

At 3:20am EDT, one of our upstream network providers, COGECO, experienced a severe network outage, as also reported by third parties.

This caused some of our Hamilton IP ranges to be temporarily unreachable, until switchover to our redundant provider COGENT completed.

By around 4:30am EDT, all IPs were fully reachable worldwide, as tested with our thirdparty monitoring services.

At the moment, all our services are fully operational.

The cause of the COGECO issue is still unknown, but we will continue to monitor with our upstream provider.

For more information, check https://downdetector.ca/status/cogeco/

 

 

Network TOR-2023-01-23 Resolved
Priority - Critical

On Tuesday Jan 23 afternoon at 12:01 EST our on-call support team and network operations was informed of a network outage by our monitoring system.

As a result of the outage, incoming traffic on certain IPs in the 69.90 sub-range was not routed to the correct cloud servers.

Several services were affected until our team resolved the issue and restored connectivity at 14:46 EST.
Third party personnel at our Toronto data centre were involved briefly during this process.
The resolution involved a forced restart of the affected router and restoration of network settings.

During this time, our support team was updating affected clients on progress via support tickets. 

 

 

Network HAM-2022-01-21 Resolved
Priority - Critical

Problem and Remediation

At 08:15, we observed a switch reboot within our cloud storage network, which caused a number of cloud servers to report storage failures. This was reported by our internal and third-party monitoring services. Our on-site team noticed the issue immediately and initiated remediation and server reboots over the following 90 minutes.

Root cause

Further investigation showed that this issue was caused by a power interruption within the building control, combined with a local redundant power supply failure due to overload.

Preventative Action

Replacement of faulty local UPS and provisioning of additional UPS redundancy. Coordination with building control on power switch overs.