OVHcloud Network Status

Current status
Legend
  • Operational
  • Degraded performance
  • Partial Outage
  • Major Outage
  • Under maintenance
FS#17328 — sbg-4a-a9
Incident Report for Network & Infrastructure
Resolved
An error whose origin has not been defined yet, prevenís us from applying equipment configurations.

% Failed to commit .. As an error (Unknown) encountered during commit operation. Changes may not have been committed:
'CfgMgr' detected the 'fatal' condition 'The Configuration Manager has encountered a file i/o error.': No such file or directory

We are investigating.

Update(s):

Date: 2016-04-04 16:07:13 UTC
The router is up, we made the necessary checks with Cisco to ensure that the reload ha sfixed theproblem: everything seems OK.
We are closely investigating the root cause with them. This case was quite interesting and complicated to manage: the root cause of the inability of the commits seems to be the disc of the RSP that is filled after several crashdumps of a process (rdsfs_svr) involved in the management system files. It would have caused the process (cfgmgr) contributing to the implementation and configuration synchronization. It is too early to give exactly the order of crashes and actually which caused what, but these are the first guises (it is sure however that these processes are involved in the incident). The inability to synchronize configurations between the 2 cards oversight has avoided a switchover, but also a \"clean\" reload (because the configuration was not saved).
We had to reload by forcé, which unfortunately resulted in the loss of configuration on the router. After charging it, everything is back to normal.
Though complicated, this incident has had no major impact on direct production since the opening of the work task. However, some configuration applications were delayed.
The main impact was during the reload (3:50 am EST) because we could not isolate the router itself (impossible configuration commit) a short outage time may have been observed during the time required for full recovery of the router (until 5:10 am EST), the latency increased compared to normal circumstances.

Date: 2016-04-04 15:56:42 UTC
Restoration in progress...

RP/0/RSP0/CPU0:ios#copy usb: running-config
Sun Apr 3 02:28:15.867 UTC
Source filename [/usb:]?sbg-4a-a9.cfg

Parsing...........................................................................

Date: 2016-04-04 15:56:04 UTC
The router has restarted without configuration. We are working to restore it.

Date: 2016-04-04 15:49:50 UTC
The switchover is impossible because the blocked process prevents the Standby supervisor card to be in the desired state.

RP/0/RSP1/CPU0:sbg-4a-a9#admin redundancy switchover
Sun Apr 3 02:59:13.278 CEST
Switchover disallowed: Standby node is not ready.

We need to restart the router to get to a estable status.

Date: 2016-04-04 15:48:40 UTC
We are troubleshooting with Cisco and carrying out tests.

A process seems to be blocked. We tried to restart it without success.

If this situation does not change, we will conduct a switchover which consiste of passing on the supervisory card, currently in standby.

Date: 2016-04-04 15:45:46 UTC
We are cutting all operations on sbg-4a/b-a9 to avoid configuration desynchronizations.

We are also opening a file at the manufacturer's to obtain more information and take the necessary corrective measures in the night.
Posted Apr 04, 2016 - 15:41 UTC