May 24th 2022 - Delay in Console / NetFoundry API Services

 

[ISSUE IDENTIFIED - 23 May 2022 09:45 UTC] We have been facing unexpected delays in our provisioning process (www.nfconsole.io and our console APIs).  The impact of this is delay / temporary failures in the creation or edit of components such as Networks, Edge routers, Endpoints, Services etc in our management and operations platform. This issue is a repeat of an incident we had on 18th May 2022. There is no impact on the run-time operations of customer networks.

 

[TEMPORARY FIX - 24th May 2022 13:02 UTC] Provisioning actions are restored, and the unfinished process is cleared out of the process database. These processes are stranded and will not be complete. Customers who see the specific operation not completed under the Management Networks -> Network process should provision the operation from the beginning.

 

[ISSUE INVESTIGATION and RESOLUTION PROGRESS - 24th May 2022 16:25 UTC]

We have found an issue with Edge Router creation and subsequent quick deletion that was causing a process to run for a long time 20 minutes.
This issue has a resolution and will be deployed later today. This was an issue discovered by customer Island.

We have identified a bug in Service update and create issues that causes stalls in those processes. The fixes are being made for this. We are still determining if these are the root cause for the other delays, such as endpoint provisioning delays. This bug requires more development to correct so is not currently ready to deploy. The team has this as its highest priority.

 

[Deployment to Production for potential processes hanging - 24th May 2022 19:50 UTC]

We have deployed a fix to production which will address multiple cases where resource provisioning can enter into a stuck state. We will continue to closely monitor resource provisioning to look for any additional causes of unexpected delays.

 

[RESOLUTION PROGRESS - 27th May 2022, 14:30 UTC]

We have deployed a fix where the timeouts are shortened, and errors for certain failures will cause a timeout in the process instead of unnecessary retries. Retries for controller updates will still be retried.

 

 

 

Was this article helpful?
0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.