OpenShift Login Scenario! - Assisted Cloud

Table of Contents

Here’s a clever and critical scenario to kickstart your preparation:

🔥 The Monday Morning Login Crisis: OpenShift UPI/Disconnected Environment Gone Down.

A customer has deployed an OpenShift UPI/Disconnected environment, which is live and fully configured with Day 2 activities, including RBAC, labeling, and four other essential setups. The only change they have done is added 3 additional nodes and applied the same configurations.

Monday Morning Nightmare:

OpenShift Console? Inaccessible.
oc login? Not working.
Application Services? Not running.
SSH Access to Nodes? Disabled.

WhatsApp Dhinesh Kumar (+91 9444410227) if you are looking for one to one OpenShift Learning.

The environment is live, and the pressure is real. All eyes are on you to resolve the issue ASAP.

🔥 The Challenge:
How do you troubleshoot and fix this scenario with no access to the console or SSH?

Feeling the heat already? Here’s your chance to think critically and showcase your troubleshooting skills.

💡 What’s Your Checklist to Resolve This Issue Quickly?

If you want to know how I resolved this exact situation, drop a comment below, and I’ll share the complete troubleshooting steps.

🔥 Let’s Learn, Grow, and Become the OpenShift Experts the industry needs!

Problem Determination Checklist:

Check if the SSH login is happening – no, as already declared.
check oc-login is working – no, it is not working
Export KUBECONFIG and check if you are able to bypass the login – no, it is not working
Check if the nodes are running – no, you can’t check directly from cluster view as oc-login is not working.
Check if the cluster operators are running – no, you can’t check as the oc-login is not happening.
Ping each node and check if you are able to get response – meaning nodes are alive and responding. – working.
Check api, api-int and *.apps end points are resolving the api end points properly – dig and dig -x commands – working as expected.
telnet api, api-int and *.apps with respective ports (80, 443, 6443, and 22623 ) – working as expected as it is reaching to LB and at this moment this is what we can do.
Check what are the end points (nodes) listed on LB for respective ports and verify if the list is correctly defined – One view this is irrelevant as it is rare to have this change but possible.
Check where the 443 port resolves (ingress) – which node has the 443 port.

WhatsApp Dhinesh Kumar (+91 9444410227) if you are looking for one to one OpenShift Learning.

Problem: The newly added nodes also has the same label as infra and the ingress pods re-scheduled to the new nodes. Since the LB is not having firewall access to the new node, the ingress pods were not reachable.

Solution: We have first pointed LB to the new nodes (443 and 80) and get the cluster accessible. Because we need this change to get the cluster accessible. Once we get the access to the cluster, changed the labels for the new nodes and reverted the ingress to reschedule to the old infra nodes, and LB end point list to old infra nodes.

Comment for your thoughts!

WhatsApp Dhinesh Kumar (+91 9444410227) if you are looking for one to one OpenShift Learning.