As the conversation about Network Automation flows around us, this topic seems to be getting some traction (and quite a reaction!).
- Have you disabled the CLI?
- Should we disable the CLI?
- How long before the CLI is disabled?
I don’t believe those are the question we should be asking but it made me examine why they might be asked.
I’ll walk you through my thought process. Like most network engineers I immediately went to the ‘HOW’.
Hey, we are, at the core, problem solvers (and yet we always start with “how” and not with “what problem are we trying to solve?“.
HOW, WHAT, & WHO
To clarify the “HOW”, I wanted to define the CLI for myself.
The CLI is the method via which we interact with discreet network devices.
So what does interact actually mean?
Ultimately I boiled it down to these two functions*:
|Execute read only commands to display configuration and protocol state
Other Network Devices or Systems
|Execute write commands to provision, upgrade, add new services, and apply configuration updates
Other Network Devices or Systems
* For the purposes of this discussion at least
How am I going to disable the CLI and what happens when I do?
Now that I clarified in my own mind what functionality I was getting from the CLI, it allowed me to think logically about the question.
OK. So lets disable the CLI. A device gets a local account for emergency use only. All individual accounts are disabled. All configuration must be done via some type of automation or automation framework (say Ansible) using a service account only available to the framework. How about using SNMP SETs, at least post provisioning? I am by no means advocating this but as engineers we have to look at the complete landscape and ask why existing options did not get widespread adoption.
What have we fixed? What have we broken? (Troubleshooting is not going to be fun)
It is at this point (should I survive disabling read-only CLI access) that I remember that I need to curb the instinct to start with the how but rather start with the why?
Why do we need to disable the CLI?
Lets ask my favorite question:
WHAT PROBLEM ARE WE TRYING TO SOLVE?
A CLI is not inherently bad. Interfaces are an essential part of our every day life.
So whats the problem?
- CLI interfaces were design for human interaction. They are slow and do not value consistency or standardization in their output within and across vendors.
The output itself is designed to be consumed by a human so the things that make us happy, neat rows, headings and columns in an “easy to read” format are irrelevant to programatic consumption.
The CLI cannot prevent you from causing an outage or from executing configuration commands that don’t comply with Enterprise guidelines or standards. The CLI offers syntax checking but typically no value past that. That is, as long as the syntax is correct the interface does not care about the data or the configuration payload.
So yes…the CLI was the means via which I caused a network outage but not the actual cause.
I’m not sure the CLI will go away in my lifetime (or ever) but it will change form and likely become less “valuable” and critical to operations. We have been seeing that for some time as network devices offer new and more “programatic” ways to interact (and not with humans).
So lets summarize the problems with the CLI:
|The CLI is designed to interact with humans and not programs. Its slow.
other Machine friendly interfaces and protocols
|Lack of structured data
|Since CLI output was designed to be consumed by humans at best we can call it semi structured data and if a network device does not offer any other options we have to parse the output to make it consumable by our automation.
other Machine friendly interfaces and protocols
|Enforces Syntax Only
|As long as you enter valid syntax you are at liberty to enter anything else.
Lightbulbs going off, anyone?
Speed and Lack of structured data
With the wealth of automation tools available to us, Speed and Lack of Structured data don’t seem to be very daunting issues. Today, I can still parse state information and execute a configuration payload via a script faster (while doing more, doing it consistently, and not forgetting any steps) than a network engineer can manually.
Enforces Syntax Only
The fact that I entered a perfectly valid, syntactically correct command which did not follow my companies’ guidelines or caused an outage or both seems to be worthy of further consideration.
- Any vlan supporting Digital Signage should be in a vlan in the 300 range
- Any vlan supporting Vending Machines should be in a vlan in the 200 range
- All vlans should have names which include function and subnet
Please configure a Digital Signage vlan and a Vending Machine vlan on switch X.
Little Mary configuring Vlans
Little Joey configuring Vlans
Little Jenny (always in hurry) configuring Vlans
Little Mary and Little Joey technically complied with enterprise standards but their configurations are still different. I think Little Jenny tried but …
While the CLI provides the means for this, it is not culpable for applying configuration that is non standard or wrong. Can this be addressed with QA? Sure, to some degree. So why do many networks exhibit some form of this “configuration drift”. QA/Auditing after the fact is expensive and (usually) insufficient if its your only means of enforcing naming and numbering standards and may not help at all with misconfigurations. Doing it right the first time must be the goal as it sets the stage for automation across the entire workflow.
Even if the guidelines were more specific, you still have to deal with Little Jenny whose issue was not a lack of understanding or interpretation of the guidelines but a lack of time and attention.
So we’ve just committed murder (killed the CLI) but we didn’t actually kill the culprit.
Its not the user interface but the payload.
The CLI does not misconfigure the network, humans misconfigure the network.
I don’t mean for that to sound harsh but its undeniable that given a set of high level guidelines, individual network engineers will do things their own way even within those guidelines if anything is left to choice. It may be consistent within an individual but get a team..across geographies and languages..and across business units and without detailed instructions and specific templates and time to execute accurately and consistency (standards) will suffer.
Well, now we can have a meaningful and clear discussion about disabling the CLI. Im talking about disabling the ability to use the CLI to change the behavior of the device (configuration). In thinking it through, the read-only aspect of the CLI (what we often use for troubleshooting and verification) is not really causing major issues today. Let us focus on the write aspect of the CLI.
In fact, we need to fully put the CLI into the broader network change workflow. I submit to you that step 6 below is not the issue, step 2 is. At least step 2 is a really great place to start. If we don’t address configuration consistency and accuracy (step 2), step 6 is just a different means to the same problem we have today, inconsistent and possibly incorrect configuration.
Can I misconfigure my network with Ansible (for example) vs the CLI. Yes. We’ve just moved the problem around.
I don’t deny there could be benefits to getting started with an automation framework but it is not the transformative change that everyone is expecting from Network Automation. For that level of change we have to look at all the moving parts and how they work together.
|Change Details (Change Requrements or “Design”)
Where is this data stored? As long as the network engineer can determine these without validation (before step 6) we are at the mercy of Little Jenny.
I like to call these ‘configlets’. As long as the network engineer can generate these (often in the dreaded Notepad) without validation we are at the mercy of their attention to detail and consistency.
With any change, there must be a step to quantify the impact to users, services, and other systems.
|Initiate Change Managment
Most organizations of sufficiently large size and rigor will have a Change Management process which must be part of this workflow.
|State Analysis and Verification
At some point, manually or via some other means, current state must be checked and assessed against the requested change. Today, this step is incredily important because often the “standard state” you expect is actually not what has been implemented. By focusing on the consistency of the configuration payload, this issue will improve over time.
Excecuting the commands in the configlet (or just typing in the commands off the top of your head) via the CLI or any other means turns out to be a pretty small piece of this workflow seen in this context!
|Test, Verification, QA
After step 6, either manually or via some other means, the change must be assessed for
– Test: Functionality – is it working?
– Verification: Was this the expected change – do the commands applied match the configlet developed in step 2?
– Quality Assurance: Does the new configuration deviate from the Enterprise standards
|Close Out Activities
There should always be close out activities, manual or via Automation, to document 5,6,and 7, update the customer, and any impacts noted in 3.
Conclusion & Next Steps
Killing the CLI is like killing the Messenger.
So its the message we have to work on!
Most Enterprises already have templates (usually in a Word document or a text file with lots of notes on what to do to varying degrees of specificity and with the expectation that they are manually filled in for a specific device).
Take your text templates and
1. Turn them into Jinja Templates.
2. Turn those notes into programatic logic.
3. Put them under revision control.
4. Automate your configuration payloads!
Real World Example
I run across many enterprises who are not comfortable pushing configuration via Automation. (Note: this analysis may help reframe the conversation a little bit)
In this situation, I always recommend using automation “around” the configuration process.
Use automation to generate:
– the pre checks,
– the required commands (“configlets”),
– the change request documentation, and
– the specific post checks for the change.
No “ad hoc” “let me just type in the commands from memory” configuration should be allowed. So here, while we use the CLI to apply the commands, the “configlets” are per defined templates and completely standard.
At some point this will start to change and Enterprises will be more comfortable with automated updates. With this approach we have already automated a critical part of the workflow.
Is this my preferred approach? No, but let me just say this, the number of times I’ve killed access to a switch because I forgot to include “add” in my vlan trunk statement has decreased dramatically.
Baby steps. In fact, if you have or are going to implement these recommendations, welcome to your first steps towards infrastructure as code.
Examples always help me (and clearly you deserve something if you are still with me) so to that end, there is a GitHub Repository config_as_code to illustrate what some of this might look like.
A Parting Consideration
An interface by definition has at least two sides. Here I only examined one side, the CLI itself. If we are to make informed decisions about the complete automation of the network change workflow I’ve shown above we will need to consider what the human side of the CLI brings to the table! Chew on that for a while. I know I will.