Using Python and Pandas to look at Pandemic Data

The script and supporting files in this repository are intended to show how the Python Pandas module can be used to analyze data, specifically COVID-19 data.

I am going to recommend 3 data sets to “investigate”:

Background

WHO Data

The repository comes with the WHO data file from 06 April 2020 (WHO-COVID-19-global-data.csv). The simplest run of the script will use this WHO data file.

To download the latest file go to the Who Overview Map and download the Map Data from the link on the lower right hand side.

This CSV file will need clean up. Remove spaces from column titles. Some rows have spaces in the country names and so spaces have shifted columns (Belize and Palestine). You will need to combine the name and shift the data back to the correct columns. Welcome to the world of data.

who_download_2020-04-06_11-45-54

John Hopkins University (JHU) Center for Systems Science and Engineering (CSSE) Data

The John Hopikns Unversity CSSE data is widely used in the media and either drives or is incorporated into many other data sets.
More importantly for our purposes, this wonderful institution of higher learning makes the raw data available on a public repository (GitHub).

CSSEGISandData on GitHub

I’ve cloned the repository so that it sits as a subdirectory in my pandas_for_pandemic_data folder and I refresh it every day.

# Clones the pands_for_pandemic_data Repository
git clone https://github.com/cldeluna/pandas_for_pandemic_data.git

# Change into the pands_for_pandemic_data Repository
cd pandas_for_pandemic_data

# Clones the John Hopkins University CSSE Data
git clone https://github.com/CSSEGISandData/COVID-19.git

# Refresh the JHU Data
cd COVID-19
git pull
# Example of refreshing the JHU repository
Claudias-iMac:COVID-19 claudia$ git pull
remote: Enumerating objects: 148, done.
remote: Counting objects: 100% (148/148), done.
remote: Compressing objects: 100% (13/13), done.
remote: Total 252 (delta 135), reused 140 (delta 135), pack-reused 104
Receiving objects: 100% (252/252), 1.25 MiB | 6.51 MiB/s, done.
Resolving deltas: 100% (157/157), completed with 14 local objects.
From https://github.com/CSSEGISandData/COVID-19
865c933c..f3dea791 master -> origin/master
513b21a4..493821d3 web-data -> origin/web-data
Updating 865c933c..f3dea791
Fast-forward
csse_covid_19_data/UID_ISO_FIPS_LookUp_Table.csv | 7141 ++++++++++----------
…/csse_covid_19_daily_reports/04-06-2020.csv | 2810 ++++++++
…/time_series_covid19_confirmed_US.csv | 6508 +++++++++---------
…/time_series_covid19_confirmed_global.csv | 527 +-
…/time_series_covid19_deaths_US.csv | 6508 +++++++++---------
…/time_series_covid19_deaths_global.csv | 527 +-
…/time_series_covid19_recovered_global.csv | 499 +-
7 files changed, 13668 insertions(+), 10852 deletions(-)
create mode 100644 csse_covid_19_data/csse_covid_19_daily_reports/04-06-2020.csv
Claudias-iMac:COVID-19 claudia$


Feel free to put it elsewhere in your directory structure. The script sets the default path in the arguments section at the bottom. You can either update the default path directly or use the -d option when you execute the script to redirect script to look there for the daily files.

default_path_2020-04-07_11-13-29

New York Times Data


The New York Times has also shared their data. This repository only contains data for the US. They share two flavors:

  • US State Level data
  • US County Level data

They do a good job of keeping the data set very clean. Its all numeric and so far I’ve not seen any missing data which is rare for any data set.

== Number of MISSING values in each column:
date 0
state 0
fips 0
cases 0
deaths 0
dtype: int64

New York Times US Data GitHub Repository

I took the same approach with this repository as I did for the JHU data. I’ve cloned the repository so that it sits as a subdirectory in my pandas_for_pandemic_data folder and I refresh it every day.

Script

In general, the script will take in a CSV data file, turn it into a Pandas Data Frame, and execute a set of commands against the data. The main section manages which options to execute and sends the relevant data frame to a function that prints out the various analysis statements for the data frame. In general, this is what will be shown for each data frame.

  • Describe the data (pandas method showing interesting statistical facts about the data)
  • Show the shape of the data frame (number of rows and columns)
  • Show the first and last 5 lines of data
  • List the column headings
  • Show the data type of each column
  • Look for the total number of missing values in each column
  • Sum the columns (only makes sense for columns holding numeric data)

The various options let you control which data set you want to investigate and filter. The output is sent to your screen. These are just some of the actions available to you with Pandas. Once you have the data in a Pandas data frame you can query the data frame for the data that is meaningful to you.

Cheatsheet for todays_totals.py script

CLIDescription
python todays_totals.py -hDisplay all the options available (Help)
python todays_totals.pyWHO Data
Without any options, the script will load the local WHO data file from 6 April into a Pandas Data Frame and run some commands to investigate the data.
Reminder: If you download a fresh WHO CSV file please note the updates I list above so that you can cleanly import the CSV into a Data Frame
python todays_totals.py -c “MX”WHO Data Filtered for a Specific Country
Note: use the 2 letter country code as an argument with the -c option
python todays_totals.py -tJohn Hopkins University CSSE Data
The -t option will look for todays daily log file in the JHU CSSE (remember to clone the repository)
python todays_totals.py -t -c “Mexico”John Hopkins University CSSE Data
The -t -c “country or region” option will let you filter for a country
python todays_totals.py -t -s “California”John Hopkins University CSSE Data
The -t -s “state” option filters the JHU data set for a state or province
python todays_totals.py -t -f 06037John Hopkins University CSSE Data
The -t -f  FIPS option filters the JHU data set for a FIPS county code.   Note: FIPS code 06037 is for Los Angeles County
python todays_totals.py -nNew York Times Data
US Totals Only for the full NY Times data set with the -n option
(remember to clone the repository)
python todays_totals.py -n -f 6
or
python todays_totals.py -n -p “California”
New York Times Data
This data set has both “state” and “fips” but fips represents FIPS State Code so in this example 6 is the FIPS state code. 
This should get you exactly the same data as ** python todays_totals.py -n -p “California”**
Script CLI Cheat Sheet

The todays_totals.py script will give you an idea of how to load pandemic data into a Pandas data frame and interrogate the data. Executing it with the -h option will give you help on the options.

(pandas) Claudias-iMac:pandas_for_pandemic_data claudia$ python todays_totals.py -h
usage: todays_totals.py [-h] [-d DAILY_REPORTS_FOLDER] [-c COUNTRY_REGION]
[-p PROVINCE_STATE] [-s SPECIFIC_DAY] [-f FIPS] [-w]
[-t] [-n]
Script Description
optional arguments:
-h, --help show this help message and exit
-d DAILY_REPORTS_FOLDER, --daily_reports_folder DAILY_REPORTS_FOLDER
Set path to CSSE Dailty Report folder
csse_covid_19_daily_reports. Default is ./COVID-19/css
e_covid_19_data/csse_covid_19_daily_reports
-c COUNTRY_REGION, --country_region COUNTRY_REGION
Filer on 2 letter Country Region. Example: "US"
-p PROVINCE_STATE, --province_state PROVINCE_STATE
Filer on Province State. Example: "California"
-s SPECIFIC_DAY, --specific_day SPECIFIC_DAY
File for specific day. Example: 04-01-2020
-f FIPS, --fips FIPS FIPS County Code Example: 06037 (Los Angeles County)
-w, --who_data_file Analyze the WHO data file provided
-t, --today_csse Analyze todays file in the CSSE repo
-n, --new_york_times Analyze the New York Times Data
Usage: 'python todays_totals'

Running the script without any parameters yields some information on the WHO data set from 6 April which is part of the repository. This information includes:

A description of the data frame including some statistical data on the numeric values.

The rows and

Example output for WHO Data:

(pandas) Claudias-iMac:pandas_for_pandemic_data claudia$ python todays_totals.py
==================== DATA FRAME CHECK ====================
==================== WHO Data Frame from WHO-COVID-19-global-data.csv ====================

== Describe the Data Frame:
Deaths CumulativeDeaths Confirmed CumulativeConfirmed
count 6786.000000 6786.000000 6786.000000 6786.000000
mean 9.971412 104.946360 178.487179 2313.689213
std 73.268455 761.569677 1183.935476 12918.006118
min 0.000000 0.000000 0.000000 1.000000
25% 0.000000 0.000000 0.000000 4.000000
50% 0.000000 0.000000 2.000000 26.000000
75% 0.000000 3.000000 25.000000 235.000000
max 2003.000000 15889.000000 33510.000000 307318.000000

== Shape of the Data Frame:
(6786, 8)

== SAMPLE (first and last 5 rows):
day Country CountryName Region Deaths CumulativeDeaths Confirmed CumulativeConfirmed
0 2/25/20 AF Afghanistan EMRO 0 0 1 1
1 2/26/20 AF Afghanistan EMRO 0 0 0 1
2 2/27/20 AF Afghanistan EMRO 0 0 0 1
3 2/28/20 AF Afghanistan EMRO 0 0 0 1
4 2/29/20 AF Afghanistan EMRO 0 0 0 1
day Country CountryName Region Deaths CumulativeDeaths Confirmed CumulativeConfirmed
6781 4/2/20 ZW Zimbabwe AFRO 0 1 0 8
6782 4/3/20 ZW Zimbabwe AFRO 0 1 0 8
6783 4/4/20 ZW Zimbabwe AFRO 0 1 1 9
6784 4/5/20 ZW Zimbabwe AFRO 0 1 0 9
6785 4/6/20 ZW Zimbabwe AFRO 0 1 0 9

== Column Headings of the data set:
['day' 'Country' 'CountryName' 'Region' 'Deaths' 'CumulativeDeaths'
'Confirmed' 'CumulativeConfirmed']

== Column default data type:
day object
Country object
CountryName object
Region object
Deaths int64
CumulativeDeaths int64
Confirmed int64
CumulativeConfirmed int64
dtype: object

== Number of MISSING values in each column:
day 0
Country 85
CountryName 0
Region 62
Deaths 0
CumulativeDeaths 0
Confirmed 0
CumulativeConfirmed 0
dtype: int64

== Sum just the numeric columns in the Data Frame:
Deaths 67666
CumulativeDeaths 712166
Confirmed 1211214
CumulativeConfirmed 15700695
dtype: int64

(pandas) Claudias-iMac:pandas_for_pandemic_data claudia$

Nornir – A New Network Automation Framework

nornir (formerly brigade) – A new network automation framework

Before getting started, let me say that I’m big fan of Ansible. It is one of my go-to automation frameworks. Having said that, there have been use cases where I’ve run into some of the limitations of Ansible and to be fair some of those limitations may have been my own. 

By limitations, I don’t mean I could not do what I wanted but rather to do what I wanted got a bit more complex than perhaps it should have been. When I go back in 6 months, I’ll have no idea how I got it to work. These use cases often involve more complex logic than what Ansible handles with its “simpler” constructs and Domain Specific Language (DSL) . 

So I was very intrigued to hear that the following automation heavyweights have been working on a new automation framework, Nornir:

  • David Barroso (think NAPALM – the python library not the sticky flammable stuff)
  • Kirk Byers (think Netmiko, Netmiko tools, and teacher extraordinaire – I can say that with full confidence as I’m pretty sure I’ve taken every one of his courses – some twice!)
  • Patrick Ogenstad (think NetworkLore)

Nornir Documentation

As an engineer one of my favorite question is “What problem are we trying to solve?” and here is my answer to that question when applied to Nornir.

Simpler, more complex logic

An oxymoron?  Certainly. Read on and I will try to explain.

By using pure Python, this framework solves the complex logic frustrations you may ultimately encounter with Ansible. If you can do it with python or have done it with python you are good to go. Thats not to say you can’t take your python script and turn it into an Ansible module but Nornir may save you that step.

Domain specific languages can be both a blessing and a curse. They can allow you the illusion that you are not programing and so facilitate getting you started if “programming” is not your cup of tea. They can help you get productive very quickly but eventually you may hit a tipping point where the cost of doing what you need to do with the tools and features in the DSL is too high in terms of complexity and supportability. Nornir “simplifies” that complex logic by allowing you access to all the tools you have in native python. As a side, and not insignificant, benefit you might actually remember what your code does when you get back to it in 6 months.  

Native on all platforms

Many companies only provide Windows laptops and so I’ve always tried to be very mindful of that when developing solutions. 

Scenario: I’ve got all of these Ansible play books that we can use to standardize network discovery, build configurations, apply configurations but most of my colleagues have Windows laptops and while I was able to develop and run these on my Mac where I can easily run an Ansible control server now we need to get an Ansible control server on a bunch of Windows laptops (this is not natively supported by Ansible). 

There are certainly solutions for this (see Using Docker as an Ansible and Python platform for Network Engineers) but that’s an extra step. There may be other reasons for taking that step but Nornir is a pip installable module and so you don’t need to.

I spent a Sunday afternoon dabbling in Nornir and it was well worth the time. It took me about 45 minutes to get things set up on my Windows system and run the example Patrick Ogenstad included in his post. While Nornir is said to support Python 2.7 (but recommends Python 3.6) I did have installation issues even with the latest pip installed. That was a significant part of the 45 minutes. Once I set up a Python3 virtual environment it worked flawlessly. You can see my work in this GitHub repository.

This is an exciting new framework with a great deal of promise which we can add to our automation arsenal!

Over the next few weeks (or months ) I’ll continue to familiarize myself with Nornir and report back.

This was originally published May 7, 2018 on Linkedin but has been updated to support the latest Nornir and scripts have been renamed so that there is no confusion between brigade and nornir. But let me say that this renaming is tied with the renaming of Cisco’s Spark platform to Webex Teams as the worst ever, or at least the last decade.

Original Brigade GitHub repository

Part 2 of this Series – Configuration Creation with Nornir

Pandas for Network Engineers (Who doesn’t love Pandas? )

The module not the mammal!

My original title for this article was going to be *Decomposing Pandas* as a follow on to *Decomposing Data Structures* but I was advised against that name. Go figure.

One of the things I love most about Python is that its always waiting for me to get just a little bit better so it can show me a slightly smarter way to do something. Pandas is the latest such example.

Pandas is a powerful data science Python library that excels at manipulating multidimensional data.

Why is this even remotely interesting to me as a network engineer?

Well, thats what Excel does, right?

I spend more time than I care to admit processing data in Excel. I find that Excel is always the lowest common denominator. I understand why and often I’m a culprit myself but eventually one grows weary of all the data being in a spreadsheet and having to manipulate it. I’m working on the former and Pandas is helping on the latter.

Google around enough for help on processing spreadsheets and you will come across references to the Pandas Python module.

If you are anything like me, you go through some or all of these stages:

  • You dismiss it as irrelevant to what you are trying to do
  • You dismiss it because its seems to be about big data, analytics, and scientific analysis of data (not your thing right?)
  • As you continue to struggle with what got you here in the first place (there has got to be a better way to deal with this spreadsheet data) you reconsider. So you try to do some processing in Pandas and pull a mental muscle…and what the heck is this NaN thing that keeps making my program crash? Basically, you find yourself way way out of your comfort zone (well..I did)!
  • You determine that your limited Python skills are not up to something quite this complex…after all, you know just enough Python to do the automation stuff you need to do and you are not a data scientist.

Finally, in a fit of desperation as you see all the Excel files you have to process, you decide that a python module is not going to get the better of you and you give it another go!

So here I am, on the other side of that brain sprain, and better for it, as is usually the case.

What is possible with Pandas…

Once you get the hang of it, manipulating spreadsheet-like data sets becomes so much simpler with Pandas. In fact, thats true for any data set, not just ones from spreadsheets. In fact, in the examples below, the data set comes from parsing show commands with TextFSM.

Knowing how to work with Pandas, even in a limited fashion as is the case with me, is going to be a handy skill to have for any Network Engineer who is (or is trying to become) conversant in programmability & automation.

My goal here is not to teach you Pandas as there is quite alot of excellent material out there to do that. I’ve highlighted the content which helped me the most in the “Study Guide” section at the end.

My goal is to share what I’ve been able to do with it as a Network Engineer, what I found most useful as I tried to wrap my head around it, and my own REPL work.

Lets look at something simple. I need to get the ARP table from a device and “interrogate” the data.

In this example, I have a text file with the output of the “show ip arp” command which I’ve parsed with TextFSM.

Here is the raw data returned from the TextFSM parsing script:

 # Executing textfsm strainer function only to get data
  strained, strainer = basic_textfsm.textfsm_strainer(template_file, output_file, debug=False)

In [1]: strained                                                                                                                                                                                                            
Out[1]:
[['Internet', '10.1.10.1', '5', '28c6.8ee1.659b', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.11', '4', '6400.6a64.f5ca', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.10', '172', '0018.7149.5160', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.21', '0', 'a860.b603.421c', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.37', '18', 'a4c3.f047.4528', 'ARPA', 'Vlan1'],
['Internet', '10.10.101.1', '-', '0018.b9b5.93c2', 'ARPA', 'Vlan101'],
['Internet', '10.10.100.1', '-', '0018.b9b5.93c1', 'ARPA', 'Vlan100'],
['Internet', '10.1.10.102', '-', '0018.b9b5.93c0', 'ARPA', 'Vlan1'],
['Internet', '71.103.129.220', '4', '28c6.8ee1.6599', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.170', '0', '000c.294f.a20b', 'ARPA', 'Vlan1'],
['Internet', '10.1.10.181', '0', '000c.298c.d663', 'ARPA', 'Vlan1']]

Note: don’t read anything into the variable name strained. The function I use to parse the data is called textfsm_strainer because I “strain” the data through TextFSM to get structured data out of it so I put the resulting parsed data from that function into a variable called “strained”.

Here is that data in a Pandas Data Frame:

# strained is the parsed data from my TextFSM function and the first command below
# loads that parsed data into a Pandas Data Frame called "df"
​
In [1]: df = pd.DataFrame(strained, columns=strainer.header)                                                                                                                                                                                                           
In [2]: df                                                                                                                                                                                                                                                      
Out[2]: 
​
    PROTOCOL         ADDRESS  AGE             MAC  TYPE INTERFACE
0   Internet       10.1.10.1    5  28c6.8ee1.659b  ARPA     Vlan1
1   Internet      10.1.10.11    4  6400.6a64.f5ca  ARPA     Vlan1
2   Internet      10.1.10.10  172  0018.7149.5160  ARPA     Vlan1
3   Internet      10.1.10.21    0  a860.b603.421c  ARPA     Vlan1
4   Internet      10.1.10.37   18  a4c3.f047.4528  ARPA     Vlan1
5   Internet     10.10.101.1    -  0018.b9b5.93c2  ARPA   Vlan101
6   Internet     10.10.100.1    -  0018.b9b5.93c1  ARPA   Vlan100
7   Internet     10.1.10.102    -  0018.b9b5.93c0  ARPA     Vlan1
8   Internet  71.103.129.220    4  28c6.8ee1.6599  ARPA     Vlan1
9   Internet     10.1.10.170    0  000c.294f.a20b  ARPA     Vlan1
10  Internet     10.1.10.181    0  000c.298c.d663  ARPA     Vlan1

I now have a spreadsheet like data structure with columns and rows that I can query and manipulate.


My first question:

What are all the IPs in Vlan1?

Just Python

Before Pandas, I would initialize an empty list to hold the one or more IPs and then I would iterate through the data structure (strained in this example) and where the interface “column” value (which in this list of lists in the strained variable is at index 5) was equal to ‘Vlan1’ I appended that IP to the list. The IP is in index 1 in each item the strained list.

# Using Python Only
print("\n\tUsing Python only..")
vlan1ips = []
for line in strained:
    if line[5] == 'Vlan1':
        vlan1ips.append(line[1])
print(f"{vlan1ips}")

The resulting output would look something like this:

['10.1.10.1', '10.1.10.11', '10.1.10.10', '10.1.10.21', '10.1.10.37', '10.1.10.102', '71.103.129.220', '10.1.10.170', '10.1.10.181']

Python and Pandas

Using a Pandas data frame df to hold the parsed data:

pandas_vlan1ips = df['ADDRESS'].loc[df['INTERFACE'] == 'Vlan1'].values

The resulting output from the one liner above would look something like this:

 ['10.1.10.1' '10.1.10.11' '10.1.10.10' '10.1.10.21' '10.1.10.37'
'10.1.10.102' '71.103.129.220' '10.1.10.170' '10.1.10.181']

Same output with a single command!

Python List Comprehension

For those more conversant with Python, you could say that list comprehension is just as efficient.

# Using list comprehension
print("Using Python List Comprehension...")
lc_vlan1ips = [line[1] for line in strained if line[5] == 'Vlan1' ]

Results in:

Using List Comprehension: 
['10.1.10.1', '10.1.10.11', '10.1.10.10', '10.1.10.21', '10.1.10.37', '10.1.10.102', '71.103.129.220', '10.1.10.170', '10.1.10.181']

So yes..list comprehension gets us down to one line but I find it a bit obscure to read and a week later I will have no idea what is in line[5] or line[1].

I could turn the data into a list of dictionaries so that rather than using the positional indexes in a list I could turn line[1] into line[‘IP_ADDRESS’] and line[5] into line[‘INTERFACE’] which would make reading the list comprehension and the basic python easier but now we’ve added lines to the script.

Finally, Yes its one line but I’m still iterating over the data.

Pandas is set up to do all the iteration for me and lets me refer to data by name or by position “out of the box” and without any extra steps.

Lets decompose the one line of code:

If you think of this expression as a filter sandwich, the df[‘ADDRESS’] and .values are the bread and the middle .loc[df[‘INTERFACE’]] == ‘Vlan1’] part that filters is the main ingredient.

Without the middle part you would have a Pandas Series or list of all the IPs in the ARP table. Basically you get the entire contents of the ‘ADDRESS” column in the data frame without any filtering.

When you “qualify” df[‘ADDRESS’] with .loc[df[‘INTERFACE’]] == ‘Vlan1’] you filter the ADDRESS column in the data frame for just those records where INTERFACE is ‘Vlan1’ and you only return the IP values by using the .values method.

Now, this will return a numpy.ndarray which might be great for some subsequent statistical analysis but as network engineers our needs are simple.

I’m using iPython in the examples below as you can see from the “In” and “Out” line prefixes.

In [1]: pandas_vlan1ips = df['ADDRESS'].loc[df['INTERFACE'] == 'Vlan1'].values

In [2]: type(pandas_vlan1ips) Out[2]: numpy.ndarray

I would like my list back as an actual python list and thats no problem for Pandas.

pandas-vlan1ips-list-2019-12-30_07-46-43

In [3]: pandas_vlan1ips = df['ADDRESS'].loc[df['INTERFACE'] == 'Vlan1'].to_list()

In [4]: type(pandas_vlan1ips) Out[4]: list

In [5]: pandas_vlan1ips Out[5]:` `['10.1.10.1',` `'10.1.10.11',` `'10.1.10.10',` `'10.1.10.21',` `'10.1.10.37',` `'10.1.10.102',` `'71.103.129.220',` `'10.1.10.170',` `'10.1.10.181']

You know what would be really handy? A list of dictionaries where I can reference both the IP ADDRESS and the MAC as keys.

In [5]: vlan1ipmac_ldict = df[['ADDRESS', 'MAC']].to_dict(orient='records')

In [6]: type(vlan1ipmac_ldict) Out[6]: list

In [7]: vlan1ipmac_ldict Out[7]:` `[{'ADDRESS': '10.1.10.1', 'MAC': '28c6.8ee1.659b'},` `{'ADDRESS': '10.1.10.11', 'MAC': '6400.6a64.f5ca'},` `{'ADDRESS': '10.1.10.10', 'MAC': '0018.7149.5160'},` `{'ADDRESS': '10.1.10.21', 'MAC': 'a860.b603.421c'},` `{'ADDRESS': '10.1.10.37', 'MAC': 'a4c3.f047.4528'},` `{'ADDRESS': '10.10.101.1', 'MAC': '0018.b9b5.93c2'},` `{'ADDRESS': '10.10.100.1', 'MAC': '0018.b9b5.93c1'},` `{'ADDRESS': '10.1.10.102', 'MAC': '0018.b9b5.93c0'},` `{'ADDRESS': '71.103.129.220', 'MAC': '28c6.8ee1.6599'},` `{'ADDRESS': '10.1.10.170', 'MAC': '000c.294f.a20b'},` `{'ADDRESS': '10.1.10.181', 'MAC': '000c.298c.d663'}]

In [8]: len(vlan1ipmac_ldict) Out[8]: 11

MAC address Lookup

Not impressed yet. Let see what else we can do with this Data Frame.

I have a small function that performs MAC address lookups to get the Vendor OUI.

This function is called get_oui_macvendors() and you pass it a MAC address and it returns the vendor name.

It uses the MacVendors.co API.

I’d like to add a column of data to our Data Frame with the Vendor OUI for each MAC address.

In the one line below, I’ve added a column to the data frame titled ‘OUI’ and populated its value by performing a lookup on each MAC and using the result from the get_oui_macvendors function.

df['OUI'] = df['MAC'].map(get_oui_macvendors)

The left side of the equation references a column in the data Fram which does not exist so it will be added.

The right side takes the existing MAC column in the data frame and takes each MAC address and runs it through the get_oui_macvendors function to get the Vendor OUI and “maps” that result into the new OUI “cell” for that MACs row in the data frame.

pandas-newcolumn diagram to show what is happening under the hood in the one line command to ad a coloumn

Now we have an updated Data Frame with a new OUI column giving the vendor code for each Mac.

In [1]: df                                                                                                                                                                                                                                                      
 Out[1]: 
     PROTOCOL         ADDRESS  AGE             MAC  TYPE INTERFACE                 OUI
 0   Internet       10.1.10.1    5  28c6.8ee1.659b  ARPA     Vlan1             NETGEAR
 1   Internet      10.1.10.11    4  6400.6a64.f5ca  ARPA     Vlan1           Dell Inc.
 2   Internet      10.1.10.10  172  0018.7149.5160  ARPA     Vlan1     Hewlett Packard
 3   Internet      10.1.10.21    0  a860.b603.421c  ARPA     Vlan1         Apple, Inc.
 4   Internet      10.1.10.37   18  a4c3.f047.4528  ARPA     Vlan1     Intel Corporate
 5   Internet     10.10.101.1    -  0018.b9b5.93c2  ARPA   Vlan101  Cisco Systems, Inc
 6   Internet     10.10.100.1    -  0018.b9b5.93c1  ARPA   Vlan100  Cisco Systems, Inc
 7   Internet     10.1.10.102    -  0018.b9b5.93c0  ARPA     Vlan1  Cisco Systems, Inc
 8   Internet  71.103.129.220    4  28c6.8ee1.6599  ARPA     Vlan1             NETGEAR
 9   Internet     10.1.10.170    0  000c.294f.a20b  ARPA     Vlan1        VMware, Inc.
 10  Internet     10.1.10.181    0  000c.298c.d663  ARPA     Vlan1        VMware, Inc.

More questions

Lets interrogate our data set further.

I want a unique list of all the INTERFACE values.

In [3]: df['INTERFACE'].unique()                                                                                                                                                                                                                                
 Out[3]: array(['Vlan1', 'Vlan101', 'Vlan100'], dtype=object)

How about “Give me a total count of each of the unique INTERFACE values?”

In [4]: df.groupby('INTERFACE').size()                                                                                                                                                                                                                          
 Out[4]: 
 INTERFACE
 Vlan1      9
 Vlan100    1
 Vlan101    1
 dtype: int64

Lets take it down a level and get unique totals based on INTERFACE and vendor OUI.

In [2]: df.groupby(['INTERFACE','OUI']).size()                                                                                                                                                                                                                  
 Out[2]: 
 INTERFACE  OUI               
 Vlan1      Apple, Inc.           1
            Cisco Systems, Inc    1
            Dell Inc.             1
            Hewlett Packard       1
            Intel Corporate       1
            NETGEAR               2
            VMware, Inc.          2
 Vlan100    Cisco Systems, Inc    1
 Vlan101    Cisco Systems, Inc    1
 dtype: int64

I could do this all day long!

Conclusion

I’ve just scratched the surface of what Pandas can do and I hope some of the examples I’ve shown above illustrate why investing in learning how to use data frames could be very beneficial. Filtering, getting unique values with counts, even Pivot Tables are possible with Pandas.

Don’t be discouraged by its seeming complexity like I was.

Don’t discount it because it does not seem to be applicable to what you are trying to do as a Network Engineer, like I did. I hope I’ve shown how very wrong I was and that it is very applicable.

In fact, this small example and some of the other content in this repository comes from an actual use case.

I’m involved in several large refresh projects and our workflow is what you would expect.

  1. Snapshot the environment before you change out the equipment
  2. Perform some basic reachability tests
  3. Replace the equipment (switches in this case)
  4. Perform basic reachability tests again
  5. Compare PRE and POST state and confirm that all the devices you had just before you started are back on the network.
  6. Troubleshoot as needed

As you can see if you delve into this repository, its heavy on APR and MAC data manipulation so that we can automate most of the workflow I’ve described above. Could I have done it without Pandas? Yes. Could I have done it as quickly and efficiently with code that I will have some shot of understanding in a month without Pandas? No.

I hope I’ve either put Pandas on your radar as a possible tool to use in the future or actually gotten you curious enough to take the next steps.

I really hope that the latter is the case and I encourage you to just dive in.

The companion repository on GitHub is intended to help and give you examples.


Next Steps

The “Study Guide” links below have some very good and clear content to get you started. Of all the content out there, these resources were the most helpful for me.

Let me also say that it took a focused effort to get the point where I was doing useful work with Pandas and I’ve only just scratched the surface. I was worth every minute! What I have described here and in this repository are the things that were useful for me as a Network Engineer.

Once you’ve gone through the Study Guide links and any others that you have found, you can return to this repository to see examples. In particular, this repository contains a Python script called arp_interrogate.py.

It goes through loading the ARP data from the “show ip arp” command, parsing it, and creating a Pandas Data Frame.

It then goes through a variety of questions (some of which you have seen above) to show how the Data Frame can be “interrogated” to get to information that might prove useful.

There are comments throughout which are reminders for me and which may be useful to you.

The script is designed to run with data in the repository by default but you can pass it your own “show ip arp” output with the -o option.

Using the -i option will drop you into iPython with all of the data still in memory for you to use. This will allow you to interrogate the data in the Data Frame yourself..

If you would like to use it make sure you clone or download the repository and set up the expected environment.

Options for the arp_interrogate.py script:

(pandas) Claudias-iMac:pandas_neteng claudia$ python arp_interrogate.py -h
usage: arp_interrogate.py [-h] [-t TEMPLATE_FILE] [-o OUTPUT_FILE] [-v]
                        [-f FILENAME] [-s] [-i] [-c]

Script Description

optional arguments:
-h, --help           show this help message and exit
-t TEMPLATE_FILE, --template_file TEMPLATE_FILE
                      TextFSM Template File
-o OUTPUT_FILE, --output_file OUTPUT_FILE
                      Full path to file with show command show ip arp output
-v, --verbose         Enable all of the extra print statements used to
                      investigate the results
-f FILENAME, --filename FILENAME
                      Resulting device data parsed output file name suffix
-s, --save           Save Parsed output in TXT, JSON, YAML, and CSV Formats
-i, --interactive     Drop into iPython
-c, --comparison     Show Comparison

Usage: ' python arp_interrogate.py Will run with default data in the
repository'
(pandas) Claudias-iMac:pandas_neteng claudia$

Study Guide

A Quick Introduction to the “Pandas” Python Library

https://towardsdatascience.com/a-quick-introduction-to-the-pandas-python-library-f1b678f34673

For me this is the class that made all the other classes start to make sense.

Note that this class is not Free.

Pandas Fundamentals by Paweł Kordek on PluralSight is exceptionally good.

There is quite alot to Pandas and it can be overwhelming (at least it was for me) but this course in particular got me working very quickly and explained things in a very clear way.

Python Pandas Tutorial 2: Dataframe Basics by codebasics <- good for Pandas operations and set_index

Python Pandas Tutorial 5: Handle Missing Data: fillna, dropna, interpolate by codebasics

Python Pandas Tutorial 6. Handle Missing Data: replace function by codebasics

Real Python <- this is terrific resource for learning Python

There is a lot of content here. Explore at will. The two below I found particularly helpful.

https://realpython.com/search?q=pandas

Intro to DataFrames by Joe James <–great ‘cheatsheet’



What others have shared…

Analyzing Wireshark Data with Pandas


Disclaimer

THE SOFTWARE in the mentioned repository IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON INFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Building a Production-ish Ready WebEx Teams ChatBot

Introduction

  • Is your interrupt-driven day no longer supportable?
  • Is there a particular Project Manager that asks you the same question every morning?
  • Do you often have to take some technical data and simplify it for semi- or non-technical consumption?
  • Would you like to pull out relevant sections of technical data for a sibling team? Today you send them all the data and hope they can pull out what they need, or find what they need because you just don’t have the time to put together something that is customized for them containing only what they need?

I suspect many of us have these situations far too often.

I spend quite a bit of time on Webex Teams (referred to as Spark from here on out in protest of the terrible re-branding of Spark to WebEx Teams which not only completely lacks imagination but also confuses anyone who also has to deal with Microsoft Teams…. a rebranding so awful it might actually unseat the rebranding of brigade). It is currently the messaging application of choice for my two biggest projects and a principal culprit in my interrupt-driven day. Maybe this can help turn it into an advantage… a mini-you if you will.

Not long ago, I lost a young and incredibly capable engineer to another project and that loss was pretty impactful. Many of the day to day activities fell back to me and I found lots of little (and not so little) things falling through the cracks. How was I going to work smarter not harder?

Network engineer getting asked questions from everyone!

The Back Story

I got to attend Interop this year and was very inspired by every session I attended but a few in particular really got me thinking about how I might automate my way out of my current predicament.

Jeremy Schulman– – Self-service Network Automation Using Slack

Nick RussoAutomation for Bureaucracies

Hank PrestonA Practical Look at NetDevOps – While Hank’s session inspired other ideas it is his work with DevNet and a SparkBot module that is most relevant for this exercise.

The Resulting Scene

The effort documented here is a real world experience and solution. To protect the innocent I’ve created a “demo” version of this solution that will highlight the functionality and the possibilities. Over the course of several posts, I’ll document the requirements, use cases, solutions, and issues.


I like to start with the problem statement:

  1. I need to be able to provide fine grained project status on demand
  2. I need to be able to print a pretty Summary report for management on demand
  3. I need to be able to extract the relevant portion of some technical information for a parallel project on demand
  4. I’d like to incorporate something fun to get the team comfortable and familiar with the Bot (on demand of course)

I also like to document the initial state and the characteristics of the environment:

  • Spark was the required platform for messaging. Enough of the team utilized Spark so as to make it an extremely viable delivery platform.
  • A SAAS platform was used for project status.
  • All the collateral and technical information resided in a document repository that everyone constantly works on and syncs to.

The solution concept: Develop a SparkBot attendant available to the Spark Team under which all of our spaces are created which can access an external API, a document repository, and respond to simple commands starting with:

  1. site_status
  2. site_summary
  3. site_networks
  4. comic_relief

OK..I started Googling and found lots of material but there were two issues:

  • Almost everything leveraged a dummy test environment often using ngrok, a fabulous little tool, but I needed this to be at least in the neighborhood of production ready. I did not want it to run on my home system.
  • Assembly was required in that there was good material on specific steps but not much on putting it all together. I suspect the thought there was that some of that should have been obvious but it wasn’t to me.

When I knew enough to be dangerous I figured I needed the following components:

ComponentDescriptionPlatform Used
Messaging Client & Account
The most basic requirement was a messaging client that was in use by enough of the team so as to be a viable delivery platform. This one was easy. Spark was already the messaging application of choice for the project. Using the messaging client solved all kinds of issues. I already had a client for a wide range of platforms, web, desktop, and mobile and I had a known User Interface to which the team was already accustomed.


Cisco WebEx Teams (Spark!!) Web Client, Desktop App, Mobile App
Messaging PlatformOf course the back end messaging platform had to support the required functionality for a Bot type application and research showed that it did. Spark has all the hooks for provisioning a Webhook and a Bot application.

Cisco WebEx Teams (Spark!!)
Document RepositoryThe Document repository the Team used had to have a client that I could install on the Web Server so that the data store would be available to bot functions and would always be synchronized (the latest).

Google Drive
Project Status APIThe Project Status tool had to support an API and I knew that the project SAAS did. (More on this later)

SAAS Project Management Tool
Web Server (Front End)Here is where it started getting tricky. I am by no means a System Administrator but I had to find a suitable web server and web server technology to use. The Web Server had to support the back end environment for the functionality and as I mentioned, also had to have a client so that I could present and sync the document repository.

Nginx
Web Server BackendFor the back end technology I always gravitate to Django but it was not a good fit in this case.
1. I did not foresee a need for a database
2. Most of the examples I found utilized Flask and I needed to get this working quickly before the village came after me with pitchforks and torches.

This was my first use of Flask and it is a testament to what you always here about it. It is simple and lightweight with quite a lot of functionality. More than enough for my purposes. With the excellent examples out there I had no issues getting it to work.

Flask & Python
Hosting PlatformThis was another challenge.

I mentioned I’m not a SysAdmin and initially I felt that bringing up a Linux server that I could “harden” sufficiently was probably not something I was going to do well so I started with a MacInCloud instance. It was a good Proof of Concept but I wasn’t happy with the performance or the cost so I abandoned it as the platform for the Bot Server.

There was nothing for it.

I needed to go down the path of a Linux host. I think I knew that already but I’ve been wanting to check out MacInCloud for a while and I thought that route would also save me time in researching how to secure a Linux server.
I always gravitate to Digital Ocean for this. They have superb documentation and How-Tos, simple and intuitive interface, excellent price points, and I’m a diver.

The first iteration of the Bot was actually on Digital Ocean and working quite well but I ran into a showstopper. The document sync client for Linux did not give you the option of changing the mount point. I could not grow the home volume on my Droplet sufficiently (the project data store is at about ~500GB) to host the data. I could add a volume but the sync client would try to sync to the home directory. I found a hack but it talked about sync instability and if I could not rely on the data being in sync on the Web Server then the rest was going to be pointless. Much of the functionality I needed involved accessing the latest data and documents on the repository.

Well, the other hosting platform I’ve been wanting to try is the Google Compute Platform. I must say I’m very impressed. Everything is pretty intuitive. I already had the domain I wanted to use on Google so that made DNS a breeze!

The production Bot is currently on GCP.

The public Demo system that is part of this post is on Digital Ocean. The Demo does not have the constraints of the production system and without the Digital Ocean documentation its doubtful I would have a working solution.

MacInCloud
Digital Ocean Droplet
Google Compute Platform (GCP)
Note: The Demo is built on a Digital Ocean Droplet
Messaging Platform SDK or ModuleI knew Spark had an SDK so worst case I had tools for the Bot.

Further research resulted in quite a few options for the Bot.

Here is where Hank Preston really saved me a ton of time. Hank’s webexteamsbot module provides a very nice framework for your own bot functions. It comes with a few that you will want to keep (/help is one) and provides some very good examples.

As your bot gets more capable (i.e. understands new functions or commands) you simply add those functions to your main script. The biggest issue I had was understanding what the module abstracted out. I needed to tap into a lot of the information…for example, the Spark space or room name had information that I needed to parse out so that if you ran a command within a space it would customize the output.

hpreston/webexteamsbot
ScriptsEach “function” or Bot command would need its own script or set of scripts and functions.

Python
HTTPS
I wanted to include as much security as my feeble SysAdmin abilities could muster. At a minimum SSL and some basic firewall functions.

HTTPS
certbot
DomainI wanted to use a FQDN.

cdl-automation.net
The Webex Teams ChatBot Subsystems
The ChatBot Subsystems

This is the first of a 5 Part Series on creating a production-ish ready WebEx Teams Chat Bot.

  1. Introduction
  2. The Basic Web Server
  3. The WebEx Teams (Spark) Backend Webhook
  4. The Bot Application on the Web Server
  5. The Bot

Building a Custom TextFSM Template

If you have seen any of the TextFSM posts on this site you know how useful the Network To Code TextFSM Template repository can be. Rarely do I not find what I need there!

I recently had to parse route summary information from JUNOS Looking Glass routers. I always check the very rich set of templates in the NTC Template index repository but in this case I was out of luck. I was going to have to build my own… and you get to watch.

Two fantastic resources you can use when you are in the same boat are here:

Its good to begin by familiarizing yourself with the output you need to parse. Here is a snippet of the show command output.

>show route summary
Autonomous system number: 2495
Router ID: 164.113.193.221
inet.0: 762484 destinations, 1079411 routes (762477 active, 0 holddown, 12 hidden)
Direct: 1 routes, 1 active
Local: 1 routes, 1 active
BGP: 1079404 routes, 762470 active
Static: 5 routes, 5 active
inet.2: 3073 destinations, 3073 routes (3073 active, 0 holddown, 0 hidden)
BGP: 3073 routes, 3073 active

Start with something simple like ASN and RouterID

A basic TextFSM Template

I wanted to start slowly with something I knew I could get to work. Looking at the data, it should be simple to extract the first two values I need:
– ASN
– Router ID

I started with those values as they are by far the simpler to extract from the ‘show route summary’ command. I will try not to cover material that is covered by the two Google links above. However I do want to point out the concept of TextFSM (as I understand it or explain it to myself) which is to provide context for your regular expressions. That is, not only can you define the specific pattern to search for but you can also define its “environment”. As you can see below the “Value” keyword lets me define a variable I want to pluck out of the unstructured text (the show command output). LIne 4 defines the “action” section to start processing and the first thing to look for is a line that starts with “Autonomous system number:” one or more space noted by the \s+ and then our ASN variable which we defined above as being a pattern of one or more digits \d+. So you have the power of the regular expression that defines the value you want and the power of regular expressions to help you define the context where your value will be found.

Junos ‘show route summary’ TextFSM Template – Version 1

For this exercise we will use my textfsm3 GitHub repository and the “test_textfsm.py” script for our testing rather than the Python command interpreter. Simply clone the repository to get started.
Note that the repository has the completed version of the template. Look at the history of the template file on GitHub to see its “evolution”.

(txtfsm3) Claudias-iMac:textfsm3 claudia$ python test_textfsm.py -h
usage: test_textfsm.py [-h] [-v] template_file output_file
This script applys a textfsm template to a text file of unstructured data (often show commands). The resulting structured data is saved as text (output.txt) and CSV (output.csv).
positional arguments:
template_file TextFSM Template File
output_file Device data (show command) output
optional arguments:
-h, --help show this help message and exit
-v, --verbose Enable all of the extra print statements used to investigate the results

In the first iteration of the template file, we obtain the output below.

(txtfsm3) Claudias-iMac:textfsm3 claudia$ python test_textfsm.py junos_show_route_summary
.template junos_show_route_summary.txt

TextFSM Results Header:
['ASN', 'RTRID']
================================
['2495', '164.113.193.221']
================================

Extract more details

So we have successfully built a template that will extract ASN and RouterID from the Junos show route summary command. Now it will get interesting because we also want this next set of values.

  • Interface
  • Destinations
  • Routes
  • Active
  • Holddown
  • Hidden

The first challenge here was to pick up the totals line. Here, one of my favorite tools comes into play, RegEx101. Regular expressions don’t come easy to me and this site makes it so easy! I saved the working session for trying to match the first part of that long totals line. As you can see, you can’t just match “inet”, or “inet” plus a digit, you also have to account for the “small.” Using RegEx101 and trial and error I came up with the following regular expression.

Value INT (([a-z]+.)?[a-z]+(\d)?.\d+)

inet.0: 762484 destinations, 1079411 routes (762477 active, 0 holddown, 12 hidden)

inet6.0: 66912 destinations, 103194 routes (66897 active, 0 holddown, 30 hidden)
Direct: 3 routes, 3 active

small.inet6.0: 31162 destinations, 31162 routes (31162 active, 0 holddown, 0 hidden)
BGP: 31162 routes, 31162 active

Let’s break it down…

The diagram below breaks the regex down into the key sections and numbers them. At the bottom you can see the actual text we are trying to parse and the numbers above indicate which section of the regex picked up the text we were interested in.

Breaking down the regular expression to extract the interface identifier (inet.x) for your TextFSM Template

The regex for INT (inet.x) was by far the most complicated. See 3 and 4 above. The rest of the line is far simpler and you just need to make sure you have it exactly as it appears in the raw text. Note that the parenthesis, which are part of the raw text show command, must also be ‘escaped’ just like the period.

Here is the TextFSM Template so far:

 Value Filldown ASN (\d+)
Value Filldown RTRID (\S+)
Value INT (([a-z]+.)?[a-z]+(\d)?.\d+)
Value DEST (\d+)
Value Required ROUTES (\d+)
Value ACTIVE (\d+)
Value HOLDDOWN (\d+)
Value HIDDEN (\d+)
Start
^Autonomous system number:\s+${ASN}
^Router ID:\s+${RTRID}
^${INT}:\s+${DEST}\s+destinations,\s+${ROUTES}\s+routes\s+\(${ACTIVE}\s+active,\s+${HOLDDOWN}\s+holddown,\s+${HIDDEN}\s+hidden\) -> Record

…and the resulting structured data:

(txtfsm3) Claudias-iMac:textfsm3 claudia$ python test_textfsm.py junos_show_route_summary.template junos_show_route_summary.txt
TextFSM Results Header:
['ASN', 'RTRID', 'INT', 'DEST', 'ROUTES', 'ACTIVE', 'HOLDDOWN', 'HIDDEN']
['2495', '164.113.193.221', 'inet.0', '762484', '1079411', '762477', '0', '12']
['2495', '164.113.193.221', 'inet.2', '3073', '3073', '3073', '0', '0']
['2495', '164.113.193.221', 'small.inet.0', '116371', '116377', '116371', '0', '0']
['2495', '164.113.193.221', 'inet6.0', '66912', '103194', '66897', '0', '30']
['2495', '164.113.193.221', 'small.inet6.0', '31162', '31162', '31162', '0', '0']

A few things to highlight, I used the ‘Filldown’ keyword for ASN and RTRID so that each “record” would have that information. The ‘Filldown’ keyword will take a value that appears once and duplicate it in subsequent records. If nothing else, it IDs the router from which the entry came but it also serves to simplify some things you might want to do down the line as each “record” has all the data. I also used the ‘Required’ keyword for routes to get rid of the empty last row that is generated when you used ‘Filldown’.

Almost there! We just need to pick up the source routes under each totals line.

Value SOURCE (\w+)
Value SRC_ROUTES (\d+)
Value SRC_ACTIVE (\d+)

Here is what the final (for now anyway) template looks like:

 Value Filldown ASN (\d+)
Value Filldown RTRID (\S+)
Value Filldown INT (([a-z]+.)?[a-z]+(\d)?.\d+)
Value DEST (\d+)
Value ROUTES (\d+)
Value ACTIVE (\d+)
Value HOLDDOWN (\d+)
Value HIDDEN (\d+)
Value SOURCE (\w+)
Value SRC_ROUTES (\d+)
Value SRC_ACTIVE (\d+)

Start
^Autonomous system number:\s+${ASN}
^Router ID:\s+${RTRID}
^${INT}:\s+${DEST}\s+destinations,\s+${ROUTES}\s+routes\s+(${ACTIVE}\s+active,\s+${HOLDDOWN}\s+holddown,\s+${HIDDEN}\s+hidden) -> Record
^\s+${SOURCE}:\s+${SRC_ROUTES}\s+routes,\s+${SRC_ACTIVE}\s+active -> Record

A few highlights. Because I wanted to store the source routes in a different value (SRC_ROUTES) I had to remove required from Routes in order to pick up the rows. I now have an extra row at the end but I can live with that for now. I also added Filldown to INT so that its clear where the source information came from.

(txtfsm3) Claudias-iMac:textfsm3 claudia$ python test_textfsm.py junos_show_route_summary.template junos_show_route_summary.txt

TextFSM Results Header:
['ASN', 'RTRID', 'INT', 'DEST', 'ROUTES', 'ACTIVE', 'HOLDDOWN', 'HIDDEN', 'SOURCE', 'SRC_ROUTES', 'SRC_ACT
IVE']
['2495', '164.113.193.221', 'inet.0', '762484', '1079411', '762477', '0', '12', '', '', '']
['2495', '164.113.193.221', 'inet.0', '', '', '', '', '', 'Direct', '1', '1']
['2495', '164.113.193.221', 'inet.0', '', '', '', '', '', 'Local', '1', '1']
['2495', '164.113.193.221', 'inet.0', '', '', '', '', '', 'BGP', '1079404', '762470']
['2495', '164.113.193.221', 'inet.0', '', '', '', '', '', 'Static', '5', '5']
['2495', '164.113.193.221', 'inet.2', '3073', '3073', '3073', '0', '0', '', '', '']
['2495', '164.113.193.221', 'inet.2', '', '', '', '', '', 'BGP', '3073', '3073']
['2495', '164.113.193.221', 'small.inet.0', '116371', '116377', '116371', '0', '0', '', '', '']
['2495', '164.113.193.221', 'small.inet.0', '', '', '', '', '', 'BGP', '116377', '116371']
['2495', '164.113.193.221', 'inet6.0', '66912', '103194', '66897', '0', '30', '', '', '']
['2495', '164.113.193.221', 'inet6.0', '', '', '', '', '', 'Direct', '3', '3']
['2495', '164.113.193.221', 'inet6.0', '', '', '', '', '', 'Local', '2', '2']
['2495', '164.113.193.221', 'inet6.0', '', '', '', '', '', 'BGP', '103185', '66888']
['2495', '164.113.193.221', 'inet6.0', '', '', '', '', '', 'Static', '4', '4']
['2495', '164.113.193.221', 'small.inet6.0', '31162', '31162', '31162', '0', '0', '', '', '']
['2495', '164.113.193.221', 'small.inet6.0', '', '', '', '', '', 'BGP', '31162', '31162']
['2495', '164.113.193.221', 'small.inet6.0', '', '', '', '', '', '', '', '']

The test_textfsm.py file will save your output into a text file as well as into a CSV file.
I did try using ROUTES for both sections and making it Required again. This got rid of the extra empty row but really impacts readability. I would have to keep track of how I used ROUTES as I would have lost the SRC_ROUTES distinction. That is a far greater sin in my opinion than an empty row at the end which is clearly just an empty row.

A quick example of using TextFSM to parse data from Cisco show commands – Python3 Version

As part of my ongoing effort to migrate everything over to Python 3, it’s time to show this “quick example” in Python 3.

TextFSM is a powerful parsing tool (python module) developed by Google.    There are some great examples out there to get you started. Here are two I urge you to read if this topic is of interest to you:

I can never get enough of examples so here is a very simple one to get you started or keep you practicing.  I find that a quick example where I can see results of my own making really energizes my learning process.

For this example, you need python 3 and the textfsm module installed.

TextFSM Getting Started is an excellent resource which includes the installation process for textfsm (textfsm is a pip installable module).

In addition to the environment, you will need 2 things to get started.

  • A TextFMS template
  • Content to parse with the TextFMS template

The TextFSM Pattern Matching or “Parsing” Template

The template may be the tricky part.  This template defines the structure of the data you are trying to gather or parse out of your content.  We are very fortunate that the Network to Code (NTC) team has given us a large library of templates from which to choose and so very often we don’t have to worry too much about the template itself. We just need to know what Cisco IOS show command has the information we want.

Content (network device output) to Parse

Once you have your template, you need content to parse.  You can get this in a variety of ways.  You can query your devices real time via Ansible or via a python script or you can act on file based (saved) data that you already have.

In this example we will keep it simple and assume we have a text file of show commands that we need to parse to get the device hardware information and software version.

To get hardware and software information, the “show version” output should have what we want.  So looking at the existing templates in the NTC library, it looks like the cisco_ios_show_version.template has what we need, If we look at the template we can see that it has two variables, VERSION and HARDWARE (which will return a list).  That looks just about right for the information we want to extract and luckily the file of show commands includes the output of the “show version” command.

So here are the two files we will work with in this example:

– the textFSM template file (downloadable from GitHub) and included my textfsm3 repository on GitHub.
ntc-templates/templates/cisco_ios_show_version.template

– the content file with show commands including the output of the show version command (downloadable here)
lab-swtich-show-cmds

For simplicity I’ve put them both in a temp directory and I will launch the python interpreter from there so we can work real time. I also list the modules that I have in the virtual environment. The only one you need for this example is textfsm.

Working directory and files

(txtfsm3) Eugenias-PB:textfsm3 eugenia$ tree
.
├── cisco_ios_show_version.template
└── lab-swtich-show-cmds.txt

The environment

(textfsm3) Eugenias-PB:~ eugenia$ python --version 
Python 3.6.55
(textfsm3) Eugenias-PB:textfsm-example eugenia$ pip freeze
ciscoconfparse==1.3.20
colorama==0.3.9
dnspython==1.15.0
dnspython3==1.15.0
ipaddress==1.0.22
textfsm==0.4.1
PB:textfsm-example eugenia$

Lets get started…

Launch the interpreter and import the textfsm module. The “>>>” tells you that you are in the python command interpreter.

(textfsm) Eugenias-PB:textfsm3 eugenia$ python 
Python 3.6.5 (default, Apr 25 2018, 14:26:36)
[GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.39.2)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>

Open the template file into a file handle I’ve called template and pass that as the argument to the textfsm.TextFSM method creating a “parsing object” based on that template (the show version template in our case).

>>> template = open('cisco_ios_show_version.template') 
>>> results_template = textfsm.TextFSM(template)

Look at some of the methods (functions) available in the results_template object by using dir() on the results_template object you just created. Make note of the ‘ParseText‘ one as that is the one we will use shortly to parse our content.

>>> import textfsm
>>> template = open('cisco_ios_show_version.template')
>>> results_template = textfsm.TextFSM(template)
>>> dir(results_template)
['GetValuesByAttrib', 'MAX_NAME_LEN', 'ParseText', 'Reset', '_AppendRecord', '_AssignVar', '_CheckLine', '_CheckRule', '_ClearAllRecord', '_ClearRecord', '_DEFAULT_OPTIONS', '_GetHeader', '_GetValue', '_Operations', '_Parse', '_ParseFSMState', '_ParseFSMVariables', '_ValidateFSM', '_ValidateOptions', 'class', 'delattr', 'dict', 'dir', 'doc', 'eq', 'format', 'ge', 'getattribute', 'gt', 'hash', 'init', 'init_subclass', 'le', 'lt', 'module', 'ne', 'new', 'reduce', 'reduce_ex', 'repr', 'setattr', 'sizeof', 'str', 'subclasshook', 'weakref', '_cur_state', '_cur_state_name', '_line_num', '_options_cls', '_result', 'comment_regex', 'header', 'state_list', 'state_name_re', 'states', 'value_map', 'values']

>>> results_template.value_map
{'VERSION': '(?P.+?)', 'ROMMON': '(?P\S+)', 'HOSTNAME': '(?P\S+)', 'UPTIME': '(?P.+)', 'RELOAD_REASON': '(?P.+?)', 'RUNNING_IMAGE': '(?P\S+)', 'HARDWARE': '(?P\S+\d\S+)', 'SERIAL': '(?P\S+)', 'CONFIG_REGISTER': '(?P\S+)', 'MAC': '(?P[0-9a-fA-F]{2}(:[0-9a-fA-F]{2}){5})'}
>>>

You don’t need these commands. I just wanted to show you how to investigate the object a little which is quite easy when working in the interpreter. Start with dir() which will give you the various methods or actions you can perform on the object you created. Focus on the ones without the underscore as those are generally internal methods although in many cases they can be useful. In the case of the textfms module I use the _results and the header method often.

This is the heart of what we are trying to do now that we’ve selected our parsing template and created an object that will parse our content and pull out the information we want. So far all we have done is built a specific “strainer” to catch the information we need. How we need to pour our data through our strainer.

First we open our text file of show commands. Below I create a file handle (basically a variable) to the show command file called content2parse and read the contents of our text file of show commands into the variable. You have to open the file first before you can read its contents.

>>> content2parse = open('lab-swtich-show-cmds.txt') 
>>> content = content2parse.read()

Now we parse out the data we want using our results_template object and its ParseText method against our content and store the results in the parsed_results variable. As you can see, this is a list of lists. If you ran this against show command files from 5 different devices you can begin to see the possibilities. You would have a list of 5 lists with the show version information for each device.

>>> parsed_results = results_template.ParseText(content) 
>>> parsed_results
[['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF']]

On a side note, notice that the value of results_template._result is equal to our parsed_results. Its a good practice to put your results in

>>> results_template._result
[['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'Reload command', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF', ['70:10:5C:53:D4:80']]]

Your first inclination might be to iterate over the results but remember that its a list of lists (notice the double opening square brackets, [[, the first denoting that the entire result is a list and the second denoting that the first element in the list is also a list) so iterating over the results would iterate over only the item in the “top level” list [0]. What you want is to iterate over each element of the results list which is also a list. I know that my top level list has 1 element, the [0] element, and that list has 8 elements. We only parsed one file which is why our list only has one element. If I iterate over the [0] element of the list I get each individual bit of information. Get the length of the top level list (should only have one element since we only parsed 1 file)

Get the length of list (the “outer” or top level list):

>>> len(parsed_results) 
1

>>> for item in parsed_results:
>>> … print(item)
>>> …
['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'Reload command', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF', ['70:10:5C:53:D4:80']]

Next, get the length of that first element (the 0 element as lists are zero indexed). Here is our data for our one device. If we iterate over our outer list we get a single line of output with the one element (also a list).

Lets find out how many elements our inner list has:

>>> len(parsed_results[0])
10

>>> index = 0
>>> for item in parsed_results[0]:
... print(f"Element # {index}: \t{item}")
... index += 1
...
Element # 0: 15.2(2)E3
Element # 1: Bootstrap
Element # 2: artic-sw01
Element # 3: 4 days, 14 hours, 2 minutes
Element # 4: Reload command
Element # 5: c2960s-universalk9-mz.152-2.E3.bin
Element # 6: ['WS-C2960S-24TS-S']
Element # 7: ['FOC1709W1DT']
Element # 8: 0xF
Element # 9: ['70:10:5C:53:D4:80']
>>>

I know I have 10 elements (0-9) because i checked the length and if you do a length on the value_map you can see that the template is designed to get 10 pieces of information so that makes sense. Get the length of the first list

>>> len(parsed_results[0]) 
10
>>> len(results_template.value_map)
10

Finally, after all of this I wanted to get the version and hardware information out of this file and here it is:

>>> parsed_results[0][0] 
'15.2(2)E3'
>>> parsed_results[0][6]
['WS-C2960S-24TS-S']
>>>

I hope these basics help you get started or keep practicing.

Here are all the main commands in sequence so you can see the flow without all of the interruptions. Also easier to copy and paste!

>>> import textfsm 
>>> template = open('cisco_ios_show_version.template')
>>> results_template = textfsm.TextFSM(template)
>>> content2parse = open('lab-swtich-show-cmds.txt')
>>> content = content2parse.read()
>>> parsed_results = results_template.ParseText(content)
>>> parsed_results
[['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'Reload command', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF', ['70:10:5C:53:D4:80']]]

https://github.com/cldeluna/textfsm3

A quick example of using TextFSM to parse data from Cisco show commands

This is the original post which used Python 2.7. Please see the updated post using Python 3.

TextFSM is a powerful parsing tool (python module) developed by Google.    There are some great examples out there to get you started. Here are two I urge you to read if this topic is of interest to you:

I can never get enough of examples so here is a very simple one to get you started or keep you practicing.  I find that a quick example where I can see results of my own making really energizes my learning process.

For this example, you need python 2.7 and the textfsm module installed.

TextFSM Getting Started is an excellent resource which includes the installation process for textfsm (textfsm is a pip installable module).

In addition to the environment, you will need 2 things to get started.

  • A TextFMS template
  • Content to parse with the TextFMS template

The template may be the tricky part.  This template defines the structure of the data you are trying to gather or parse out of your content.  We are very fortunate that the Network to Code (NTC) team has given us a large library of templates from which to choose and so we don’t have to worry too much about the template itself. We just need to know what Cisco IOS show command has the information we want.

Once you have your template, you need content to parse.  You can get this in a variety of ways.  You can query your devices real time via Ansible or via a python scirpt or you can act on file based data that you already have.

In this example we will keep it simple and assume we have a text file of show commands that we need to parse to get the device hardware information and software version.

To get hardware and software information, the “show version” output should have what we want.  So looking at the existing templates in the NTC library, it looks like the cisco_ios_show_version.template has what we need, If we look at the template we can see that it has two variables, VERSION and HARDWARE (which will return a list).  That looks just about right for the information we want to extract and luckily the file of show commands includes the output of the “show version” command.

So here are the two files we will work with in this example:

– the textFSM template file (downloadable from GitHub)
ntc-templates/templates/cisco_ios_show_version.template

– the content file with show commands including the output of the show version command (downloadable here)
lab-swtich-show-cmds

For simplicity I’ve put them both in a temp directory and I will launch the python interpreter from there so we can work real time. I also list the modules that I have in the virtual environment. The only one you need for this example is textfsm.

Working directory and files
(textfsm) Eugenias-PB:textfsm-example eugenia$ pwd
/Users/eugenia/Downloads/textfsm-example
(textfsm) Eugenias-PB:textfsm-example eugenia$ ls
cisco_ios_show_version.template lab-swtich-show-cmds.txt
(textfsm) Eugenias-PB:textfsm-example eugenia$ ls -al
total 24
drwxr-xr-x 4 eugenia staff 136 May 11 05:06 .
drwx------+ 118 eugenia staff 4012 May 11 05:04 ..
-rw-r--r--@ 1 eugenia staff 680 May 11 05:06 cisco_ios_show_version.template
-rw-r--r--@ 1 eugenia staff 7425 May 11 05:05 lab-swtich-show-cmds.txt

The environment
(textfsm) Eugenias-PB:~ eugenia$ python --version
Python 2.7.15


(textfsm) Eugenias-PB:textfsm-example eugenia$ pip freeze
et-xmlfile==1.0.1
graphviz==0.8.2
jdcal==1.3
netaddr==0.7.19
openpyxl==2.5.1
**textfsm==0.3.2**
xlrd==1.1.0
(textfsm) Eugenias-PB:textfsm-example eugenia$
Lets get started…
Launch the interpreter and import the textfsm module.  The ">>>" tells you that you are in the python command interpreter. 


(textfsm) Eugenias-PB:textfsm-example eugenia$ python
Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>>
>>> import textfsm
Open the template file into a file handle I've called template and pass that as the argument to the textfsm.TextFSM method
creating a "parsing object" based on that template (the show version template in our case).


>>> template = open('cisco_ios_show_version.template')
>>> results_template = textfsm.TextFSM(template)
Look at some of the methods available in the results_template object by using dir() on the results_template object you just created.  Make note of the 'ParseText' one 
as that is the one we will use shortly to parse our content.


>>> dir(results_template)
['GetValuesByAttrib', 'MAX_NAME_LEN', 'ParseText', 'Reset', '_AppendRecord', '_AssignVar', '_CheckLine', '_CheckRule', '_ClearAllRecord', '_ClearRecord', '_DEFAULT_OPTIONS', '_GetHeader', '_GetValue', '_Operations', '_Parse', '_ParseFSMState', '_ParseFSMVariables', '_ValidateFSM', '_ValidateOptions', '__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', '_cur_state', '_cur_state_name', '_line_num', '_options_cls', '_result', 'comment_regex', 'header', 'state_list', 'state_name_re', 'states', 'value_map', 'values']

>>> results_template.value_map
{'UPTIME': '(?P<UPTIME>.+)', 'HOSTNAME': '(?P<HOSTNAME>\\S+)', 'RUNNING_IMAGE': '(?P<RUNNING_IMAGE>\\S+)', 'CONFIG_REGISTER': '(?P<CONFIG_REGISTER>\\S+)', 'HARDWARE': '(?P<HARDWARE>\\S+\\d\\S+)', 'VERSION': '(?P<VERSION>.+?)', 'SERIAL': '(?P<SERIAL>\\S+)', 'ROMMON': '(?P<ROMMON>\\S+)'}

>>> results_template.values
[<textfsm.TextFSMValue object at 0x107c0ac90>, <textfsm.TextFSMValue object at 0x107c181d0>, <textfsm.TextFSMValue object at 0x107c18450>, <textfsm.TextFSMValue object at 0x107cf1750>, <textfsm.TextFSMValue object at 0x107cf1790>, <textfsm.TextFSMValue object at 0x107cf17d0>, <textfsm.TextFSMValue object at 0x107cf18d0>, <textfsm.TextFSMValue object at 0x107cf1950>]


You don’t need the last three commands. I just wanted to show you how to investigate the object a little which is quite easy when working in the interpreter.

This is the heart of what we are trying to do now that we’ve selected our parsing template and created an object that will parse our content and pull out the information we want.

First we open our text file of show commands. Below I create a file handle (basically a variable) to the 
show command file called content2parse and read the contents of that file into the variable content. 
You have to open the file first before you can read its contents.

>>> content2parse = open('lab-swtich-show-cmds.txt')
>>> content = content2parse.read()



Now we parse out the data we want using our results_template object (and its ParseText) method against 
our content and store the results in the parsed_results variable. As you can see, this is a list of lists. 
If you ran this against show command files from 5 different devices you can begin to see the possibilities. 
You would have a list of 5 lists with the show version information for each device.

>>> parsed_results = results_template.ParseText(content)
>>> parsed_results
[['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF']]



Your first inclination might be to iterate over the results but remember that its a list of lists so iterating over the results
would iterate over the only item in the "top level" list [0]. What you want is to iterate over each element of the results list
which is also a list. I know that my top level list has 1 element, the [0] element, and that list has 8 elements.  We only parsed one file which is why our list only has one element.  If I iterate 
over the [0] element of the list I get each individual bit of information.

Get the length of the top level list (should only have one element since we only parsed 1 file)
>>> len(parsed_results)
1
Get the length of that first element
>>> len(parsed_results[0])
8
>>> for item in parsed_results:
... print(item)
...
['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF']

>>> index = 0
>>> for item in parsed_results[0]:
... print("Element # {}: {}".format(index,item))
... index = index + 1
...
Element # 0: 15.2(2)E3
Element # 1: Bootstrap
Element # 2: artic-sw01
Element # 3: 4 days, 14 hours, 2 minutes
Element # 4: c2960s-universalk9-mz.152-2.E3.bin
Element # 5: ['WS-C2960S-24TS-S']
Element # 6: ['FOC1709W1DT']
Element # 7: 0xF
>>>


I know I have 8 elements (0-7) because i checked the length and if you do a length on the value_map you can see 
that the template is designed to get 8 pieces of information so that makes sense.

Get the length of the first list
>>> len(parsed_results[0])
8
Get the length of the value map
>>> len(results_template.value_map)
8



Finally, after all of this I wanted to get the version and hardware information out of this file and here it is:


>>> parsed_results[0][0]
'15.2(2)E3'
>>> parsed_results[0][5]
['WS-C2960S-24TS-S']
>>>

I hope these basics help you get started or keep practicing.

Here are all the main commands in sequence so you can see the flow without all of the interruptions. Also easier to copy and paste!

Python 2.7.10 (default, Feb 7 2017, 00:08:15)
[GCC 4.2.1 Compatible Apple LLVM 8.0.0 (clang-800.0.34)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import textfsm
>>> template = open('cisco_ios_show_version.template')
>>> results_template = textfsm.TextFSM(template)
>>> content2parse = open('lab-swtich-show-cmds.txt')
>>> content = content2parse.read()
>>> parsed_results = results_template.ParseText(content)
>>> parsed_results
[['15.2(2)E3', 'Bootstrap', 'artic-sw01', '4 days, 14 hours, 2 minutes', 'c2960s-universalk9-mz.152-2.E3.bin', ['WS-C2960S-24TS-S'], ['FOC1709W1DT'], '0xF']]