Decomposing Data Structures

Whether you are trying to find all the tenants in an ACI fabric, or all the interface IPs and descriptions on a network device, or trying to determine if the Earth is in imminent danger from an asteroid hurtling towards it, understanding complex data structures is a critical skill for anyone working with modern IT infrastructure technology.

As APIs become more and more prevalent, acting on an object (device, controller, cloud-based service, management system, etc.) will return structured data in a format you may have never seen before. For beginners, this may be a little daunting but once you get a few basics down you will be decomposing data structures at dinner parties!

Most modern APIs return data in XML or JSON format. Some give you the option to choose. We won't spend any time on how you get this structured data. That will be a topic for another day. The focus today is how to interpret and manipulate data returned to you in JSON. I'm not a fan of XML so if I ever run into an API that only returns XML (odds are you will too) I do my very best to convert it to JSON first.

Lets get some basics out of the way. Curly braces {}, square brackets [], and spacing (whitespace) provide a syntax for this returned data. If you know a little Python these will be familiar to you.

Symbol


Meaning
[ ]LIST
Square brackets denote a list of values will be found between the opening [ and closing ] brackets separated by commas List elements are referenced by the number of its position in the list so that in a list like my_list = [1,2,3,4], if you want the information in the 3rd position (the third element) which is the number 3 you say my_list[2] because lists are zero indexed and so my_list[0] is the number 1, my_list[1] is the number 2, etc.
{ }DICTIONARY
Curly braces denote a list of key value pairs will be found between the opening { and closing } braces separated by commas with the key and value pair separated by a colon : key: value Dictionary key:value pairs are reference by the key so that in a dictionary like my_dict = {'la': 'Dodgers', 'sf': 'Giants'} if you want to know the baseball team in LA you reference my_dict['la'].

Lists look like this:

[
item1,
item2,
item3
]

or (equally valid)

[item1,item2,item3]

Dictionaries look like this:

{
key:value,
otherkey:othervalue
}

or (equally valid)

{key:value, otherkey:othervalue}

The other thing to note is that when these structures are formatted, spacing is important and can tell you a lot about the hierarchy.

The examples above are not quoted and they should be. I didn't because I wanted to highlight that single and double quotes can be used in Python but the JSON standard requires double quotes.

Its also important to understand that these two basic structures can be combined so that you can have a list of dictionaries or a dictionary of key:value pairs where the value is a list or a dictionary. You can have many levels and combinations and often that is what makes the data structure seem complex.

Lets start with a simple example.

Below is the output of an Ansible playbook that queries an ACI fabric for the configured tenants.

Cisco ACI Tenatn Query Playbook output
ACI Tenant Query Playbook - output

Lets look at the data that was returned, which is highlighted in a big yellow outer box in the image below. Also highlighted are the syntax symbols that will let us decompose this structure. As you can see from a, the entire structure is encased in curly braces {}. That tells us that the outermost structure is a dictionary and we know that dictionaries provide us with a key (think of this as an index value you use to get to the data in the value part) followed by a colon : followed by a value. In this case, we have a dictionary with one element or key:value pair and the value is a list. We know it is a list from b, which shows left square bracket immediately following the colon and denoting the start of a list structure.

{ "key" : "value" } ( where value is a list structure [])
{"tenantlist": ["common", "mgmt", "infra", etc..]}

Building a Custom TextFSM Template
ACI Tenant Query Playbook - output with annotations


This is a simple structure. It is short, so you can see the entire data structure in one view, but it has both types of data structures that you will encounter, a list and a dictionary.


Lets look at something a bit more complex.

REST COUNTRIES provides a public REST API for obtaining information about countries.

Check them out!

Using Postman, I submitted a query about Kenya. Below are the first 25 or so lines of the data returned.

Noting the colored letters and line numbers below in the image:

a. Line 1 shows us the left square bracket (outlined in a green box) which tells us this is a list. That is the outermost structure so you know that to get to the first element you will need to use a list reference like mylist[0] for the first element.

b. Line 2 shows us the left curly brace (outlined in a yellow box) which indicates that what follows as the first element of the list is a dictionary. Line 3 is a key:value pair where the key "name" has a value of "Kenya".

c. Line 4 is a key:value pair but the value of key "topLevelDomain" is a list (of one element ".ke").
Line7 returns us to a simple key:value pair.

Structured data returned from REST Countries API query
Structured data returned from REST Countries API query

Here is where it can start getting confusing. Remembering our reference rules..that is:

  • you reference a list element by its zero indexed positional number and
  • you reference the value of a dictionary element by its key

Don't be distracted by the data within the syntax symbols just yet. If you see something like [ {},{},{} ], (ignoring the contents inside the braces and brackets) you should see that this is a list of three elements and those elements are dictionaries. Assuming alist = [ {},{},{} ] you access the first element (dictionary) with alist[0].

Going a step further, if you see this structure [ {"key1":[]},{"a_key": "blue"},{"listkey": [1,2,3]} ], you already know its a list of dictionaries. Now you can also see that the first element in the list is a dictionary with a single key:value pair and the key is "key1" and the value is an empty list. The second element in the list is also a dictionary with a single key:value pair with a key of "a_key" and a value of a string "blue". I'll leave you to describe the third element in the list.

Assuming my_list = [ {"key1":[]},{"a_key": "blue"},{"listkey": [1,2,3]} ], if I wanted to pull out the string "blue" I would reference my_color = my_list[1]["a_key"] and the variable my_color would be equal to "blue". The string "blue" is the value of the second dictionary in the list. Remembering that list elements start at "0" (zero indexed), you need to access the element in the second position with [1]. To get to "blue" you have to use the key of "a_key" and so you have mylist[1]["a_key"] which will give you "blue".

Lets try to extract the two letter country code for Kenya.

So I'm going to cheat a little here to introduce you to the concept of digging in to the data structure to "pluck" out the data you want. You really want to understand the entire data structure before doing this so I'm going to assume that this is a list with one element and that element is a dictionary of key value pairs with different information about Kenya.

That being the case, first I have to reference the element in the list.

  • The list has one element so I know I have to reference it with the zero index [0] (the first and only element in the list)
  • Next I have to pluck out the 2 letter country code for Kenya and that is in a key:value pair with the key of 'alpha2code'

So assuming we have a variable country_info that has our data structure, then to reference the 2 letter country code I would need to use

country_info[0]["alpha2Code"]

That reference structure above would return "KE".

[0] takes us one level deep into the first element of the list. This is where the dictionary (note the curly brace in line 2 below) can be accessed. At this level we can access the key we need "alpha2Code" to get the 2 letter country code.

country_info =

Extracting the Country Code from the JSON output
Extracting the Country Code from the JSON output

Lets build on this. What if I need the country calling code so I can make a phone call to someone in Kenya from another country? For this, we need to go a level deeper as the country code is in a key named "callingCodes" at the same level of "alpha2Code" but the value is a list rather than a variable. See lines 9 - 11 in the image above. We know how to reference a list, so in this case, if I wanted the first country code in the list my reference structure would look like:

country_info[0]["callingCodes"][0]

That would return "254" (a string).

In many cases, you might want the entire list and so to get that:

country_info[0]["callingCodes"]

That would return ["254"] as a list (yes a list with only one element but a list because its enclosed in square brackets). There are cases where you may want to do some specific manipulation and you need the entire list.

Extra: In the companion GitHub repository to this post there is a quick & dirty Python3 script country_info_rest.py that will let you get some country data and save it to a file. There is also an optional "decompose function" that you can tailor to your needs to get a feel for decomposing the data structure via a script.

(generic_py3_env) Claudias-iMac:claudia$ python country_info_rest.py -h
usage: country_info_rest.py [-h] [-n CNAME] [-d]
​
Call REST Countries REST API with a country name.
​
optional arguments:
  -h, --help            show this help message and exit
  -n CNAME, --cname CNAME
                        Country Name to override default (Mexico)
  -d, --decompose       Execute a function to help decompose the response
​
Usage: 'python country_info_rest.py' without the --cname argument the script
will use the default country name of Mexico. Usage with optional name
parameter: 'python country_info_rest.py -n Singapore'. Note: this is a python3
script.
​​

Lets look at something really complex.

Now a word about "complex". At this point I'm hoping you can start to see the pattern. It is a pattern of understanding the "breadcrumbs" that you need to follow to get to the data you want. You now know all the "breadcrumb" formats:

  • [positional number] for lists and
  • ["key"] for dictionaries

From here on out its more of the same. Lets see this in action with the data returned from the NASA Asetroids - Near Earth Object Web Service.

But first, here is why I called this "really complex". A better term might be "long with scary data at first glance". At least it was for me because when I first ran my first successful Ansible playbook and finally got data back it was like opening a shoe box with my favorite pair of shoes in it and finding a spider in the box…yes, shrieking "what IS that!" and jumping away.

I hope that by this point there is no shrieking but more of a quizzical.."Hmmm OK, I see the outer dictionary with a key of "asteroid_output" and a value of another dictionary with quite alot of keys and some odd looking stuff…". Lets get into it!

…and if there is shrieking then I hope this helps get you on your way to where its quieter.

Raw output from an Ansible Playbook querying the NASA Near Earth Web Service
Raw output from an Ansible Playbook

I want to pluck out the diameter of the asteroid as well as something that tells me if there is any danger to Earth.

Somewhere, in the middle of all of this output, is this section which has the data we want but where is it in reference to the start of the entire data structure? Where are the breadcrumbs? Hard to tell... or is it?

Information we need to get to within the data structure returned by the Ansible playbook

You can visually walk the data but for data structures of this size and with many levels of hierarchy it can be time consuming and a bit daunting until you get the hang of it. There are a number of approaches I've tried including:

  1. visually inspecting it (good for 25 lines or less....if you cannot fit it on a single page to "eyeball" it try one of the other methods below...I promise you it will save you time)
  2. saving the output to a text file and opening it up in a modern text editor or IDE so that you can inspect and collapse sections to get a better understanding of the structure
  3. using a Python script or Ansible playbook to decompose by trial and error
  4. using a JSON editor to convert to a more readable structure and to interpret the data structure for you

I don't recommend the first approach at all unless your data is like our first example or you combine it with the Python (or Ansible) trial and error approach but this can be time consuming. I do have to recommend doing it this way once because it really helps you understand what is going on.

Using a good advanced editor (*not* Notepad.exe) or IDE (Integrated Development Environment) is a good approach but for something that makes my eyes cross like the output above I use a JSON editor.

In the two sections below I'll show you a bit more detail on approach #2 and #4. Play around with the companion GitHub repository for an example of approach #3.

Asteroid Data collapsed down in Sublime Text Editor

Note that this has already been collapsed down to the value of the key asteroid_output so the outer dictionary is already stripped off. In this view it looks a bit more manageable and the values we want can be found at the level shown below:

asteroid_output["json"]["near_earth_objects"][<date_key>]

where <date_key> can be any of the 8 date keys found in line 23, line 858, line 1154 etc. The gap in the line numbers give you a sense of how much data we've collapsed down but I hope you can begin to see how that makes it easier to start understanding you how you need to walk the path to the data you want.

asteroid_output =

Expanding one of the date keys as shown in the next image shows us how we might start to get to the data we want.

asteroid_output["json"]["near_earth_objects"]["2019-07-07"]

The date key we expanded, "2019-07-07", has a value that is a list. If we take the first element of that list we can get the estimated diameter in feet and the boolean value of "are we to go the way of the dinosaurs" or technically they value of key "is_potentially_hazardous_asteroid".

Estimated maximum diameter in feet:

asteroid_output["json"]["near_earth_objects"]["2019-07-07"][0]["estimated_diameter"]["feet"]["estimated_diameter_max"]

Is this going to be an extinction level event?:

asteroid_output["json"]["near_earth_objects"]["2019-07-07"][0]["is_potentially_hazardous_asteroid"]

Which will give us false (for that one date anyway :D).

Using a good text editor or IDE to investigate a data structure by expanding
Using a good text editor or IDE to investigate a data structure by expanding

Using JSON tools to decompose your data structure

It is here I must confess that these days if I can't visually figure out or "eyeball" the "breadcrumbs" I need to use to get to the data I want, I immediately go to this approach. Invariably I think I can "eyeball" it and miss a level.

If I'm working with non-sensitive data JSON Editor Online is my personal favorite.

  1. I copy the output and paste it into the left window,
  2. click to analyze and format into the right window, and
  3. then I collapse and expand to explore the data structure and figure out the breadcrumbs that I need.

The Editor gives you additional information and element counts and has many other useful features. One of them is allowing you to save an analysis on line so you can share it.

Decomposing_Data_Structures_asteroid_output in JSON Editor Online

Using the JSON Editor Online to navigate through the returned data from your API call
Using the JSON Editor Online to navigate through the returned data from your API call

There are occasions where I'm not working with public data and in those cases I'm more comfortable using a local application. My "go to" local utility is JSON Editor from Vlad Badea available from the Apple Store. I don't have a recommendation for Windows but I know such tools exist and some look interesting.

For this data set, the local JSON Editor application does a nicer job of representing the asteroid_output because it really collapses that hairy content value.

Using the JSON Editor App on your system to navigate through the returned data from your API call
JSON Editor APP on local system

Using a Python script to decompose by trial and error

In this repository there is a rudimentary Python3 script country_info_rest.py which, when executed with the "-d" option, will attempt to walk the response from the REST Country API a couple of levels.

The first part of the script executes a REST GET and saves the response. With the "-d" option it also executes a "decompose" function to help understand the returned data structure. Some sample output from the script follows.

Outer structure (0) levels deep:
        The data structure 0 levels deep is a <class 'list'>
        The length of the data structure 0 levels deep is 1
​
One level deep:
        The data structure 1 level deep is a <class 'dict'>
        The length of the data structure 1 level deep is 24
​
        Dictionary keys are dict_keys(['name', 'topLevelDomain', 'alpha2Code', 'alpha3Code', 'callingCodes', 'capital', 'altSpellings', 'region', 'subregion', 'population', 'latlng', 'demonym', 'area', 'gini', 'timezones', 'borders', 'nativeName', 'numericCode', 'currencies', 'languages', 'translations', 'flag', 'regionalBlocs', 'cioc'])
​
                Key: name       Value: Singapore
​
                Key: topLevelDomain     Value: ['.sg']
​
                Key: alpha2Code         Value: SG
​
                Key: alpha3Code         Value: SGP
​
                Key: callingCodes       Value: ['65']
​
                Key: capital    Value: Singapore
​
                Key: altSpellings       Value: ['SG', 'Singapura', 'Republik Singapura', '新加坡共和国']
​
                Key: region     Value: Asia
​
                Key: subregion  Value: South-Eastern Asia
​
                Key: population         Value: 5535000
​
                Key: latlng     Value: [1.36666666, 103.8]
​
                Key: demonym    Value: Singaporean
​
                Key: area       Value: 710.0
​
                Key: gini       Value: 48.1
​
                Key: timezones  Value: ['UTC+08:00']
​
                Key: borders    Value: []
​
                Key: nativeName         Value: Singapore
​
                Key: numericCode        Value: 702
​
                Key: currencies         Value: [{'code': 'BND', 'name': 'Brunei dollar', 'symbol': '$'}, {'code': 'SGD', 'name': 'Singapore dollar', 'symbol': '$'}]
​
                Key: languages  Value: [{'iso639_1': 'en', 'iso639_2': 'eng', 'name': 'English', 'nativeName': 'English'}, {'iso639_1': 'ms', 'iso639_2': 'msa', 'name': 'Malay', 'nativeName': 'bahasa Melayu'}, {'iso639_1': 'ta', 'iso639_2': 'tam', 'name': 'Tamil', 'nativeName': 'தமிழ்'}, {'iso639_1': 'zh', 'iso639_2': 'zho', 'name': 'Chinese', 'nativeName': '中文 (Zhōngwén)'}]
​
                Key: translations       Value: {'de': 'Singapur', 'es': 'Singapur', 'fr': 'Singapour', 'ja': 'シンガポール', 'it': 'Singapore', 'br': 'Singapura', 'pt': 'Singapura', 'nl': 'Singapore', 'hr': 'Singapur', 'faگاپور'}
​
                Key: flag       Value: https://restcountries.eu/data/sgp.svg
​
                Key: regionalBlocs      Value: [{'acronym': 'ASEAN', 'name': 'Association of Southeast Asian Nations', 'otherAcronyms': [], 'otherNames': []}]
​
                Key: cioc       Value: SIN

===== Plucking out specific data:     
2 Letter Country Code:                          SG     
First (0 index) International Calling Code:     65     
List of International Calling Code:             ['65']     
==========================================

Feel free to take this script and add to it and modify it for your own data structure!


Apart from the first example, I have deliberately not used data from network devices. I wanted to show that the data source really does not matter. Once you understand how to decompose the data, that is, get to the data you want within a returned data structure, you can pluck out data all day long from any data set. A secondary objective was to play around with some of these APIs. While checking to see if the Earth is about to get broadsided by an asteroid is clearly important there are quite a few public and fee-based APIs out there with perhaps more practical use.

Of course within our own networks, we will be querying device and controller APIs for status and pushing configuration payloads. We will be pulling inventory data from a CMDB system API, executing some actions, perhaps some updates, and recording any changes via API to the Ticketing System.


Some final tips, links, and notes:

Some sites have excellent API documentation that tell you exactly what will be returned but some don't, so in many instances you have to do this decomposition exercise anyway. It's best to get familiar with it. It's like knowing how to read a map and and how to navigate in case you forget your GPS.

JSON Tools

JSON Editor Online

REST APIs

REST COUNTRIES

NASA Asteroids - Near Earth Object Web Service

https://api.nasa.gov/api.html#NeoWS

Examples Repository cldeluna/Decomposing_DataStructures


The Gratuitous. Arp

2019-07-07