Handling complex nested dicts in Python
Python is a lovely language for data processing, but it can get a little verbose when dealing with large nested dictionaries.
Let’s say you’re using some parsed JSON, for example from the Wikidata API. The structure is pretty predictable, but not at all times: some of the keys in the dictionary might not be available all the time.
Consider a structure like this:
animals = [
{
"animal" : "bunny"
},
{}
]
If you would try to directly access the animal
property in a fashion like this:
for item in animals:
print item["animal"]
You would get an error like this:
bunny
Traceback (most recent call last):
KeyError: 'animal'
Because the animal
key is missing in the second item in the list. You could use the handy get
method:
for item in animals:
print item.get("animal", "no animal available")
The second argument to get
is a default value that will be used if the key is not available:
bunny
no animal available
Excellent! However, this leads to problems when having a nested structure:
animals = [
{
"animal" : {
"type" : "bunny"
}
},
{
"animal" : {}
},
{}
]
You could nest the get
statements
for item in animals:
print item.get("animal").get("type")
But leads to an error because the animal
key is lacking in the third item.
You could do something like this:
for item in animals:
if "animal" in item:
print item.get("animal").get("type")
But with deeply nested structures (i counted seven levels in the Wikidata API) this gets unwieldy pretty fast.
Wouldn’t it be awesome if you could simply do this?
for item in animals:
print item.get("animal/type")
Note the /
in the get
method.
Unfortunately, this is not possible in vanilla Python, but with a really small helper class you can easily make this happen:
class DictQuery(dict):
def get(self, path, default = None):
keys = path.split("/")
val = None
for key in keys:
if val:
if isinstance(val, list):
val = [ v.get(key, default) if v else None for v in val]
else:
val = val.get(key, default)
else:
val = dict.get(self, key, default)
if not val:
break;
return val
Now you can do this:
for item in animals:
print DictQuery(item).get("animal/type")
bunny
{}
None
Nice, huh?
Nils Breunese
Nice if that’s your use case, but aren’t you just beginning to reinvent XPath for JSON? Oh wait, apparently there is already JSONQuery, JSONPath, JSONselect, JSPath, Json Pointer, Jsel…
“The nice thing about standards is that you have so many to choose from.” – Andrew S. Tanenbaum
Nils Breunese
Oh and by the way: jq is awesome!
https://stedolan.github.io/jq/
Juka Gruenzeug
Thank you for this blogpost, it is exactly what I was looking for! Well, it would be nice to have a library for searching complex dict/list structures, but so far I didn’t find any…
For future readers / concerning Nils’ post:
None of these are for python, so listing them wasn’t exactly helpful…?
Jorge Ibáñez
Juka, dpath is a library to perform complex queries on dictionaries. See:
https://github.com/akesterson/dpath-python
dliao
very useful dict helper, tx very much!!
Alex
So strange that Python doesn’t have this feature built in. Here is an alternate helper function that can be fed a list of keys and will return the nested value:
def get_nested(my_dict, keys=[]):
key = keys.pop(0)
if len(keys) == 0:
return my_dict[key]
return get_nested(my_dict[key], keys)
M
This is really awesome. Helped me a lot
Jon
Thanks for writing this tutorial.
Teja
Thankyou for putting this. It has helped me in working with yml file with multiple documents in a very concise form.
Kristoffer B
Thanks for posting this, I’ve bookmarked it for the next time I’ll run into this issue.
I can also recommend the JMESPath library: https://github.com/jmespath/jmespath.py
I find this easy to use when I need to do more advanced parsing or transformation (also it’s quite frequently updated).
Anonymous
Nice, but it only works two levels deep…
mike
excellent class, Ive been looking for proper pythonic way to handle large Dict exception handling and this is the best thing Ive found so far,
I went so far as to write messy nested Try Except statements for KeyError, IndexError, etc, really ugly stuff. This is much more elegant.
hay
Thanks! You might be interested in dataknead, my new data processing library which has this dict querying method built in.
Abhi
very nice!
Ravi T C
I am trying to convert JSON data into a CSV in Python3, but it no longer works with this script, giving me different errors. Anyone know how to fix for Python 3? Thanks.
Below is my JSON data:
{
“fruit”: [
{
“name”: “Apple”,
“binomial name”: “Malus domestica”,
“major_producers”: [
“China”,
“United States”,
“Turkey”
],
“nutrition”: {
“carbohydrates”: “13.81g”,
“fat”: “0.17g”,
“protein”: “0.26g”
}
},
{
“name”: “Orange”,
“binomial name”: “Citrus x sinensis”,
“major_producers”: [
“Brazil”,
“United States”,
“India”
],
“nutrition”: {
“carbohydrates”: “11.75g”,
“fat”: “0.12g”,
“protein”: “0.94g”
}
},
{
“name”: “Mango”,
“binomial name”: “Mangifera indica”,
“major_producers”: [
“India”,
“China”,
“Thailand”
],
“nutrition”: {
“carbohydrates”: “15g”,
“fat”: “0.38g”,
“protein”: “0.82g”
}
}
]
}
Bimlesh Sharma
How i can change value of nested key, for example:
[a:b, a1:{}, a2:{ b1:test, b2:vest},a3:[{},{}]}]
i want to change value that is inside a3 key
Fabio Caccamo
I suggest you to try puthon-benedict to use dicts with full dotted keypath support.
https://github.com/fabiocaccamo/python-benedict
Anon
As far as I know, you should add a license to (at least) your ‘DictQuery’ helper snippet
hay
@Anon: if you would need a license, i hereby declare that it is in the public domain according to the CC0 license: https://creativecommons.org/publicdomain/zero/1.0/deed
Anonymous
Very good.
Anonymous
Here’s another way:
from functools import reduce
from operator import getitem
def get_nested(nested_obj, keylist):
“””
Programmatic access to nested dictionaries, lists and tuples.
(such as output from load_json_data())
Example:
nested_obj = {‘1’: {‘b’: [‘hello’, ‘world’]}}
keylist=(‘1’, ‘b’, 1)
–> ‘world’
:param nested_obj: Any callable Python data object (with nested content)
:param keylist: Path of keys needed to access the desired data item.
:return: Data item
“””
return reduce(getitem, keylist, nested_obj)