Notebook: using jsonstat.py python library with jsonstat format version 1.¶

This Jupyter notebook shows the python library jsonstat.py in action. The JSON-stat is a simple lightweight JSON dissemination format. For more information about the format see the official site. This example shows how to explore the example data file oecd-canada from json-stat.org site. This file is compliant to the version 1 of jsonstat.

# all import here
from __future__ import print_function
import os
import pandas as ps # using panda to convert jsonstat dataset to pandas dataframe
import jsonstat     # import jsonstat.py package

import matplotlib as plt  # for plotting

%matplotlib inline

Download or use cached file oecd-canada.json. Caching file on disk permits to work off-line and to speed up the exploration of the data.

url = 'http://json-stat.org/samples/oecd-canada.json'
file_name = "oecd-canada.json"

file_path = os.path.abspath(os.path.join("..", "tests", "fixtures", "www.json-stat.org", file_name))
if os.path.exists(file_path):
    print("using already downloaded file {}".format(file_path))
else:
    print("download file and storing on disk")
    jsonstat.download(url, file_name)
    file_path = file_name

using already downloaded file /Users/26fe_nas/gioprj.on_mac/prj.python/jsonstat.py/tests/fixtures/www.json-stat.org/oecd-canada.json

Initialize JsonStatCollection from the file and print the list of dataset contained into the collection.

collection = jsonstat.from_file(file_path)
collection

JsonstatCollection contains the following JsonStatDataSet:

pos	dataset
0	'oecd'
1	'canada'

Select the dataset named oedc. Oecd dataset has three dimensions (concept, area, year), and contains 432 values.

oecd = collection.dataset('oecd')
oecd

name: 'oecd'
label: 'Unemployment rate in the OECD countries 2003-2014'
source: 'Unemployment rate in the OECD countries 2003-2014'
size: 3

pos	id	label	size	role
0	concept	indicator	1	metric
1	area	OECD countries, EU15 and total	36	geo
2	year	2003-2014	12	time

Shows some detailed info about dimensions

oecd.dimension('concept')

pos	idx	label
0	'UNR'	'unemployment rate'

oecd.dimension('area')

pos	idx	label
0	'AU'	'Australia'
1	'AT'	'Austria'
2	'BE'	'Belgium'
3	'CA'	'Canada'
...	...	...

oecd.dimension('year')

pos	idx	label
0	'2003'	''
1	'2004'	''
2	'2005'	''
3	'2006'	''
...	...	...

Accessing value in the dataset¶

Print the value in oecd dataset for area = IT and year = 2012

oecd.data(area='IT', year='2012')

JsonStatValue(idx=201, value=10.55546863, status=None)

oecd.value(area='IT', year='2012')

10.55546863

oecd.value(concept='unemployment rate',area='Australia',year='2004') # 5.39663128

5.39663128

oecd.value(concept='UNR',area='AU',year='2004')

5.39663128

Trasforming dataset into pandas DataFrame¶

df_oecd = oecd.to_data_frame('year', content='id')
df_oecd.head()

	concept	area	Value
year
2003	UNR	AU	5.943826
2004	UNR	AU	5.396631
2005	UNR	AU	5.044791
2006	UNR	AU	4.789363
2007	UNR	AU	4.379649

df_oecd['area'].describe() # area contains 36 values

count     432
unique     36
top        JP
freq       12
Name: area, dtype: object

Extract a subset of data in a pandas dataframe from the jsonstat dataset. We can trasform dataset freezing the dimension area to a specific country (Canada)

df_oecd_ca = oecd.to_data_frame('year', content='id', blocked_dims={'area':'CA'})
df_oecd_ca.tail()

	concept	area	Value
year
2010	UNR	CA	7.988900
2011	UNR	CA	7.453610
2012	UNR	CA	7.323584
2013	UNR	CA	7.169742
2014	UNR	CA	6.881227

df_oecd_ca['area'].describe()  # area contains only one value (CA)

count     12
unique     1
top       CA
freq      12
Name: area, dtype: object

df_oecd_ca.plot(grid=True)

<matplotlib.axes._subplots.AxesSubplot at 0x113980908>

../_images/oecd-canada-jsonstat_v1_24_1.png

Trasforming a dataset into a python list¶

oecd.to_table()[:5]

[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Australia', '2004', 5.39663128],
 ['unemployment rate', 'Australia', '2005', 5.044790587],
 ['unemployment rate', 'Australia', '2006', 4.789362794]]

It is possible to trasform jsonstat data into table in different order

order = [i.did() for i in oecd.dimensions()]
order = order[::-1]  # reverse list
table = oecd.to_table(order=order)
table[:5]

[['indicator', 'OECD countries, EU15 and total', '2003-2014', 'Value'],
 ['unemployment rate', 'Australia', '2003', 5.943826289],
 ['unemployment rate', 'Austria', '2003', 4.278559338],
 ['unemployment rate', 'Belgium', '2003', 8.158333333],
 ['unemployment rate', 'Canada', '2003', 7.594616751]]