Data visualization

The most common usecases are presented

matplotlib : To visualize results of computations

pandas : Visualize data from files

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
In [2]:
%matplotlib inline

Simple X[Y1, Y2,..] plots

Used to visualize results of computations

In [3]:
x = np.arange(0,10)
x
Out[3]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [29]:
y = x**2
y1 = x**1.8
In [43]:
plt.plot(x,y,y1)
Out[43]:
[<matplotlib.lines.Line2D at 0x7f88014232b0>,
 <matplotlib.lines.Line2D at 0x7f8801423470>]

Change styles

In [7]:
plt.plot(x,y,'r*')
Out[7]:
[<matplotlib.lines.Line2D at 0x7f8801a884a8>]

Zoom, title, axis labels

In [42]:
#Viewing specific portion of graph
plt.xlim(0,4)
plt.ylim(0,10)

#Specifying labels and title
plt.title("TITLE")
plt.xlabel('X LABEL')
plt.ylabel('Y LABEL')
plt.xticks([0,1,2,3,4],['A','B','C','D','E'])

plt.plot(x,y)
Out[42]:
[<matplotlib.lines.Line2D at 0x7f88014630b8>]

Scatter plot to visualize features from two classes

In [66]:
feature = np.random.randint(0,10,(10,2))
labels = np.random.randint(0,2,10)
print(feature)
print(labels)
[[7 9]
 [6 7]
 [1 6]
 [0 3]
 [6 5]
 [5 9]
 [1 3]
 [6 1]
 [7 3]
 [4 2]]
[0 0 0 1 0 0 0 1 1 0]

Each point on the plot is (feature[sample,0],feature[sample,1])

In [69]:
#plt.plot(feature[:,0],feature[:,1],'*') #same, except we dont no if there is an option to include labels
plt.scatter(feature[:,0],feature[:,1],c=labels)
Out[69]:
<matplotlib.collections.PathCollection at 0x7f88010814e0>

Heatmaps

Used to visualize CNN filetrs

In [16]:
mat = np.arange(0,100).reshape(10,10)
mat
Out[16]:
array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
       [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
       [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
       [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
       [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
       [70, 71, 72, 73, 74, 75, 76, 77, 78, 79],
       [80, 81, 82, 83, 84, 85, 86, 87, 88, 89],
       [90, 91, 92, 93, 94, 95, 96, 97, 98, 99]])
In [17]:
plt.imshow(mat)
Out[17]:
<matplotlib.image.AxesImage at 0x7f880195bcf8>
In [19]:
plt.imshow(mat,cmap='coolwarm')
plt.colorbar()
Out[19]:
<matplotlib.colorbar.Colorbar at 0x7f88018ea780>
In [52]:
#For non sqaure matrices, we may have to change the aspect to get the right visualization
mat2 = np.random.randint(0,101,(100,5))
plt.imshow(mat2)
Out[52]:
<matplotlib.image.AxesImage at 0x7f880126d630>
In [55]:
#set correct aspect
plt.imshow(mat2,aspect=0.08)
Out[55]:
<matplotlib.image.AxesImage at 0x7f8801188b38>

Visualization using Pandas

Use pandas to visualize from files. You dont have to extract columns, set axis labels manually

In [3]:
df = pd.read_csv('files/data_csv.csv')
df
Out[3]:
Country Age Salary Purchased
0 France 44.0 72000.0 No
1 Spain 27.0 48000.0 Yes
2 Germany 30.0 54000.0 No
3 Spain 38.0 61000.0 No
4 Germany 40.0 NaN Yes
5 France 35.0 58000.0 Yes
6 Spain NaN 52000.0 No
7 France 48.0 79000.0 Yes
8 Germany 50.0 83000.0 No
9 France 37.0 67000.0 Yes
In [36]:
df.plot(x='Age',y='Salary',kind='scatter',title="TITLE")
Out[36]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f88015e2208>
In [37]:
df.plot(x='Country',y='Salary',kind='bar',title="TITLE")
Out[37]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f88015a5240>
In [10]:
df1 = pd.read_csv("files/pima-indians-diabetes.csv")
df1.head()
Out[10]:
Number_pregnant Glucose_concentration Blood_pressure Triceps Insulin BMI Pedigree Age Class Group
0 6 0.743719 0.590164 0.353535 0.000000 0.500745 0.234415 50 1 B
1 1 0.427136 0.540984 0.292929 0.000000 0.396423 0.116567 31 0 C
2 8 0.919598 0.524590 0.000000 0.000000 0.347243 0.253629 32 1 B
3 1 0.447236 0.540984 0.232323 0.111111 0.418778 0.038002 21 0 B
4 0 0.688442 0.327869 0.353535 0.198582 0.642325 0.943638 33 1 C

Histograms

In [11]:
df1["Age"].hist(bins=20)
Out[11]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff695179588>

The same done using matplotlib

In [12]:
plt.hist(df1["Age"].values,bins=20)
Out[12]:
(array([173., 127.,  96.,  61.,  41.,  51.,  47.,  39.,  34.,  18.,  21.,
         13.,  15.,  11.,   8.,   8.,   3.,   1.,   0.,   1.]),
 array([21., 24., 27., 30., 33., 36., 39., 42., 45., 48., 51., 54., 57.,
        60., 63., 66., 69., 72., 75., 78., 81.]),
 <a list of 20 Patch objects>)