Thursday, November 23, 2017

Python libraries and Data Structures

Python Data Structures
Following are some data structures, which are used in Python. You should be familiar with them in order to use them.

Lists – Lists are one of the most versatile data structure in Python. A list can simply be defined by writing a list of comma separated values in square brackets. Lists might contain items of different types, but usually the items all have the same type. Python lists are mutable and individual elements of a list can be changed.
Here is a quick example to define a list and then access it:
country = ["Brazil", "Russia", "India", "China", "South Africa"]
Individual elements of a list can be accessed by writing the index number in square bracket. Keep in mind that the first index of a list is 0 and not 1.
print(country[1])
#returns – Russia
A range of a script can be accessed by providing first index number and last index number.
print(country[1:3])
#returns - ['Russia', 'India']
A negative index accesses the elements of a list from end.
print(country[-2])
#returns - 'China'
A few common methods applicable to the list include: append(), extend(), insert(), remove(), count(), sort(), reverse().

Strings – Strings can simply be defined by use of single ( ‘ ), double ( ” ) or triple ( ”’ ) inverted commas. Strings enclosed in tripe quotes ( ”’ ) can span over multiple lines and are used frequently in docstrings (Python’s way of documenting functions). \ is used as an escape character. Please note that Python strings are immutable, so you cannot change part of strings.
greeting = 'Hello'
print(greeting[1])             # Returns the char of the index value 1. e
print(len(greeting))           # Prints the length of string. 5
print(greeting +' ' +'World')  # Prints Hello World.
 
Raw string can be used to pass on string as is. Python interpreter does not alter the string if you specify a string to be raw. Raw string can be defined by adding r to the string.
string = r'\n is a new line char by default.'
print(string)          # Returns - \n is a new line char by default.
Python strings are immutable and hence can't be changed.
greeting[1:] = 'i'     # Trying to change Hello to Hi. This will result an error.
 
Dictionary – Dictionary is an unordered set of key: value pairs, with the requirement that the keys are unique (within one dictionary).
Following is a simple example –
dict = {'Name': 'Zara', 'Age': 7, 'Class': 'First'}
print "dict['Name']: ", dict['Name']
print "dict['Age']: ", dict['Age']
When the above code is executed, it produces the following result –
dict['Name']:  Zara
dict['Age']:  7


Monday, November 20, 2017

Lists and Tuples in Python

Lists are what they seem - a list of values. Each one of them is numbered, starting from zero - the first one is numbered zero, the second 1, the third 2, etc. You can remove values from the list, and add new values to the end. Example: Your many cats' names.
Tuples are just like lists, but you can't change their values. The values that you give it first up, are the values that you are stuck with for the rest of the program. Again, each value is numbered starting from zero, for easy reference. Example: the names of the months of the year.

Tuples
Tuples are pretty easy to make. You give your tuple a name, then after that the list of values it will carry. For example, the months of the year:
months = ('January','February','March','April','May','June','July','August','September','October','November','December')

Python then organises those values in a handy, numbered index - starting from zero, in the order that you entered them in. It would be organised like this:

Index
Value
0
January
1
February
2
March
3
April
4
May
5
June
6
July
7
August
8
September
9
October
10
November
11
December

Lists
Lists are very similar to tuples. Lists are modifiable (or 'mutable'), so their values can be changed. Most of the time we use lists, not tuples, because we want to easily change the values of things if we need to.
Lists are defined very similarly to tuples. Say you have FIVE cats, called Tom, Snappy, Kitty, Jessie and Chester. To put them in a list, you would do this:
cats = ['Tom', 'Snappy', 'Kitty', 'Jessie', 'Chester']
You recall values from lists exactly the same as you do with tuples. For example, to print the name of your 3rd cat you would do this:
print(cats[2])
You can also recall a range of examples, like above, for example - cats[0:2] would recall your 1st and 2nd cats.
Where lists come into their own is how they can be modified. To add a value to a list, you use the 'append()' function. Let's say you got a new cat called Catherine. To add her to the list you'd do this:
cats.append('Catherine')


Tuesday, November 14, 2017

List Comprehensions in Python

List comprehensions provide a concise way to create lists. It consists of brackets containing an expression followed by a for clause, then zero or more for or if clauses. The expressions can be anything, meaning you can put in all kinds of objects in lists.

The result will be a new list resulting from evaluating the expression in the context of the for and if clauses which follow it. 

The list comprehension always returns a result list. 

If you used to do it like this:
new_list = []
for i in old_list:
    if filter(i):
        new_list.append(expressions(i))

You can obtain the same thing using list comprehension:
new_list = [expression(i) for i in old_list if filter(i)]

Examples

x = [i for i in range(10)]
print x
# This will give the output:
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

For the next example, assume we want to create a list of squares.

# You can either use loops:
squares = []

for x in range(10):
    squares.append(x**2)
 
print squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

# Or you can use list comprehensions to get the same result:
squares = [x**2 for x in range(10)]

# print squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Wednesday, September 6, 2017

Python Simple Heat Map

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

a = np.random.random((5, 4))
df = pd.DataFrame({'a': a[0], 'b': a[1], 'c': a[2], 'd': a[3], 'e': a[4]})
rows = list('abcd')
columns = list(df.columns)

fig = plt.figure(figsize=(10,10), dpi=72) 
ax = fig.add_subplot(111) 
plt.pcolor(df, cmap = plt.cm.Reds) # cmap='hot'

ax.set_xticks(np.arange(0,5)+0.5)
ax.set_yticks(np.arange(0,4)+0.5)
ax.set_yticklabels(rows)
ax.set_xticklabels(columns)
plt.show()

Friday, September 1, 2017

Python pandas iloc vs ix vs loc explanation

"""
loc is label based indexing so basically looking up a value in a row
iloc is integer row based indexing
ix is a general method that first performs label based, if that fails then it falls to integer based
"""
import pandas as pd

df = pd.DataFrame({'A': ['abc', 'xyz', 'pqr'], 'B': [25, 50, 75]}, index = [50, 100, 150])
print(df)

print(df.loc[100, :])  # subset dataframe df if index value = 100 across all columns

print(df.iloc[1, :])   # subset dataframe df for row = 1 across all columns

df = pd.DataFrame({'A': ['abc', 'xyz', 'pqr'], 'B': [25, 50, 75]}, index = ['50', '100', '150'])

# the following yield same result
print(df.ix['50', :])
print(df.ix[1, :])

Sunday, August 27, 2017

Use Python to find specific files from one location to another location

import os
import shutil
from fnmatch import fnmatch


root = r'H:'
dst_dir = r'G:'
pattern = "*.tif"
i = 1
for path, subdirs, files in os.walk(root):
  for name in files:
    if fnmatch(name, pattern):
      current_file = os.path.join(path, name)
      shutil.copy(current_file, dst_dir)
      dst_file = os.path.join(dst_dir, name)
      dst_new_file_name = os.path.join(dst_dir, "["+ str(i) +"] " + name)
      os.rename(dst_file, dst_new_file_name)
      print(os.path.join(path, name))
      i += 1

Python Split a large file into multiple files....

def file_split(filehandler, delimiter='\t', 
               row_limit=25, 
               output_name_template='output_%s.txt', 
               output_path='F:\\Novus\\Decision Tree\\Data\\', keep_headers=True):
    import csv
    reader = csv.reader(filehandler, delimiter=delimiter)   #reading the source file using csv module
    current_piece = 1                                       #identifier of the out data int number
    current_out_path = ''.join([output_path, output_name_template % current_piece]) #create the full path of the out data
    
    of = open(current_out_path, 'w')                            #open the source data
    current_out_writer = csv.writer(of, delimiter=delimiter)    #put the source data into a variable
    current_limit = row_limit                                   #set the max row limit
    
    if keep_headers:                            #check if the header option is True or False
        headers = next(reader)
        current_out_writer.writerow(headers)    #if header is True write the header into the out file
        
    for i, row in enumerate(reader):
        if i + 1 > current_limit:
            of.close()
            current_piece += 1
            current_limit = row_limit * current_piece
            current_out_path = ''.join([output_path, output_name_template % current_piece])
            of = open(current_out_path, 'w')
            current_out_writer = csv.writer(of, delimiter=delimiter)  #if the row limit is achieved then create a new out file
            
            if keep_headers:
                current_out_writer.writerow(headers)
        current_out_writer.writerow(row)
    of.close()
        
               
file_split(open(r'F:\Novus\Decision Tree\Data\iris_data.txt', 'r'), row_limit=100,
           output_name_template='iris_split_%s.txt',
           output_path='F:\\Novus\\Decision Tree\\Data\\')

Thursday, August 24, 2017

Python dask dataframe for large data

The data that I am using here fits in memory, but dask will work even when the data is larger than memory.
import dask.dataframe as dd

df = dd.read_csv(r'F:\Novus\Decision Tree\Data\iris_data_copy.txt', 
                sep='\t', header=0, encoding='latin-1', blocksize=100**4)
df.npartitions

df_summary = df.groupby(['Species']).mean()
print(df_summary.compute())

Friday, August 18, 2017

Update a Python list - Don't know why this the case.

I was trying to update a list which contents a few lists. When I try to update a one element of a list within the list it seems updating all lists of the list. But it only happens when I create the list in the following way. I have no clue why this is the case.
example_list = [[0,0]]*3
print(example_list)

example_list[0][0] = 1
print(example_list)
print('')

example_list = [[0,0],[0,0],[0,0]]
print(example_list)

example_list[0][0] = 1
print(example_list)
[[0, 0], [0, 0], [0, 0]]
[[1, 0], [1, 0], [1, 0]]

[[0, 0], [0, 0], [0, 0]]
[[1, 0], [0, 0], [0, 0]]

Wednesday, August 16, 2017

Python Recursive Function

Recursion is a way of programming or coding a problem, in which a function calls itself one or more times in its body. Usually, it is returning the return value of this function call. If a function definition fulfils the condition of recursion, we call this function a recursive function. 

Termination condition:
A recursive function has to terminate to be used in a program. A recursive function terminates, if with every recursive call the solution of the problem is downsized and moves towards a base case. A base case is a case, where the problem can be solved without further recursion. A recursion can lead to an infinite loop, if the base case is not met in the calls. 

Example: 
4! = 4 * 3!
3! = 3 * 2!
2! = 2 * 1
 
Replacing the calculated values gives us the following expression 
4! = 4 * 3 * 2 * 1 

Generally we can say: Recursion in computer science is a method where the solution to a problem is based on solving smaller instances of the same problem. 
#Recursive function to calculate the cumulative sum of a list
def cum_sum(l=None):
    if len(l) <= 1: return l[0]
    else: return l[0]+cum_sum(l[1:len(l)])

#Recursive function to calculate the the factorial of a number
def fact(number):
    if number == 1: return 1
    else: return number * fact(number-1)
    
print(cum_sum([1,2,3,4,5,6,7,8,9]))
print(fact(3))

#45
#6

Monday, August 14, 2017

Python Grouping Data Elements (itertools groupby)

#PYTHON GROUPING DATA ELEMENTS (ITERTOOLS GROUPBY)

import itertools

#make a iterator that returns consecutive keys and groups from the iterable 
list_01 = [100, 50, 50, 50, 50, 50, 60, 60, 60, 80, 80, 70, 70, 70, 70, 70, 70]

keys = []
groups = []
sorted_list_01 = sorted(list_01)
for k, g in itertools.groupby(sorted_list_01):
    keys.append(k)
    #make g a list
    groups.append(list(g))
    
print(keys, "==>", groups)

#make dict of key and group

dict_list_01 = dict(zip(keys, groups))
print(dict_list_01)

#another example
list_02 = 'AAAAAAACCCDDEEEEFF'
sorted_list_02 = sorted(list_02)
dict_list_02 = dict(zip([k for k, g in itertools.groupby(sorted_list_02)], 
                        [list(g) for k, g in itertools.groupby(sorted_list_02)]))
print(dict_list_02)

dict_list_02_len = dict(zip([k for k, g in itertools.groupby(sorted_list_02)], 
                            [len(list(g)) for k, g in itertools.groupby(sorted_list_02)]))
print(dict_list_02_len)

Friday, August 11, 2017

Python pickle

In this post i am going to talk about pickle. It is used for serializing and de-serializing a Python object structure. Any object in python can be pickled so that it can be saved on disk. What pickle does is that it “serialises” the object first before writing it to file. Pickling is a way to convert a python object (list, dict, etc.) into a character stream. The idea is that this character stream contains all the information necessary to reconstruct the object in another python script.

Example of pickle creation

##Import pickle module
import pickle
##Let's create example dic object
eample_dict = {'Sarbadal': ['Manager', '5 years', 'Data Science', 'Photography'], 
               'AJ':       ['Lead', '4 years', 'Data Science', 'Video Game'], 
               'Shobhit':  ['Sr. Analyst', '1 year', 'Data Science', 'Python'], 
               'Abhishek': ['Analyst', '2 years', 'Data Science', 'Painting']}
##Creating pickle
filename = r'F:\Python\Pickle Objects\pickle_example_dict'
file = open(filename, 'wb')
pickle.dump(eample_dict, file)
##Loading pickle
file = open(filename, 'rb')
new_dict = pickle.load(file)

##Checking the loaded object
print(type(new_dict))
print(new_dict['AJ'])

['Lead', '4 years', 'Data Science', 'Video Game']

Thursday, August 10, 2017

Permutation Function with Repetition

def perm(l=None, n=None, str_a=None, perm_a=None):
    if len(str_a) == n:
        return [str_a] + perm_a
    else:
        new_perm_a = perm_a
        for c in l:
            new_perm_a = perm(l=l, n=n, str_a=str_a + c, perm_a=new_perm_a)
        return new_perm_a

def permutations(l=None, n=None):
    str_a, perm_a = '', []
    result = perm(l=l, n=n, str_a=str_a, perm_a=perm_a)
    return result

lst = permutations(l=['a', 'b', 'c', 'd'], n=3)
print(lst)