ISRC Python Workshop: Baiscs II

__Functions, File I/O and External Libraries__

<hr>

@author: Zhiya Zuo

@email: zhiya-zuo@uiowa.edu

source: https://github.com/zhiyzuo/python-tutorial

---

### Functions

#### Calling functions

Previously, we have already made use of many built-in functions to facilitate programming. Function is a block of codes with input arguments (and, optionally, return values) for specific purposes. In Python ( and many other languages), a function call is as the following:

```python
>> output = function(input_argument)
```

For example:

In [1]:
range(5)

range(0, 5)

Now that Python 3 use [`iterator`](https://stackoverflow.com/questions/25653996/what-is-the-difference-between-list-and-iterator-in-python) for 'range' function, we can manually convert the output into `list` so that we can see the output explicitly

In [2]:
list(range(5))

[0, 1, 2, 3, 4]

#### Define our own functions

Note that we are not limited to built-in functions only. Let's now try make our own functions. Before that, we need to be clear on the structure of a function
```python
def func_name(arg1, arg2, arg3, ...):
    #####################
    # Do something here #
    #####################
    return output
```

\* *`return output` is NOT required*

In the following example, we make use of `sum`, a built-in function to sum up numeric iterables.

In [3]:
def mySum(list_to_sum):
    return sum(list_to_sum)

In [4]:
mySum(range(5))

10

A more complicated one that does not use `sum` function.

In [5]:
def mySumUsingLoop(list_to_sum):
    sum_ = list_to_sum[0]
    for item in list_to_sum[1:]:
        sum_ += item
    return sum_

In [6]:
mySumUsingLoop(range(5))

10

*The two example functions are not doing anything interesting but just served as illustrations to build customized functions.*

### Libraries

Often times, we need either internal or external help for complicated computation tasks. In these occasions, we need to _import libraries_. 

#### Built-in libraries

Python provides many built-in packages to prevent extra work on some common and useful functions

We will use __math__ as an example.

In [7]:
import math # use import to load a library

To use functions from the library, do: `library_name.function_name`. For example, when we want to calculate the logarithm using a function from `math` library, we can do `math.log`

In [8]:
x = 3
print("e^x = e^3 = %f"%math.exp(x))
print("log(x) = log(3) = %f"%math.log(x))

e^x = e^3 = 20.085537
log(x) = log(3) = 1.098612


You can also import one specific function:

In [9]:
from math import exp # You can import a specific function
print(exp(x)) # This way, you don't need to use math.exp but just exp

20.085536923187668


Or all:

In [10]:
from math import * # Import all functions

In [11]:
print(exp(x))
print(log(x)) # Before importing math, calling `exp` or `log` will raise errors

20.085536923187668
1.0986122886681098


Depending on what you want to achieve, you may want to choose between importing a few or all (by `*`) functions within a package.

---

### Quick Intro to Numpy

Instead of using the native data structures, we use `numpy.ndarray` for data analytics most of the time. While they are not as "flexible" as lists, they are easy to use and have better performance. As Numpy's official documentation states:
> NumPyâ€™s main object is the homogeneous multidimensional array. It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive integers.

As we were using it just now, the most common alias for `numpy` is `np`:

In [12]:
import numpy as np

#### Create arrays

Depending on what types of analyses we are going to work on later, the most appropriate array initialization methods can be choosed.

##### By hand

This is very similar to creating a list of elements manually, except that we wrap the list around by `np.array()`.

In [13]:
arr = np.array([1,2,3,8])
arr

array([1, 2, 3, 8])

In [14]:
arr.shape

(4,)

Multidimensional arrays: seperated by comma

1 by 4: 1 row and 4 columns

In [15]:
arr = np.array([[1,2,3,8]])
arr.shape

(1, 4)

In [16]:
arr

array([[1, 2, 3, 8]])

3 by 4: 3 row and 4 columns

In [17]:
arr = np.array([[1,2,3,8], [3,2,3,2], [4,5,0,8]])
arr.shape

(3, 4)

In [18]:
arr

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

##### By functions

There are many special array initialization methods to call:

In [19]:
np.zeros([3,5], dtype=int)

array([[0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0],
       [0, 0, 0, 0, 0]])

In [20]:
np.ones([3,5])

array([[1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.],
       [1., 1., 1., 1., 1.]])

#### Arithmatic operations

The rules are very similar to R: they are generally element wise

In [21]:
arr

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

In [22]:
arr * 6

array([[ 6, 12, 18, 48],
       [18, 12, 18, 12],
       [24, 30,  0, 48]])

In [23]:
arr - 5

array([[-4, -3, -2,  3],
       [-2, -3, -2, -3],
       [-1,  0, -5,  3]])

In [24]:
np.exp(arr)

array([[2.71828183e+00, 7.38905610e+00, 2.00855369e+01, 2.98095799e+03],
       [2.00855369e+01, 7.38905610e+00, 2.00855369e+01, 7.38905610e+00],
       [5.45981500e+01, 1.48413159e+02, 1.00000000e+00, 2.98095799e+03]])

##### Operation based on itself

There are many class methods to calculate some statistics of the array itself along some axis:
- `axis=1` means row-wise
- `axis=0` means column-wise

In [25]:
arr

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

In [26]:
arr.max()

8

In [27]:
arr.max(axis=1)

array([8, 3, 8])

In [28]:
arr.max(axis=0)

array([4, 5, 3, 8])

#### Indexing and slicing

The most important part is how to index and slice a `np.array`. It is actually very similar to `list`, except that we now may have more index elements because there are more than one dimension for most of the datasets in real life

##### 1 dimensional case

In [29]:
a1 = np.array([1,2,8,100])
a1

array([  1,   2,   8, 100])

In [30]:
a1[0]

1

In [31]:
a1[-2]

8

In [32]:
a1[[0,1,3]]

array([  1,   2, 100])

We can also use boolean values to index
- `True` means we want this element

In [33]:
a1 > 3

array([False, False,  True,  True])

In [34]:
a1[a1 > 3]

array([  8, 100])

##### 2 dimensional case

In [35]:
arr

array([[1, 2, 3, 8],
       [3, 2, 3, 2],
       [4, 5, 0, 8]])

Using only one number to index will lead to a subset of the original multidimenional array: also an array

In [36]:
arr[0]

array([1, 2, 3, 8])

In [37]:
type(arr[0])

numpy.ndarray

Since we have 2 dimensions now, there are 2 indices we can use for indexing the 2 dimensions respectively

In [38]:
arr[0,0]

1

We can use `:` to indicate everything along that axis

In [39]:
arr[1]

array([3, 2, 3, 2])

In [40]:
arr[1, :]

array([3, 2, 3, 2])

In [41]:
arr[1,:] == arr[1]

array([ True,  True,  True,  True])

In [42]:
arr[:, 1]

array([2, 2, 5])