What are we looking for?

Data Science -> Artificial Intelligence

Throughout history, subjects like statistics, optimization, data mining, machine learning and artificial intelligence are connected to build the reality of bots to make life easier, comfortable and faster (data science fields). In artificial intelligence sub-field, computers need to know (receive & store) data from sensors through their networks to know their environment, learn and take actions to satisfy themselves/human !

Data Science -> Artificial Intelligence -> Machine Learning

The way the computers run whole process need some sort of algorithm that helps to reasoning step by step (or in parallel). Procedures of algorithms are made by human but this could happen independently. The major procedures headings used in machine learning are listed below :

- Supervised learning (task driven)
- Regression (for continuous data)
- Linear
- Logistics
- Generalized Linear Model (GLM)
- Gaussian Process Regression (GPR)
- Support Vector Regression (SVR)
- Ensemble Methods
- Decision Tree

- Classification (for categorical data)
- Navie Bayes
- Support Vector Machine (SVM)
- Random Decision Forest
- Ada Boost
- Gradient Boosted
- Logistic
- Nearest Neighbour (NN)
- Discriminant Analysis

- Regression (for continuous data)
- Unsupervised learning (data driven)
- Clustering
- K-means
- k-Medoids
- KNN
- Hierarchial
- Gaussian Mixture
- Hidden Markov Model

- Dimensionality Reduction
- PCA
- SVD

- Clustering
- Semi-Supervised learning
- Self Training
- Low Density Separation Models
- Graph based algorithms

- Reinforcement learning (learn to react)
- Dynamic Programming
- Monte Carlo Methods
- Heuristic Methods

- Self learning
- Feature learning
- Sparse dictionary learning
- Anomaly detection (outlier detection)
- Robot learning
- Association rule learning

Running any procedures needs creating models to train, models could be nature inspired, imaginary thoughts, logical reasoning and etc. Some of them are mentioned in above lists and rest of them are :

- Artificial Neural Networks (ANN)
- Feedforward
- Radial Basis
- Kohonen
- Learning vectors
- Modular
- Recurrent
- Hopfield
- Elman/Jordan
- Echo State
- LSTM
- BRNN
- CTRNN
- HRNN
- RMLP
- Second Order
- Multi Time
- Stochastic

- genetic algorithm (GA)
- Bayesian networks (belief networks)

For a clear look, famous algorithms used in machine learning are grouped under similarity features.

## Let’s start coding!

Using written/embedded algorithms in packages and modules is easy-peasy problem-solving method but here we will write **some **algorithms from scratch and we could compare them with **sklearn**, **Tensorflow **and etc. versions. Python programming language will be used widely! (could use R, …). For any questions here, feel free to contact me.

- Linear Regression (python code : Link)

Mathematical definition of line :**Y=mX+b**(m: Slope, b: Y Intercept)

here, for regression we will use mean-squared distances of points :**slope = ( ( mean(X) * mean(Y) – mean(X*Y) ) / ( mean(X)**^{2}-mean(X^{2}) ) )

Test the code with Origin, Excel and Open-Office Spreadsheet.

- K-Nearest Neighbors (KNN) (python code : Link)

euclidean distance : d(p,q)=sum((p_{i}-q_{i})^{2})^{0.5}

here, for new data we calculate euclidean distances and count ‘K’ of nearest neighbors. Like election, number of nearest neighbor’s group indicates the winner (i.e. determines which group, our new data belongs to).

For comparison, we plot center of masses and group boundary to see KNN results by changing ‘k’ number.

- Support Vector Machine (SVM) (python code: Link)

loss function : hinge loss (faster run to maximize the margin)

if sign(predicted value) = sign(actual value) –> cost : 0

w : normal vector to hyperplane

x : set of points (samples and features)

bias : hyper plane intersect (each distances)

offset : SVMs distances from decision boundary (margins)

margin : reinforcement range of values([-1,1])

α : Learning rate

λ : regularization parameter (to balance the margin maximization and loss)

Iteration : repetition of process

To get gradient update, we must take partial derivatives respectively, then we get: df/dw_{j}={0 : if y_{i}(w^{T}.x_{i}+b)≥1 , -y_{i}.x_{ij}if y_{i}(w^{T}.x_{i}+b)<1}

Min(λ||w||^{2}) –> w=w-α.(2λw) , Max loss –>(-x_{i}.y_{i})_{+}

misclassification gradient update : w -= α.(y_{i}.x_{i}-2λw)

otherwise gradient update: w -= α.(2λw)

- K-Means (python code : Link)

Algorithm takes all observations into ‘k’ clusters with nearest mean filtration (i.e. all we need to specify is the number of clusters). First we try to tag points with k center and minimize the distances to centers until best centers satisfy the accuracy we need. Initial ‘k’ center points are first ‘k’ points.

For prediction, frobenius norm aims to find distances from each centers.

- Mean-Shift (python code : Link)

1st : we start to see others from every data (i.e. every data could be a cluster center).

2nd : Mean of distances lower than radius is center of cluster.

Finally by repeating 2 steps we get optimum centers.

6.Recurrent Neural Networks (RNNs)

soon…

6.1. Multiple Timescales Recurrent Neural Network (MTRNN)

6.2 Long Short-Term Memory (LSTM)

7. Convolutional Neural Network (CNN)

soon…