Welcome to the Data Machine Learning tutorial!

Welcome to the machine learning tutorial with scikit-learn and TensorFlow. If you've completed a data analysis tutorial with tools like NumPy, Pandas, and Matplotlib, then you're ready to take the next step and dive into machine learning.

In this tutorial, we'll explore two popular machine learning libraries: scikit-learn and TensorFlow. Scikit-learn is a powerful library for building traditional machine learning models, while TensorFlow is a popular library for building deep learning models.

Scikit-Learn is a popular Python library for machine learning. It provides simple and efficient tools for data mining and data analysis, as well as a consistent interface for a variety of machine learning algorithms.

Scikit-Learn is built on top of NumPy, SciPy, and matplotlib, which are other popular Python libraries for scientific computing and data visualization.

In this section, we will explore some of the key concepts and features of Scikit-Learn, including supervised and unsupervised learning, regression and classification, clustering, and model evaluation and selection.

Supervised learning: Regression and Classification

Supervised learning is a type of machine learning where the algorithm is trained on a labeled dataset, meaning that the input data has corresponding output values. The goal of supervised learning is to learn a mapping from inputs to outputs that generalizes well to new, unseen data.

from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# Load iris dataset 
iris = load_iris() 

# Split dataset into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2, random_state=42) 

# Create a decision tree classifier with max_depth=2 
clf = DecisionTreeClassifier(max_depth=2) 

# Train the classifier on the training data 
clf.fit(X_train, y_train) 

# Predict the target values for the testing data 
y_pred = clf.predict(X_test) 

# Compute the accuracy of the classifier 
acc = accuracy_score(y_test, y_pred) 

# Print the accuracy 
print("Accuracy:", acc)

This code loads the iris dataset, splits it into training and testing sets, creates a decision tree classifier, trains the classifier on the training data, predicts the target values for the testing data, and computes the accuracy of the classifier. Finally, it prints the accuracy.

Regression

Regression is a type of supervised learning where the goal is to predict a continuous output variable, such as the price of a house or the temperature of a city. Scikit-Learn provides a variety of linear and non-linear regression algorithms, as well as tools for preprocessing data and tuning hyperparameters.

from sklearn.datasets import load_diabetes 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression 
from sklearn.metrics import mean_squared_error, r2_score

# Load diabetes dataset 
diabetes = load_diabetes() 

# Split dataset into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( diabetes.data, diabetes.target, test_size=0.2, random_state=42) 

# Create a linear regression model 
reg = LinearRegression() 

# Train the model on the training data 
reg.fit(X_train, y_train) 

# Predict the target values for the testing data 
y_pred = reg.predict(X_test) 

# Compute the mean squared error and R^2 score 
mse = mean_squared_error(y_test, y_pred) 
r2 = r2_score(y_test, y_pred) 

# Print the mean squared error and R^2 score 
print("Mean Squared Error:", mse) 
print("R^2 Score:", r2)

This code loads the load_diabetes dataset, splits it into training and testing sets, creates a linear regression model, trains the model on the training data, predicts the target values for the testing data, and computes the mean squared error and R^2 score. Finally, it prints the mean squared error and R^2 score.

Classification

Classification is another type of supervised learning where the goal is to predict a categorical output variable, such as whether an email is spam or not, or which species a plant belongs to. Scikit-Learn provides a variety of classification algorithms, including logistic regression, decision trees, and support vector machines (SVMs).

from sklearn.datasets import load_breast_cancer 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

# Load breast cancer dataset 
breast_cancer = load_breast_cancer() 

# Split dataset into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( breast_cancer.data, breast_cancer.target, test_size=0.2, random_state=42) 

# Create a logistic regression model 
clf = LogisticRegression() 

# Train the model on the training data 
clf.fit(X_train, y_train) 

# Predict the target values for the testing data 
y_pred = clf.predict(X_test) 

# Compute the accuracy, precision, recall, and F1 score 
accuracy = accuracy_score(y_test, y_pred) 
precision = precision_score(y_test, y_pred) 
recall = recall_score(y_test, y_pred) 
f1 = f1_score(y_test, y_pred) 

# Print the accuracy, precision, recall, and F1 score 
print("Accuracy:", accuracy) 
print("Precision:", precision) 
print("Recall:", recall) 
print("F1 Score:", f1)

This code loads the load_breast_cancer dataset, splits it into training and testing sets, creates a logistic regression model, trains the model on the training data, predicts the target values for the testing data, and computes the accuracy, precision, recall, and F1 score. Finally, it prints the accuracy, precision, recall, and F1 score.

Unsupervised Learning: Clustering and Dimensionality Reduction

Unsupervised learning is a type of machine learning where the data is not labeled or classified. The goal of unsupervised learning is to find patterns or groups in the data without prior knowledge or guidance from the user.

from sklearn.datasets import make_blobs 
from sklearn.cluster import KMeans 
import matplotlib.pyplot as plt 

# Generate a sample dataset with 3 clusters 
X, y = make_blobs(n_samples=300, centers=3, cluster_std=0.5, random_state=0) 

# Initialize KMeans with 3 clusters 
kmeans = KMeans(n_clusters=3, random_state=0) 

# Fit KMeans to the dataset 
kmeans.fit(X) 

# Get cluster labels for each data point 
labels = kmeans.predict(X) 

# Plot the dataset with color-coded clusters 
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis') 
plt.show()

This example generates a sample dataset with three clusters using the make_blobs function, initializes the KMeans algorithm with three clusters, fits it to the dataset, and then assigns cluster labels to each data point using the predict method. Finally, it visualizes the data points with color-coded clusters using matplotlib.

Clustering

Clustering is an unsupervised learning technique that involves grouping together similar data points based on their features. It can be used to identify patterns or structures within data, and is commonly used in fields such as marketing, biology, and social science.

from sklearn.datasets import load_iris 
from sklearn.cluster import KMeans 
import matplotlib.pyplot as plt 

# Load the iris dataset 
iris = load_iris() 
X = iris.data 

# Initialize KMeans with 3 clusters 
kmeans = KMeans(n_clusters=3, random_state=0) 

# Fit KMeans to the dataset 
kmeans.fit(X) 

# Get cluster labels for each data point 
labels = kmeans.predict(X) 

# Plot the dataset with color-coded clusters 
plt.scatter(X[:, 0], X[:, 1], c=labels, s=50, cmap='viridis') 
plt.show()

This example loads the iris dataset, initializes the KMeans algorithm with three clusters, fits it to the dataset, and then assigns cluster labels to each data point using the predict method. Finally, it visualizes the data points with color-coded clusters using matplotlib.

Dimensionality Reduction

Dimensionality reduction is another unsupervised learning technique that involves reducing the number of features in a dataset while preserving as much of the original information as possible. This is useful when dealing with high-dimensional data that may be difficult to visualize or analyze. Examples of dimensionality reduction algorithms include Principal Component Analysis (PCA) and t-Distributed Stochastic Neighbor Embedding (t-SNE).

from sklearn.datasets import load_digits 
from sklearn.decomposition import PCA 
import matplotlib.pyplot as plt 

# Load the digits dataset 
digits = load_digits() 
X = digits.data 
y = digits.target 

# Reduce dimensionality with PCA 
pca = PCA(n_components=2) 
X_reduced = pca.fit_transform(X) 

# Plot the reduced dataset 
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, edgecolor='none', alpha=0.8, cmap=plt.cm.get_cmap('viridis', 10)) 
plt.xlabel('component 1') 
plt.ylabel('component 2') 
plt.colorbar() 
plt.show()

This example loads the digits dataset, reduces the dimensionality of the dataset from 64 dimensions to 2 dimensions using PCA, and then visualizes the reduced dataset with points color-coded by their true class using matplotlib.

Model Evaluation and Selection

Introduction

Once we have trained a model on our data, we need to evaluate its performance and select the best model for our problem. In this section, we will explore some of the techniques for model evaluation and selection in Scikit-Learn.

Holdout Method

The holdout method involves splitting our data into a training set and a test set. We use the training set to fit the model and the test set to evaluate its performance. This method is quick and easy to implement, but it may not provide an accurate estimate of the model's performance on new, unseen data. Here is an example with Linear Regression

from sklearn.datasets import load_iris 
from sklearn.model_selection import train_test_split 
from sklearn.linear_model import LinearRegression 

# Load the Iris dataset 
iris = load_iris() 

# Split the dataset into training and testing sets using the holdout method 
X_train, X_test, y_train, y_test = train_test_split( iris.data, iris.target, test_size=0.2) 

# Train a linear regression model on the training set 
model = LinearRegression() 
model.fit(X_train, y_train) 

# Evaluate the model on the testing set 
score = model.score(X_test, y_test) 
print("Model accuracy: {:.2f}%".format( score * 100))

In this example, we load the Iris dataset using the load_iris() function from scikit-learn. We then split the dataset into a training set and a testing set using the holdout method with train_test_split(). We train a linear regression model on the training set using LinearRegression() and fit(). Finally, we evaluate the model on the testing set using score(), and print the accuracy score.

here is an example of polynomial regression

from sklearn.datasets import fetch_california_housing 
from sklearn.linear_model import LinearRegression 
from sklearn.preprocessing import PolynomialFeatures 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error, r2_score 
import numpy as np 

# Load the California Housing dataset 
housing = fetch_california_housing() 

# Split the data into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( housing.data, housing.target, test_size=0.2, random_state=42) 

# Create a polynomial features object with degree 2 
poly_features = PolynomialFeatures( degree=2, include_bias=False) 

# Transform the training features to include polynomial terms up to degree 2 
X_train_poly = poly_features.fit_transform( X_train) 

# Fit a linear regression model on the transformed features 
lin_reg = LinearRegression() 
lin_reg.fit(X_train_poly, y_train) 

# Transform the testing features to include polynomial terms up to degree 2 
X_test_poly = poly_features.transform( X_test) 

# Make predictions on the testing set 
y_pred = lin_reg.predict(X_test_poly) 

# Calculate the mean squared error and R-squared score 
mse = mean_squared_error(y_test, y_pred) 
r2 = r2_score(y_test, y_pred) 

print("Mean squared error: {:.2f}".format( mse)) 
print("R-squared score: {:.2f}".format(r2))

This code uses the California Housing dataset from Scikit-Learn, splits the data into training and testing sets, creates a PolynomialFeatures object with degree 2, transforms the training features to include polynomial terms up to degree 2, fits a linear regression model on the transformed features, transforms the testing features to include polynomial terms up to degree 2, makes predictions on the testing set, and calculates the mean squared error and R-squared score.

here's an example of Decision Tree Regression using the diabetes dataset

from sklearn.datasets import load_diabetes 
from sklearn.model_selection import train_test_split 
from sklearn.tree import DecisionTreeRegressor 
from sklearn.metrics import mean_squared_error 

# Load the diabetes dataset 
diabetes = load_diabetes() 

# Split the dataset into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( diabetes.data, diabetes.target, test_size=0.2, random_state=42) 

# Create a decision tree regressor with max depth of 3 
regressor = DecisionTreeRegressor( max_depth=3) 

# Train the model on the training data 
regressor.fit(X_train, y_train) 

# Make predictions on the test data 
y_pred = regressor.predict(X_test) 

# Compute the mean squared error of the predictions 
mse = mean_squared_error(y_test, y_pred) 
print("Mean squared error:", mse)

In this example, we load the diabetes dataset and split it into training and testing sets. We then create a decision tree regressor with a max depth of 3 and train it on the training data. We use the model to make predictions on the test data and compute the mean squared error of the predictions.

Here's an example of Random Forest Regression using the California Housing dataset

import numpy as np 
from sklearn.datasets import fetch_california_housing 
from sklearn.ensemble import RandomForestRegressor 
from sklearn.model_selection import train_test_split 
from sklearn.metrics import mean_squared_error 

# Load the California Housing dataset 
data = fetch_california_housing(as_frame=True) 

# Split the data into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( data.data, data.target, test_size=0.2, random_state=42) 

# Create a Random Forest Regression model 
model = RandomForestRegressor( n_estimators=100, random_state=42) 

# Train the model on the training set 
model.fit(X_train, y_train) 

# Make predictions on the testing set 
y_pred = model.predict(X_test) 

# Evaluate the model using mean squared error 
mse = mean_squared_error(y_test, y_pred) 
print("Mean squared error: {:.2f}".format(mse))

In this example, we first load the California Housing dataset using fetch_california_housing from Scikit-Learn. We then split the data into training and testing sets using train_test_split and create a Random Forest Regression model using RandomForestRegressor. We train the model on the training set using fit and make predictions on the testing set using predict. Finally, we evaluate the model using mean squared error from mean_squared_error in Scikit-Learn's metrics module.

here is an example of using the fetch_california_housing dataset with K-Nearest Neighbors Regression

from sklearn.datasets import fetch_california_housing 
from sklearn.neighbors import KNeighborsRegressor 
from sklearn.model_selection import train_test_split 

# Load the dataset 
data = fetch_california_housing(as_frame=True) 

# Split the data into training and testing sets 
X_train, X_test, y_train, y_test = train_test_split( data.data, data.target, test_size=0.2) 

# Create and train the K-Nearest Neighbors Regression model 
knn_reg = KNeighborsRegressor( n_neighbors=5) 
knn_reg.fit(X_train, y_train) 

# Make predictions on the test set 
y_pred = knn_reg.predict(X_test) 

# Evaluate the performance of the model 
score = knn_reg.score(X_test, y_test) 
print("R^2 score: {:.2f}".format(score))

This code loads the fetch_california_housing dataset as a pandas dataframe using the as_frame=True argument. The data is then split into training and testing sets using train_test_split(). A K-Nearest Neighbors Regression model is created and trained on the training set using KNeighborsRegressor() and fit(). Predictions are made on the test set using predict(), and the performance of the model is evaluated using the R^2 score, which is calculated using score().

Cross-Validation

Cross-validation is a technique that involves splitting our data into k folds, training the model on k-1 folds, and testing it on the remaining fold. We repeat this process k times, with each fold used once as the test set. This technique can provide a more accurate estimate of the model's performance and is especially useful when we have limited data.

The most common type of cross-validation is k-fold cross-validation, where the dataset is split into k equally-sized folds. The model is trained on k-1 of these folds and evaluated on the remaining fold. This process is repeated k times, with each fold serving as the test set exactly once. The results of each evaluation are then averaged to produce a final performance metric. Here is an example

from sklearn.datasets import fetch_california_housing 
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import cross_val_score 

# Load the California housing dataset 
X, y = fetch_california_housing(return_X_y=True) 

# Create a linear regression model 
model = LinearRegression() 

# Evaluate the model using k-fold cross-validation 
scores = cross_val_score(model, X, y, cv=5) # Use 5 folds 

print("Mean score:", f"Mean score: {scores.mean():.2f}") 
print("Standard deviation:", f"Standard deviation: {scores.std():.2f}")

In this example, we first load the California housing dataset using the fetch_california_housing function. We then create a LinearRegression model and evaluate its performance using 5-fold cross-validation, as specified by the cv parameter. Finally, we print the mean and standard deviation of the scores to get an idea of the model's overall performance.

Here is another example of cross-validation using the diabetes dataset

from sklearn.datasets import load_diabetes 
from sklearn.linear_model import LinearRegression 
from sklearn.model_selection import cross_val_score 

# Load the diabetes dataset 
diabetes = load_diabetes() 

# Use linear regression model 
lr = LinearRegression() 

# Perform 10-fold cross-validation 
scores = cross_val_score(lr, diabetes.data, diabetes.target, cv=10) 

print("Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() * 2))

In this example, we first load the diabetes dataset and create a LinearRegression model. We then perform 10-fold cross-validation using the cross_val_score function, which takes in the model, input data, target data, and the number of folds (cv=10). Finally, we print the mean and standard deviation of the scores to evaluate the performance of the model.

Hyperparameter Tuning

Hyperparameter tuning is the process of selecting the best values for the hyperparameters of a machine learning model. This is typically done by evaluating the performance of the model on a validation set, and then adjusting the hyperparameters accordingly. Hyperparameter tuning is often done using a combination of manual search, grid search, and other optimization techniques such as Bayesian optimization.

Grid Search

Grid search is a technique for hyperparameter tuning that involves defining a grid of hyperparameter values and exhaustively searching over the grid to find the best combination of hyperparameters for the model. We train and evaluate the model on each combination of hyperparameters in the grid and select the one that performs the best. Grid search can be time-consuming and computationally expensive, but it can help us find the optimal hyperparameters for our model.
here's an example of Grid Search

from sklearn.datasets import load_iris 
from sklearn.model_selection import GridSearchCV 
from sklearn.svm import SVC 

# Load the iris dataset 
iris = load_iris() 

# Define the hyperparameters to tune 
param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [0.01, 0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly']} 

# Create a support vector classifier object 
svc = SVC() 

# Create a GridSearchCV object 
grid = GridSearchCV(svc, param_grid, cv=5) 

# Fit the GridSearchCV object to the data 
grid.fit(iris.data, iris.target) 

print("Best hyperparameters: ", grid.best_params_) 
print("Accuracy score: ", grid.best_score_)

In this example, we load the Iris dataset and define a set of hyperparameters to tune using Grid Search. We create a Support Vector Classifier (SVC) object and a GridSearchCV object. We then fit the GridSearchCV object to the data and print the best hyperparameters and the corresponding accuracy score. The cv parameter is set to 5, which means that a 5-fold cross-validation is performed during the Grid Search.

This line param_grid = {'C': [0.1, 1, 10, 100], 'gamma': [0.01, 0.1, 1, 10], 'kernel': ['linear', 'rbf', 'poly']} defines the hyperparameters and their possible values for the Support Vector Machine (SVM) model. Specifically, C, gamma, and kernel are three hyperparameters that can significantly impact the performance of the SVM model.

C controls the trade-off between achieving a low training error and a low testing error by creating a softer or harder margin for the SVM. A smaller value of C will result in a wider margin but may lead to misclassification errors on the training set, while a larger value of C will create a narrow margin and may lead to overfitting.
gamma determines the influence of a single training example. A low value of gamma will result in a broader bell-shaped curve, while a high value of gamma will result in a narrower peak-shaped curve. The choice of gamma affects the smoothness of the decision boundary.
kernel specifies the kernel function to be used by the SVM. The choice of kernel function can impact the ability of the SVM to model non-linear relationships in the data. In this example, we have specified three different kernels: linear, radial basis function (RBF), and polynomial.

Neural networks are a powerful class of machine learning models that are designed to mimic the way the human brain works. At a high level, a neural network consists of layers of interconnected nodes or "neurons" that process input data and produce output predictions. Here are some key concepts to understand when building basic neural networks:

Input layer: This is the first layer of the network, which receives the input data. Each node in the input layer represents a feature or input variable.
Here's an example of how to create an input layer :
```
import tensorflow as tf

# Define the shape of the input data
input_shape = (32, 32, 3) # (height, width, channels)# it corresponds to a 32x32 RGB image

# Create an input layer with the specified shape
inputs = tf.keras.layers.Input(shape=input_shape)

# Print the input shape of the layer
print(inputs.shape)
```
Note that the input layer is typically the first layer in a neural network model and is used to specify the shape of the input data that will be fed into the model.
Hidden layers: These are one or more layers of neurons that process the input data and transform it into a format that is more suitable for producing accurate predictions. The nodes in hidden layers use mathematical functions to transform the input data and pass it on to the next layer.
here's an example of how to create hidden layers :
```
# The "input" value is from the previous code example

# Create a hidden layer with 64 units and a ReLU activation function
hidden1 = tf.keras.layers.Dense(64, activation='relu')(inputs)

# Create another hidden layer with 32 units and a ReLU activation function
hidden2 = tf.keras.layers.Dense(32, activation='relu')(hidden1)

# Print the output shape of the last hidden layer
print(hidden2.shape)
```
Note that the number of hidden layers and the number of units in each hidden layer can vary depending on the complexity of the problem being solved. The choice of activation function can also have a significant impact on the performance of the model.
Output layer: This is the final layer of the network, which produces the output predictions. The nodes in the output layer represent the target variable(s) of the model, and their activation values represent the predicted values for each target variable.
here's an example of how to create an output layer :
```
# This is a continuation of the previous example

# Create an output layer with 10 units and a softmax activation function
outputs = tf.keras.layers.Dense(10, activation='softmax')(hidden2)

# Print the output shape of the output layer
print(outputs.shape)
```
Note that the number of units in the output layer should match the number of classes in the problem being solved. The choice of activation function for the output layer also depends on the problem being solved, e.g., sigmoid activation function for binary classification problems.

Activation functions:
These are mathematical functions that are applied to the nodes in each layer of the network. Activation functions help to introduce non-linearity into the model, allowing it to learn complex patterns and relationships in the data.

ReLU (Rectified Linear Unit)

This is one of the most commonly used activation functions in deep learning. ReLU sets all negative values to zero, and leaves positive values unchanged. Mathematically, it is defined as f(x) = max(0, x). The ReLU activation function is computationally efficient and helps to prevent the vanishing gradient problem.

Note: this one is a heavy computation and it may not work in our editor.

import tensorflow as tf
from tensorflow.keras.datasets import mnist

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape the input data to 1D vectors
x_train = x_train.reshape(-1, 784)
x_test = x_test.reshape(-1, 784)

# Normalize the input data
x_train = x_train / 255.0
x_test = x_test / 255.0

# Convert the labels to one-hot encoding
y_train = tf.keras.utils.to_categorical(y_train, num_classes=10) 
y_test = tf.keras.utils.to_categorical(y_test, num_classes=10)

# Define the input layer
input_layer = tf.keras.layers.Input(shape=(784,))

# Define a hidden layer with ReLU activation function
hidden_layer = tf.keras.layers.Dense(units=128, activation='relu')(input_layer)

# Define the output layer with softmax activation function
output_layer = tf.keras.layers.Dense(units=10, activation='softmax')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with categorical crossentropy loss and Adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Sigmoid:

Sigmoid is commonly used in binary classification problems where the output should be a probability between 0 and 1. The sigmoid function has an S-shaped curve and is defined as f(x) = 1 / (1 + exp(-x)).

import tensorflow as tf

# Define the input layer
input_layer = tf.keras.layers.Input(shape=(10,))

# Define a hidden layer with sigmoid activation function
hidden_layer = tf.keras.layers.Dense(units=5, activation='sigmoid')(input_layer)

# Define the output layer with sigmoid activation function
output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with binary crossentropy loss and Adam optimizer
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Generate some random data for training and testing
import numpy as np
x_train = np.random.randn(100, 10)
y_train = np.random.randint(0, 2, size=(100, 1)) 
x_test = np.random.randn(20, 10)
y_test = np.random.randint(0, 2, size=(20, 1))

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Tanh (Hyperbolic Tangent):

Tanh is a common activation function used in neural networks. It is similar to the sigmoid function, but it outputs values between -1 and 1. The tanh function is defined as f(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).

import tensorflow as tf

# Define the input layer
input_layer = tf.keras.layers.Input((shape=10,))

# Define a hidden layer with tanh activation function
hidden_layer = tf.keras.layers.Dense(units=5, activation='tanh')(input_layer)

# Define the output layer with sigmoid activation function
output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with binary crossentropy loss and Adam optimizer
model.compile(loss='binary_crossentropy', optimizer='adam', metrics=['accuracy'])

# Generate some random data for training and testing
import numpy as np
x_train = np.random.randn(100, 10)
y_train = np.random.randint(0, 2, size=(100, 1))
x_test = np.random.randn(20, 10)
y_test = np.random.randint(0, 2, size=(20, 1))

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Softmax:

Softmax is used in multiclass classification problems, where the output should be a probability distribution over multiple classes. The softmax function outputs values between 0 and 1 and ensures that the sum of the outputs is 1. The softmax function is defined as f(x) = exp(x) / sum(exp(x)).

import tensorflow as tf

# Define the input layer
input_layer = tf.keras.layers.Input(shape=(10,))

# Define a hidden layer with ReLU activation function
hidden_layer = tf.keras.layers.Dense(units=5, activation='relu')(input_layer)

# Define the output layer with softmax activation function
output_layer = tf.keras.layers.Dense(units=3, activation='softmax')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with categorical crossentropy loss and Adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Generate some random data for training and testing
import numpy as np
x_train = np.random.randn(100, 10)
y_train = np.random.randint(0, 3, size=(100, 1))
y_train = tf.keras.utils.to_categorical(y_train, num_classes=3)
x_test = np.random.randn(20, 10)
y_test = np.random.randint(0, 3, size=(20, 1))
y_test = tf.keras.utils.to_categorical(y_test, num_classes=3)

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Loss functions:
These are mathematical functions that are used to measure the difference between the predicted values and the actual values in the training data. The goal of the model is to minimize the value of the loss function, which indicates that the model is making accurate predictions.

Mean Squared Error (MSE)

This loss function is used in regression problems where the output is a continuous variable. The MSE is calculated as the average of the squared differences between the predicted and actual values. The formula for calculating MSE is: MSE = (1/n) * ∑(y_pred - y_actual)^2

import tensorflow as tf

# Define the input layer
input_layer = tf.keras.layers.Input(shape=(10,))

# Define a hidden layer with ReLU activation function
hidden_layer = tf.keras.layers.Dense(units=5, activation='relu')(input_layer)

# Define the output layer with linear activation function
output_layer = tf.keras.layers.Dense(units=1, activation='linear')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with MSE loss and Adam optimizer
model.compile(loss='mean_squared_error', optimizer='adam')

# Generate some random data for training and testing
import numpy as np
x_train = np.random.randn(100, 10)
y_train = np.random.randn(100, 1)
x_test = np.random.randn(20, 10)
y_test = np.random.randn(20, 1)

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Binary Crossentropy

This loss function is used in binary classification problems where the output is a binary variable. It measures the difference between the predicted probabilities and the actual binary labels. The formula for calculating Binary Crossentropy is: Binary Crossentropy = -(y_actual * log(y_pred) + (1 - y_actual) * log(1 - y_pred))

import tensorflow as tf

# Define the input layer
input_layer = tf.keras.layers.Input(shape=(10,))

# Define a hidden layer with ReLU activation function
hidden_layer = tf.keras.layers.Dense(units=5, activation='relu')(input_layer)

# Define the output layer with sigmoid activation function
output_layer = tf.keras.layers.Dense(units=1, activation='sigmoid')(hidden_layer)

# Create a model
model = tf.keras.models.Model(inputs=input_layer, outputs=output_layer)

# Compile the model with Binary Crossentropy loss and Adam optimizer
model.compile(loss='binary_crossentropy', optimizer='adam')

# Generate some random data for training and testing
import numpy as np
x_train = np.random.randn(100, 10)
y_train = np.random.randint(2, size=(100, 1))
x_test = np.random.randn(20, 10)
y_test = np.random.randint(2, size=(20, 1))

# Train the model on the training data
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Categorical Crossentropy

This loss function is used in multi-class classification problems where the output is a categorical variable. It measures the difference between the predicted probabilities and the actual categorical labels. . The formula for calculating Categorical Crossentropy is: Categorical Crossentropy = - ∑(y_actual * log(y_pred))

Note: this one is a heavy computation and it may not work in our editor.

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the input shape
input_shape = (784,)

# Create a sequential model
model = Sequential()

# Add a dense layer with 64 units and relu activation
model.add(Dense(units=64, activation='relu', input_shape=input_shape))

# Add a dense output layer with softmax activation
model.add(Dense(units=10, activation='softmax'))

# Compile the model with categorical crossentropy loss and adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Load the data
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape the data to have the expected input shape
x_train = x_train.reshape(x_train.shape[0], 784)
x_test = x_test.reshape(x_test.shape[0], 784)

# Normalize the data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# Convert the labels to categorical one-hot encoding
num_classes = 10
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]

# Train the model
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Sparse Categorical Crossentropy

This loss function is similar to categorical crossentropy, but it is used when the actual categorical labels are represented as integers.

Note: this one is a heavy computation and it may not work in our editor.

import numpy as np
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

# Define the input shape
input_shape = (784,)

# Create a sequential model
model = Sequential()

# Add a dense layer with 64 units and relu activation
model.add(Dense(units=64, activation='relu', input_shape=input_shape))

# Add a dense output layer with softmax activation
model.add(Dense(units=10, activation='softmax'))

# Compile the model with categorical crossentropy loss and adam optimizer
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])

# Load the data
from tensorflow.keras.datasets import mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Reshape the data to have the expected input shape
x_train = x_train.reshape(x_train.shape[0], 784)
x_test = x_test.reshape(x_test.shape[0], 784)

# Normalize the data
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# Convert the labels to categorical one-hot encoding
num_classes = 10
y_train = np.eye(num_classes)[y_train]
y_test = np.eye(num_classes)[y_test]

# Train the model
model.fit(x_train, y_train, batch_size=32, epochs=10, validation_data=(x_test, y_test))

Kullback-Leibler (KL) Divergence

This loss function is used to measure the difference between two probability distributions. It is commonly used in generative models such as variational autoencoders.

The KL divergence between two probability distributions p and q is defined as: KL(p || q) = ∑x p(x) log(p(x) / q(x)) where x is the set of possible outcomes, p(x) is the probability of outcome x under distribution p, and q(x) is the probability of outcome x under distribution q.

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.losses import KLDivergence

# generate some dummy data for demonstration purposes
import numpy as np
x_train = np.random.rand(1000, 10)
y_train = np.random.rand(1000, 5)

# define the model architecture
model = Sequential()
model.add(Dense(16, activation='relu', input_shape=(10,)))
model.add(Dense(8, activation='relu'))
model.add(Dense(5, activation='softmax'))

# compile the model with KL divergence loss
model.compile(optimizer='adam', loss=KLDivergence())

# train the model
model.fit(x_train, y_train, epochs=10)

Backpropagation:

This is an algorithm that is used to adjust the weights of the nodes in the network during training. Backpropagation uses the gradients of the loss function with respect to the weights to update the weights in the direction that minimizes the loss.

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.optimizers import SGD
import numpy as np

# generate some dummy data for training
X_train = np.random.rand(100, 10)
y_train = np.random.randint(2, size=(100, 1))

# create the model
model = Sequential()
model.add(Dense(5, input_shape=(10,), activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

# configure the model for training
sgd = SGD(learning_rate=0.1)
model.compile(optimizer=sgd, loss='binary_crossentropy')

# train the model for 10 epochs
model.fit(X_train, y_train, epochs=10)

# evaluate the model on some test data
X_test = np.random.rand(10, 10)
y_test = np.random.randint(2, size=(10, 1))
loss = model.evaluate(X_test, y_test)
print("Test loss:", loss)

In general the common way of creating a Basic Neural Networks is:

Import the necessary libraries:

import tensorflow as tf
import numpy as np

Define the input data and the expected output data:

# Input data
import numpy as np

X = np.array([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=np.float32)

# Expected output data
y = np.array([[0], [1], [1], [0]], dtype=np.float32)

Define the layers of the neural network using the tf.keras.layers API:

# Define the layers
input_layer = tf.keras.layers.Input(shape=(2,))
hidden_layer = tf.keras.layers.Dense(4, activation='sigmoid')(input_layer)
output_layer = tf.keras.layers.Dense(1, activation='sigmoid')(hidden_layer)

Define the model and compile it using an optimizer and a loss function:

# Define the model
model = tf.keras.Model(inputs=input_layer, outputs=output_layer)
# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Train the model using the input and output data:

# Train the model
model.fit(X, y, epochs=1000, verbose=0)

Make predictions using the trained model:

# Make predictions using the trained model
predictions = model.predict(X)

# Print the predictions
print(predictions)

Note : This is a typic but not only way to create a neural network, and that this example is just for explaining