Fake News Classification Machine Learning Model

by | Jan 20, 2023 | Coding, Machine Learning

Introduction

We are going to create a fake news classification machine learning model, which is a type of artificial intelligence model that is trained to identify and classify news articles or statements as genuine or fake. We are going to train this model on a dataset of labeled examples of real and fake news, which can be used to classify new, unseen news articles or statements automatically. There are different approaches to building such a model, but common techniques include natural language processing, machine learning, and deep learning. The performance of the model can be evaluated by measuring its accuracy, precision, recall, and other metrics on a separate test dataset.

This machine learning model will help us to classify the news as fake news or real news according to the words and special characters present in the text. We are going to use algorithms like Count Vectorizer and the concepts of Porter Steamer to perform necessary actions.

 

Objectives

The main objectives of creating a fake news classification machine learning model are:

  • Identifying fake news by automatically classifying news articles or statements as genuine or fake based on patterns and characteristics learned from a labeled training dataset.
  • Improving the accuracy and performance of the classifier by experimenting with different machine learning algorithms, feature engineering techniques, and hyperparameter tuning.
  • Making the classifier more robust by handling different types of text and handling issues such as imbalanced classes, missing data, and noisy data.
  • Incorporating additional information sources, such as social media data, to improve the classifier’s ability to identify fake news.
  • Improving the interpretability of the classifier by providing insights into the features and decision rules used by the model.
  • Continuously monitoring the classifier’s performance and updating it as new fake news detection techniques and data become available.

Requirements

To perform a fake news classification machine learning model using Python, the following requirements are typically needed:

  • A labeled dataset of real and fake news articles or statements will be used to train and evaluate the classifier.
  • Python programming language and a set of commonly used libraries such as NumPy, pandas, scikit-learn, and NLTK for data pre-processing, feature extraction, and machine learning.
  • A machine learning algorithm for building the classifier, such as logistic regression, Naive Bayes, decision trees, random forests, or deep learning models.
  • Knowledge of natural language processing techniques for text processing, such as tokenization, stemming, and lemmatization.
  • A development environment for coding and testing the classifier, such as Jupyter Notebook or PyCharm. We have used Jupyter Notebook.
  • Access to a computing platform with sufficient resources to train and test the classifier, such as a local machine or a cloud-based platform.
  • Familiarity with machine learning and data analysis fundamentals, such as feature engineering, model evaluation, and hyperparameter tuning.
  • Experience with visualization libraries such as Matplotlib and Seaborn to visualize the results and insights of the model.
  • Familiarity with web scraping and web crawling to extract data from different sources.

Source Code

import pandas as pd
df=pd.read_csv('fake-news/train.csv')
df.head()
## Get the Independent Features
X=df.drop('label',axis=1)
X.head()
## Get the Dependent features
y=df['label']
y.head()
df.shape
from sklearn.feature_extraction.text import CountVectorizer, TfidfVectorizer, HashingVectorizer
df=df.dropna()
df.head(10)
messages=df.copy()
messages.reset_index(inplace=True)
messages.head(10)
messages['title'][6]
from nltk.corpus import stopwords
from nltk.stem.porter import PorterStemmer
ps = PorterStemmer()
corpus = []
for i in range(0, len(messages)):
review = re.sub('[^a-zA-Z]', ' ', messages['title'][i])
review = review.lower()
review = review.split()


review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
review = ' '.join(review)
corpus.append(review)
corpus[3]
## Applying Countvectorizer
# Creating the Bag of Words model
from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(max_features=5000,ngram_range=(1,3))
X = cv.fit_transform(corpus).toarray()
X.shape
y=messages['label']
## Divide the dataset into Train and Test
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=0)
cv.get_feature_names()[:20]
cv.get_params()
count_df = pd.DataFrame(X_train, columns=cv.get_feature_names())
count_df.head()
import matplotlib.pyplot as plt
def plot_confusion_matrix(cm, classes,
normalize=False,
title='Confusion matrix',
cmap=plt.cm.Blues):
"""
See full source and example:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_confusion_matrix.html


This function prints and plots the confusion matrix.
Normalization can be applied by setting `normalize=True`.
"""
plt.imshow(cm, interpolation='nearest', cmap=cmap)
plt.title(title)
plt.colorbar()
tick_marks = np.arange(len(classes))
plt.xticks(tick_marks, classes, rotation=45)
plt.yticks(tick_marks, classes)
if normalize:
cm = cm.astype('float') / cm.sum(axis=1)[:, np.newaxis]
print("Normalized confusion matrix")
else:
print('Confusion matrix, without normalization')
thresh = cm.max() / 2.
for i, j in itertools.product(range(cm.shape[0]), range(cm.shape[1])):
plt.text(j, i, cm[i, j],
horizontalalignment="center",
color="white" if cm[i, j] > thresh else "black")
plt.tight_layout()
plt.ylabel('True label')
plt.xlabel('Predicted label')
from sklearn.naive_bayes import MultinomialNB
classifier=MultinomialNB()
from sklearn import metrics
import numpy as np
import itertools
classifier.fit(X_train, y_train)
pred = classifier.predict(X_test)
score = metrics.accuracy_score(y_test, pred)
print("accuracy: %0.3f" % score)
cm = metrics.confusion_matrix(y_test, pred)
plot_confusion_matrix(cm, classes=['FAKE', 'REAL'])
classifier.fit(X_train, y_train)
pred = classifier.predict(X_test)
score = metrics.accuracy_score(y_test, pred)
score
y_train.shape
from sklearn.linear_model import PassiveAggressiveClassifier
linear_clf = PassiveAggressiveClassifier(n_iter=50)
linear_clf.fit(X_train, y_train)
pred = linear_clf.predict(X_test)
score = metrics.accuracy_score(y_test, pred)
print("accuracy: %0.3f" % score)
cm = metrics.confusion_matrix(y_test, pred)
plot_confusion_matrix(cm, classes=['FAKE Data', 'REAL Data'])
classifier=MultinomialNB(alpha=0.1)
previous_score=0
for alpha in np.arange(0,1,0.1):
sub_classifier=MultinomialNB(alpha=alpha)
sub_classifier.fit(X_train,y_train)
y_pred=sub_classifier.predict(X_test)
score = metrics.accuracy_score(y_test, y_pred)
if score>previous_score:
classifier=sub_classifier
print("Alpha: {}, Score : {}".format(alpha,score))
## Get Features names
feature_names = cv.get_feature_names()
classifier.coef_[0]
### Most real
sorted(zip(classifier.coef_[0], feature_names), reverse=True)[:20]
### Most fake
sorted(zip(classifier.coef_[0], feature_names))[:5000]

Output

Fake News Classification Machine Learning Model

Explanation of the Code

1. Initially, we imported all the libraries required to build our machine-learning model.

2. Then, we cleaned our dataset by dropping the null values through dropna() function.

3. Accordingly, we have looked at our dataset in the head and tail functions, respectively.

4. Then, we removed some special characters from the text so that analysis becomes easier.

5. Then, through the natural language toolkit, we imported all the necessary libraries and algorithms like porter streamer and count vectorizer and through the fit function, we trained our model through this algorithm.

6. Algorithms used: HashingVectorizer, TfidfVectorizer, CountVectorizer

Conclusion

Hence we have successfully built the machine learning model to predict the news as fake or real, which helps extract the correct information from the news and remove the disinformation.

 

You May Also Like To Create…

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *