Hands-On Project: Sentiment Analysis of Movie Reviews

Project

Sentiment analysis involves determining whether a given piece of text conveys a positive, negative, or neutral sentiment. In this project, we will classify movie reviews as either positive or negative.

Objective

To build a machine learning model that can classify the sentiment of movie reviews using natural language processing (NLP) techniques.

Dataset

The IMDB Movie Reviews Dataset is a widely used dataset for sentiment analysis. You can download it from Kaggle.

Dataset Details:

• Text: The review content.

• Sentiment: The label (positive or negative).

Tools and Libraries Required

bash



pip install pandas numpy scikit-learn nltk matplotlib seaborn tensorflow

Step-by-Step Guide

1. Import Libraries

python


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import classification_report, confusion_matrix
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import re

2. Load the Dataset

python

# Load dataset
data = pd.read_csv('IMDB Dataset.csv')
# Display dataset information
print(data.head())
print(data['sentiment'].value_counts())

3. Data Cleaning

Convert text to lowercase.
Remove punctuation, special characters, and numbers.
Tokenise the text and remove stop-words.
Apply stemming or lemmatization.

python

nltk.download('stopwords')
nltk.download('punkt')
# Data cleaning function
def clean_text(text):
    # Convert to lowercase
    text = text.lower()
    # Remove special characters and numbers
    text = re.sub(r'[^a-zA-Z\s]', '', text)
    # Tokenize and remove stopwords
    words = word_tokenize(text)
    words = [word for word in words if word not in stopwords.words('english')]
    return ' '.join(words)
data['cleaned_review'] = data['review'].apply(clean_text)

4. Split the Dataset

python


X = data['cleaned_review']
y = data['sentiment'].apply(lambda x: 1 if x == 'positive' else 0)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

5. Feature Extraction with CountVectorizer

Convert text into numerical features using bag-of-words:

python

vectorizer = CountVectorizer(max_features=5000)
X_train_vectors = vectorizer.fit_transform(X_train).toarray()
X_test_vectors = vectorizer.transform(X_test).toarray()

6. Train the Model

Using Naive Bayes Classifier for simplicity:

python

model = MultinomialNB()
model.fit(X_train_vectors, y_train)

7. Evaluate the Model

python

# Predictions
y_pred = model.predict(X_test_vectors)
# Metrics
print(classification_report(y_test, y_pred))
conf_matrix = confusion_matrix(y_test, y_pred)
# Confusion Matrix Heatmap
sns.heatmap(conf_matrix, annot=True, fmt='d', cmap='Blues', xticklabels=['Negative', 'Positive'], yticklabels=['Negative', 'Positive'])
plt.xlabel('Predicted')
plt.ylabel('Actual')
plt.title('Confusion Matrix')
plt.show()

8. Test with Custom Reviews

python

# Example review
new_review = ["The movie was fantastic! Amazing plot and great acting."]
new_review_cleaned = [clean_text(review) for review in new_review]
new_review_vectorized = vectorizer.transform(new_review_cleaned).toarray()
# Prediction
prediction = model.predict(new_review_vectorized)
print("Sentiment:", "Positive" if prediction[0] == 1 else "Negative")

Enhancements with Deep Learning

For improved performance, you can use deep learning models like LSTMs or Transformers.
Here’s a brief outline:

Preprocess text using TensorFlow/Keras Tokenizer.
Convert text to sequences using embeddings (e.g., Word2Vec or GloVe).
Build an LSTM or GRU model in TensorFlow/Keras.
Train the model on the cleaned dataset.

Use Cases of Sentiment Analysis

Customer Feedback Analysis: Identify satisfaction levels from product reviews.
Brand Monitoring: Analyse public sentiment about a brand on social media.
Movie Recommendations: Predict user preferences based on past reviews.

Conclusion

Sentiment analysis of movie reviews is an excellent hands-on project to understand natural language processing and machine learning. By following these steps, you’ll not only build a working model but also gain practical knowledge of preprocessing, vectorization, and classification techniques.

TutorialsDestiny

Artificial Intelligence

Intro to AI

Machine Learning

Deep Learning

Natural Language Processing

Computer Vision

Reinforcement Learning

AI Ethics and Bias

Hands on project

Hands-On Project: Sentiment Analysis of Movie Reviews

Project

Objective

Dataset

bash

Step-by-Step Guide

python

python

python

python

python

python

python

python

Enhancements with Deep Learning

Use Cases of Sentiment Analysis

Conclusion