Car Price Prediction Machine Learning Model

by | Jan 18, 2023 | Coding, Machine Learning

Home » Coding » Car Price Prediction Machine Learning Model


A car price prediction machine learning model is a type of algorithm that uses historical data on car sales and features to predict the price of a car. The model is trained on a dataset of car information such as make, model, year, mileage, condition, and the corresponding sale price. Once trained, the model can be used to predict the sale price of a car based on its features. Common techniques for creating a car price prediction model include linear regression, decision trees, and random forests.



The objective behind building this car price prediction machine learning model is

  • To predict the price of a car so that we can get our car according to our own utility and demand balanced according to our price range.
  • To help businesses in the automobile industry to set standards to meet the requirements of the users and can also grow their businesses accordingly.
  • To use historical data to train the model and make predictions on new, unseen data.


To build a car price prediction model using Python, you will need the following:

  • A dataset of car information: This dataset should include features such as make, model, year, mileage, and condition, as well as the corresponding sale price.
  • Python programming language: You must install Python on your computer to build the model.
  • Required Libraries: You will need to install libraries such as numpy, pandas, scikit-learn, and matplotlib. These libraries are used in Python for data manipulation, visualization, and machine learning.
  • Jupyter Notebook/ IDE: You will need a development environment such as Jupyter Notebook or an IDE to write and run the code for the model.
  • Understanding of Machine Learning concepts and Python programming.

Source Code

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
%matplotlib inline
df=pd.read_csv('car data.csv’)
##check missing values
final_dataset['Current Year']=2022
final_dataset['no_year']=final_dataset['Current Year']- final_dataset['Year']
final_dataset=final_dataset.drop(['Current Year'],axis=1)
import seaborn as sns
#get correlations of each features in dataset
corrmat = df.corr()
top_corr_features = corrmat.index
#plot heat map
X=final_dataset.iloc[:,1:] # independent feature
y=final_dataset.iloc[:,0] # dependent feature (selling price)
# feature importance
from sklearn.ensemble import ExtraTreesRegressor
model = ExtraTreesRegressor(),y)
print(model.feature_importances_) # according to the value this tells us the importance of features
#plot graph to better visualize feature importances
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=0)
from sklearn.ensemble import RandomForestRegressor
# hyperparameters for decision trees
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1200, num = 12)]
# Number of trees in random forest
n_estimators = [int(x) for x in np.linspace(start = 100, stop = 1200, num = 12)]
# Number of features to consider at every split
max_features = ['auto', 'sqrt']
# Max number of levels in the tree
max_depth = [int(x) for x in np.linspace(5, 30, num = 6)]
# max_depth.append(None)
# Min number of samples that are required to split a node
min_samples_split = [2, 5, 10, 15, 100]
# Min number of samples that are required at each leaf node
min_samples_leaf = [1, 2, 5, 10]
from sklearn.model_selection import RandomizedSearchCV
#Randomized Search CV
# Create the random grid
random_grid = {'n_estimators': n_estimators,
'max_features': max_features,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf}
# Using the random grid to search best hyper-parameters
# First create the base model to tune
rf = RandomForestRegressor()
# Random search of parameters by using 3 fold cross validation
# search across 100 different combinations
rf_random = RandomizedSearchCV(estimator = rf, param_distributions = random_grid,scoring='neg_mean_squared_error', n_iter = 10, cv = 5, verbose=2, random_state=42,n_jobs=1),y_train)
from sklearn import metrics
print('MAE:', metrics.mean_absolute_error(y_test, predictions))
print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))
import pickle
# open a file, where you want to store the data
file = open('random_forest_regression_model.pkl', 'wb')
# dump information to that file
pickle.dump(rf_random, file)

Explanation of the Code

1. Initially, we imported the dataset and all the necessary libraries that were needed to build our model.

2. Then, we checked for the null values in our dataset, and if present, we removed them accordingly.

3. According to our features, we have cleaned our dataset and dropped some of the columns which are not useful in our model-building process.

4. Then, in the next section, we started our train test split phase and trained the model with Random Forest Classifier, and then with the Randomized Search CV, we selected the best number of attributes for our model building.

5. We have created some plots and visualizations to get insights from our dataset more concisely.

6. Then, accordingly, we predicted the values after the training phase was done.


Car Price Prediction Machine Learning Model


Hence we have successfully built the car price prediction machine learning model. This model will predict the price of a car based on given features in our dataset, which will help individuals to select the best-suited car according to their own utility and demand. Hence this model can also help businesses grow and increase revenues.


You May Also Like To Create…