Introduction of the Project
Here we have created a coding project for Employee Salary Prediction using Machine Learning. This machine learning model predicts the salary of the employee based on the number of years of work experience he/she had in the industry. We have our dataset with two columns which are years of experience and salary, and accordingly, we will train our model using a Supervised Machine Learning algorithm. And finally, we will be able to predict the salary of the employee on the basis of the number of years of work experience.
Objectives
- The purpose of building this model is to have a brief idea about the salary of an employee on the basis of how many years an employee has spent in his professional career.
- This machine learning model can help businesses to decide what range of salary must be provided to an individual trying to seek a job in the respective organization having relevant years of experience.
Requirements
As far as requirements while building this Employee Salary Prediction using Machine Learning model are concerned:
1. Python Libraries
- Pandas
- NumPy
- Matplotlib
2. Jupyter Notebook or Google Colab
[Note: All the necessary libraries can be installed by running the pip command directly on the command prompt]
Source Code
# Importing all the necessary libraries import pandas as pd import numpy as np from matplotlib import pyplot as plt %matplotlib inline # pandas provide us function to load our data set to our model #(using read_csv() function) df = pd.read_csv("Salary_Data.csv") df # Printing the data set (YearsExperience and Salary) # Lets have a look over the shape of dataset using shape function! df.shape # Tells us the number of rows and number of colomns in our data set # head and tail functions let us know about the first five observations and last five #observations in our data set respectively! df.head() # prints the first five observation of our data set! df.tail() # Prints the last five observations of our data set! # This process is to check the wether our dataset is filled with any empty # values and tells us through a boolean value to make # inference # As we can see that entire datset comes out to be with a false value on isna() # function which means that no null values are # present df.isna() # This step is to confirm that no null values are present in our data set # so we can move ahead with further operations! df.isna().sum() # matplotlib is the library used for plotting the graphs which we have already #imported above # This means that we need to plot YearsExperience on x-axis and Salary on y-axis df.plot(x = 'YearsExperience' , y = 'Salary' , style = 'o') # This title function gives the us the options to provide a specefic title to our # plot! plt.title('YearsExperience vs Salary') # Now its time to give the titles to x-axis and y-axis using xlabel() and ylabel() # functions respectively! # xlabel() gives the title to the x-axis plt.xlabel('Years Of Experience') # ylabel() gives the title to y-axis plt.ylabel('Salary') # Now we have given all the specifications to our graph , so lets have a look on # our plot using show() function! plt.show() # Now its time to make our model which will predict the resuts on the given data # So dividing our data into features(inputs) and labels(output) using iloc() function in python # This gives us the entire x attributes into an specefic array! x = df.iloc[:,:-1].values x # This gives us the entire y attributes into a specefic array! y = df.iloc[:,1].values # Lets train our model and test it accordingly! from sklearn.model_selection import train_test_split # We are training our data set with a test_size of 0.2 which means that we are only using 20% of our data to train our model! X_train , X_test , y_train , y_test = train_test_split(x , y , test_size = 0.2 , random_state = 0) # Importing linear regression algorithm which comes under sklearn from sklearn.linear_model import LinearRegression regressor = LinearRegression() # We fitted our training variables in the LinearRegression Model using fit Function regressor.fit(X_train , y_train) # Linear Regression model called Successfully! # Training complete # Plotting the regression line line = regressor.coef_*x+regressor.intercept_# compare it with equation of straight line [y = mx + c]) # Plotting for the test data plt.scatter(x, y) plt.plot(x, line , color = 'red'); plt.show() # Testing our model print(X_test) # these are the attributes of Years Of Experience we have tested! # Making predictions on the values we have tested! y_pred = regressor.predict(X_test) # Predicted # Creating a data frame with Actual and Predicted Values! data_frame = pd.DataFrame({'Actual' : y_test , 'Predicted' : y_pred}) data_frame # Checking our Model finally at 2.5 years of Experience years_exp = [[10.0]] own_pred = regressor.predict(years_exp) print("Years of exp is {}" .format(years_exp)) print("Predicted salary is {}" .format(own_pred[0]))
Explanation of the Code
1. Initially, we loaded our dataset in our Jupyer Notebook using the read_csv function and looked into the first five and last five observations of our dataset using the head() and tail() functions, respectively.
2. The next step is to check if our dataset contains any missing values or null values; if they exist, we need to remove the null values from our dataset, and if not, we can move ahead with our dataset with further operations.
3. Next, we can use “matplotlib” to create visualizations.
4. After this, the linear regression algorithm is loaded by the “sklearn” library, and the model is in the training phase.
5. The best fit line is plotted on our graph, which represents the linear relationship between both variables.
Output
On running this machine learning source code to predict the salary of an employee successfully, you will get the following output.
Conclusion
Hence we successfully created a machine learning model to predict the salary of employees based on the number of years of experience in the industry. This Employee Salary Prediction Using Machine Learning model helps businesses to predict the salary of employees and can decide the pay range of the employee working in the company as well as for the employee who is willing to join the company based on the number of years of experience.

Cisco Ramon is an American software engineer who has experience in several popular and commercially successful programming languages and development tools. He has been writing content since last 5 years. He is a Senior Manager at Rude Labs Pvt. Ltd.
0 Comments