XML Document Parsing And Extraction In Python | Python Project

by | Mar 19, 2022 | Coding, Python

Introduction of the Project

Today, we will convert an XML file into a CSV file! Sounds interesting, right? So let us have a step-by-step look at how the code for XML Document Parsing and Extraction in Python is implemented.

XML stands for Extensible Markup Language, which is a markup language and file format for storing, transmitting, and reconstructing arbitrary data.

Parsing is a process in which a string is parsed into more easily processed components.

A CSV (comma-separated values) file is a text file that has a specific format that allows data to be saved in a table structured format.

 

Requirements

1. Python IDE or VScode. To run the code

2. An XML file to parse it and extract its data into a CSV file.

Steps For XML Document Parsing And Extraction In Python

Step 1: Download the XML file and change its path in code according to your path.

Step 2: Paste the below code in your IDE/code editor;

Source Code

# Import the required modules

import csv

import xml.etree.ElementTree as ET

# Function 1 - To parse the XML file

def parseXML(xmlfile):




# To parse XML document into element tree

tree = ET.parse(xmlfile)




# To get the root element of this tree

root = tree.getroot()




# Empty list for content

content = []




    # Returns list containing all matching elements in document order

for c in root.findall('./channel/item'):




# An empty dictionary

project = {}




# To iterate the child elements of the content

for child in c:




# A special checking for namespace object content : media

if child.tag == '{https://myprojectideas.com/}content':

project['Media'] = child.attrib['url']

else:

project[child.tag] = child.text.encode('utf8')




# To append the dictionary to content list

content.append(project)

# To return the content list

return content




# Function 2 - To convert the XML file to CSV file

def savetoCSV(content, filename):




# Specify the fields for the csv file

fields = ['Domain', 'ProjectName', 'Description', 'WebsiteLink', 'YoutubeLink' , 'Media']




# Writing to the csv file using character 'w'

with open(filename, 'w') as csvfile:




# Create a csv dictionary writer object

writer = csv.DictWriter(csvfile, fieldnames = fields)




# To write the headers

writer.writeheader()




# To write the rows

writer.writerows(content)




# Main function to call the sub functions

def main():

# To parse the XML file

content = parseXML('videos/SampleXML.xml')




# To convert the XML file to CSV file

savetoCSV(content, 'videos/CSVFile.csv')

if __name__ == "__main__":




# To call the main function

main()

Explanation Of The Code

We initially imported CSV and XML ElementTree modules for CSV conversion and parsing purposes.

1. In the first function, we are performing parsing.

2. We use the parse function to parse an XML document into an element tree.

3. To get the root element of this tree, we are using the getroot() function.

4. After this, we have created an empty list to store the content of the XML file.

5. For loop returns a list containing all matching elements in document order.

6. We are creating a dictionary and appending it to the content list inside the for loop.

7. In the second function, we convert the XML file into a CSV file.

8. Here, we need to specify the fields for the CSV file.

9. Using open function & character ‘w’, we are writing into the CSV file.

10. Create a CSV dictionary writer object, and write the headers and rows using the writeheader() and writerows() function.

11. In the main function, we are calling both subfunctions.

12. At last, we call the main function to run the program.

Output

The following input XML file was used for this project.

https://drive.google.com/drive/folders/14-WQrsU6o-mxb_kambI6cuxT5IROzG1S?usp=sharing

The XML file will be parsed and extracted in CSV format, and when you try to edit the file, it will look clear, just like the below image.

XML Document Parsing And Extraction In Python

 

Things to Remember 

  • Before running the code, check the XML and CSV file path to view the output.
  • If you are using your own XML file, then don’t forget to change the fields name; else, it will give errors. In that case, first, use the given XML file and then edit the code accordingly.

 

You May Also Like To Create…

0 Comments

Submit a Comment

Your email address will not be published.