XML Document Parsing And Extraction In Python | Python Project

by | Mar 19, 2022 | Coding, Python

Home » DIY » XML Document Parsing And Extraction In Python | Python Project

Introduction of the Project

Today, we will convert an XML file into a CSV file! Sounds interesting, right? So let us have a step-by-step look at how the code for XML Document Parsing and Extraction in Python is implemented.

XML stands for Extensible Markup Language, which is a markup language and file format for storing, transmitting, and reconstructing arbitrary data.

Parsing is a process in which a string is parsed into more easily processed components.

A CSV (comma-separated values) file is a text file that has a specific format that allows data to be saved in a table structured format.

 

Requirements

1. Python IDE or VScode. To run the code

2. An XML file to parse it and extract its data into a CSV file.

Steps For XML Document Parsing And Extraction In Python

Step 1: Download the XML file and change its path in code according to your path.

Step 2: Paste the below code in your IDE/code editor;

Source Code

# Import the required modules
import csv
import xml.etree.ElementTree as ET
# Function 1 - To parse the XML file
def parseXML(xmlfile):


# To parse XML document into element tree
tree = ET.parse(xmlfile)


# To get the root element of this tree
root = tree.getroot()


# Empty list for content
content = []


    # Returns list containing all matching elements in document order
for c in root.findall('./channel/item'):


# An empty dictionary
project = {}


# To iterate the child elements of the content
for child in c:


# A special checking for namespace object content : media
if child.tag == '{https://myprojectideas.com/}content':
project['Media'] = child.attrib['url']
else:
project[child.tag] = child.text.encode('utf8')


# To append the dictionary to content list
content.append(project)
# To return the content list
return content


# Function 2 - To convert the XML file to CSV file
def savetoCSV(content, filename):


# Specify the fields for the csv file
fields = ['Domain', 'ProjectName', 'Description', 'WebsiteLink', 'YoutubeLink' , 'Media']


# Writing to the csv file using character 'w'
with open(filename, 'w') as csvfile:


# Create a csv dictionary writer object
writer = csv.DictWriter(csvfile, fieldnames = fields)


# To write the headers
writer.writeheader()


# To write the rows
writer.writerows(content)


# Main function to call the sub functions
def main():
# To parse the XML file
content = parseXML('videos/SampleXML.xml')


# To convert the XML file to CSV file
savetoCSV(content, 'videos/CSVFile.csv')
if __name__ == "__main__":


# To call the main function
main()

Explanation Of The Code

We initially imported CSV and XML ElementTree modules for CSV conversion and parsing purposes.

1. In the first function, we are performing parsing.

2. We use the parse function to parse an XML document into an element tree.

3. To get the root element of this tree, we are using the getroot() function.

4. After this, we have created an empty list to store the content of the XML file.

5. For loop returns a list containing all matching elements in document order.

6. We are creating a dictionary and appending it to the content list inside the for loop.

7. In the second function, we convert the XML file into a CSV file.

8. Here, we need to specify the fields for the CSV file.

9. Using open function & character ‘w’, we are writing into the CSV file.

10. Create a CSV dictionary writer object, and write the headers and rows using the writeheader() and writerows() function.

11. In the main function, we are calling both subfunctions.

12. At last, we call the main function to run the program.

Output

The following input XML file was used for this project.

https://drive.google.com/drive/folders/14-WQrsU6o-mxb_kambI6cuxT5IROzG1S?usp=sharing

The XML file will be parsed and extracted in CSV format, and when you try to edit the file, it will look clear, just like the below image.

XML Document Parsing And Extraction In Python

 

Things to Remember 

  • Before running the code, check the XML and CSV file path to view the output.
  • If you are using your own XML file, then don’t forget to change the fields name; else, it will give errors. In that case, first, use the given XML file and then edit the code accordingly.

 

You May Also Like To Create…

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *