Introduction of the Project
Today, we will convert an XML file into a CSV file! Sounds interesting, right? So let us have a step-by-step look at how the code for XML Document Parsing and Extraction in Python is implemented.
XML stands for Extensible Markup Language, which is a markup language and file format for storing, transmitting, and reconstructing arbitrary data.
Parsing is a process in which a string is parsed into more easily processed components.
A CSV (comma-separated values) file is a text file that has a specific format that allows data to be saved in a table structured format.
Requirements
1. Python IDE or VScode. To run the code
2. An XML file to parse it and extract its data into a CSV file.
Steps For XML Document Parsing And Extraction In Python
Step 1: Download the XML file and change its path in code according to your path.
Step 2: Paste the below code in your IDE/code editor;
Source Code
# Import the required modules import csv import xml.etree.ElementTree as ET # Function 1 - To parse the XML file def parseXML(xmlfile): # To parse XML document into element tree tree = ET.parse(xmlfile) # To get the root element of this tree root = tree.getroot() # Empty list for content content = [] # Returns list containing all matching elements in document order for c in root.findall('./channel/item'): # An empty dictionary project = {} # To iterate the child elements of the content for child in c: # A special checking for namespace object content : media if child.tag == '{https://myprojectideas.com/}content': project['Media'] = child.attrib['url'] else: project[child.tag] = child.text.encode('utf8') # To append the dictionary to content list content.append(project) # To return the content list return content # Function 2 - To convert the XML file to CSV file def savetoCSV(content, filename): # Specify the fields for the csv file fields = ['Domain', 'ProjectName', 'Description', 'WebsiteLink', 'YoutubeLink' , 'Media'] # Writing to the csv file using character 'w' with open(filename, 'w') as csvfile: # Create a csv dictionary writer object writer = csv.DictWriter(csvfile, fieldnames = fields) # To write the headers writer.writeheader() # To write the rows writer.writerows(content) # Main function to call the sub functions def main(): # To parse the XML file content = parseXML('videos/SampleXML.xml') # To convert the XML file to CSV file savetoCSV(content, 'videos/CSVFile.csv') if __name__ == "__main__": # To call the main function main()
Explanation Of The Code
We initially imported CSV and XML ElementTree modules for CSV conversion and parsing purposes.
1. In the first function, we are performing parsing.
2. We use the parse function to parse an XML document into an element tree.
3. To get the root element of this tree, we are using the getroot() function.
4. After this, we have created an empty list to store the content of the XML file.
5. For loop returns a list containing all matching elements in document order.
6. We are creating a dictionary and appending it to the content list inside the for loop.
7. In the second function, we convert the XML file into a CSV file.
8. Here, we need to specify the fields for the CSV file.
9. Using open function & character ‘w’, we are writing into the CSV file.
10. Create a CSV dictionary writer object, and write the headers and rows using the writeheader() and writerows() function.
11. In the main function, we are calling both subfunctions.
12. At last, we call the main function to run the program.
Output
The following input XML file was used for this project.
https://drive.google.com/drive/folders/14-WQrsU6o-mxb_kambI6cuxT5IROzG1S?usp=sharing
The XML file will be parsed and extracted in CSV format, and when you try to edit the file, it will look clear, just like the below image.
Things to Remember
- Before running the code, check the XML and CSV file path to view the output.
- If you are using your own XML file, then don’t forget to change the fields name; else, it will give errors. In that case, first, use the given XML file and then edit the code accordingly.

Cisco Ramon is an American software engineer who has experience in several popular and commercially successful programming languages and development tools. He has been writing content since last 5 years. He is a Senior Manager at Rude Labs Pvt. Ltd.
0 Comments