.pdf to .txt Converter
This is a really simple but effective way to convert pdf files to txt files.
Prerequisites
You will need to install the “pdfminer” library first, for this to work:
pip install pdfminer.six
How to use it
- Save the script below in the folder of your choice
- Rename the .pdf file you want to convert as “test.pdf” and save it in the same folder as the script
- Run the script
- The text file is saved as “test.txt”
import io
import pdfminer.high_level
import pdfminer.layout
# Open the PDF file in read-binary mode
with open('test.pdf', 'rb') as pdf_file:
# Use pdfminer to extract the text
extracted_text = pdfminer.high_level.extract_text(pdf_file)
# Open a new text file in write mode
with open('test.txt', 'w', encoding='utf-8') as txt_file:
# Write the extracted text to the text file
txt_file.write(extracted_text)
Leave a Reply