.pdf to .txt Converter

This is a really simple but effective way to convert pdf files to txt files.

Prerequisites

You will need to install the “pdfminer” library first, for this to work:

pip install pdfminer.six

How to use it

  1. Save the script below in the folder of your choice
  2. Rename the .pdf file you want to convert as “test.pdf” and save it in the same folder as the script
  3. Run the script
  4. The text file is saved as “test.txt”
import io
import pdfminer.high_level
import pdfminer.layout

# Open the PDF file in read-binary mode
with open('test.pdf', 'rb') as pdf_file:

    # Use pdfminer to extract the text
    extracted_text = pdfminer.high_level.extract_text(pdf_file)

# Open a new text file in write mode
with open('test.txt', 'w', encoding='utf-8') as txt_file:

    # Write the extracted text to the text file
    txt_file.write(extracted_text)