white printing paper with numbers

.csv to HTML Converter

This is a really simple but effective way to convert single column .csv files to HTML files.

Usage

Why would you use this? Well sometimes you may have a long list you want to render as a table on a website. It can be very time consuming to do this by hand, so this script does it automaticaly in the blink of an eye. I used it on this post to render the list of voices as a table.

Prerequisites

There are no prerequisites to install with this script.

How to use it

  1. Save the script below in the folder of your choice
  2. Rename the .csv file you want to convert as “input.csv” and save it in the same folder as the script
  3. Run the script
  4. The HTML file is saved as “output.html”
  5. If you want more than 2 columns, change “2” in the two places in the script indicated by comments
import csv

def csv_to_html(input_csv, output_html):
    with open(input_csv, 'r') as file:
        reader = csv.reader(file)
        data = [row[0] for row in reader]

    html_table = '<table>\n'
    for i in range(0, len(data), 2): # Change "2" to the number of columns you require
        html_table += '<tr>\n'
        for j in range(2):  # Change "2" to the number of columns you require
            if i + j < len(data):
                html_table += f'<td>{data[i + j]}</td>'
            else:
                html_table += '<td></td>'  # Add empty cell if data is exhausted
        html_table += '</tr>\n'
    html_table += '</table>'

    with open(output_html, 'w') as output_file:
        output_file.write(html_table)

if __name__ == "__main__":
    input_csv_file = "input.csv"
    output_html_file = "output.html"
    
    csv_to_html(input_csv_file, output_html_file)
meal, food, bread time

Text Chunker

This script works its way through a long text file and “chunks” the text into smaller files, each with 100 sentences. You can easily change the length of the chunked files.

Usage

This is useful for creating smaller text files for summarisation, or for converting a long book into smaller sections and then using a text to speech script HERE to create bite-size audible files.

Prerequisites

You will need to install the “pdfminer” library first, for this to work:

pip install pdfminer.six

How to use it

  1. Save the script below in the folder of your choice
  2. Rename the text file you want to chunk as “test.txt” and save it in the same folder as the script
  3. Run the script
  4. The chunked files are saved in the “chunks” folder
  5. You can change the length of each chunk by adjusting the parameter currently set to “100”.
import nltk
import re
import os

# Define the path to the input file
input_path = 'test.txt'

# Read in the input file
with open(input_path, 'r', encoding='utf-8', errors='ignore') as f:
    text = f.read()
    text = text.replace('\uf0b7', '#')  # replace problematic character with #
	
# Remove line breaks and page breaks
text = re.sub(r'\n|\f', '', text)

# Use NLTK to split the text into individual sentences
sentences = nltk.sent_tokenize(text)

# Use regular expressions to split the sentences into chunks of 100
chunks = [sentences[i:i+100] for i in range(0, len(sentences), 100)]

# Create a directory to store the output files
if not os.path.exists('chunks'):
    os.mkdir('chunks')

# Loop through the chunks and save each one as a separate text file
for i, chunk in enumerate(chunks):
    chunk_text = ' '.join(chunk)
    chunk_num = i + 1
    output_path = f'chunks/chunk {chunk_num}.txt'
    with open(output_path, 'w') as f:
        f.write(chunk_text)
pdf, document, icon

.pdf to .txt Converter

This is a really simple but effective way to convert pdf files to txt files.

Prerequisites

You will need to install the “pdfminer” library first, for this to work:

pip install pdfminer.six

How to use it

  1. Save the script below in the folder of your choice
  2. Rename the .pdf file you want to convert as “test.pdf” and save it in the same folder as the script
  3. Run the script
  4. The text file is saved as “test.txt”
import io
import pdfminer.high_level
import pdfminer.layout

# Open the PDF file in read-binary mode
with open('test.pdf', 'rb') as pdf_file:

    # Use pdfminer to extract the text
    extracted_text = pdfminer.high_level.extract_text(pdf_file)

# Open a new text file in write mode
with open('test.txt', 'w', encoding='utf-8') as txt_file:

    # Write the extracted text to the text file
    txt_file.write(extracted_text)