How I saved 100s of hours of work for a local company using Python and python-docx (N00B Friendly) [Part 2/2]

By coderad on 4/10/2025

So in last week's post we got our hands on our word documents and then wrote a Python script to extract the data we need from each one. We used that data to build us a folder of .csv files corresponding to each document with the pared-down data inside.

Quick and easy, and now we have all our fire alarm system device information in files with the address of the building or complex or facility (as the filename) in a fraction of the amount of time it would have taken to compile the information manually.

And we didn't blow anything up! Trust me you do not want the fire department called, there's so much more paperwork surrounding that as I've learned from this gig.

_{Souce: KlingAI}

Getting Started

.CSVs in hand, it's time to integrate the data into the new reports! This essentially the reverse of the first part with a twist; we're still scanning for tables and iterating through them, we're just pasting the new data instead of copying old data.

As always, we start with our imports, which will be the same as in FetchData.py, with one addition:

import os
import docx
import csv
import math

We have os for file system and docx and csv for working with docx and csv files. We added math for access to the ceiling function, which we'll be using later on to calculate the number of pages we need for our output report.

Finding our Target Table

With our original source sheets, the only tables in there were ones that contained the data we need. In our new integrated report, the fire alarm devices start on page 13. So we're going to need a unique marker, something that differentiates our table from the other dozen tables that we don't care about. A quick scan of the report template shows that our table says "Device Location" in the top left cell, and none of the other tables do, so there's our hook! We're also going to be dealing with multiple tables eventually, so we'll support that as well:

def find_device_tables(doc):
    tables = []
    for table in doc.tables:
        if table.rows and table.rows[0].cells and table.rows[0].cells[0].text.strip() == "Device Location":
            tables.append(table)
    return tables

What are we up to here?

We're defining a function called find_device_tables that takes a docx as input
We're declaring an empty array for our table IDs
We're looking at each table in the template to see if it matches our reference string in cell 0,0
We're adding that ID to our list of table IDs

Now we have a nice list of table IDs that we can use to place our data once we load it from the csv.

Building the Templates

Our new report template only has space for 20 records. This part of the report lists all the fire protection devices that need to be checked, like smoke alarms, fire extinguishers and fire alarm pull stations.

You know, the things that you're not supposed to pull as a joke, ever, and which Kling has apparently never heard of.

_{Souce: KlingAI}

On small jobs, 20 might be fine but when you get into a big building like a school or hospital or mall, there could be dozens, if not hundreds.

My first thought was to basically just find the blank table and then programmatically copy it and create a new page with another copy of the table and repeat for however many times we need to hold all our records for each site. But when I actually tried it out, I had a lot of difficulty getting the styles to carry over. I could create lots of new tables, but the formatting was a mess. The vertically aligned legend, the background cell colors, the column widths. I tried a couple approaches but I quickly realized that it was going to be much faster to just actually manually create multiple report templates where each template already had the right amount of pages available.

Easy! I created a new folder called /templates and then saved the original template as t0020.docx. Then I copied my template page, pasted it, saved it as t0040.docx. (Then t0060, t0080, t0100 etc). Three inputs each: CTRL-V (Paste), F12 (Save As), and change one digit. Then repeat, adding a page each time with a new name.

Could I have automated this? Certainly, but the fact is it took all of 20 minutes to manually generate the reports with support for up to 1200 devices. It would have taken at least that long to brew a coffee and articulate the problem. A fun exercise, maybe, but a little overkill for this one-time task.

Lets do up a quick function to pick our template:

def get_template_for_row_count(row_count):

    template_size = math.ceil(row_count / 20) * 20  # Round up to nearest 20
    return f"./templates/t{str(template_size).zfill(4)}.docx"

We're using ceiling from math here to quickly round our row count (which we will grab from the .csv file) to the nearest 20, and return the actual template name, ie. t0120.docx to the function that's going to build a new file.

Performing the Merge

This is a longer function, but that's okay. No one will ever look at your code, unless you blog about it or something similarly silly. We'll break it up a little here.

We start with defining our merge function and loading up the .csv, dumping the data into a list. We'll call our new get_template_for_row_count function to select our template. If the template doesn't exist, then that's fine -- it tells us how many rows we need so we can cobble together the appropriate template afterwards and run it again.

In the end I needed to make a few of these like t2150 and t3060 for particularly unique sites, which is fine.

def merge_data(csv_path, output_path):
    # Read CSV data
    with open(csv_path, newline='', encoding="utf-8") as csvfile:
        reader = csv.reader(csvfile)
        data_rows = list(reader)

    row_count = len(data_rows)
    template_path = get_template_for_row_count(row_count)
    
    if not template_path or not os.path.exists(template_path):
        print(f"Error: No suitable template found for {csv_path} ({row_count} rows).")
        return

Now that we've selected our template, we can load it and use our first find_device_tables function to find our tables:

    # Load the Word template
    doc = docx.Document(template_path)
    tables = find_device_tables(doc)

We insert our csv data:

    # Insert data into tables
    row_index = 0
    for table in tables:
        for i in range(1, len(table.rows)):  # Skip header row
            if row_index >= row_count:
                break  # Stop when all data is inserted
            cells = table.rows[i].cells
            cells[0].text = data_rows[row_index][0]  # Location is Column 0
            cells[2].text = data_rows[row_index][1]  # Device is Column 2
            cells[4].text = data_rows[row_index][2]  # Circuit is Column 4
            row_index += 1
        if row_index >= row_count:
            break  # Stop when all data is inserted

    if row_index < row_count:
        print(f"Warning: Not all data was inserted for {csv_path} (ran out of table space).")

And finally set the filename and save it, and delete the original csv:

    # Generate output filename
    new_filename = os.path.basename(csv_path).replace("Device Data.csv", "Fire Alarm Report.docx")
    output_file = os.path.join(output_path, new_filename)

    # Save the modified report
    doc.save(output_file)
    print(f"Merged: {csv_path} -> {output_file}")

    # Delete CSV after successful merge
    os.remove(csv_path)
    print(f"Deleted: {csv_path}")

Why do we delete the csv? Mostly because this way we can easily look at our csv folder after we run our script. Any that are still there obviously need a little special attention; could be something with the filename, could be a missing report template. Just an easy way to tell at a glance what got skipped. And of course, we still have our source files so we can generate a new folder of .csv in no time flat if we want.

Finishing Up

Now we just need to tie it all together!

def process_all_files():
    input_folder = "./processed"
    output_folder = "./migrated"

    if not os.path.exists(output_folder):
        os.makedirs(output_folder)

    for filename in os.listdir(input_folder):
        if filename.endswith(".csv"):
            csv_path = os.path.join(input_folder, filename)
            merge_data(csv_path, output_folder)

process_all_files()

All we have to do now is drop to the CLI and python3 MergeData.py and we're off to the races! Check our processed folder after, deal with any stragglers and Bob's your Uncle.

I did some quick napkin math, and by my reckoning it would have taken them at least 160 man hours to pull this off manually. If they just had assistant assistants doing it for $20 an hour that would have been about $3200.

Of course, they don't have assistant assistants there, so they would have had to use actual assistants as well as the actual account managers. So not only would it be far more in wages, it would have pulled those resources away from their real (and time-sensitive) jobs for the better part of two weeks.

Since I was able to throw this together for less than a thousand bucks and put it together in a couple days, it was a massive win for everyone.

And nobody had to call the fire department.

_{Souce: KlingAI}

Comments (6)

@hivebuzz 4/10/2025

Congratulations @coderad! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)

You received more than 1750 upvotes.
Your next target is to reach 2000 upvotes.

_{You can view your badges on your board and compare yourself to others in the Ranking} _{If you no longer want to receive notifications, reply to this comment with the word STOP}

@aftabirshad 4/11/2025

Awesome work! Good work once again

@coderad 4/11/2025

thank you kindly my friend!

@bpcvoter3 4/11/2025

Wow, love this post It’s been handpicked and curated by the awesome #Bilpcoin team—we’re so happy to support amazing content like this! If you’d like to join us in spreading the positivity, feel free to delegate some Hive Power to this account. Every bit helps us curate and uplift even more creators in the community By adding #bilpcoin or #bpc to original posts, you can earn BPC tokens https://hive.blog/hive-167922/@bpcvoter2/calling-all-music-artists-on-hive-elevate-your-sound-with-ai-infused-beats

By adding #bilpcoin or #bpc to original posts, you can earn BPC tokens

from Imgflip Meme Generator

https://peakd.com/hive-140084/@bpcvoter1/my-way-keni-bpc-ai-music

https://peakd.com/hive-126152/@bpcvoter2/dear-themarkymark-buildawhale-gogreenbuddy-usainvote-ipromote-and-whoever-else-is-involved-in-this-scheme-you-call-us-nutty-as

https://peakd.com/hive-167922/@bilpcoinbpc/exploring-the-possibilities-of-ai-art-with-bilpcoin-nfts-episode-102-buildawhale-scam-farm-on-hive-and-dear-steevc

https://peakd.com/hive-133987/@bilpcoinbpc/comprehensive-analysis-of-punkteam-s-wallet-transactions

https://hive.blog/hive-163521/@bpcvoter1/deep-dive-into-meritocracy-s-activity-history-and-blockchain-audit

https://www.publish0x.com/the-dark-side-of-hive/to-downvoters-scammers-and-farmers-on-hive-the-time-to-chang-xmjzrmp

https://peakd.com/hive-163521/@bpcvoter3/themarkymark-we-ve-exposed-your-actions-repeatedly-how-you-and-your-army-of-bots-manipulate-rewards-to-benefit-yourselves-it-s

https://peakd.com/hive-168088/@bpcvoter3/the-shadow-matrix-a-tale-of-deceit-and-reckoning https://youtu.be/5wEl6BaB2RM

Our move?
🔹 Unite voices: Use #bilpcoin or #bpc to highlight censorship & demand accountability.
🔹 Document abuse: Share evidence of unfair downvotes, self-voting scams, and double standards.
🔹 Push for reform: Advocate for transparent governance, vote caps, and community-driven rules.

Decentralization isn’t just a feature—it’s a fight. Let’s model fairness, rally allies, and pressure Hive to live up to its ideals.

#StandForDecentralization #HiveTransparency