So in last week's post we got our hands on our word documents and then wrote a Python script to extract the data we need from each one. We used that data to build us a folder of .csv files corresponding to each document with the pared-down data inside.
Quick and easy, and now we have all our fire alarm system device information in files with the address of the building or complex or facility (as the filename) in a fraction of the amount of time it would have taken to compile the information manually.
And we didn't blow anything up! Trust me you do not want the fire department called, there's so much more paperwork surrounding that as I've learned from this gig.
Souce: KlingAI
Getting Started
.CSVs in hand, it's time to integrate the data into the new reports! This essentially the reverse of the first part with a twist; we're still scanning for tables and iterating through them, we're just pasting the new data instead of copying old data.
As always, we start with our imports, which will be the same as in FetchData.py, with one addition:
import os
import docx
import csv
import math
We have os for file system and docx and csv for working with docx and csv files. We added math for access to the ceiling function, which we'll be using later on to calculate the number of pages we need for our output report.
Finding our Target Table
With our original source sheets, the only tables in there were ones that contained the data we need. In our new integrated report, the fire alarm devices start on page 13. So we're going to need a unique marker, something that differentiates our table from the other dozen tables that we don't care about. A quick scan of the report template shows that our table says "Device Location" in the top left cell, and none of the other tables do, so there's our hook! We're also going to be dealing with multiple tables eventually, so we'll support that as well:
def find_device_tables(doc):
tables = []
for table in doc.tables:
if table.rows and table.rows[0].cells and table.rows[0].cells[0].text.strip() == "Device Location":
tables.append(table)
return tables
What are we up to here?
- We're defining a function called find_device_tables that takes a docx as input
- We're declaring an empty array for our table IDs
- We're looking at each table in the template to see if it matches our reference string in cell 0,0
- We're adding that ID to our list of table IDs
Now we have a nice list of table IDs that we can use to place our data once we load it from the csv.
Building the Templates
Our new report template only has space for 20 records. This part of the report lists all the fire protection devices that need to be checked, like smoke alarms, fire extinguishers and fire alarm pull stations.
You know, the things that you're not supposed to pull as a joke, ever, and which Kling has apparently never heard of.
Souce: KlingAI
On small jobs, 20 might be fine but when you get into a big building like a school or hospital or mall, there could be dozens, if not hundreds.
My first thought was to basically just find the blank table and then programmatically copy it and create a new page with another copy of the table and repeat for however many times we need to hold all our records for each site. But when I actually tried it out, I had a lot of difficulty getting the styles to carry over. I could create lots of new tables, but the formatting was a mess. The vertically aligned legend, the background cell colors, the column widths. I tried a couple approaches but I quickly realized that it was going to be much faster to just actually manually create multiple report templates where each template already had the right amount of pages available.
Easy! I created a new folder called /templates and then saved the original template as t0020.docx. Then I copied my template page, pasted it, saved it as t0040.docx. (Then t0060, t0080, t0100 etc). Three inputs each: CTRL-V (Paste), F12 (Save As), and change one digit. Then repeat, adding a page each time with a new name.
Could I have automated this? Certainly, but the fact is it took all of 20 minutes to manually generate the reports with support for up to 1200 devices. It would have taken at least that long to brew a coffee and articulate the problem. A fun exercise, maybe, but a little overkill for this one-time task.
Lets do up a quick function to pick our template:
def get_template_for_row_count(row_count):
template_size = math.ceil(row_count / 20) * 20 # Round up to nearest 20
return f"./templates/t{str(template_size).zfill(4)}.docx"
We're using ceiling from math here to quickly round our row count (which we will grab from the .csv file) to the nearest 20, and return the actual template name, ie. t0120.docx to the function that's going to build a new file.
Performing the Merge
This is a longer function, but that's okay. No one will ever look at your code, unless you blog about it or something similarly silly. We'll break it up a little here.
We start with defining our merge function and loading up the .csv, dumping the data into a list. We'll call our new get_template_for_row_count function to select our template. If the template doesn't exist, then that's fine -- it tells us how many rows we need so we can cobble together the appropriate template afterwards and run it again.
In the end I needed to make a few of these like t2150 and t3060 for particularly unique sites, which is fine.
def merge_data(csv_path, output_path):
# Read CSV data
with open(csv_path, newline='', encoding="utf-8") as csvfile:
reader = csv.reader(csvfile)
data_rows = list(reader)
row_count = len(data_rows)
template_path = get_template_for_row_count(row_count)
if not template_path or not os.path.exists(template_path):
print(f"Error: No suitable template found for {csv_path} ({row_count} rows).")
return
Now that we've selected our template, we can load it and use our first find_device_tables function to find our tables:
# Load the Word template
doc = docx.Document(template_path)
tables = find_device_tables(doc)
We insert our csv data:
# Insert data into tables
row_index = 0
for table in tables:
for i in range(1, len(table.rows)): # Skip header row
if row_index >= row_count:
break # Stop when all data is inserted
cells = table.rows[i].cells
cells[0].text = data_rows[row_index][0] # Location is Column 0
cells[2].text = data_rows[row_index][1] # Device is Column 2
cells[4].text = data_rows[row_index][2] # Circuit is Column 4
row_index += 1
if row_index >= row_count:
break # Stop when all data is inserted
if row_index < row_count:
print(f"Warning: Not all data was inserted for {csv_path} (ran out of table space).")
And finally set the filename and save it, and delete the original csv:
# Generate output filename
new_filename = os.path.basename(csv_path).replace("Device Data.csv", "Fire Alarm Report.docx")
output_file = os.path.join(output_path, new_filename)
# Save the modified report
doc.save(output_file)
print(f"Merged: {csv_path} -> {output_file}")
# Delete CSV after successful merge
os.remove(csv_path)
print(f"Deleted: {csv_path}")
Why do we delete the csv? Mostly because this way we can easily look at our csv folder after we run our script. Any that are still there obviously need a little special attention; could be something with the filename, could be a missing report template. Just an easy way to tell at a glance what got skipped. And of course, we still have our source files so we can generate a new folder of .csv in no time flat if we want.
Finishing Up
Now we just need to tie it all together!
def process_all_files():
input_folder = "./processed"
output_folder = "./migrated"
if not os.path.exists(output_folder):
os.makedirs(output_folder)
for filename in os.listdir(input_folder):
if filename.endswith(".csv"):
csv_path = os.path.join(input_folder, filename)
merge_data(csv_path, output_folder)
process_all_files()
All we have to do now is drop to the CLI and python3 MergeData.py and we're off to the races! Check our processed folder after, deal with any stragglers and Bob's your Uncle.
I did some quick napkin math, and by my reckoning it would have taken them at least 160 man hours to pull this off manually. If they just had assistant assistants doing it for $20 an hour that would have been about $3200.
Of course, they don't have assistant assistants there, so they would have had to use actual assistants as well as the actual account managers. So not only would it be far more in wages, it would have pulled those resources away from their real (and time-sensitive) jobs for the better part of two weeks.
Since I was able to throw this together for less than a thousand bucks and put it together in a couple days, it was a massive win for everyone.
And nobody had to call the fire department.
Souce: KlingAI
Comments (6)
Congratulations @coderad! You have completed the following achievement on the Hive blockchain And have been rewarded with New badge(s)
Your next target is to reach 2000 upvotes.
You can view your badges on your board and compare yourself to others in the Ranking If you no longer want to receive notifications, reply to this comment with the word
STOP
Awesome work! Good work once again
thank you kindly my friend!
By adding #bilpcoin or #bpc to original posts, you can earn BPC tokens
https://peakd.com/hive-140084/@bpcvoter1/my-way-keni-bpc-ai-music
https://peakd.com/hive-126152/@bpcvoter2/dear-themarkymark-buildawhale-gogreenbuddy-usainvote-ipromote-and-whoever-else-is-involved-in-this-scheme-you-call-us-nutty-as
https://peakd.com/hive-167922/@bilpcoinbpc/exploring-the-possibilities-of-ai-art-with-bilpcoin-nfts-episode-102-buildawhale-scam-farm-on-hive-and-dear-steevc
https://peakd.com/hive-133987/@bilpcoinbpc/comprehensive-analysis-of-punkteam-s-wallet-transactions
https://hive.blog/hive-163521/@bpcvoter1/deep-dive-into-meritocracy-s-activity-history-and-blockchain-audit
https://www.publish0x.com/the-dark-side-of-hive/to-downvoters-scammers-and-farmers-on-hive-the-time-to-chang-xmjzrmp
https://peakd.com/hive-163521/@bpcvoter3/themarkymark-we-ve-exposed-your-actions-repeatedly-how-you-and-your-army-of-bots-manipulate-rewards-to-benefit-yourselves-it-s
https://peakd.com/hive-168088/@bpcvoter3/the-shadow-matrix-a-tale-of-deceit-and-reckoning https://youtu.be/5wEl6BaB2RM
Decentralization isn’t just a feature—it’s a fight. Let’s model fairness, rally allies, and pressure Hive to live up to its ideals.
#StandForDecentralization #HiveTransparency
Excellent information.
thank you!
sometimes it's nice to see this stuff actually solving real-world problems, instead of making like the 1 millionth TODO List app haha
!PIZZA
$PIZZA slices delivered: @danzocal(9/10) tipped @coderad
Moon is coming