No description
Find a file
2025-09-06 11:25:55 +12:00
.gitignore Initial 2025-09-06 02:08:40 +12:00
html_optimizer.py optimize existing post 2025-09-06 11:25:55 +12:00
main.py optimization 2025-09-06 11:18:14 +12:00
README.md optimization 2025-09-06 11:18:14 +12:00

Blog Generator with Image Optimization

Overview

The enhanced main_optimized.py script extends the original blog generator with powerful image optimization and cloud storage capabilities.

Key Features

🖼️ Image Optimization

  • Automatic resizing: Images larger than 1200x1200px are resized while maintaining aspect ratio
  • Format conversion: PNG/JPG images are converted to WebP for better compression
  • Quality optimization: Images are compressed with 85% quality for optimal size/quality balance
  • EXIF handling: Automatic rotation based on EXIF orientation data
  • Massive size reduction: Typically 85-95% file size reduction

☁️ Cloud Storage Integration

  • Google Cloud Storage: Automatically uploads optimized images to GCP bucket
  • Public URLs: Generates public URLs for all uploaded images
  • Organized structure: Images are organized by folder (e.g., 06/ for June)
  • CDN delivery: Images served from Google's global CDN for fast loading

🌐 HTML Generation

  • Public URLs: HTML uses cloud storage URLs instead of local file paths
  • Fallback support: If upload fails, falls back to local file references
  • Same format: Maintains the same HTML structure as the original script

Setup

1. Install Dependencies

pip install python-docx google-cloud-translate google-cloud-storage Pillow

2. Configure Environment

# Required: Path to service account JSON
export GOOGLE_APPLICATION_CREDENTIALS="./service_account.json"

# Optional: Override default bucket name
export GCP_BUCKET_NAME="your-bucket-name"  # defaults to "filipkin-blog-images"

3. Ensure GCP Bucket Exists

The bucket filipkin-blog-images should already exist and be publicly accessible.

Usage

python main_optimized.py input.docx output.html

Example

python main_optimized.py files/06/index.docx files/06/index_optimized.html

What Happens

  1. Extract images from the DOCX file
  2. Optimize each image:
    • Resize if larger than 1200x1200px
    • Convert to WebP format
    • Compress with 85% quality
    • Apply EXIF orientation fixes
  3. Upload to GCP bucket:
    • Upload to gs://filipkin-blog-images/FOLDER/imagename.webp
    • Generate public URL: https://storage.googleapis.com/filipkin-blog-images/FOLDER/imagename.webp
  4. Generate HTML with public URLs
  5. Save backup of optimized images locally

Performance Improvements

Before (Original)

  • File sizes: 1-3MB per image
  • Format: PNG/JPG
  • Storage: Local files only
  • Loading: Slow, especially on mobile
  • Total size: ~35MB for a typical blog post

After (Optimized)

  • File sizes: 50-400KB per image (85-95% reduction)
  • Format: WebP (better compression)
  • Storage: Google Cloud Storage with CDN
  • Loading: Fast global delivery
  • Total size: ~3.5MB for the same blog post

Error Handling

  • Upload failures: Falls back to local file references
  • Optimization failures: Uses original image if optimization fails
  • Missing credentials: Clear error messages with setup instructions
  • Network issues: Continues processing other images if one fails

File Organization

filipkin-blog-images/
├── 06/
│   ├── image1.webp
│   ├── image2.webp
│   └── ...
├── 07/
│   ├── image1.webp
│   └── ...
└── test/
    └── test_images...

Benefits

  1. Faster loading: 90%+ smaller file sizes
  2. Better user experience: Especially on mobile/slow connections
  3. Global CDN: Fast delivery worldwide via Google's infrastructure
  4. Future-proof: Easy to update images without re-deploying
  5. Cost effective: Reduces bandwidth costs
  6. SEO benefits: Faster page load times improve search rankings

Backward Compatibility

The optimized script maintains full compatibility with the original:

  • Same command-line interface
  • Same HTML structure
  • Same translation features
  • Falls back gracefully if cloud features aren't available