01

Project Overview

Understanding the autonomous blog engine

🤖

AI-Powered Content

Google Gemini 2.5 Flash generates rich, philosophical analysis posts with unique insights for every movie and series.

🎬

TMDB Integration

Automatically fetches movie/series data, cast info, and high-quality backdrop images from The Movie Database.

GitHub Actions

Scheduled workflows run twice daily, handling everything from content generation to deployment.

🖼️

Image Processing

All images are automatically processed to optimized WebP format with intelligent compression.

📧

Backup System

Automated backups every 3 days via email, with smart compression and split archives for large files.

🚀

Jekyll Deployment

Automatic build and deploy to GitHub Pages whenever new content is pushed to the repository.

Technology Stack

💎 Jekyll Static Site
🐙 GitHub Pages Hosting
⚙️ GitHub Actions Automation
🧠 Gemini 2.5 AI Content
🎞️ TMDB API Movie Data
📸 Unsplash Recap Images
✈️ Telegram Notifications
🐍 Python 3.11 Core Language

📅 Weekly Output Schedule

Day Posts Type
Monday - Saturday 4 posts/day 2 movies + 2 series (or 4 movies)
Sunday 1 post Weekly philosophical recap
Total 25 posts/week Fully automated
02

System Architecture

High-level flow of the automation system

GitHub Actions Triggers

0 3 * * * ~08:30 AM IST daily
0 12 * * 1-6 ~05:30 PM IST Mon-Sat
0 6 */3 * * ~11:30 AM IST (backup)
push to main Jekyll build & deploy
🐍

main.py Execution

1 VALIDATE Environment
2 SELECT Items from CSV
3 FETCH TMDB Data
6 SAVE Post & Update
5 GENERATE Gemini Content
4 DOWNLOAD Process Images
7 REMOVE From CSV
8 PRE-CHECK Next Items
📤

Git Commit & Push

  • Commits new posts, images, updated data files
  • Pushes to main branch
  • Triggers pages-deploy.yml workflow
🚀

Jekyll Build & Deploy

  • Builds static site with Jekyll
  • Deploys to GitHub Pages
  • ✨ Site is live!
03

Data Files Deep Dive

Core data structures powering the automation

04

Core Python Scripts

The heart of the automation system

🐍

main.py

2,478 lines
Core Engine
💾

backup.py

327 lines
Backup System
Backup Directories:
_posts/Blog posts
assets/img/posts/Post images
data/CSVs, metadata
Original/CSV backups
Key Functions:
  • collect_stats() - Scans directories, counts files and bytes
  • create_archive() - Creates .tar.gz with max compression
  • split_archive() - Splits if > 24MB (Gmail limit)
  • send_backup_email() - Sends beautiful HTML email with attachment
05

GitHub Actions Workflows

Automated CI/CD pipeline configuration

📝

daily_post.yml

210 lines
Content Automation
Schedule:
0 3 * * * 08:30 AM IST daily (Sun: recap)
0 12 * * 1-6 05:30 PM IST Mon-Sat only
Steps:
📥 Checkout
🐍 Setup Python 3.11
📦 Install dependencies
Random delay (0-20 min)
📁 Create directories
📋 Verify CSVs
🤖 Run main.py
📝 Git config
📤 Commit changes
🚀 Push to main
💾

backup.yml

85 lines
Data Backup
Schedule:
0 6 */3 * * ~11:30 AM IST every 3 days
Steps:
📥 Checkout
🐍 Setup Python 3.11
📁 Verify directories
💾 Run backup.py
📊 Write summary
🚀

pages-deploy.yml

65 lines
Jekyll Deployment
Trigger:
push to main When _posts/ changes
Jobs:
Build Job
  • Setup Ruby 3.3
  • Configure Pages
  • Jekyll build
  • Upload artifact
Deploy Job
  • Download artifact
  • Deploy to Pages
  • Output URL
06

Daily Automation Flow

Complete sequence of operations for each run

Trigger Time: 08:30 AM IST or 05:30 PM IST
⚙️ GitHub Actions
🐍 main.py
🌐 External APIs
GitHub Actions
🚀 Trigger workflow
GitHub Actions
Random delay (0-20 min)
main.py
Validate environment → APIs
main.py
📂 Load CSV files & processing queue
🔄 FOR EACH ITEM (2 items: movie + series)
1 Check Telegram uploads
2 Fetch TMDB data
3 Download & process images
4 Generate content (Gemini)
5 Save .md post
6 Update history & metadata
7 Remove from CSV
main.py
🔍 Pre-check next items & save queue → APIs
GitHub Actions
📤 Git add → commit → push
🚀 Triggers Jekyll build & deploy → pages-deploy.yml
07

Sunday Weekly Recap Flow

The philosophical synthesis process

1

🔍 Detect Sunday

is_sunday() returns True based on TRIGGER_SCHEDULE or datetime.weekday()

2

📋 Get Week's Posts

Parse history.log, filter Monday 00:00 → Sunday 23:59, exclude recap entries. Minimum 3 posts required.

3

🔎 Generate Search Query

Gemini analyzes titles → outputs 2-4 word philosophical query
Example: "existential contemplation"

4

📸 Fetch Unsplash Image

Search /search/photos?orientation=landscape, filter not in used_unsplash.json, select random from top 10

5

🖼️ Process Hero Image

Convert to RGB → Resize to 1920px → Compress to ≤500KB WebP → Save as week{N}_{year}_recap_hero.webp

6

🧠 Generate Recap Content

Gemini creates 1500-2000 word philosophical synthesis weaving ALL films into one narrative with 6-8 blockquotes

7

💾 Save Post

Filename: YYYY-MM-DD-weekly-recap-week-N.md
Sanitize title, write to _posts/

8

📜 Update History

Entry: recap_w{N}_{year}
Prevents duplicate recap generation

08

Backup System

Automated data protection every 3 days

💾

backup.py Execution

Every 3 days at 06:00 UTC (~11:30 AM IST)
1 Validate SMTP credentials
2 Scan directories: _posts/, assets/img/posts/, data/, Original/
3 Create archive: whatsup_backup_YYYY-MM-DD_HHMM.tar.gz (gzip level 9)
4
Check size:
< 24MB → Single file > 24MB → Split into .part01, .part02...
5 Send email(s) with HTML summary + archive attachment(s)
6 Cleanup temporary files
09

API Integrations

External services powering the automation

🎬

TMDB API

Base URL: https://api.themoviedb.org/3
Endpoints:
/find/{imdb_id} Convert IMDb ID to TMDB ID
/movie/{id} Get movie details
/tv/{id} Get TV show details
Image CDN:
https://image.tmdb.org/t/p/original/{file_path}
🧠

Google Gemini API

SDK: google-genai Python package
Model: gemini-2.5-flash
Features Used:
  • Blog post generation (800-1200 words)
  • Weekly recap synthesis (1500-2000 words)
  • Unsplash search query generation
  • Image prompt generation
📸

Unsplash API

Base URL: https://api.unsplash.com
Endpoint:
/search/photos Search for images
Parameters:
  • query - philosophical query
  • orientation=landscape
  • content_filter=high
✈️

Telegram Bot API

Base URL: https://api.telegram.org/bot{token}
Endpoints:
/sendMessageSend alerts
/getUpdatesPoll for photos
/getFileDownload path
/deleteMessageCleanup
📧

Gmail SMTP

Server: smtp.gmail.com:587
Auth: App Password (not regular password)
Usage:
  • Backup email delivery
  • Alert notifications
  • Manual fallback requests
10

Error Handling & Fallbacks

Resilient systems for edge cases

🔄

API Error Handling

Gemini API:
  • 3 retry attempts max
  • Exponential backoff on 503/429 errors
  • Wait: 10s → 20s → 40s between retries
TMDB API:
  • Graceful degradation on failure
  • Returns None for web search fallback
🖼️

Missing Images Fallback

Telegram Manual Upload System:
  1. Pre-check identifies missing images
  2. Sends Telegram + Email alerts
  3. Upload with captions: HERO_{imdb_id}, IMG1_{imdb_id}...
  4. Next run checks for uploads before TMDB
📺

Series Depletion

When series.csv is empty:

if available_series.empty:
  # Switch to double movie mode
  # Select 2 movies instead
🔁

Image Deduplication Reset

When all images for an item have been used:

if not available_backdrops:
  print("All images used!")
  available_backdrops = backdrops # Reset
11

Environment Variables

Configuration secrets and settings

Required

Automation fails without these

Variable Purpose Source
GEMINI_API_KEY Google Gemini AI Google AI Studio
TMDB_API_KEY Movie database TMDB Account
Optional

Features disabled without these

Variable Purpose Source
UNSPLASH_ACCESS_KEY Sunday recap images Unsplash Developer
TELEGRAM_BOT_TOKEN Manual upload fallback BotFather
TELEGRAM_CHAT_ID Target chat for alerts Bot chat
SMTP_EMAIL Email sender address Gmail
SMTP_PASSWORD Email app password Gmail App Passwords
NOTIFICATION_EMAIL Recipient for alerts Any email
GitHub Actions

CI/CD specific variables

Variable Purpose
GH_PAT Personal Access Token for push
TRIGGER_SCHEDULE Cron expression that triggered run
12

File Structure Reference

Complete project organization

📁 WHATSUP/
📁 .github/workflows/
📄 daily_post.yml Content automation
📄 backup.yml Data backup
📄 pages-deploy.yml Jekyll deployment
📁 scripts/
🐍 main.py Core automation (2478 lines)
🐍 backup.py Backup system (327 lines)
📄 requirements.txt Python dependencies
📁 data/
📊 movies.csv Source movies
📊 series.csv Source series
📜 history.log Processed items log
📋 metadata_db.json Post metadata
📋 processing_queue.json Pre-validated items
📋 used_images.json TMDB deduplication
📋 used_unsplash.json Unsplash deduplication
📁 _posts/
📝 2026-02-02-rogue-one.md
📝 2026-02-02-game-of-thrones.md
... More markdown posts
📁 assets/img/posts/
🖼️ tt3748528_hero.webp
🖼️ tt3748528_1.webp
... More WebP images
⚙️ _config.yml Jekyll configuration
💎 Gemfile Ruby dependencies
🏠 index.html Site homepage
📖 README.md Project readme
13

Troubleshooting Guide

Common issues and solutions

🛠️ Manual Intervention Commands

Run locally (testing):
cd scripts && python main.py
Skip random delay:
python main.py --skip-delay
Trigger workflow manually:

Actions tab → Select workflow → "Run workflow"