Paper Image Downloader - Setup Guide

This set of scripts automates downloading first-page images from your research papers and updating the publications carousel on your website.

Scripts

download_paper_images.py - Downloads PDFs and extracts first pages as JPG images
update_carousel.py - Helps update the research.html carousel with the new images

Prerequisites

You’ll need to install Python packages:

pip install requests pdf2image Pillow

Additional System Requirements

Since pdf2image requires poppler-utils, install it for your OS:

Windows:

choco install poppler

Or download from: https://github.com/oschwartz10612/poppler-windows/releases/

macOS:

brew install poppler

Linux (Ubuntu/Debian):

sudo apt-get install poppler-utils

Usage

Step 1: Download Paper Cover Images

Run the downloader script from the workspace root:

python download_paper_images.py

This will:

Download all 18 papers from your publications
Extract the first page from each PDF
Convert to JPG images (150 DPI)
Save to /images folder with pattern: {paper_name}_cover.jpg
Show a summary of success/failures

Step 2: Review Generated Images

Check the /images folder to review the generated cover images. You should see:

images/
  ├── backbones_cover.jpg
  ├── breakhis_cover.jpg
  ├── chatty_cover.jpg
  ├── cxv_cover.jpg
  ├── edsnet_cover.jpg
  ├── emnlp_cover.jpg
  ├── federated_cover.jpg
  ├── fld_cover.jpg
  ├── fld_plus_cover.jpg
  ├── graphs_cover.jpg
  ├── inpainting_cover.jpg
  ├── rl2_cover.jpg
  ├── samsung_cover.jpg
  ├── vixwacv22_cover.jpg
  ├── wavemix_cover.jpg
  ├── wavemixsr_cover.jpg
  ├── wavemixsrv2_cover.jpg
  └── wavepaint_cover.jpg

Step 3: Update Carousel HTML

Run the carousel updater to assist with HTML updates:

python update_carousel.py

Then manually update _pages/research.html carousel section:

Replace this pattern:

<div class="w-full h-48 bg-gradient-to-br from-purple-400 to-indigo-600 flex items-center justify-center">
  <span class="text-white text-center px-4"><strong>Paper Name</strong><br/>Subtitle</span>
</div>

With:

<img src="/images/{paper_name}_cover.jpg" alt="Paper Cover" class="w-full h-48 object-cover">

Paper Name Mappings

Image File	Publication
backbones_cover.jpg	Which Backbone to Use
breakhis_cover.jpg	Magnification Invariant Medical Image Analysis
chatty_cover.jpg	Adversarial Transport Terms (ATT)
cxv_cover.jpg	Convolutional Xformers for Vision
edsnet_cover.jpg	EDSNet: Efficient-DSNet for Video Summarization
emnlp_cover.jpg	So You Think You’re Funny?
federated_cover.jpg	FLeNS: Federated Learning
fld_cover.jpg	FLD: Normalizing Flow Based Metric
fld_plus_cover.jpg	FLD+: Data-efficient Evaluation Metric
graphs_cover.jpg	Heterogeneous Graphs for Breast Cancer
inpainting_cover.jpg	Resource-efficient Image Inpainting
rl2_cover.jpg	RL2: Histopathology Metric
samsung_cover.jpg	PawFACS: Pet Facial Action Recognition
vixwacv22_cover.jpg	ViX: Resource-Efficient Hybrid X-Formers
wavemix_cover.jpg	WaveMix: Token Mixer
wavemixsr_cover.jpg	WaveMixSR: Super-resolution
wavemixsrv2_cover.jpg	WaveMixSR-V2: Enhanced Super-resolution
wavepaint_cover.jpg	WavePaint: Self-Supervised Inpainting

Troubleshooting

Downloads fail for certain papers

Some journals may:

Block automated downloads
Have CAPTCHAs
Require subscriptions

Solution: Manually download PDFs and place in .temp_papers folder, then run the script again (it skips existing files).

PDF to image conversion fails

Ensure poppler is installed correctly:

# Test poppler (if installed via chocolatey/brew)
pdftoppm --version

Missing first page

Some PDFs may have blank first pages. Review the generated images and manually crop better pages if needed.

Script crashes on specific papers

Check the error message. Common issues:

PDF is encrypted or corrupted
Network timeout (try again)
Unsupported PDF format

You can manually download and convert those PDFs separately.

Manual Alternative

If automated downloading fails, you can:

Visit each paper link manually
Download the PDF
Extract first page using any PDF reader
Save as JPG to /images/{paper_name}_cover.jpg
Update the HTML manually

Clean Up

After successful conversion, the script auto-deletes temporary PDFs. To manually clean:

rm -rf .temp_papers  # Linux/macOS
rmdir /s .temp_papers  # Windows

Need Help?

If you encounter issues:

Check that all required packages are installed: pip list
Verify poppler is in system PATH: pdftoppm --version
Try downloading a single paper manually to test the PDF
Check file permissions in /images folder

Created: March 2026
For: Pranav’s Research Portfolio Website

Dr. Pranav Jeevan P