"""
Folder Comparison Tool - Cross-Platform Side-by-Side Excel Report Generator
=============================================================================
File      : 05-Compare-Two-Folders_V01.py
Version   : 1.0.0
Author    : Yahya Nazer
Copyright : (c) 2025 Chatbizdb.com - Yahya Nazer
License   : Proprietary
Email     : contact@chatbizdb.com
Status    : Production
Date      : 2026-05-13

Objective
---------
Let the user select two folders (A and B) via GUI dialogs, recursively scan
both for files, and write a four-sheet Excel (.xlsx) report that shows:
    - System and folder metadata (setup sheet).
    - All files from folder A with clickable hyperlinks (a_folder sheet).
    - All files from folder B with clickable hyperlinks (b_folder sheet).
    - A side-by-side path comparison with colour-coded highlighting to show
      which files exist in one folder but not the other (compare sheet).

The report is saved to a timestamped .xlsx file in a 'report/' subfolder
beside the script.

How the Comparison Works
--------------------------
Files are compared using their RELATIVE PATHS from each folder root, not
their absolute paths or filenames alone.  This means:
    - 'subfolder/photo.jpg' in Folder A is treated as the same file as
      'subfolder/photo.jpg' in Folder B.
    - 'photo.jpg' at the root of A is different from 'subfolder/photo.jpg'
      in B, even though the filename is the same.

Relative path normalisation:
    os.path.relpath(file_info['full_path'], folder_path)
        Converts an absolute path to a path relative to the folder root.
        Example:
            full_path   = /home/user/Photos/2024/January/img001.jpg
            folder_path = /home/user/Photos
            relpath     = 2024/January/img001.jpg

    .replace(os.sep, '/')
        Normalises the separator to forward slash on all OS so Windows
        paths ('2024\\January\\img001.jpg') match Unix paths correctly
        when compared as strings.

Set-based comparison:
    a_folder_set = set(a_folder_relative_paths)
    b_folder_set = set(b_folder_relative_paths)

    Set membership tests (in / not in) run in O(1) average time.
    For each file in A, checking 'not in b_folder_set' instantly tells
    whether the file is missing from B.  This is far faster than a
    nested loop (O(n²)) for large folders.

Excel Colour Coding in the Compare Sheet
------------------------------------------
    Orange cell (FFE4B5 = Moccasin)
        A file path highlighted in orange exists in ONE folder only.
        Column B cell is orange  -> file is in A but NOT in B.
        Column C cell is orange  -> file is in B but NOT in A.
    No highlight
        File exists in BOTH folders (same relative path).

Excel Report Structure (Four Sheets)
---------------------------------------
    Sheet       Content
    ----------  ---------------------------------------------------------------
    setup       System info (OS, machine), folder A and B paths, report timestamp.
    a_folder    All files from folder A: Index, Filename, Path, Size, Full Path.
                Full Path cells are clickable Excel hyperlinks (file:// URL).
    b_folder    Same structure for folder B.
    compare     Three columns: Index | A_Folder Files | B_Folder Files.
                A and B columns list relative paths in their original order.
                Cells are coloured orange where a path is missing from the other side.

openpyxl Key Concepts Used
-----------------------------
    Concept                         Usage
    ------------------------------  --------------------------------------------
    openpyxl.Workbook()             Create a new empty Excel workbook in memory.
    wb.remove(wb.active)            Delete the default 'Sheet' that openpyxl
                                    creates automatically.
    wb.create_sheet('name')         Add a named worksheet to the workbook.
    ws.append([values])             Write a list as a new row at the end of the sheet.
    ws.cell(row, column, value)     Write to a specific cell by row/column index.
    Font(bold, color, underline)    Apply text formatting to a cell.
    PatternFill(start_color, ...)   Apply a solid background colour to a cell.
    cell.hyperlink = url            Make the cell a clickable link.
    ws.auto_filter.ref = 'A1:Zn'   Add Excel AutoFilter dropdown to all columns.
    get_column_letter(idx)          Convert 1-based column index to letter: 1->'A'.
    worksheet.column_dimensions[l].width  Set column width in character units.
    wb.save(filepath)               Write the workbook to an .xlsx file on disk.

Column Width Auto-Adjustment
------------------------------
auto_adjust_column_widths() iterates every cell in every column to find the
longest value, then sets the column width to that length plus a 3-character
margin, clamped between 12 and 120 characters:
    adjusted = max(min(max_length + 3, 120), 12)

This prevents columns from being too narrow (unreadable) or too wide (off-screen).

macOS / Linux Hidden File Filtering
--------------------------------------
On macOS and Linux, os.walk() yields hidden files and folders (names starting
with '.', including .DS_Store on macOS).  These are filtered out:

    dirs[:] = [d for d in dirs if not d.startswith('.')]
        In-place list mutation ([:] slice assignment) modifies the dirs list
        that os.walk() uses to decide which subdirectories to descend into.
        Assigning to dirs = [...] (without [:]) would create a new local
        variable and have NO effect on os.walk().

    Files beginning with '.' and the macOS files '.DS_Store' and '.localized'
    are individually skipped during the per-file loop.

Output Folder Layout
---------------------
    <script directory>/
    +-- report/
        +-- 05-compare-YYYY-MM-DD--HH-MM.xlsx

Dependencies
-------------
Standard library plus two third-party packages (install once):
    pip install pandas openpyxl

The if __name__ == '__main__' block checks for these before calling main()
and prints the exact install command for Windows, macOS, and Linux if any
are missing.

Function Summary
-----------------
    Function                    Purpose
    --------------------------  --------------------------------------------------
    get_system_info()           Detect OS, version, and path separator; return dict.
    get_report_folder_path()    Resolve and create the 'report/' output folder.
    open_excel_file()           Open the Excel file with the OS default application.
    select_folder()             GUI folder-picker dialog; return selected path.
    auto_adjust_column_widths() Set each column width based on its longest cell value.
    get_all_files_info()        Walk a folder tree; return list of file metadata dicts.
    create_hyperlink_formula()  Build an OS-correct file:// URL for Excel hyperlinks.
    create_excel_report()       Build and save the four-sheet Excel workbook.
    show_completion_message()   Display a GUI info dialog summarising the report.
    main()                      Orchestrate all steps end-to-end.

Step-by-Step Flow (inside main())
------------------------------------
Step 1  : Detect OS and print environment info.
Step 2  : GUI folder-picker: select folder A.
Step 3  : GUI folder-picker: select folder B.
Step 4  : Set up the report folder and derive the timestamped output filename.
Step 5  : Scan folder A recursively; exit if empty or error.
Step 6  : Scan folder B recursively; exit if empty or error.
Step 7  : Build the four-sheet Excel report and save it.
Step 8  : Open the Excel file in the default application.
Step 9  : Show the GUI completion info dialog.

Notes
-----
- The script does not require admin privileges; it only reads source folders
  and writes to the 'report/' subfolder beside itself.
- openpyxl hyperlinks on macOS use 'file://' (two slashes); Windows uses
  'file:///' (three slashes including the drive letter).
- root.quit() is called before root.destroy() in select_folder() to cleanly
  exit the tkinter event loop before destroying the window object.
- The pandas import is included in the dependency check at the bottom even
  though the current version of this script uses openpyxl directly; it may
  be used in a future version for data aggregation.
"""

__version__    = '1.0.0'
__author__     = 'Yahya Nazer'
__copyright__  = 'Copyright (c) 2025 Chatbizdb.com - Yahya Nazer'
__license__    = 'Proprietary'
__maintainer__ = 'Yahya Nazer'
__email__      = 'contact@chatbizdb.com'
__status__     = 'Production'

# ===========================================================================
# Imports
# ===========================================================================
import os                               # File system operations and path handling
import platform                         # OS detection (system, version, machine)
import subprocess                       # OS-native file launcher (macOS / Linux)
import sys                              # stdout reconfiguration
import tkinter as tk                    # GUI toolkit for dialogs
from tkinter import filedialog, messagebox   # Folder picker and info/error dialogs
import pandas as pd                     # Available for future data aggregation use
from datetime import datetime           # Timestamp generation
import openpyxl                         # Excel .xlsx workbook creation and styling
from openpyxl.utils.dataframe import dataframe_to_rows  # DataFrame -> row iterator
from openpyxl.styles import Font, PatternFill           # Cell text and fill styles
from openpyxl.utils import get_column_letter            # Column index -> letter (1->'A')

# Reconfigure stdout to UTF-8 for correct display of paths with special characters
if hasattr(sys.stdout, 'reconfigure'):
    sys.stdout.reconfigure(encoding='utf-8')


# ===========================================================================
# Function: get_system_info()
# ===========================================================================
def get_system_info() -> dict:
    """
    Detect the current OS, version, machine type, and path separator.

    Returns a dict used throughout the script to make OS-specific decisions:
        - Whether to filter hidden files (macOS / Linux).
        - Which file:// URL format to use in Excel hyperlinks.
        - Which OS launcher to call (os.startfile / open / xdg-open).
        - Which path separator to use in normalisation.

    platform.system()   : 'Windows', 'Darwin' (macOS), or 'Linux'.
    platform.version()  : Detailed OS version string.
    platform.machine()  : Hardware architecture ('x86_64', 'ARM64', etc.).
    os.sep              : Path separator: '\\' on Windows, '/' on Unix.

    Returns:
        dict with keys: 'os', 'os_version', 'machine', 'separator'.
    """
    system_info = {
        'os'        : platform.system(),    # 'Windows', 'Darwin', or 'Linux'
        'os_version': platform.version(),   # Detailed version string
        'machine'   : platform.machine(),   # CPU architecture
        'separator' : os.sep,               # Path separator for this OS
    }

    print(f"[INFO]  Detected OS      : {system_info['os']} "
          f"{system_info['os_version']}")
    print(f"[INFO]  Path separator   : '{system_info['separator']}'")

    return system_info


# ===========================================================================
# Function: get_report_folder_path()
# ===========================================================================
def get_report_folder_path() -> str:
    """
    Resolve and create the 'report/' output folder beside this script.

    The report folder is always located at:
        <directory containing this script>/report/

    Using os.path.abspath(__file__) as the anchor guarantees the same
    location regardless of which directory the terminal is open in when
    the script is run.

    os.makedirs(path) is called without exist_ok=True here; the if/else
    checks existence first so the print message is accurate.

    Returns:
        str: Full absolute path of the 'report/' folder.
    """
    script_dir = os.path.dirname(os.path.abspath(__file__))
    report_dir = os.path.join(script_dir, 'report')

    if not os.path.exists(report_dir):
        os.makedirs(report_dir)
        print(f'[INFO]  Created report directory: {report_dir}')
    else:
        print(f'[INFO]  Using existing report directory: {report_dir}')

    return report_dir


# ===========================================================================
# Function: open_excel_file()
# ===========================================================================
def open_excel_file(file_path: str, system_info: dict) -> None:
    """
    Open the Excel report file using the OS default application.

    OS-specific launchers:
        Windows : os.startfile(path)     -- Windows ShellExecute API.
        macOS   : subprocess ['open']    -- macOS built-in file launcher.
        Linux   : subprocess ['xdg-open']-- freedesktop.org standard launcher.

    On failure (app not found, file not yet flushed to disk) the except
    block prints a manual navigation hint instead of crashing.

    Args:
        file_path   (str):  Full path to the .xlsx file to open.
        system_info (dict): From get_system_info(); provides the 'os' key.
    """
    print(f'[INFO]  Opening Excel file: {file_path}')

    try:
        if system_info['os'] == 'Windows':
            os.startfile(file_path)
        elif system_info['os'] == 'Darwin':             # macOS
            subprocess.run(['open', file_path])
        else:                                           # Linux / other Unix
            subprocess.run(['xdg-open', file_path])

        print('[INFO]  Excel file opened successfully.')

    except Exception as e:
        print(f'[WARN]  Could not open Excel file automatically: {e}')
        print('[INFO]  Please navigate to the report folder and open the file manually.')


# ===========================================================================
# Function: select_folder()
# ===========================================================================
def select_folder(prompt_text: str) -> str | None:
    """
    Open a GUI folder-picker dialog and return the selected folder path.

    Uses tkinter's askdirectory() which opens the OS-native folder browser.
    Returns None if the user cancels.

    root.lift() / root.attributes('-topmost', True)
        Two complementary methods to ensure the dialog appears on top:
        lift() raises the window above other application windows.
        '-topmost' keeps it above ALL windows until destroyed.

    root.quit() before root.destroy()
        quit() cleanly exits the tkinter event loop (if it was started).
        destroy() then frees all window resources.
        Calling destroy() directly without quit() can cause a warning
        in some tkinter versions.

    os.path.normpath(folder_path)
        Normalises the path separators for the current OS.
        tkinter's askdirectory() always returns forward slashes, even on
        Windows.  normpath converts 'C:/Users/...' to 'C:\\Users\\...'
        on Windows so all subsequent os.path operations work correctly.

    Args:
        prompt_text (str): Title bar text displayed on the dialog window.

    Returns:
        str | None: Normalised absolute path of the selected folder, or None.
    """
    root = tk.Tk()
    root.withdraw()                         # Hide blank root window
    root.lift()                             # Raise above sibling windows
    root.attributes('-topmost', True)       # Stay on top of all windows

    folder_path = filedialog.askdirectory(title=prompt_text)

    root.quit()                             # Exit tkinter event loop cleanly
    root.destroy()                          # Free all tkinter resources

    if folder_path:
        # Normalise separators for the current OS
        folder_path = os.path.normpath(folder_path)
        print(f'[INFO]  Selected folder: {folder_path}')
        return folder_path
    else:
        print('[INFO]  No folder selected.')
        return None


# ===========================================================================
# Function: auto_adjust_column_widths()
# ===========================================================================
def auto_adjust_column_widths(worksheet) -> None:
    """
    Set each column's width to fit its longest cell value.

    Algorithm:
        1. Iterate every column in the worksheet.
        2. For each column, find the maximum character length across all cells.
        3. Set the column width to that length + 3 (padding margin).
        4. Clamp the width between 12 (minimum readable) and 120 (maximum
           before horizontal scrolling becomes uncomfortable).

    Width formula:
        adjusted = max(min(max_length + 3, 120), 12)

        min(..., 120)   : Cap at 120 characters to prevent very long paths
                          from making the column impossibly wide.
        max(..., 12)    : Ensure a minimum of 12 characters so short headers
                          like 'Index' still display cleanly.

    get_column_letter(column[0].column)
        column[0] is the first cell of the column (row 1 header).
        .column returns the 1-based column index (1, 2, 3, ...).
        get_column_letter() converts it to the Excel letter ('A', 'B', 'C', ...).

    worksheet.column_dimensions[letter].width
        Sets the column width in Excel's character unit (roughly one character
        wide in the default font).

    The bare 'except: pass' is intentional -- some cell types (merged cells,
    formula cells) raise exceptions when len() is called on their value.
    Skipping them silently is safe because they never contain the longest value.

    Args:
        worksheet: An openpyxl Worksheet object (any sheet).
    """
    for column in worksheet.columns:
        max_length    = 0
        column_letter = get_column_letter(column[0].column)   # e.g. 'A', 'B', 'C'

        for cell in column:
            try:
                if cell.value:
                    cell_length = len(str(cell.value))
                    if cell_length > max_length:
                        max_length = cell_length
            except:
                pass    # Skip merged/formula cells that raise on len()

        # +3 margin; clamp between minimum 12 and maximum 120 characters
        adjusted_width = min(max_length + 3, 120)
        adjusted_width = max(adjusted_width, 12)

        worksheet.column_dimensions[column_letter].width = adjusted_width


# ===========================================================================
# Function: get_all_files_info()
# ===========================================================================
def get_all_files_info(folder_path: str, system_info: dict) -> list[dict]:
    """
    Recursively walk a folder tree and return a list of file metadata dicts.

    macOS / Linux hidden file filtering:
        dirs[:] = [d for d in dirs if not d.startswith('.')]
            Modifies the dirs list IN-PLACE using slice assignment ([:]).
            os.walk() uses the dirs list to decide which subdirectories to
            descend into.  In-place modification is REQUIRED -- assigning to
            'dirs = [...]' (without [:]) creates a new local variable that
            os.walk() never sees, so hidden subfolders would still be visited.

        Files starting with '.' are individually skipped.
        The macOS-specific files '.DS_Store' and '.localized' are also
        explicitly excluded.

    os.path.normpath(path)
        Normalises path separators for the current OS.
        Applied to both the directory path (root) and the full file path
        so all downstream comparisons use consistent separator characters.

    Per-file try/except:
        Catches OSError, IOError (permission denied, broken symlinks), and
        UnicodeDecodeError (filenames with non-decodable byte sequences).
        Problematic files are skipped with a warning; the scan continues.

    Args:
        folder_path (str):  Absolute path of the folder to scan.
        system_info (dict): From get_system_info(); 'os' key controls filtering.

    Returns:
        list[dict]: One dict per file with keys:
            'filename'  (str): File name only (no directory prefix).
            'path'      (str): Normalised directory containing the file.
            'size'      (int): File size in bytes (0 if unreadable).
            'full_path' (str): Normalised absolute path to the file.
    """
    files_info = []
    print(f'[INFO]  Scanning folder: {folder_path}')

    try:
        for root, dirs, files in os.walk(folder_path):

            # Filter hidden subdirectories on macOS and Linux
            # [:] mutates the list in-place so os.walk() respects the change
            if system_info['os'] in ['Darwin', 'Linux']:
                dirs[:] = [d for d in dirs if not d.startswith('.')]

            for file in files:

                try:
                    # Skip hidden files on macOS / Linux
                    if system_info['os'] in ['Darwin', 'Linux'] and file.startswith('.'):
                        continue

                    # Skip macOS metadata artifacts
                    if system_info['os'] == 'Darwin' and file in ['.DS_Store', '.localized']:
                        continue

                    file_path = os.path.join(root, file)

                    # Get size; default to 0 if the file cannot be read
                    try:
                        file_size = os.path.getsize(file_path)
                    except (OSError, IOError):
                        file_size = 0

                    # Normalise separators for cross-platform consistency
                    normalized_root      = os.path.normpath(root)
                    normalized_full_path = os.path.normpath(file_path)

                    file_info = {
                        'filename' : file,
                        'path'     : normalized_root,
                        'size'     : file_size,
                        'full_path': normalized_full_path,
                    }
                    files_info.append(file_info)

                except (OSError, IOError, UnicodeDecodeError) as e:
                    print(f'[WARN]  Could not access file {file}: {e}')
                    continue

    except Exception as e:
        print(f'[ERROR] Error scanning folder {folder_path}: {e}')
        return []

    print(f'[INFO]  Found {len(files_info)} files in {folder_path}')
    return files_info


# ===========================================================================
# Function: create_hyperlink_formula()
# ===========================================================================
def create_hyperlink_formula(file_path: str, system_info: dict) -> str:
    """
    Build an OS-correct file:// URL for use as an Excel cell hyperlink.

    File URL format by OS:
        Windows :  file:///C:/Users/...
                   Three slashes; backslashes replaced with forward slashes.
                   os.sep is '\\' on Windows so replace(os.sep, '/') converts
                   'C:\\Users\\file.txt' to 'C:/Users/file.txt'.

        macOS   :  file:///Users/...
                   Three slashes (two from 'file://' + leading '/' of abs path).

        Linux   :  file:///home/...
                   Same format as macOS.

    When assigned to cell.hyperlink in openpyxl, clicking the cell in Excel
    opens the file with the OS default application.

    Args:
        file_path   (str):  Normalised absolute path to the file.
        system_info (dict): From get_system_info(); provides the 'os' key.

    Returns:
        str: A file:// URL string suitable for openpyxl cell.hyperlink.
    """
    if system_info['os'] == 'Windows':
        # Replace Windows backslashes with forward slashes for URL compatibility
        return f"file:///{file_path.replace(os.sep, '/')}"
    else:
        # macOS and Linux paths start with '/', giving file:///path
        return f'file://{file_path}'


# ===========================================================================
# Function: create_excel_report()
# ===========================================================================
def create_excel_report(a_folder_files: list,
                         b_folder_files: list,
                         output_filepath: str,
                         a_folder_path: str,
                         b_folder_path: str,
                         system_info: dict) -> bool:
    """
    Build and save a four-sheet Excel workbook comparing two folder scans.

    Sheet 1: 'setup'
        System information (OS, machine) and the two folder paths.
        Used for traceability: opening the report later, the user can see
        exactly which folders were compared and when.

    Sheet 2: 'a_folder'
        All files from folder A.  Columns: Index, File Name, Path,
        File Size (bytes), Full Path.  Full Path cells are Excel hyperlinks
        (blue, underlined) that open the file on click.

    Sheet 3: 'b_folder'
        Same structure for folder B.

    Sheet 4: 'compare'
        Side-by-side relative paths.  Column B = A folder paths,
        Column C = B folder paths.  Cells highlighted orange where
        a path exists in one folder but not the other.

    wb.remove(wb.active):
        openpyxl.Workbook() always creates a default sheet named 'Sheet'.
        Removing it before adding named sheets keeps the workbook clean.

    PatternFill(start_color, end_color, fill_type):
        start_color = end_color = 'FFE4B5' (Moccasin / light orange).
        Both colours are the same for a solid fill.
        fill_type='solid' means the colour covers the entire cell background.
        'FFE4B5' is the standard Excel hex colour for a subtle highlight.

    Relative path comparison using sets:
        a_folder_set = set(a_folder_relative_paths)
        b_folder_set = set(b_folder_relative_paths)

        Python set membership test 'x in set' runs in O(1) average time
        (hash-based lookup), making it efficient for large folder scans.
        A nested loop alternative would be O(n * m) which becomes very slow
        when both folders contain thousands of files.

        relative_path = os.path.relpath(full_path, folder_path)
            Produces a path relative to the folder root.
        .replace(os.sep, '/')
            Normalises separators so 'sub\\file.txt' (Windows) and
            'sub/file.txt' (Unix) compare as equal strings.

    Compare sheet row alignment:
        Column B is written first (all A paths, rows 2..N_A+1).
        Column C is written second (all B paths, rows 2..N_B+1).
        If B has more files than A, the rows beyond A's range need an
        Index cell in column A -- handled by the 'else' branch.
        If B has fewer files, rows up to A's range already have an Index.
        This approach writes both columns independently to avoid zipping
        two lists of different lengths.

    ws.auto_filter.ref = 'A1:Zn'
        Adds Excel AutoFilter dropdown arrows to the header row.
        The ref attribute takes a cell range string.
        get_column_letter(ws.max_column) converts the last column index
        to its letter; ws.max_row gives the last row with data.

    Args:
        a_folder_files  (list[dict]): Output of get_all_files_info() for A.
        b_folder_files  (list[dict]): Output of get_all_files_info() for B.
        output_filepath (str):        Full path of the .xlsx file to write.
        a_folder_path   (str):        Absolute path of folder A.
        b_folder_path   (str):        Absolute path of folder B.
        system_info     (dict):       From get_system_info().

    Returns:
        bool: True if the file was saved successfully; False on error.
    """
    print(f'[INFO]  Creating Excel report: {output_filepath}')

    # Create a new workbook and remove the default empty sheet
    wb = openpyxl.Workbook()
    wb.remove(wb.active)    # Remove the auto-created 'Sheet' tab

    # ------------------------------------------------------------------
    # Sheet 1: setup -- system info and folder paths
    # ------------------------------------------------------------------
    print('[INFO]  Creating setup sheet...')
    ws_setup = wb.create_sheet('setup')

    ws_setup['A1'] = 'System Information'
    ws_setup['A1'].font = Font(bold=True)
    ws_setup['A2'] = (f"Operating System: {system_info['os']} "
                      f"{system_info['os_version']}")
    ws_setup['A3'] = f"Machine: {system_info['machine']}"
    ws_setup['A4'] = ''                                         # Blank separator row
    ws_setup['A5'] = 'Compared Folders'
    ws_setup['A5'].font = Font(bold=True)
    ws_setup['A6'] = f'Folder A: {a_folder_path}'
    ws_setup['A7'] = f'Folder B: {b_folder_path}'
    ws_setup['A8'] = ''
    ws_setup['A9'] = (f"Report Generated: "
                      f"{datetime.now().strftime('%Y-%m-%d %H:%M:%S')}")

    auto_adjust_column_widths(ws_setup)
    print('[INFO]  Setup sheet created.')

    # ------------------------------------------------------------------
    # Sheet 2: a_folder -- all files from folder A with hyperlinks
    # ------------------------------------------------------------------
    print('[INFO]  Creating a_folder sheet...')
    ws_a      = wb.create_sheet('a_folder')
    headers_a = ['Index', 'File Name', 'Path', 'File Size (bytes)', 'Full Path']

    ws_a.append(headers_a)
    for cell in ws_a[1]:               # Row 1 = header row
        cell.font = Font(bold=True)    # Bold all header cells

    for i, file_info in enumerate(a_folder_files, start=2):    # Data starts at row 2
        ws_a.cell(row=i, column=1, value=i - 1)                # 1-based index
        ws_a.cell(row=i, column=2, value=file_info['filename'])
        ws_a.cell(row=i, column=3, value=file_info['path'])
        ws_a.cell(row=i, column=4, value=file_info['size'])

        # Full Path column: set as a clickable hyperlink
        full_path  = file_info['full_path']
        cell       = ws_a.cell(row=i, column=5, value=full_path)
        try:
            # cell.hyperlink makes the cell clickable in Excel
            # Font color='0000FF' (blue) and underline='single' give the
            # classic hyperlink appearance
            cell.hyperlink = create_hyperlink_formula(full_path, system_info)
            cell.font      = Font(color='0000FF', underline='single')
        except:
            # If hyperlink creation fails (very long paths, special chars)
            # fall back to plain black text
            cell.font = Font(color='000000')

    auto_adjust_column_widths(ws_a)
    print(f'[INFO]  a_folder sheet: {len(a_folder_files)} rows written.')

    # ------------------------------------------------------------------
    # Sheet 3: b_folder -- all files from folder B with hyperlinks
    # ------------------------------------------------------------------
    print('[INFO]  Creating b_folder sheet...')
    ws_b      = wb.create_sheet('b_folder')
    headers_b = ['Index', 'File Name', 'Path', 'File Size (bytes)', 'Full Path']

    ws_b.append(headers_b)
    for cell in ws_b[1]:
        cell.font = Font(bold=True)

    for i, file_info in enumerate(b_folder_files, start=2):
        ws_b.cell(row=i, column=1, value=i - 1)
        ws_b.cell(row=i, column=2, value=file_info['filename'])
        ws_b.cell(row=i, column=3, value=file_info['path'])
        ws_b.cell(row=i, column=4, value=file_info['size'])

        full_path  = file_info['full_path']
        cell       = ws_b.cell(row=i, column=5, value=full_path)
        try:
            cell.hyperlink = create_hyperlink_formula(full_path, system_info)
            cell.font      = Font(color='0000FF', underline='single')
        except:
            cell.font = Font(color='000000')

    auto_adjust_column_widths(ws_b)
    print(f'[INFO]  b_folder sheet: {len(b_folder_files)} rows written.')

    # ------------------------------------------------------------------
    # Sheet 4: compare -- side-by-side relative paths with colour coding
    # ------------------------------------------------------------------
    print('[INFO]  Creating compare sheet...')
    ws_compare      = wb.create_sheet('compare')
    headers_compare = ['Index', 'A_Folder Files', 'B_Folder Files']

    ws_compare.append(headers_compare)
    for cell in ws_compare[1]:
        cell.font = Font(bold=True)

    # Build relative path lists: strip the folder root, normalise separators
    a_folder_relative_paths = []
    b_folder_relative_paths = []

    for file_info in a_folder_files:
        rel_path = os.path.relpath(file_info['full_path'], a_folder_path)
        rel_path = rel_path.replace(os.sep, '/')   # Normalise to forward slash
        a_folder_relative_paths.append(rel_path)

    for file_info in b_folder_files:
        rel_path = os.path.relpath(file_info['full_path'], b_folder_path)
        rel_path = rel_path.replace(os.sep, '/')
        b_folder_relative_paths.append(rel_path)

    # Convert to sets for O(1) membership tests
    a_folder_set = set(a_folder_relative_paths)
    b_folder_set = set(b_folder_relative_paths)

    # Total row count covers both lists
    max_rows = max(len(a_folder_relative_paths), len(b_folder_relative_paths))

    # Orange fill: highlights cells where the path is MISSING from the other folder
    # FFE4B5 = Moccasin (light orange) -- subtle but clearly visible
    light_orange_fill = PatternFill(
        start_color='FFE4B5',
        end_color  ='FFE4B5',
        fill_type  ='solid'
    )

    # Write Column B (A folder paths) starting at row 2
    for i, relative_path in enumerate(a_folder_relative_paths, start=2):
        ws_compare.cell(row=i, column=1, value=i - 1)      # Index column
        cell = ws_compare.cell(row=i, column=2, value=relative_path)
        # Highlight if this path is NOT present in folder B
        if relative_path not in b_folder_set:
            cell.fill = light_orange_fill

    # Write Column C (B folder paths) starting at row 2
    for i, relative_path in enumerate(b_folder_relative_paths, start=2):
        # If this row already has an A-folder row, just write to column C
        # If B has more files than A, also write the Index in column A
        if i <= len(a_folder_relative_paths) + 1:
            cell = ws_compare.cell(row=i, column=3, value=relative_path)
        else:
            # Rows beyond A's range: add the Index cell too
            ws_compare.cell(row=i, column=1, value=i - 1)
            cell = ws_compare.cell(row=i, column=3, value=relative_path)

        # Highlight if this path is NOT present in folder A
        if relative_path not in a_folder_set:
            cell.fill = light_orange_fill

    print(f'[INFO]  Compare sheet: {max_rows} rows written with colour coding.')

    auto_adjust_column_widths(ws_compare)

    # Apply AutoFilter to the compare sheet header row
    # auto_filter.ref takes an Excel range string: 'A1:Cn' where n = last row
    ws_compare.auto_filter.ref = (
        f'A1:{get_column_letter(ws_compare.max_column)}{ws_compare.max_row}'
    )
    print('[INFO]  AutoFilter applied to compare sheet.')

    # ------------------------------------------------------------------
    # Save the workbook to disk
    # ------------------------------------------------------------------
    try:
        wb.save(output_filepath)
        print(f'[INFO]  Excel file saved: {output_filepath}')
        return True

    except Exception as e:
        print(f'[ERROR] Could not save Excel file: {e}')
        return False


# ===========================================================================
# Function: show_completion_message()
# ===========================================================================
def show_completion_message(output_filename: str,
                             a_folder_count: int,
                             b_folder_count: int,
                             system_info: dict) -> None:
    """
    Display a GUI information dialog summarising the completed comparison.

    messagebox.showinfo() renders a native OS info box with an OK button.
    The message summarises the output filename, OS, file counts, the four
    Excel sheets, and the colour-coding scheme.

    root.lift() and '-topmost' ensure the dialog appears on top of other
    windows.  root.quit() cleanly exits the event loop before destroy().

    Args:
        output_filename  (str):  Just the filename (not full path) for display.
        a_folder_count   (int):  Number of files found in folder A.
        b_folder_count   (int):  Number of files found in folder B.
        system_info      (dict): From get_system_info(); provides 'os' key.
    """
    message = (
        f'Comparison completed successfully!\n\n'
        f'Report saved as: {output_filename}\n'
        f"Operating System: {system_info['os']}\n"
        f'A_folder files: {a_folder_count:,}\n'
        f'B_folder files: {b_folder_count:,}\n\n'
        f'The Excel file contains:\n'
        f"  'setup' sheet   : System info and folder paths\n"
        f"  'a_folder' sheet: All files from the first folder\n"
        f"  'b_folder' sheet: All files from the second folder\n"
        f"  'compare' sheet : Side-by-side comparison with colour coding\n\n"
        f'Colour coding:\n'
        f'  Orange cell = file exists in that folder but NOT in the other.\n\n'
        f'Click file paths in the Excel report to open files directly!'
    )

    root = tk.Tk()
    root.withdraw()
    root.lift()
    root.attributes('-topmost', True)

    messagebox.showinfo('Comparison Complete', message)

    root.quit()
    root.destroy()


# ===========================================================================
# Function: main()
# ===========================================================================
def main() -> None:
    """
    Orchestrate the full folder comparison workflow end-to-end.

    Steps:
        1.  Detect OS and print environment info.
        2.  GUI folder-picker: select folder A.
        3.  GUI folder-picker: select folder B.
        4.  Set up the 'report/' folder and derive the timestamped filename.
        5.  Scan folder A recursively; exit if empty or error.
        6.  Scan folder B recursively; exit if empty or error.
        7.  Build the four-sheet Excel report and save it.
        8.  Open the Excel file in the OS default application.
        9.  Show the GUI completion info dialog.
    """
    # Step 1 -- OS detection and banner
    system_info = get_system_info()
    print('=' * 70)
    print('CROSS-PLATFORM FOLDER COMPARISON TOOL')
    print(f"Running on: {system_info['os']} {system_info['os_version']}")
    print('=' * 70)

    # Step 2 -- Select folder A
    print('\n[STEP 2] Select the first folder (a_folder)...')
    a_folder = select_folder('Select the first folder (a_folder)')
    if not a_folder:
        print('[INFO]  No folder selected. Exiting.')
        return

    # Step 3 -- Select folder B
    print('\n[STEP 3] Select the second folder (b_folder)...')
    b_folder = select_folder('Select the second folder (b_folder)')
    if not b_folder:
        print('[INFO]  No folder selected. Exiting.')
        return

    # Step 4 -- Set up report folder and output filename
    print('\n[STEP 4] Setting up report folder and output file...')
    report_folder   = get_report_folder_path()
    timestamp       = datetime.now().strftime('%Y-%m-%d--%H-%M')
    output_filename = f'05-compare-{timestamp}.xlsx'
    output_filepath = os.path.join(report_folder, output_filename)
    print(f'[INFO]  Output file: {output_filepath}')

    # Step 5 -- Scan folder A
    print('\n[STEP 5] Scanning a_folder...')
    a_folder_files = get_all_files_info(a_folder, system_info)
    if not a_folder_files:
        print('[ERROR] No files found in a_folder or an error occurred.')
        return

    # Step 6 -- Scan folder B
    print('\n[STEP 6] Scanning b_folder...')
    b_folder_files = get_all_files_info(b_folder, system_info)
    if not b_folder_files:
        print('[ERROR] No files found in b_folder or an error occurred.')
        return

    # Step 7 -- Create the Excel report
    print('\n[STEP 7] Creating Excel report...')
    success = create_excel_report(
        a_folder_files, b_folder_files,
        output_filepath,
        a_folder, b_folder,
        system_info
    )

    if success:
        print('\n' + '=' * 70)
        print('[DONE]  Process completed successfully!')
        print('=' * 70)
        print(f'[INFO]  Report    : {output_filepath}')
        print(f'[INFO]  A files   : {len(a_folder_files):,}')
        print(f'[INFO]  B files   : {len(b_folder_files):,}')
        print(f"[INFO]  Platform  : {system_info['os']}")

        # Step 8 -- Open the Excel file
        print('\n[STEP 8] Opening Excel file...')
        open_excel_file(output_filepath, system_info)

        # Step 9 -- Show GUI completion dialog
        show_completion_message(
            output_filename,
            len(a_folder_files),
            len(b_folder_files),
            system_info
        )

    else:
        print('\n[ERROR] Failed to create Excel report.')
        root = tk.Tk()
        root.withdraw()
        messagebox.showerror(
            'Error',
            'Failed to create Excel report.\nCheck the console for details.'
        )
        root.destroy()


# ===========================================================================
# Entry-point guard with dependency check
#
# The dependency check runs BEFORE main() so a clear install message is shown
# if pandas or openpyxl are missing, rather than a raw ImportError traceback.
#
# Why check here and not at the top of the file?
#   Checking at the top would crash immediately on import with an unhelpful
#   error.  Checking here allows the script to print OS-specific install
#   instructions and wait for the user to read them (input('Press Enter...')).
# ===========================================================================
if __name__ == '__main__':
    try:
        # Re-import inside the guard to confirm all packages are importable
        import tkinter as tk
        from tkinter import filedialog, messagebox
        import pandas as pd
        import openpyxl
        import platform
        print('[INFO]  All required modules are available.')

    except ImportError as e:
        print(f'[ERROR] Missing required module: {e}')
        print('\nPlease install the required packages:')

        # OS-specific install instructions
        if platform.system() == 'Windows':
            print('    pip install pandas openpyxl')
        elif platform.system() == 'Darwin':             # macOS
            print('    pip3 install pandas openpyxl')
            print('    or: python3 -m pip install pandas openpyxl')
        else:                                           # Linux
            print('    pip3 install pandas openpyxl')
            print('    or: sudo apt-get install python3-pandas python3-openpyxl')

        input('\nPress Enter to exit...')
        exit(1)

    main()
