"""
Chapter-01-First-Analysis - First Analysis in VS Code
======================================================
File    : Session1-First-Analysis-V04.py
Version : V04
Date    : 2026-05-13

Objective
---------
Load a CSV file chosen interactively by the user through a file-picker dialog,
compute basic descriptive statistics and the Pearson correlation between
Height and Weight, then save the results to an output folder two levels above
this script.

This version introduces tkinter -- Python's built-in GUI toolkit.
tkinter lets you create small desktop dialog windows (file pickers, folder
pickers, message boxes) without installing any extra packages.

See R02-Pathlib-Summary.pdf in the references folder for path guidance.

What Is tkinter?
----------------
tkinter is included with every standard Python installation on Windows and
macOS. On some Linux systems you may need to install it separately:
    sudo apt install python3-tk          # Debian / Ubuntu
    sudo dnf install python3-tkinter     # Fedora / RHEL

The pattern used in this script:
    root = Tk()             <- Create an invisible root window (required).
    root.withdraw()         <- Hide it immediately so only the dialog appears.
    path = filedialog.askopenfilename(...)  <- Open the file picker dialog.
    root.destroy()          <- Clean up the window after the user picks a file.

tkinter  vs  hardcoded path  Quick Comparison
-----------------------------------------------
    Version   How the input file is specified
    -------   -------------------------------------------------------
    V02       Hardcoded relative string  ../../A-Data/Height-Weight.csv
    V03       Hardcoded pathlib Path     project_root / 'A-Data' / 'Height-Weight.csv'
    V04       Interactive dialog         User browses and clicks the file at runtime

Expected Folder Layout
----------------------
    (root)/
    |
    +-- A-Data/
    |   +-- Height-Weight.csv               <- Suggested input (user may choose any CSV)
    |
    +-- Chapter-01-First-Analysis/
    |   +-- S01-First-Analysis/
    |       +-- Session1-First-Analysis-V04.py   <- THIS script
    |
    +-- C-Results/
        +-- Summary-Session-V04.csv         <- Output produced here

Required Input File Columns
-----------------------------
The CSV file chosen by the user must contain at least these two columns:
    Height_cm   (numeric, centimetres)
    Weight_kg   (numeric, kilograms)

Column names are case-sensitive. Any extra columns are ignored.

Sample file content  (Height-Weight.csv):
    Name,Height_cm,Weight_kg
    Ali,175,70
    Sara,160,55
    John,180,75
    Mei,165,60
    Luis,170,68

Quick Start
-----------
1. Open VS Code and open the S01-First-Analysis folder.
2. Open Session1-First-Analysis-V04.py.
3. Run (Ctrl+F5, or right-click -> Run Python File in Terminal).
4. A file-picker dialog will open -- browse to Height-Weight.csv and click Open.
5. Read the console output for a data preview and statistics.
6. Open C-Results/Summary-Session-V04.csv to inspect the saved results.

Step-by-Step Flow
-----------------
Step 1  : Reconfigure stdout/stderr to UTF-8 for cross-platform printing.
Step 2  : Import required libraries: sys, pathlib.Path, pandas, tkinter.
Step 3  : Define pick_csv() -- opens a file-picker dialog and returns a Path.
Step 4  : Resolve the project root (two levels above this script) using pathlib.
Step 5  : Build the output folder path and create it if it does not exist.
Step 6  : Call pick_csv() to let the user choose the input CSV file.
Step 7  : Validate that the user selected a file (exit gracefully if not).
Step 8  : Load the chosen CSV into a pandas DataFrame.
Step 9  : Validate that the required columns exist in the loaded file.
Step 10 : Print the first five rows as a quick data preview.
Step 11 : Compute mean and standard deviation for Height_cm and Weight_kg.
Step 12 : Print the summary statistics to the console.
Step 13 : Compute the Pearson correlation between the two numeric columns.
Step 14 : Print the correlation value and a plain-language interpretation.
Step 15 : Assemble all results into a small summary DataFrame.
Step 16 : Save the summary DataFrame as Summary-Session-V04.csv.
Step 17 : Print a success message showing the full output path.

Output File  (Summary-Session-V04.csv)
----------------------------------------
    Metric,Value
    Mean Height (cm),<value>
    Std Dev Height (cm),<value>
    Mean Weight (kg),<value>
    Std Dev Weight (kg),<value>
    Pearson Correlation,<value>

Notes
-----
- Column names Height_cm and Weight_kg are case-sensitive.
  Update COLUMN_HEIGHT and COLUMN_WEIGHT below if your CSV uses different names.
- If tkinter is unavailable (some headless Linux servers), the pick_csv()
  function prints a clear message and returns None; the script then exits cleanly.
- Path.resolve() converts any relative or symlinked path to a true absolute path.
- parents[1] is the grandparent folder: parents[0] = direct parent,
  parents[1] = grandparent (two levels up).
- The script uses raise SystemExit(code) instead of sys.exit() for clean exits
  that do not produce a traceback.
"""

# ===========================================================================
# Step 1 - Reconfigure stdout and stderr to UTF-8
#          Prevents garbled or missing characters on Windows terminals.
#          The hasattr() guard keeps this safe on older Python builds.
# ===========================================================================
import sys

if hasattr(sys.stdout, 'reconfigure'):
    sys.stdout.reconfigure(encoding='utf-8')
if hasattr(sys.stderr, 'reconfigure'):
    sys.stderr.reconfigure(encoding='utf-8')

# ===========================================================================
# Step 2 - Import required libraries
# ===========================================================================
from pathlib import Path           # Modern object-oriented path handling (built-in)
import pandas as pd                # DataFrame for data loading and analysis

# ---------------------------------------------------------------------------
# Step 2b - Import tkinter components (built-in GUI toolkit)
#           Tk          -> the root window class (must exist before any dialog)
#           filedialog  -> provides askopenfilename() for the file-picker dialog
#
#           The try/except safely handles environments where tkinter is not
#           installed (e.g. minimal Linux servers). Both names are set to None
#           so the pick_csv() function below can detect and report the problem.
# ---------------------------------------------------------------------------
try:
    from tkinter import Tk, filedialog
except Exception:
    Tk         = None   # tkinter unavailable -- pick_csv() will handle this
    filedialog = None

# ===========================================================================
# Step 3 - Define pick_csv() -- the file-picker dialog function
#
#  PURPOSE:
#      Opens a native OS file-picker dialog filtered to .csv files.
#      Returns the selected file as a pathlib.Path, or None if cancelled.
#
#  PARAMETERS:
#      title (str) : Text shown in the title bar of the dialog window.
#                    Defaults to a helpful prompt describing what to select.
#
#  RETURNS:
#      Path  -> absolute path of the chosen file if the user clicked Open.
#      None  -> if the user cancelled, closed the dialog, or tkinter is missing.
#
#  TKINTER PATTERN USED:
#      root = Tk()             Create an invisible root window (tkinter requires
#                              one before any dialog can be opened).
#      root.withdraw()         Hide it immediately so the user only sees the
#                              file picker, not a blank window behind it.
#      filedialog.askopenfilename(...)
#                              Opens the OS-native file picker and blocks until
#                              the user makes a choice. Returns the selected
#                              path as a plain string, or '' if cancelled.
#      root.destroy()          Release all tkinter resources after the dialog
#                              closes. Always call this to avoid memory leaks.
# ===========================================================================
def pick_csv(title: str = 'Select CSV file (e.g., Height-Weight.csv)') -> (Path | None):
    """
    Open a file-picker dialog and return the chosen CSV file as a Path object.

    Steps:
        1. Check whether tkinter is available; print a message and return None if not.
        2. Create and immediately hide an invisible tkinter root window.
        3. Open the OS file-picker dialog filtered to .csv files.
        4. Destroy the root window to release GUI resources.
        5. Return the selected path as a Path object, or None if cancelled.

    Args:
        title (str): Title bar text shown on the dialog window.

    Returns:
        Path | None: Absolute path of the selected file, or None if no file
                     was chosen or tkinter is unavailable.
    """
    # Step 3a - Guard: tkinter unavailable on this system
    if Tk is None:
        print('[WARN] tkinter is not available on this system.')
        print('[WARN] Please run the script on a desktop with tkinter installed.')
        return None

    # Step 3b - Create the invisible root window (required by tkinter)
    root = Tk()
    root.withdraw()                         # Hide immediately -- no blank window shown

    # Step 3c - Open the file-picker dialog
    #           filetypes filters the dialog to show only .csv files by default.
    #           The user can switch to 'All files' to pick any file type.
    #           askopenfilename() returns a string path, or '' if cancelled.
    path = filedialog.askopenfilename(
        title     = title,
        filetypes = [
            ('CSV files', '*.csv'),         # Default filter: show only .csv
            ('All files', '*.*'),           # Fallback: show everything
        ]
    )

    # Step 3d - Always destroy the root window to free tkinter resources
    root.destroy()

    # Step 3e - Return a Path object if a file was chosen, otherwise None
    return Path(path) if path else None     # path == '' means user cancelled


# ===========================================================================
# Step 4 - Resolve path objects using pathlib
#
#  Path(__file__).resolve()       -> absolute path of THIS script file
#  .parents[1]                    -> two levels up (grandparent folder)
#                                    parents[0] = direct parent (S01-First-Analysis/)
#                                    parents[1] = grandparent   (root/)
#
#  The / operator joins Path segments (not arithmetic division):
#      project_root / 'C-Results'  ->  (root)/C-Results/
# ===========================================================================
project_root = Path(__file__).resolve().parents[1]
# e.g. C:\...\(root)\

print(f'[DEBUG] project_root : {project_root}')

# ===========================================================================
# Step 5 - Build the output folder path and create it if it does not exist
#          parents=True  -> create any missing parent folders as well.
#          exist_ok=True -> do nothing silently if the folder already exists.
# ===========================================================================
OUTPUT_FILENAME = 'Summary-Session-V04.csv'     # Output CSV filename

out_folder = project_root / 'C-Results'         # e.g. (root)/C-Results/
out_folder.mkdir(parents=True, exist_ok=True)   # Create folder if missing
out_file   = out_folder / OUTPUT_FILENAME        # Full path for the output CSV

print(f'[DEBUG] out_folder   : {out_folder}')
print(f'[DEBUG] out_file     : {out_file}')

# ===========================================================================
# Step 6 - Call pick_csv() to let the user choose the input CSV file
#          The dialog opens at this point; script execution pauses until
#          the user makes a selection or cancels the dialog.
# ===========================================================================
print('\n[INFO] Opening file picker dialog...')
csv_path = pick_csv()
print(f'[DEBUG] csv_path     : {csv_path}')

# ===========================================================================
# Step 7 - Validate that the user selected a file
#          pick_csv() returns None if the user cancelled or tkinter is missing.
#          raise SystemExit(1) exits immediately with error code 1 and no
#          traceback -- cleaner than letting the script crash on the next line.
# ===========================================================================
if not csv_path:
    print('[ERROR] No file selected. Exiting.')
    raise SystemExit(1)

print(f'[INFO] Selected file : {csv_path}')

# ===========================================================================
# Step 8 - Load the chosen CSV into a pandas DataFrame
#          encoding='utf-8' handles accented or special characters in names.
#          pathlib Path objects are accepted directly by pd.read_csv().
# ===========================================================================
print('\n[INFO] Loading data from CSV...')
df = pd.read_csv(csv_path, encoding='utf-8')    # df = main data table
print(f'[DEBUG] Rows loaded    : {len(df)}')
print(f'[DEBUG] Columns found  : {list(df.columns)}')

# ===========================================================================
# Step 9 - Validate that required columns exist in the loaded file
#
#          required_cols is a Python set.
#          set.issubset(df.columns) returns True if EVERY element of
#          required_cols is present in the DataFrame column list.
#          If any column is missing, we print a clear message and exit with
#          error code 2 (different from code 1 = no file selected).
# ===========================================================================
COLUMN_HEIGHT  = 'Height_cm'                    # Exact column name for height
COLUMN_WEIGHT  = 'Weight_kg'                    # Exact column name for weight
required_cols  = {COLUMN_HEIGHT, COLUMN_WEIGHT} # Set of columns that must exist

if not required_cols.issubset(df.columns):
    print(f'[ERROR] Missing required columns: {required_cols}')
    print(f'[ERROR] Columns found in file  : {list(df.columns)}')
    print('[ERROR] Check column names and try again. Exiting.')
    raise SystemExit(2)

print(f'[INFO] Required columns found: {required_cols}')

# ===========================================================================
# Step 10 - Print a quick data preview (first 5 rows)
#           Confirms the data loaded correctly before computing anything.
# ===========================================================================
print('\n=== Data Preview (first 5 rows) ===')
print(df.head())

# ===========================================================================
# Step 11 - Compute descriptive statistics for Height and Weight
#
#           .mean()  -> arithmetic average of all values in the column
#           .std()   -> sample standard deviation (pandas default: ddof=1)
#                       ddof=1 divides by (n-1) rather than n, giving an
#                       unbiased estimate when working with a sample.
# ===========================================================================
mean_height = df[COLUMN_HEIGHT].mean()     # Average height across all rows
mean_weight = df[COLUMN_WEIGHT].mean()     # Average weight across all rows
std_height  = df[COLUMN_HEIGHT].std()      # Height spread / variability
std_weight  = df[COLUMN_WEIGHT].std()      # Weight spread / variability

# ===========================================================================
# Step 12 - Print the summary statistics
# ===========================================================================
print('\n=== Summary Statistics ===')
print(f'  Average Height : {mean_height:.1f} cm')
print(f'  Std Dev Height : {std_height:.1f} cm')
print(f'  Average Weight : {mean_weight:.1f} kg')
print(f'  Std Dev Weight : {std_weight:.1f} kg')

# ===========================================================================
# Step 13 - Compute the Pearson correlation coefficient
#
#           .corr() measures the linear relationship between two numeric Series.
#           Formula:  r = cov(X, Y) / (std(X) * std(Y))
#
#           Result range and meaning:
#               r >= 0.8   -> strong positive correlation
#               r >= 0.5   -> moderate positive correlation
#               r >= 0.0   -> weak or no positive correlation
#               r <  0.0   -> negative correlation (taller -> lighter)
# ===========================================================================
corr = df[COLUMN_HEIGHT].corr(df[COLUMN_WEIGHT])   # Pearson r (default method)

# ===========================================================================
# Step 14 - Print the correlation and a plain-language interpretation
# ===========================================================================
print(f'\n  Pearson Correlation (Height vs Weight) : {corr:.2f}')

if corr >= 0.8:
    print('  [INFO] Strong positive correlation.')
elif corr >= 0.5:
    print('  [INFO] Moderate positive correlation.')
elif corr >= 0.0:
    print('  [INFO] Weak or no positive correlation.')
else:
    print('  [INFO] Negative correlation detected.')

# ===========================================================================
# Step 15 - Assemble all results into a compact summary DataFrame
#           Two columns: Metric (label) and Value (rounded number).
#           round(..., 4) keeps the CSV tidy without losing meaningful precision.
# ===========================================================================
summary = pd.DataFrame({
    'Metric': [
        'Mean Height (cm)',
        'Std Dev Height (cm)',
        'Mean Weight (kg)',
        'Std Dev Weight (kg)',
        'Pearson Correlation',
    ],
    'Value': [
        round(mean_height, 4),
        round(std_height,  4),
        round(mean_weight, 4),
        round(std_weight,  4),
        round(corr,        4),
    ]
})

print('\n=== Results Table ===')
print(summary.to_string(index=False))          # Print cleanly without row numbers

# ===========================================================================
# Step 16 - Save the summary DataFrame to CSV
#           index=False omits the automatic integer row-number column.
#           encoding='utf-8' ensures the file opens correctly everywhere.
#           out_file is a Path object; to_csv() accepts it directly.
# ===========================================================================
summary.to_csv(out_file, index=False, encoding='utf-8')

# ===========================================================================
# Step 17 - Print a success message with the full output file path
# ===========================================================================
print(f'A-Data    - Where are input data files are stored')
print(f'B-Engines - Where are python code is stored')
print(f'C-Results - Where the results of the program are stored')
print(f'\n[DONE] Results saved to: {out_file}')
