diff --git a/README.md b/README.md index 8d6edea..458a712 100644 --- a/README.md +++ b/README.md @@ -14,7 +14,7 @@ REL01_ADDITIONS MARS-826 -- AKtualnie pracuje nad: - MARS-828 + MARS-828s -- Poniżej czeka na wdrożenie REL03 @@ -69,8 +69,8 @@ sql "ADMIN/Cloudpass#34@ggmichalski_high" "@rollback_mars835_prehook.sql" cd .\MARS_Packages\REL03\MARS-1057 -sql "ADMIN/Cloudpass#34@ggmichalski_high" "@install_mars1057.sql" -sql "ADMIN/Cloudpass#34@ggmichalski_high" "@rollback_mars1057.sql" +echo 'yes' | sql "ADMIN/Cloudpass#34@ggmichalski_high" "@install_mars1057.sql" +echo 'yes' | sql "ADMIN/Cloudpass#34@ggmichalski_high" "@rollback_mars1057.sql" 7z a -pMojeSuperHaslo#123 -mhe=on M1057_arch.7z MARS-1057 diff --git a/confluence/FILE_ARCHIVER_Guide.md b/confluence/FILE_ARCHIVER_Guide.md index 2e9bd6b..18ad5e0 100644 --- a/confluence/FILE_ARCHIVER_Guide.md +++ b/confluence/FILE_ARCHIVER_Guide.md @@ -10,7 +10,6 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate - **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID - **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG -- **Backward Compatible**: Default THRESHOLD_BASED strategy maintains existing behavior - **Validation**: Automatic validation of strategy-specific configuration requirements - **OCI Integration**: Works seamlessly with DBMS_CLOUD operations via cloud_wrapper @@ -18,7 +17,7 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate - **Schema**: CT_MRDS - **Package**: FILE_ARCHIVER -- **Current Version**: 3.2.0 +- **Current Version**: 3.3.0 - **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY ### Critical Prerequisites @@ -38,7 +37,7 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate | Strategy | WHERE Clause Logic | Configuration Required | Primary Use Case | |----------|-------------------|----------------------|------------------| -| `THRESHOLD_BASED` | Days since workflow start > threshold | DAYS_FOR_ARCHIVE_THRESHOLD | Legacy compatibility, simple time-based archival | +| `THRESHOLD_BASED` | Days since workflow start > threshold | DAYS_FOR_ARCHIVE_THRESHOLD | Simple time-based archival | | `MINIMUM_AGE_MONTHS` | Archive data older than X months (0=current month only) | MINIMUM_AGE_MONTHS (≥0) | All sources - flexible retention (0 for LM, 6 for CSDB) | | `HYBRID` | Combines month boundary + minimum age | MINIMUM_AGE_MONTHS | Advanced retention scenarios | @@ -62,11 +61,11 @@ WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'C2D_TABLE'; ``` -**Use Case**: Simple time-based archival, backward compatible with FILE_ARCHIVER v2.0.0 behavior. +**Use Case**: Simple time-based archival. ### 2. MINIMUM_AGE_MONTHS -Archives data older than specified number of months. **Special case**: MINIMUM_AGE_MONTHS = 0 archives all data before current month (replaces deprecated CURRENT_MONTH_ONLY strategy). +Archives data older than specified number of months. **Special case**: MINIMUM_AGE_MONTHS = 0 archives all data before current month. **WHERE Clause**: ```sql @@ -132,6 +131,60 @@ WHERE SOURCE_FILE_TYPE = 'INPUT' **Use Case**: Advanced scenarios requiring both current month retention AND minimum age threshold. +## Archival Triggering Logic + +### Strategy-Specific Execution Behavior + +The FILE_ARCHIVER package uses **different triggering logic** depending on the configured archival strategy: + +#### MINIMUM_AGE_MONTHS Strategy (Threshold-Independent) + +**Behavior**: Archives data **immediately** when age criteria is met, **without checking** archival thresholds. + +```sql +-- Executed when MINIMUM_AGE_MONTHS strategy is configured +IF vSourceFileConfig.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' THEN + vArchivalTriggeredBy := 'AGE_BASED'; + -- Proceeds with archival regardless of FILES_COUNT, ROWS_COUNT, or BYTES_SUM +END IF; +``` + +**Why**: This strategy is designed for **strict retention policies** where data **must** be archived based on age alone (e.g., regulatory compliance requiring current month only). + +#### THRESHOLD_BASED and HYBRID Strategies (Threshold-Dependent) + +**Behavior**: Archives data **only when** at least one of the following thresholds is exceeded: + +1. **FILES_COUNT_OVER_ARCHIVE_THRESHOLD** - Number of files eligible for archival +2. **ROWS_COUNT_OVER_ARCHIVE_THRESHOLD** - Number of rows eligible for archival +3. **BYTES_SUM_OVER_ARCHIVE_THRESHOLD** - Total size in bytes eligible for archival + +```sql +-- Executed for THRESHOLD_BASED and HYBRID strategies +IF vTableStat.OVER_ARCH_THRESOLD_FILE_COUNT >= vSourceFileConfig.FILES_COUNT_OVER_ARCHIVE_THRESHOLD THEN + vArchivalTriggeredBy := 'FILES_COUNT'; +ELSIF vTableStat.OVER_ARCH_THRESOLD_ROW_COUNT >= vSourceFileConfig.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD THEN + vArchivalTriggeredBy := 'ROWS_COUNT'; +ELSIF vTableStat.OVER_ARCH_THRESOLD_SIZE >= vSourceFileConfig.BYTES_SUM_OVER_ARCHIVE_THRESHOLD THEN + vArchivalTriggeredBy := 'BYTES_SUM'; +END IF; +``` + +**Why**: These strategies provide **performance optimization** by avoiding unnecessary archival operations when data volume is small. + +**Configuration Example**: +```sql +-- Set archival thresholds for THRESHOLD_BASED strategy +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET FILES_COUNT_OVER_ARCHIVE_THRESHOLD = 10, -- Archive when 10+ files eligible + ROWS_COUNT_OVER_ARCHIVE_THRESHOLD = 100000, -- Archive when 100k+ rows eligible + BYTES_SUM_OVER_ARCHIVE_THRESHOLD = 104857600 -- Archive when 100MB+ eligible +WHERE ARCHIVAL_STRATEGY = 'THRESHOLD_BASED' + AND TABLE_ID = 'YOUR_TABLE'; +``` + +**Important**: For **MINIMUM_AGE_MONTHS** strategy, these threshold values are **ignored** - archival proceeds based on age alone. + ## Configuration Validation ### Validation Trigger @@ -158,8 +211,132 @@ WHERE ...; -- Error: ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set ``` +## Archival Control Configuration + +### ARCHIVE_ENABLED Column + +Controls whether archival is enabled for specific table configuration. + +**Column**: `A_SOURCE_FILE_CONFIG.ARCHIVE_ENABLED` (VARCHAR2(1), DEFAULT 'Y') + +**Values**: +- `'Y'` (default) - Table is eligible for archival processing +- `'N'` - Table is excluded from archival (batch operations skip this config) + +**Use Cases**: +- Disable archival for specific tables without removing configuration +- Temporarily suspend archival during data migration or troubleshooting +- Selective archival in batch operations + +**Configuration Example**: +```sql +-- Disable archival for specific table +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET ARCHIVE_ENABLED = 'N' +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'CSDB' + AND TABLE_ID = 'CSDB_DEBT'; +COMMIT; + +-- Re-enable archival +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET ARCHIVE_ENABLED = 'Y' +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'CSDB' + AND TABLE_ID = 'CSDB_DEBT'; +COMMIT; + +-- Check archival status +SELECT + SOURCE_FILE_ID, + TABLE_ID, + ARCHIVE_ENABLED, + ARCHIVAL_STRATEGY +FROM CT_MRDS.A_SOURCE_FILE_CONFIG +WHERE SOURCE_FILE_TYPE = 'INPUT' +ORDER BY SOURCE_FILE_ID, TABLE_ID; +``` + +### KEEP_IN_TRASH Column + +Controls TRASH folder retention policy for archived files. + +**Column**: `A_SOURCE_FILE_CONFIG.KEEP_IN_TRASH` (VARCHAR2(1), DEFAULT 'Y') + +**Values**: +- `'Y'` (default) - CSV files kept in TRASH folder after archival (status: ARCHIVED_AND_TRASHED) +- `'N'` - CSV files deleted from TRASH folder after archival (status: ARCHIVED_AND_PURGED) + +**Benefits of TRASH Retention (TRUE)**: +- Safety net for rollback if archival issues discovered +- Supports compliance and audit requirements +- Enables file restoration via `RESTORE_FILE_FROM_TRASH` procedure + +**Benefits of TRASH Cleanup (FALSE)**: +- Reduces storage costs in DATA bucket +- Simplifies bucket management +- Appropriate for non-critical or test data + +**Configuration Example**: +```sql +-- Production: Keep files in TRASH (recommended) +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET KEEP_IN_TRASH = 'Y' +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'LM' + AND TABLE_ID LIKE 'LM_%'; +COMMIT; + +-- Test environment: Cleanup TRASH to save storage +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET KEEP_IN_TRASH = 'N' +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'TEST_SOURCE'; +COMMIT; + +-- Bulk configuration by source +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET KEEP_IN_TRASH = 'Y' +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID IN ('CSDB', 'C2D', 'LM'); +COMMIT; +``` + ## Data Lifecycle Workflow +### Status Tracking in A_SOURCE_FILE_RECEIVED + +The FILE_ARCHIVER tracks file lifecycle through the `PROCESSING_STATUS` column in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table: + +**Status Progression**: +``` +INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional) + ↓ + INGESTED (via RESTORE_FILE_FROM_TRASH) +``` + +**Status Descriptions**: +- **INGESTED**: File successfully processed through Airflow+DBT, residing in ODS bucket +- **ARCHIVED_AND_TRASHED**: File archived to Parquet in ARCHIVE bucket, CSV retained in TRASH folder (DATA bucket) +- **ARCHIVED_AND_PURGED**: File archived to Parquet, CSV deleted from TRASH folder (when KEEP_IN_TRASH='N') + +**Associated Columns Updated During Archival**: +```sql +UPDATE CT_MRDS.A_SOURCE_FILE_RECEIVED + SET PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED', -- Status change + ARCH_PATH = 'archive_directory_prefix/', -- Directory with Parquet files + PARTITION_YEAR = 2026, -- Year partition value + PARTITION_MONTH = 02 -- Month partition value + WHERE SOURCE_FILE_NAME = 'file.csv'; +``` + +**ARCH_PATH Column**: Contains the **directory prefix** (URI) where archived Parquet files are located in the ARCHIVE bucket. Since `DBMS_CLOUD.EXPORT_DATA` may create multiple Parquet files with parallel execution, the system stores the directory location rather than individual filenames. + +**Example ARCH_PATH**: +``` +https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/ +``` + ### Standard File Processing Flow ``` @@ -183,9 +360,9 @@ WHERE ...; 2.1 TRASH Subfolder (DATA Bucket - File Retention) ├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME) ├─ Stores CSV files after archival to Parquet - ├─ Status: ARCHIVED_AND_TRASHED (default retention) + ├─ Status: ARCHIVED_AND_TRASHED (default, controlled by KEEP_IN_TRASH config) ├─ Enables rollback if archival issues occur - └─ Optional cleanup: ARCHIVED_AND_PURGED (pKeepInTrash=FALSE) + └─ Optional cleanup: ARCHIVED_AND_PURGED (when KEEP_IN_TRASH = 'N') 3. ARCHIVE Bucket (Long-term Storage) ├─ Historical data in Parquet format @@ -194,29 +371,48 @@ WHERE ...; └─ Optimized for big data analytics (Spark, Hive) **Key Procedures**: -- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey, pKeepInTrash)` - Main archival procedure using strategy-specific WHERE clause - - `pKeepInTrash` (BOOLEAN, DEFAULT TRUE) - Controls TRASH folder retention - - TRUE: Files kept in TRASH folder for safety and rollback capability (default) - - FALSE: Files deleted from TRASH folder after successful archival +- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey)` - Main archival procedure using strategy-specific WHERE clause + - TRASH folder retention controlled by `KEEP_IN_TRASH` column in A_SOURCE_FILE_CONFIG +- `ARCHIVE_ALL(pSourceFileConfigKey, pSourceKey, pArchiveAll)` - Batch archival with 3-level granularity and error handling + - **Level 3 (Highest Priority)**: Single configuration via `pSourceFileConfigKey` + - **Level 2 (Medium Priority)**: All configurations for source via `pSourceKey` + - **Level 1 (Lowest Priority)**: All configurations system-wide via `pArchiveAll` + - **Error Handling**: Continues processing other tables on individual failures + - **Filtering**: Respects `ARCHIVE_ENABLED='Y'` (skips disabled configurations) + - **Individual TRASH Policy**: Each table's `KEEP_IN_TRASH` setting applied independently + - **Summary Reporting**: Returns counts of Archived/Skipped/Failed tables - `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy - `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic +- `GATHER_TABLE_STAT_ALL(pSourceFileConfigKey, pSourceKey, pGatherAll)` - Batch statistics with 3-level granularity +- `RESTORE_FILE_FROM_TRASH(pSourceFileConfigKey, pSourceKey, pRestoreAll)` - Restore archived files from TRASH +- `PURGE_TRASH_FOLDER(pSourceFileConfigKey, pSourceKey, pPurgeAll)` - Purge TRASH folder with 3-level granularity **Archival Execution**: ```sql --- Default behavior: Keep files in TRASH folder (ARCHIVED_AND_TRASHED status) +-- Single table archival (TRASH retention controlled by KEEP_IN_TRASH config) BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA( - pSourceFileConfigKey => vSourceFileConfigKey, - pKeepInTrash => TRUE -- DEFAULT value + pSourceFileConfigKey => vSourceFileConfigKey ); END; / --- Optional: Delete files from TRASH after archival (ARCHIVED_AND_PURGED status) +-- Batch archival: All tables for specific source BEGIN - CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA( - pSourceFileConfigKey => vSourceFileConfigKey, - pKeepInTrash => FALSE -- Cleanup TRASH folder + CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL( + pSourceFileConfigKey => NULL, + pSourceKey => 'LM', -- Archive all LM tables + pArchiveAll => FALSE + ); +END; +/ + +-- Batch archival: All tables system-wide +BEGIN + CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL( + pSourceFileConfigKey => NULL, + pSourceKey => NULL, + pArchiveAll => TRUE -- Archive all configured tables ); END; / @@ -225,10 +421,121 @@ END; **Strategy-Based Filtering**: - Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG - GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause +- Only tables with ARCHIVE_ENABLED = 'Y' are processed - Data matching criteria moved from ODS to ARCHIVE bucket - CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/) - Parquet format with Hive-style partitioning applied to ARCHIVE bucket -- TRASH retention controlled by pKeepInTrash parameter +- TRASH retention controlled by KEEP_IN_TRASH column in A_SOURCE_FILE_CONFIG + +### Automatic Rollback Mechanism + +FILE_ARCHIVER implements **automatic rollback** to ensure data integrity if archival process fails: + +**Process Flow**: +1. **Export to ARCHIVE**: Data exported to Parquet format in ARCHIVE bucket +2. **Status Update**: A_SOURCE_FILE_RECEIVED records updated to 'ARCHIVED_AND_TRASHED' +3. **Move to TRASH**: CSV files moved from ODS to TRASH folder (DATA bucket) +4. **Optional Cleanup**: If KEEP_IN_TRASH='N', files deleted from TRASH + +**Automatic Rollback Trigger**: +If **any error occurs** during step 3 (Move to TRASH), the system: +- **Reverts all files**: Moves successfully processed files from TRASH back to ODS +- **Rolls back status**: Resets A_SOURCE_FILE_RECEIVED status to 'INGESTED' +- **Logs error**: Records detailed error information in A_PROCESS_LOG +- **Raises exception**: Propagates error to calling process + +**Rollback Logic (from code)**: +```sql +-- If MOVE_FILE_TO_TRASH fails for any file +ELSIF vProcessControlStatus = 'MOVE_FILE_TO_TRASH_FAILURE' THEN + FOR f in (files already moved to TRASH) LOOP + -- Move file back from TRASH to ODS + DBMS_CLOUD.MOVE_OBJECT( + source_object_uri => 'TRASH/.../filename', + target_object_uri => 'ODS/.../filename' + ); + + -- Revert status back to INGESTED + UPDATE A_SOURCE_FILE_RECEIVED + SET PROCESSING_STATUS = 'INGESTED' + WHERE source_file_name = f.filename; + END LOOP; +END IF; +``` + +**Why This Matters**: Ensures **all-or-nothing** archival - either all files for a YEAR_MONTH partition are successfully archived, or **none** are (maintains data consistency). + +### TRASH Management Procedures + +#### RESTORE_FILE_FROM_TRASH + +Restores files from TRASH folder back to ODS with **3-level granularity**: + +**Level 3 (Highest Priority)** - Single File Restore: +```sql +-- Restore specific file by A_SOURCE_FILE_RECEIVED_KEY +CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( + pSourceFileReceivedKey => 12345 +); +``` + +**Level 2 (Medium Priority)** - Configuration-Based Restore: +```sql +-- Restore all files for specific table configuration +CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( + pSourceFileConfigKey => 341 +); +``` + +**Level 1 (Lowest Priority)** - Global Restore: +```sql +-- Restore ALL files with ARCHIVED_AND_TRASHED status system-wide +CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( + pRestoreAll => TRUE +); +``` + +**Restore Operations**: +- **Moves files**: TRASH folder → ODS folder (using DBMS_CLOUD.MOVE_OBJECT) +- **Updates status**: ARCHIVED_AND_TRASHED → INGESTED +- **Clears metadata**: Sets ARCH_PATH, PARTITION_YEAR, PARTITION_MONTH to NULL +- **Returns files to active processing**: Makes data available for Airflow+DBT pipeline + +#### PURGE_TRASH_FOLDER + +Permanently deletes files from TRASH with **3-level granularity**: + +**Level 3 (Highest Priority)** - Single File Purge: +```sql +-- Delete specific file from TRASH +CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( + pSourceFileReceivedKey => 12345 +); +``` + +**Level 2 (Medium Priority)** - Configuration-Based Purge: +```sql +-- Delete all TRASH files for specific table configuration +CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( + pSourceFileConfigKey => 341 +); +``` + +**Level 1 (Lowest Priority)** - Global Purge: +```sql +-- Delete ALL files with ARCHIVED_AND_TRASHED status system-wide +CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( + pPurgeAll => TRUE +); +``` + +**Purge Operations**: +- **Deletes files**: Permanently removes from TRASH folder (using DBMS_CLOUD.DELETE_OBJECT) +- **Updates status**: ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED +- **Warning**: **Irreversible operation** - files cannot be restored after purge +- **Use case**: Storage optimization, compliance with data retention policies + +**Important**: Purge is **not automatic** - must be explicitly called. This provides additional safety layer for data retention. ## Configuration Examples @@ -335,6 +642,56 @@ GROUP BY ARCHIVAL_STRATEGY ORDER BY ARCHIVAL_STRATEGY; ``` +### Example 5: Configure Archival Control Settings + +```sql +-- Complete configuration with all archival settings +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', + MINIMUM_AGE_MONTHS = 6, + ARCHIVE_ENABLED = 'Y', -- Enable archival + KEEP_IN_TRASH = 'Y' -- Keep files in TRASH for safety +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'CSDB' + AND TABLE_ID = 'CSDB_DEBT'; +COMMIT; + +-- Disable archival temporarily for troubleshooting +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET ARCHIVE_ENABLED = 'N' -- Batch operations will skip this table +WHERE TABLE_ID = 'CSDB_DEBT'; +COMMIT; + +-- Configure TRASH cleanup for test environment +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET KEEP_IN_TRASH = 'N' -- Delete files from TRASH after archival +WHERE SOURCE_FILE_TYPE = 'INPUT' + AND SOURCE_FILE_ID = 'TEST_SOURCE'; +COMMIT; + +-- View complete configuration +SELECT + SOURCE_FILE_ID, + TABLE_ID, + ARCHIVAL_STRATEGY, + MINIMUM_AGE_MONTHS, + ARCHIVE_ENABLED, + KEEP_IN_TRASH +FROM CT_MRDS.A_SOURCE_FILE_CONFIG +WHERE SOURCE_FILE_TYPE = 'INPUT' +ORDER BY SOURCE_FILE_ID, TABLE_ID; + +-- Summary by archival status +SELECT + ARCHIVE_ENABLED, + KEEP_IN_TRASH, + COUNT(*) AS TABLE_COUNT +FROM CT_MRDS.A_SOURCE_FILE_CONFIG +WHERE SOURCE_FILE_TYPE = 'INPUT' +GROUP BY ARCHIVE_ENABLED, KEEP_IN_TRASH +ORDER BY ARCHIVE_ENABLED DESC, KEEP_IN_TRASH DESC; +``` + ## Release 01 Configuration ### Configured Tables (MARS-828) @@ -425,7 +782,180 @@ SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', WHERE ...; ``` -#### Issue 2: Archival Not Working as Expected +#### Issue 2: Archival Not Triggering Despite Configuration + +**Scenario A**: **MINIMUM_AGE_MONTHS** strategy not archiving +```sql +-- Check files that should be archived +SELECT + SFR.A_SOURCE_FILE_RECEIVED_KEY, + SFR.SOURCE_FILE_NAME, + SFR.PROCESSING_STATUS, + LH.LOAD_START, + TRUNC(MONTHS_BETWEEN(SYSDATE, LH.LOAD_START)) AS MONTHS_AGE, + SFC.MINIMUM_AGE_MONTHS AS THRESHOLD +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR +JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY +JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY +WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' + AND SFR.PROCESSING_STATUS = 'INGESTED' + AND SFC.ARCHIVE_ENABLED = 'Y' +ORDER BY LH.LOAD_START; + +-- Note: MINIMUM_AGE_MONTHS archives immediately (threshold-independent) +-- If files not archived, check ARCHIVE_ENABLED='Y' and run ARCHIVE_TABLE_DATA +``` + +**Scenario B**: **THRESHOLD_BASED** or **HYBRID** strategy not archiving +```sql +-- Check if threshold reached for specific configuration +SELECT + SFC.SOURCE_FILE_ID, + SFC.TABLE_ID, + SFC.ARCHIVAL_STRATEGY, + SFC.FILES_COUNT_OVER_ARCHIVE_THRESHOLD AS FILE_THRESHOLD, + SFC.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD AS ROW_THRESHOLD, + SFC.BYTES_SUM_OVER_ARCHIVE_THRESHOLD AS BYTE_THRESHOLD, + COUNT(SFR.A_SOURCE_FILE_RECEIVED_KEY) AS CURRENT_FILES, + SUM(SFR.TOTAL_RECORDS) AS CURRENT_ROWS, + SUM(SFR.FILE_SIZE_BYTES) AS CURRENT_BYTES +FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC +LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR + ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY + AND SFR.PROCESSING_STATUS = 'INGESTED' +WHERE SFC.ARCHIVAL_STRATEGY IN ('THRESHOLD_BASED', 'HYBRID') + AND SFC.ARCHIVE_ENABLED = 'Y' + AND SFC.A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey +GROUP BY + SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, + SFC.FILES_COUNT_OVER_ARCHIVE_THRESHOLD, + SFC.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD, + SFC.BYTES_SUM_OVER_ARCHIVE_THRESHOLD; + +-- Expected: At least ONE threshold (FILE/ROW/BYTE) must be exceeded +-- If no threshold exceeded, archival will NOT trigger (threshold-dependent behavior) +``` + +#### Issue 3: ARCH_PATH Contains Directory Not Filename + +**Symptoms**: A_SOURCE_FILE_RECEIVED.ARCH_PATH shows folder path instead of specific file + +**Explanation**: This is **expected behavior**: +```sql +-- Example ARCH_PATH value +SELECT ARCH_PATH +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED +WHERE PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' + AND ROWNUM = 1; + +-- Result (example): +-- https://objectstorage.../ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/ + +-- Reason: DBMS_CLOUD.EXPORT_DATA with parallel execution creates multiple Parquet files: +-- - STANDING_FACILITIES_part_00001.parquet +-- - STANDING_FACILITIES_part_00002.parquet +-- - ... +-- System stores directory prefix to track ALL generated files +``` + +**To List Actual Parquet Files**: +```sql +-- Use DBMS_CLOUD.LIST_OBJECTS with ARCH_PATH as prefix +SELECT object_name, bytes, created +FROM TABLE(DBMS_CLOUD.LIST_OBJECTS( + credential_name => 'OCI$RESOURCE_PRINCIPAL', + location_uri => 'https://objectstorage.../b/archive/o/' +)) +WHERE object_name LIKE 'ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/%'; +``` + +#### Issue 4: Files Remain in TRASH Folder + +**Symptoms**: Files not deleted from TRASH after archival + +**Cause**: Configuration has `KEEP_IN_TRASH='Y'` (retain files in TRASH) + +**Verification**: +```sql +-- Check TRASH policy for configuration +SELECT + SOURCE_FILE_ID, + TABLE_ID, + KEEP_IN_TRASH, + CASE KEEP_IN_TRASH + WHEN 'Y' THEN 'Files RETAINED in TRASH (manual purge required)' + WHEN 'N' THEN 'Files DELETED immediately after archival' + END AS TRASH_BEHAVIOR +FROM CT_MRDS.A_SOURCE_FILE_CONFIG +WHERE TABLE_ID = 'YOUR_TABLE'; +``` + +**Solutions**: +```sql +-- Option A: Change configuration to auto-delete (permanent change) +UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG +SET KEEP_IN_TRASH = 'N' -- Auto-delete from TRASH after archival +WHERE TABLE_ID = 'YOUR_TABLE'; +COMMIT; + +-- Option B: Manually purge TRASH for specific table (one-time action) +BEGIN + CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER( + pSourceFileConfigKey => :yourConfigKey + ); +END; +/ + +-- Option C: Purge all TRASH system-wide (use with caution) +BEGIN + CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER( + pPurgeAll => TRUE + ); +END; +/ +``` + +#### Issue 5: Automatic Rollback Occurred + +**Symptoms**: Files unexpectedly back in INGESTED status, archival process reported failure + +**Cause**: Error during "Move to TRASH" step triggered automatic rollback + +**Investigation**: +```sql +-- Check process logs for rollback events +SELECT + PROCESS_LOG_KEY, + LOG_LEVEL, + LOG_MESSAGE, + PARAMETERS, + LOG_TIMESTAMP +FROM CT_MRDS.A_PROCESS_LOG +WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA' + AND LOG_MESSAGE LIKE '%rollback%' OR LOG_MESSAGE LIKE '%MOVE_FILE_TO_TRASH_FAILURE%' +ORDER BY LOG_TIMESTAMP DESC +FETCH FIRST 10 ROWS ONLY; + +-- Check files that were rolled back +SELECT + A_SOURCE_FILE_RECEIVED_KEY, + SOURCE_FILE_NAME, + PROCESSING_STATUS, -- Should be INGESTED after rollback + ARCH_PATH, -- Should be NULL after rollback + PARTITION_YEAR, -- Should be NULL after rollback + PARTITION_MONTH -- Should be NULL after rollback +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED +WHERE A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey + AND UPDATED_AT > SYSDATE - 1 -- Last 24 hours +ORDER BY UPDATED_AT DESC; +``` + +**Resolution**: +1. **Investigate root cause**: Check error messages in A_PROCESS_LOG +2. **Fix underlying issue**: OCI permissions, bucket access, wrong credentials, etc. +3. **Re-run archival**: Call ARCHIVE_TABLE_DATA again after fix + +#### Issue 6: Archival Not Working as Expected **Symptoms**: Data not being archived according to strategy @@ -495,9 +1025,156 @@ FROM user_objects WHERE object_name = 'FILE_ARCHIVER'; ``` +### Diagnostic Queries for Monitoring + +#### Query 1: Status Distribution Across All Files + +```sql +-- Overall file status distribution +SELECT + PROCESSING_STATUS, + COUNT(*) AS FILE_COUNT, + ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS PERCENTAGE, + MIN(CREATED_AT) AS OLDEST_FILE, + MAX(CREATED_AT) AS NEWEST_FILE +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED +GROUP BY PROCESSING_STATUS +ORDER BY FILE_COUNT DESC; +``` + +#### Query 2: Files in TRASH (Archived but Not Purged) + +```sql +-- Files currently in TRASH folder (status ARCHIVED_AND_TRASHED) +SELECT + SFR.A_SOURCE_FILE_RECEIVED_KEY, + SFC.SOURCE_FILE_ID, + SFC.TABLE_ID, + SFR.SOURCE_FILE_NAME, + SFR.ARCH_PATH, + SFR.PARTITION_YEAR, + SFR.PARTITION_MONTH, + SFR.FILE_SIZE_BYTES, + SFR.UPDATED_AT AS ARCHIVED_AT, + TRUNC(SYSDATE - SFR.UPDATED_AT) AS DAYS_IN_TRASH, + SFC.KEEP_IN_TRASH AS TRASH_POLICY +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR +JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY +WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' +ORDER BY SFR.UPDATED_AT DESC; +``` + +#### Query 3: Archival Activity by Configuration + +```sql +-- Archival statistics per table configuration +SELECT + SFC.SOURCE_FILE_ID, + SFC.TABLE_ID, + SFC.ARCHIVAL_STRATEGY, + SFC.ARCHIVE_ENABLED, + SFC.KEEP_IN_TRASH, + COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'INGESTED' THEN 1 END) AS PENDING_ARCHIVE, + COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' THEN 1 END) AS IN_TRASH, + COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_PURGED' THEN 1 END) AS PURGED, + MAX(SFR.UPDATED_AT) FILTER (WHERE SFR.PROCESSING_STATUS LIKE 'ARCHIVED%') AS LAST_ARCHIVAL +FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC +LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY +WHERE SFC.SOURCE_FILE_TYPE = 'INPUT' +GROUP BY + SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, + SFC.ARCHIVE_ENABLED, SFC.KEEP_IN_TRASH +ORDER BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID; +``` + +#### Query 4: Files Eligible for Archival (MINIMUM_AGE_MONTHS) + +```sql +-- Files that should be archived based on MINIMUM_AGE_MONTHS strategy +SELECT + SFC.SOURCE_FILE_ID, + SFC.TABLE_ID, + SFC.MINIMUM_AGE_MONTHS AS AGE_THRESHOLD, + COUNT(*) AS ELIGIBLE_FILES, + SUM(SFR.FILE_SIZE_BYTES) AS TOTAL_SIZE_BYTES, + SUM(SFR.TOTAL_RECORDS) AS TOTAL_ROWS, + MIN(LH.LOAD_START) AS OLDEST_FILE, + MAX(LH.LOAD_START) AS NEWEST_ELIGIBLE +FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC +JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY +JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY +WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' + AND SFC.ARCHIVE_ENABLED = 'Y' + AND SFR.PROCESSING_STATUS = 'INGESTED' + AND LH.LOAD_START < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -SFC.MINIMUM_AGE_MONTHS) +GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.MINIMUM_AGE_MONTHS +ORDER BY ELIGIBLE_FILES DESC; +``` + +#### Query 5: Archival Performance Metrics + +```sql +-- Recent archival operations with timing +SELECT + PROCESS_LOG_KEY, + SUBSTR(PARAMETERS, 1, 100) AS CONFIG_INFO, + LOG_TIMESTAMP AS START_TIME, + LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) AS END_TIME, + ROUND((LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) + - LOG_TIMESTAMP) * 24 * 60, 2) AS DURATION_MINUTES, + CASE + WHEN LOG_LEVEL = 'ERROR' THEN 'FAILED' + WHEN LOG_MESSAGE LIKE '%Archival completed%' THEN 'SUCCESS' + ELSE 'IN_PROGRESS' + END AS STATUS +FROM CT_MRDS.A_PROCESS_LOG +WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA' + AND LOG_TIMESTAMP > SYSDATE - 7 -- Last 7 days +ORDER BY LOG_TIMESTAMP DESC; +``` + +#### Query 6: TRASH Storage Usage + +```sql +-- Estimate TRASH folder storage usage +SELECT + SFC.SOURCE_FILE_ID, + COUNT(*) AS FILES_IN_TRASH, + ROUND(SUM(SFR.FILE_SIZE_BYTES) / 1024 / 1024 / 1024, 2) AS SIZE_GB, + MIN(SFR.UPDATED_AT) AS OLDEST_IN_TRASH, + MAX(SFR.UPDATED_AT) AS NEWEST_IN_TRASH, + SFC.KEEP_IN_TRASH AS POLICY +FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR +JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY +WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' +GROUP BY SFC.SOURCE_FILE_ID, SFC.KEEP_IN_TRASH +ORDER BY SIZE_GB DESC; +``` + ## Version History -### v3.1.0 (Current - 2026-02-05) +### v3.3.0 (Current - 2026-02-11) +- **BREAKING CHANGE**: Removed `pKeepInTrash` parameter from ARCHIVE_TABLE_DATA +- Added `ARCHIVE_ENABLED` column to A_SOURCE_FILE_CONFIG for selective archiving control +- Added `KEEP_IN_TRASH` column to A_SOURCE_FILE_CONFIG (replaces pKeepInTrash parameter) +- Added batch procedures with 3-level granularity (config/source/all): + - ARCHIVE_ALL - Batch archival procedure + - GATHER_TABLE_STAT_ALL - Batch statistics procedure + - RESTORE_FILE_FROM_TRASH - Restore files from TRASH folder + - PURGE_TRASH_FOLDER - Purge TRASH folder files +- TRASH retention now configuration-based instead of parameter-based +- Enhanced flexibility for archival orchestration and monitoring + +### v3.2.1 (2026-02-10) +- Fixed critical bug: Status update ARCHIVED → ARCHIVED_AND_TRASHED when moving files to TRASH folder +- Ensures proper status tracking for files retained in TRASH + +### v3.2.0 (2026-02-06) +- Added `pKeepInTrash` parameter (DEFAULT TRUE) to ARCHIVE_TABLE_DATA +- TRASH folder retention control for safety and compliance +- Files kept in TRASH subfolder by default for rollback capability + +### v3.1.0 (2026-02-05) - **BREAKING CHANGE**: Removed CURRENT_MONTH_ONLY strategy (replaced by MINIMUM_AGE_MONTHS = 0) - Mathematical equivalence: CURRENT_MONTH_ONLY ≡ MINIMUM_AGE_MONTHS = 0 - Updated trigger validation to allow MINIMUM_AGE_MONTHS >= 0 (previously >= 1) @@ -567,9 +1244,7 @@ WHERE object_name = 'FILE_ARCHIVER'; - Example: CSDB securities data (MINIMUM_AGE_MONTHS = 6) 2. **Use THRESHOLD_BASED when**: - - Maintaining backward compatibility with legacy behavior - Simple time-based archival is sufficient - - Migration from FILE_ARCHIVER v2.0.0 3. **Use HYBRID when**: - Complex retention requirements @@ -632,18 +1307,30 @@ WHERE object_name = 'FILE_ARCHIVER'; ### TRASH Folder Retention Best Practices -1. **Default Behavior (pKeepInTrash = TRUE - Recommended)**: +1. **Default Behavior (KEEP_IN_TRASH = 'Y' - Recommended)**: - Keeps CSV files in TRASH folder after archival - Provides safety net for rollback if archival issues occur - Supports compliance and audit requirements - Status: ARCHIVED_AND_TRASHED - Use for: Production environments, regulatory compliance, critical data + - Configuration: + ```sql + UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG + SET KEEP_IN_TRASH = 'Y' + WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE'; + ``` -2. **TRASH Cleanup (pKeepInTrash = FALSE)**: +2. **TRASH Cleanup (KEEP_IN_TRASH = 'N')**: - Deletes CSV files from TRASH folder after successful archival - Reduces storage costs in DATA bucket - Status: ARCHIVED_AND_PURGED - Use for: Non-critical data, storage optimization, test environments + - Configuration: + ```sql + UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG + SET KEEP_IN_TRASH = 'N' + WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE'; + ``` 3. **Monitoring TRASH Folder**: ```sql @@ -676,7 +1363,7 @@ WHERE object_name = 'FILE_ARCHIVER'; ## Author Created by: Grzegorz Michalski -Date: 2026-02-06 +Date: 2026-02-11 Schema: CT_MRDS Package: FILE_ARCHIVER -Version: 3.2.0 +Version: 3.3.0