# FILE_ARCHIVER Configuration Guide This document describes the archival strategies available in the FILE_ARCHIVER package for managing data lifecycle across OCI buckets (INBOX → ODS → ARCHIVE). ## Overview The FILE_ARCHIVER package provides flexible archival strategies that accommodate different data retention policies across source systems. It manages the movement of processed data from operational storage (ODS bucket) to long-term archival storage (ARCHIVE bucket) based on configurable strategies. ### Key Features - **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID - **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG - **Workflow Bypass**: IS_WORKFLOW_SUCCESS_REQUIRED flag allows archivization of files from non-DBT sources - **Validation**: Automatic validation of strategy-specific configuration requirements ### Package Information - **Schema**: CT_MRDS - **Package**: FILE_ARCHIVER - **Current Version**: 3.4.0 - **Dependencies**: ENV_MANAGER, FILE_MANAGER, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY ### Critical Prerequisites ⚠️ **IMPORTANT**: FILE_ARCHIVER requires data to be registered in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table. **For new system data (Airflow + DBT):** - `A_SOURCE_FILE_RECEIVED` records are automatically created by `FILE_MANAGER.PROCESS_SOURCE_FILE` during file validation - No additional configuration needed - standard workflow handles registration **For legacy data migrated from Informatica + WLA system:** - Use `DATA_EXPORTER` with **`pRegisterExport => TRUE`** parameter to automatically register exported files in `A_SOURCE_FILE_RECEIVED` - This enables FILE_ARCHIVER to process legacy data exports without manual registration - Available in both `EXPORT_TABLE_DATA` (single CSV) and `EXPORT_TABLE_DATA_TO_CSV_BY_DATE` (partitioned CSV exports) **Example - Legacy Data Export with Registration**: ```sql -- Export legacy data to DATA bucket WITH automatic registration BEGIN CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE( pSchemaName => 'OU_TOP', pTableName => 'AGGREGATED_ALLOTMENT', pKeyColumnName => 'A_ETL_LOAD_SET_KEY_FK', pBucketArea => 'DATA', pFolderName => 'legacy_export', pMinDate => DATE '2024-01-01', pMaxDate => DATE '2024-12-31', pRegisterExport => TRUE, -- ✓ Registers files in A_SOURCE_FILE_RECEIVED pProcessName => 'LEGACY_MIGRATION' ); END; / -- Now FILE_ARCHIVER can process these files BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA( pSourceFileConfigKey => vConfigKey ); END; / ``` **Alternative approach**: Export directly to ARCHIVE bucket using `DATA_EXPORTER.EXPORT_TABLE_DATA_BY_DATE` with `pBucketArea => 'ARCHIVE'` to bypass archival step entirely ## Archival Strategies ### Strategy Overview | Strategy | WHERE Clause Logic | Configuration Required | Primary Use Case | |----------|-------------------|----------------------|------------------| | `THRESHOLD_BASED` | Days since workflow start > threshold | ARCHIVE_THRESHOLD_DAYS | Simple time-based archival | | `MINIMUM_AGE_MONTHS` | Archive data older than X months (0=current month only) | MINIMUM_AGE_MONTHS (≥0) | All sources - flexible retention (0 for LM, 6 for CSDB) | | `HYBRID` | Combines month boundary + minimum age | MINIMUM_AGE_MONTHS | Advanced retention scenarios | ### 1. THRESHOLD_BASED (Default) Archives data based on number of days since workflow start. **WHERE Clause**: ```sql extract(day from (systimestamp - workflow_start)) > ARCHIVE_THRESHOLD_DAYS ``` **Configuration**: ```sql UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'THRESHOLD_BASED', ARCHIVE_THRESHOLD_DAYS = 30, MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'C2D_DATA' AND TABLE_ID = 'C2D_TABLE'; ``` **Use Case**: Simple time-based archival. ### 2. MINIMUM_AGE_MONTHS Archives data older than specified number of months. **Special case**: MINIMUM_AGE_MONTHS = 0 archives all data before current month. **WHERE Clause**: ```sql workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS) -- When MINIMUM_AGE_MONTHS = 0: workflow_start < TRUNC(SYSDATE, 'MM') ``` **Configuration Examples**: ```sql -- LM: Keep only current month data (MINIMUM_AGE_MONTHS = 0) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 0 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'DistributeStandingFacilities' AND TABLE_ID = 'LM_STANDING_FACILITIES'; -- CSDB: Retain 6 months of data (MINIMUM_AGE_MONTHS = 6) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID IN ('CSDB_DEBT', 'CSDB_DEBT_DAILY'); ``` **Use Cases**: - **MINIMUM_AGE_MONTHS = 0**: LM dissemination feeds requiring current month only (daily/intraday updates) - **MINIMUM_AGE_MONTHS = 6**: CSDB securities/ratings data requiring 6-month retention - **MINIMUM_AGE_MONTHS = N**: Regulatory compliance with specific N-month retention periods **Behavior Examples**: - **With MINIMUM_AGE_MONTHS = 0**: - January data: Archived on February 1st - February data: Remains in ODS bucket during February - March 1st: February data archived, March data active - **With MINIMUM_AGE_MONTHS = 6**: - February 2026: Archives data from July 2025 and earlier - March 2026: Archives data from August 2025 and earlier - Keeps current month + 6 previous months (7 months total) in ODS bucket ### 3. HYBRID Combines month boundary check with minimum age threshold - archives data from previous months AND older than minimum age. **WHERE Clause**: ```sql TRUNC(workflow_start, 'MM') < TRUNC(SYSDATE, 'MM') AND workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS) ``` **Configuration**: ```sql -- Advanced: Current month + 3 months minimum UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'HYBRID', MINIMUM_AGE_MONTHS = 3 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'SPECIAL_SOURCE' AND TABLE_ID = 'SPECIAL_TABLE'; ``` **Use Case**: Advanced scenarios requiring both current month retention AND minimum age threshold. ## Archival Triggering Logic ### Strategy-Specific Execution Behavior The FILE_ARCHIVER package uses **different triggering logic** depending on the configured archival strategy: #### MINIMUM_AGE_MONTHS Strategy (Threshold-Independent) **Behavior**: Archives data **immediately** when age criteria is met, **without checking** archival thresholds. ```sql -- Executed when MINIMUM_AGE_MONTHS strategy is configured IF vSourceFileConfig.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' THEN vArchivalTriggeredBy := 'AGE_BASED'; -- Proceeds with archival regardless of FILES_COUNT, ROWS_COUNT, or BYTES_SUM END IF; ``` **Why**: This strategy is designed for **strict retention policies** where data **must** be archived based on age alone (e.g., regulatory compliance requiring current month only). #### THRESHOLD_BASED and HYBRID Strategies (Threshold-Dependent) **Behavior**: Archives data **only when** at least one of the following thresholds is exceeded: 1. **ARCHIVE_THRESHOLD_FILES_COUNT** - Number of files eligible for archival 2. **ARCHIVE_THRESHOLD_ROWS_COUNT** - Number of rows eligible for archival 3. **ARCHIVE_THRESHOLD_BYTES_SUM** - Total size in bytes eligible for archival ```sql -- Executed for THRESHOLD_BASED and HYBRID strategies IF vTableStat.OVER_ARCH_THRESOLD_FILE_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_FILES_COUNT THEN vArchivalTriggeredBy := 'FILES_COUNT'; ELSIF vTableStat.OVER_ARCH_THRESOLD_ROW_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_ROWS_COUNT THEN vArchivalTriggeredBy := 'ROWS_COUNT'; ELSIF vTableStat.OVER_ARCH_THRESOLD_SIZE >= vSourceFileConfig.ARCHIVE_THRESHOLD_BYTES_SUM THEN vArchivalTriggeredBy := 'BYTES_SUM'; END IF; ``` **Why**: These strategies provide **performance optimization** by avoiding unnecessary archival operations when data volume is small. **Configuration Example**: ```sql -- Set archival thresholds for THRESHOLD_BASED strategy UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVE_THRESHOLD_FILES_COUNT = 10, -- Archive when 10+ files eligible ARCHIVE_THRESHOLD_ROWS_COUNT = 100000, -- Archive when 100k+ rows eligible ARCHIVE_THRESHOLD_BYTES_SUM = 104857600 -- Archive when 100MB+ eligible WHERE ARCHIVAL_STRATEGY = 'THRESHOLD_BASED' AND TABLE_ID = 'YOUR_TABLE'; ``` **Important**: For **MINIMUM_AGE_MONTHS** strategy, these threshold values are **ignored** - archival proceeds based on age alone. ## Configuration Validation ### Validation Trigger **Trigger**: `TRG_BI_A_SRC_FILE_CFG_ARCH_VAL` Automatically validates archival configuration on INSERT/UPDATE to A_SOURCE_FILE_CONFIG: **Validation Rules**: 1. **MINIMUM_AGE_MONTHS**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL AND MINIMUM_AGE_MONTHS >= 0` - Error: "Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set (≥0)" 2. **HYBRID**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL` - Error: "Strategy HYBRID requires MINIMUM_AGE_MONTHS to be set" **Example Validation Error**: ```sql -- This will fail validation UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = NULL -- ERROR: Required for this strategy WHERE ...; -- Error: ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set ``` ## Archival Control Configuration ### IS_ARCHIVE_ENABLED Column Controls whether archival is enabled for specific table configuration. **Column**: `A_SOURCE_FILE_CONFIG.IS_ARCHIVE_ENABLED` (CHAR(1), DEFAULT 'N' NOT NULL) **Values**: - `'Y'` - Table is eligible for archival processing - `'N'` (default) - Table is excluded from archival (batch operations skip this config) **Use Cases**: - Disable archival for specific tables without removing configuration - Temporarily suspend archival during data migration or troubleshooting - Selective archival in batch operations **Configuration Example**: ```sql -- Disable archival for specific table UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_ARCHIVE_ENABLED = 'N' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Re-enable archival UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_ARCHIVE_ENABLED = 'Y' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Check archival status SELECT SOURCE_FILE_ID, TABLE_ID, IS_ARCHIVE_ENABLED, ARCHIVAL_STRATEGY FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' ORDER BY SOURCE_FILE_ID, TABLE_ID; ``` ### IS_KEPT_IN_TRASH Column Controls TRASH folder retention policy for archived files. **Column**: `A_SOURCE_FILE_CONFIG.IS_KEPT_IN_TRASH` (CHAR(1), DEFAULT 'N' NOT NULL) **Values**: - `'Y'` - CSV files kept in TRASH folder after archival (status: ARCHIVED_AND_TRASHED) - `'N'` (default) - CSV files deleted from TRASH folder after archival (status: ARCHIVED_AND_PURGED) **Benefits of TRASH Retention (TRUE)**: - Safety net for rollback if archival issues discovered - Supports compliance and audit requirements - Enables file restoration via `RESTORE_FILE_FROM_TRASH` procedure **Benefits of TRASH Cleanup (FALSE)**: - Reduces storage costs in DATA bucket - Simplifies bucket management - Appropriate for non-critical or test data **Configuration Example**: ```sql -- Production: Keep files in TRASH (recommended) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'Y' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'LM' AND TABLE_ID LIKE 'LM_%'; COMMIT; -- Test environment: Cleanup TRASH to save storage UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'N' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'TEST_SOURCE'; COMMIT; -- Bulk configuration by source UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'Y' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID IN ('CSDB', 'C2D', 'LM'); COMMIT; ``` ### IS_WORKFLOW_SUCCESS_REQUIRED Column Controls whether archivization requires `WORKFLOW_SUCCESSFUL='Y'` in A_WORKFLOW_HISTORY. Added in MARS-1409. **Column**: `A_SOURCE_FILE_CONFIG.IS_WORKFLOW_SUCCESS_REQUIRED` (CHAR(1), DEFAULT 'Y' NOT NULL) **Values**: - `'Y'` (default) - Only files with `WORKFLOW_SUCCESSFUL='Y'` are eligible for archivization (standard Airflow+DBT flow) - `'N'` - Archivization proceeds regardless of workflow completion status (bypass for manual/non-DBT sources) **Use Cases**: - `'Y'`: All standard INBOX-validated sources (LM, CSDB, C2D) - ensures only fully-processed files are archived - `'N'`: Legacy data migrated via `DATA_EXPORTER`, manual uploads, or any source without DBT workflow tracking **GATHER_TABLE_STAT behavior**: - `'Y'`: Statistics (file count, row count, byte sum) counted only from files with `WORKFLOW_SUCCESSFUL='Y'` - `'N'`: Statistics counted from all INGESTED files regardless of workflow outcome **Configuration Example**: ```sql -- Standard source: require DBT workflow completion (default) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_WORKFLOW_SUCCESS_REQUIRED = 'Y' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'LM'; COMMIT; -- Non-DBT source: bypass workflow check UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_WORKFLOW_SUCCESS_REQUIRED = 'N' WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'MANUAL_UPLOAD'; COMMIT; -- Or set at configuration time via ADD_SOURCE_FILE_CONFIG CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE_FILE_CONFIG( pSourceKey => 'MANUAL', pSourceFileType => 'INPUT', pSourceFileId => 'MANUAL_UPLOAD', pSourceFileDesc => 'Manual data upload without DBT', pSourceFileNamePattern => 'manual_*.csv', pTableId => 'MY_TABLE', pTemplateTableName => 'CT_ET_TEMPLATES.MY_TABLE', pIsWorkflowSuccessRequired => 'N' -- bypass workflow check ); ``` ### Status Tracking in A_SOURCE_FILE_RECEIVED The FILE_ARCHIVER tracks file lifecycle through the `PROCESSING_STATUS` column in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table: **Status Progression**: ``` INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional) ↓ INGESTED (via RESTORE_FILE_FROM_TRASH) ``` **Status Descriptions**: - **INGESTED**: File successfully processed through Airflow+DBT, residing in ODS bucket - **ARCHIVED_AND_TRASHED**: File archived to Parquet in ARCHIVE bucket, CSV retained in TRASH folder (DATA bucket) - **ARCHIVED_AND_PURGED**: File archived to Parquet, CSV deleted from TRASH folder (when IS_KEPT_IN_TRASH='N') **Associated Columns Updated During Archival**: ```sql UPDATE CT_MRDS.A_SOURCE_FILE_RECEIVED SET PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED', -- Status change ARCH_PATH = 'archive_directory_prefix/', -- Directory with Parquet files PARTITION_YEAR = 2026, -- Year partition value PARTITION_MONTH = 02 -- Month partition value WHERE SOURCE_FILE_NAME = 'file.csv'; ``` **ARCH_PATH Column**: Contains the **directory prefix** (URI) where archived Parquet files are located in the ARCHIVE bucket. Since `DBMS_CLOUD.EXPORT_DATA` may create multiple Parquet files with parallel execution, the system stores the directory location rather than individual filenames. **Example ARCH_PATH**: ``` https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/ ``` ### Standard File Processing Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ FILE PROCESSING LIFECYCLE │ └─────────────────────────────────────────────────────────────┘ 1. INBOX Bucket (Validation) ├─ File arrives from source system ├─ FILE_MANAGER.PROCESS_SOURCE_FILE validates structure ├─ Status: RECEIVED → VALIDATED → READY_FOR_INGESTION └─ FILE_MANAGER.MOVE_FILE relocates to ODS bucket 2. ODS Bucket (Operational Data) ├─ Active data processing (Airflow + DBT) ├─ External tables read data from bucket ├─ Status: INGESTED ├─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy └─ CSV files moved to TRASH subfolder (ODS → TRASH/) 2.1 TRASH Subfolder (DATA Bucket - File Retention) ├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME) ├─ Stores CSV files after archival to Parquet ├─ Status: ARCHIVED_AND_TRASHED (default, controlled by IS_KEPT_IN_TRASH config) ├─ Enables rollback if archival issues occur └─ Optional cleanup: ARCHIVED_AND_PURGED (when IS_KEPT_IN_TRASH = 'N') 3. ARCHIVE Bucket (Long-term Storage) ├─ Historical data in Parquet format ├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH= ├─ Status: ARCHIVED_AND_TRASHED or ARCHIVED_AND_PURGED └─ Optimized for big data analytics (Spark, Hive) **Key Procedures**: - `ARCHIVE_TABLE_DATA(pSourceFileConfigKey)` - Main archival procedure using strategy-specific WHERE clause - TRASH folder retention controlled by `IS_KEPT_IN_TRASH` column in A_SOURCE_FILE_CONFIG - `ARCHIVE_ALL(pSourceFileConfigKey, pSourceKey, pArchiveAll)` - Batch archival with 3-level granularity and error handling - **Level 3 (Highest Priority)**: Single configuration via `pSourceFileConfigKey` - **Level 2 (Medium Priority)**: All configurations for source via `pSourceKey` - **Level 1 (Lowest Priority)**: All configurations system-wide via `pArchiveAll` - **Error Handling**: Continues processing other tables on individual failures - **Filtering**: Respects `IS_ARCHIVE_ENABLED='Y'` (skips disabled configurations) - **Individual TRASH Policy**: Each table's `IS_KEPT_IN_TRASH` setting applied independently - **Summary Reporting**: Returns counts of Archived/Skipped/Failed tables - `GATHER_TABLE_STAT(pSourceFileConfigKey)` - Calculates archival statistics using strategy logic - `GATHER_TABLE_STAT_ALL(pSourceFileConfigKey, pSourceKey, pGatherAll, pOnlyEnabled)` - Batch statistics with 3-level granularity - `pOnlyEnabled` (DEFAULT TRUE): When TRUE, only processes tables with `IS_ARCHIVE_ENABLED='Y'` - `RESTORE_FILE_FROM_TRASH(pSourceFileReceivedKey, pSourceFileConfigKey, pRestoreAll)` - Restore archived files from TRASH - `PURGE_TRASH_FOLDER(pSourceFileReceivedKey, pSourceFileConfigKey, pPurgeAll)` - Purge TRASH folder with 3-level granularity - `GET_VERSION` / `GET_BUILD_INFO` / `GET_VERSION_HISTORY` - Package version and metadata **Function Wrappers (Python Integration)**: All key procedures have `FN_*` function overloads returning `PLS_INTEGER` (SQLCODE: 0=success, error code on failure) for Python library integration: - `FN_ARCHIVE_TABLE_DATA`, `FN_GATHER_TABLE_STAT`, `FN_ARCHIVE_ALL`, `FN_GATHER_TABLE_STAT_ALL` - `RESTORE_FILE_FROM_TRASH` and `PURGE_TRASH_FOLDER` also have function overloads returning PLS_INTEGER **Internal Functions** (not callable externally): - `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy (private) - `GET_TABLE_STAT` - Retrieves or auto-generates table statistics (private) **Archival Execution**: ```sql -- Single table archival (TRASH retention controlled by IS_KEPT_IN_TRASH config) BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA( pSourceFileConfigKey => vSourceFileConfigKey ); END; / -- Batch archival: All tables for specific source BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL( pSourceFileConfigKey => NULL, pSourceKey => 'LM', -- Archive all LM tables pArchiveAll => FALSE ); END; / -- Batch archival: All tables system-wide BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL( pSourceFileConfigKey => NULL, pSourceKey => NULL, pArchiveAll => TRUE -- Archive all configured tables ); END; / ``` **Strategy-Based Filtering**: - Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG - GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause - Only tables with IS_ARCHIVE_ENABLED = 'Y' are processed - Data matching criteria moved from ODS to ARCHIVE bucket - CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/) - Parquet format with Hive-style partitioning applied to ARCHIVE bucket - TRASH retention controlled by IS_KEPT_IN_TRASH column in A_SOURCE_FILE_CONFIG ### Automatic Rollback Mechanism FILE_ARCHIVER implements **automatic rollback** to ensure data integrity if archival process fails: **Process Flow**: 1. **Export to ARCHIVE**: Data exported to Parquet format in ARCHIVE bucket 2. **Status Update**: A_SOURCE_FILE_RECEIVED records updated to 'ARCHIVED_AND_TRASHED' 3. **Move to TRASH**: CSV files moved from ODS to TRASH folder (DATA bucket) 4. **Optional Cleanup**: If IS_KEPT_IN_TRASH='N', files deleted from TRASH **Automatic Rollback Trigger**: If **any error occurs** during step 3 (Move to TRASH), the system: - **Reverts all files**: Moves successfully processed files from TRASH back to ODS - **Rolls back status**: Resets A_SOURCE_FILE_RECEIVED status to 'INGESTED' - **Logs error**: Records detailed error information in A_PROCESS_LOG - **Raises exception**: Propagates error to calling process **Rollback Logic (from code)**: ```sql -- If MOVE_FILE_TO_TRASH fails for any file ELSIF vProcessControlStatus = 'MOVE_FILE_TO_TRASH_FAILURE' THEN FOR f in (files already moved to TRASH) LOOP -- Move file back from TRASH to ODS DBMS_CLOUD.MOVE_OBJECT( source_object_uri => 'TRASH/.../filename', target_object_uri => 'ODS/.../filename' ); -- Revert status back to INGESTED UPDATE A_SOURCE_FILE_RECEIVED SET PROCESSING_STATUS = 'INGESTED' WHERE source_file_name = f.filename; END LOOP; END IF; ``` **Why This Matters**: Ensures **all-or-nothing** archival - either all files for a YEAR_MONTH partition are successfully archived, or **none** are (maintains data consistency). ### TRASH Management Procedures #### RESTORE_FILE_FROM_TRASH Restores files from TRASH folder back to ODS with **3-level granularity**: **Level 3 (Highest Priority)** - Single File Restore: ```sql -- Restore specific file by A_SOURCE_FILE_RECEIVED_KEY CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( pSourceFileReceivedKey => 12345 ); ``` **Level 2 (Medium Priority)** - Configuration-Based Restore: ```sql -- Restore all files for specific table configuration CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( pSourceFileConfigKey => 341 ); ``` **Level 1 (Lowest Priority)** - Global Restore: ```sql -- Restore ALL files with ARCHIVED_AND_TRASHED status system-wide CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH( pRestoreAll => TRUE ); ``` **Restore Operations**: - **Moves files**: TRASH folder → ODS folder (using DBMS_CLOUD.MOVE_OBJECT) - **Updates status**: ARCHIVED_AND_TRASHED → INGESTED - **Clears metadata**: Sets ARCH_PATH, PARTITION_YEAR, PARTITION_MONTH to NULL - **Returns files to active processing**: Makes data available for Airflow+DBT pipeline #### PURGE_TRASH_FOLDER Permanently deletes files from TRASH with **3-level granularity**: **Level 3 (Highest Priority)** - Single File Purge: ```sql -- Delete specific file from TRASH CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( pSourceFileReceivedKey => 12345 ); ``` **Level 2 (Medium Priority)** - Configuration-Based Purge: ```sql -- Delete all TRASH files for specific table configuration CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( pSourceFileConfigKey => 341 ); ``` **Level 1 (Lowest Priority)** - Global Purge: ```sql -- Delete ALL files with ARCHIVED_AND_TRASHED status system-wide CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER( pPurgeAll => TRUE ); ``` **Purge Operations**: - **Deletes files**: Permanently removes from TRASH folder (using DBMS_CLOUD.DELETE_OBJECT) - **Updates status**: ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED - **Warning**: **Irreversible operation** - files cannot be restored after purge - **Use case**: Storage optimization, compliance with data retention policies **Important**: Purge is **not automatic** - must be explicitly called. This provides additional safety layer for data retention. ## Configuration Examples ### Example 1: Configure LM Standing Facilities (Current Month Only) ```sql -- Keep only current month data in ODS bucket (MINIMUM_AGE_MONTHS = 0) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 0 -- 0 = archives all data before current month WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'DistributeStandingFacilities' AND TABLE_ID = 'LM_STANDING_FACILITIES'; COMMIT; -- Verify configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID = 'DistributeStandingFacilities'; ``` ### Example 2: Configure CSDB Debt (MINIMUM_AGE_MONTHS) ```sql -- Retain 6 months of data in ODS bucket UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Verify configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'CSDB_DEBT'; ``` ### Example 3: Bulk Configuration for LM Source ```sql -- Configure all 19 LM tables with MINIMUM_AGE_MONTHS = 0 (current month only) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 0 -- 0 = keep only current month WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID IN ( 'DistributeStandingFacilities', 'DistributeTTS', 'DistributeAdHocAdjustments', 'DistributeBalanceSheet', 'DistributeCSMAdjustments', 'DistributeCurrentAccounts', 'DistributeForecast', 'DistributeQREAdjustments' ); COMMIT; -- Verify bulk configuration SELECT SOURCE_FILE_ID, COUNT(*) AS TABLE_COUNT, MAX(ARCHIVAL_STRATEGY) AS STRATEGY, MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID LIKE 'Distribute%' GROUP BY SOURCE_FILE_ID ORDER BY SOURCE_FILE_ID; ``` ### Example 4: View Current Archival Configuration ```sql -- All configured tables with their archival strategies SELECT A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, ARCHIVE_THRESHOLD_DAYS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' ORDER BY A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID; -- Summary by strategy SELECT ARCHIVAL_STRATEGY, COUNT(*) AS TABLE_COUNT, MIN(MINIMUM_AGE_MONTHS) AS MIN_AGE_MIN, MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE_MAX FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' GROUP BY ARCHIVAL_STRATEGY ORDER BY ARCHIVAL_STRATEGY; ``` ### Example 5: Configure Archival Control Settings ```sql -- Complete configuration with all archival settings UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6, IS_ARCHIVE_ENABLED = 'Y', -- Enable archival IS_KEPT_IN_TRASH = 'Y' -- Keep files in TRASH for safety WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Disable archival temporarily for troubleshooting UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_ARCHIVE_ENABLED = 'N' -- Batch operations will skip this table WHERE TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Configure TRASH cleanup for test environment UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'N' -- Delete files from TRASH after archival WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'TEST_SOURCE'; COMMIT; -- View complete configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, IS_ARCHIVE_ENABLED, IS_KEPT_IN_TRASH FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' ORDER BY SOURCE_FILE_ID, TABLE_ID; -- Summary by archival status SELECT IS_ARCHIVE_ENABLED, IS_KEPT_IN_TRASH, COUNT(*) AS TABLE_COUNT FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' GROUP BY IS_ARCHIVE_ENABLED, IS_KEPT_IN_TRASH ORDER BY IS_ARCHIVE_ENABLED DESC, IS_KEPT_IN_TRASH DESC; ``` ## Release 01 Configuration ### Configured Tables (MARS-828) The following 25 Release 01 tables were configured with archival strategies: **LM Tables (19 total) - MINIMUM_AGE_MONTHS = 0 (current month only)**: - LM_STANDING_FACILITIES - LM_STANDING_FACILITIES_HEADER - LM_TTS_HEADER - LM_TTS_ITEM - LM_ADHOC_ADJUSTMENTS_HEADER - LM_ADHOC_ADJUSTMENTS_ITEM - LM_ADHOC_ADJUSTMENTS_ITEM_HEADER - LM_BALANCESHEET_HEADER - LM_BALANCESHEET_ITEM - LM_CSM_ADJUSTMENTS_HEADER - LM_CSM_ADJUSTMENTS_ITEM - LM_CSM_ADJUSTMENTS_ITEM_HEADER - LM_CURRENT_ACCOUNTS_HEADER - LM_CURRENT_ACCOUNTS_ITEM - LM_FORECAST_HEADER - LM_FORECAST_ITEM - LM_QRE_ADJUSTMENTS_HEADER - LM_QRE_ADJUSTMENTS_ITEM - LM_QRE_ADJUSTMENTS_ITEM_HEADER **CSDB Tables (6 total)**: *MINIMUM_AGE_MONTHS = 6 (6-month retention)*: - CSDB_DEBT - CSDB_DEBT_DAILY *MINIMUM_AGE_MONTHS = 0 (current month only)*: - CSDB_INSTR_RAT_FULL - CSDB_INSTR_DESC_FULL - CSDB_ISSUER_RAT_FULL - CSDB_ISSUER_DESC_FULL **Verification Query**: ```sql -- Check Release 01 configuration SELECT CASE WHEN TABLE_ID LIKE 'LM_%' THEN 'LM' WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB' END AS SOURCE_GROUP, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, COUNT(*) AS TABLE_COUNT FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID IN ( -- 25 Release 01 tables 'LM_STANDING_FACILITIES', 'LM_STANDING_FACILITIES_HEADER', 'LM_TTS_HEADER', 'LM_TTS_ITEM', -- ... other tables ) GROUP BY CASE WHEN TABLE_ID LIKE 'LM_%' THEN 'LM' WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB' END, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS ORDER BY SOURCE_GROUP, ARCHIVAL_STRATEGY; ``` ## Troubleshooting ### Common Issues #### Issue 1: Validation Error on Configuration Update **Error**: ``` ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set ``` **Cause**: Trigger validation failed - strategy requires MINIMUM_AGE_MONTHS but value is NULL **Solution**: ```sql -- Provide required MINIMUM_AGE_MONTHS value UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 -- Required for this strategy WHERE ...; ``` #### Issue 2: Archival Not Triggering Despite Configuration **Scenario A**: **MINIMUM_AGE_MONTHS** strategy not archiving ```sql -- Check files that should be archived SELECT SFR.A_SOURCE_FILE_RECEIVED_KEY, SFR.SOURCE_FILE_NAME, SFR.PROCESSING_STATUS, LH.LOAD_START, TRUNC(MONTHS_BETWEEN(SYSDATE, LH.LOAD_START)) AS MONTHS_AGE, SFC.MINIMUM_AGE_MONTHS AS THRESHOLD FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' AND SFR.PROCESSING_STATUS = 'INGESTED' AND SFC.IS_ARCHIVE_ENABLED = 'Y' ORDER BY LH.LOAD_START; -- Note: MINIMUM_AGE_MONTHS archives immediately (threshold-independent) -- If files not archived, check IS_ARCHIVE_ENABLED='Y' and run ARCHIVE_TABLE_DATA ``` **Scenario B**: **THRESHOLD_BASED** or **HYBRID** strategy not archiving ```sql -- Check if threshold reached for specific configuration SELECT SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, SFC.ARCHIVE_THRESHOLD_FILES_COUNT AS FILE_THRESHOLD, SFC.ARCHIVE_THRESHOLD_ROWS_COUNT AS ROW_THRESHOLD, SFC.ARCHIVE_THRESHOLD_BYTES_SUM AS BYTE_THRESHOLD, COUNT(SFR.A_SOURCE_FILE_RECEIVED_KEY) AS CURRENT_FILES, SUM(SFR.TOTAL_RECORDS) AS CURRENT_ROWS, SUM(SFR.FILE_SIZE_BYTES) AS CURRENT_BYTES FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY AND SFR.PROCESSING_STATUS = 'INGESTED' WHERE SFC.ARCHIVAL_STRATEGY IN ('THRESHOLD_BASED', 'HYBRID') AND SFC.IS_ARCHIVE_ENABLED = 'Y' AND SFC.A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, SFC.ARCHIVE_THRESHOLD_FILES_COUNT, SFC.ARCHIVE_THRESHOLD_ROWS_COUNT, SFC.ARCHIVE_THRESHOLD_BYTES_SUM; -- Expected: At least ONE threshold (FILE/ROW/BYTE) must be exceeded -- If no threshold exceeded, archival will NOT trigger (threshold-dependent behavior) ``` #### Issue 3: ARCH_PATH Contains Directory Not Filename **Symptoms**: A_SOURCE_FILE_RECEIVED.ARCH_PATH shows folder path instead of specific file **Explanation**: This is **expected behavior**: ```sql -- Example ARCH_PATH value SELECT ARCH_PATH FROM CT_MRDS.A_SOURCE_FILE_RECEIVED WHERE PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' AND ROWNUM = 1; -- Result (example): -- https://objectstorage.../ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/ -- Reason: DBMS_CLOUD.EXPORT_DATA with parallel execution creates multiple Parquet files: -- - STANDING_FACILITIES_part_00001.parquet -- - STANDING_FACILITIES_part_00002.parquet -- - ... -- System stores directory prefix to track ALL generated files ``` **To List Actual Parquet Files**: ```sql -- Use DBMS_CLOUD.LIST_OBJECTS with ARCH_PATH as prefix SELECT object_name, bytes, created FROM TABLE(DBMS_CLOUD.LIST_OBJECTS( credential_name => 'OCI$RESOURCE_PRINCIPAL', location_uri => 'https://objectstorage.../b/archive/o/' )) WHERE object_name LIKE 'ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/%'; ``` #### Issue 4: Files Remain in TRASH Folder **Symptoms**: Files not deleted from TRASH after archival **Cause**: Configuration has `IS_KEPT_IN_TRASH='Y'` (retain files in TRASH) **Verification**: ```sql -- Check TRASH policy for configuration SELECT SOURCE_FILE_ID, TABLE_ID, IS_KEPT_IN_TRASH, CASE IS_KEPT_IN_TRASH WHEN 'Y' THEN 'Files RETAINED in TRASH (manual purge required)' WHEN 'N' THEN 'Files DELETED immediately after archival' END AS TRASH_BEHAVIOR FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'YOUR_TABLE'; ``` **Solutions**: ```sql -- Option A: Change configuration to auto-delete (permanent change) UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'N' -- Auto-delete from TRASH after archival WHERE TABLE_ID = 'YOUR_TABLE'; COMMIT; -- Option B: Manually purge TRASH for specific table (one-time action) BEGIN CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER( pSourceFileConfigKey => :yourConfigKey ); END; / -- Option C: Purge all TRASH system-wide (use with caution) BEGIN CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER( pPurgeAll => TRUE ); END; / ``` #### Issue 5: Automatic Rollback Occurred **Symptoms**: Files unexpectedly back in INGESTED status, archival process reported failure **Cause**: Error during "Move to TRASH" step triggered automatic rollback **Investigation**: ```sql -- Check process logs for rollback events SELECT PROCESS_LOG_KEY, LOG_LEVEL, LOG_MESSAGE, PARAMETERS, LOG_TIMESTAMP FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA' AND LOG_MESSAGE LIKE '%rollback%' OR LOG_MESSAGE LIKE '%MOVE_FILE_TO_TRASH_FAILURE%' ORDER BY LOG_TIMESTAMP DESC FETCH FIRST 10 ROWS ONLY; -- Check files that were rolled back SELECT A_SOURCE_FILE_RECEIVED_KEY, SOURCE_FILE_NAME, PROCESSING_STATUS, -- Should be INGESTED after rollback ARCH_PATH, -- Should be NULL after rollback PARTITION_YEAR, -- Should be NULL after rollback PARTITION_MONTH -- Should be NULL after rollback FROM CT_MRDS.A_SOURCE_FILE_RECEIVED WHERE A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey AND UPDATED_AT > SYSDATE - 1 -- Last 24 hours ORDER BY UPDATED_AT DESC; ``` **Resolution**: 1. **Investigate root cause**: Check error messages in A_PROCESS_LOG 2. **Fix underlying issue**: OCI permissions, bucket access, wrong credentials, etc. 3. **Re-run archival**: Call ARCHIVE_TABLE_DATA again after fix #### Issue 6: Archival Not Working as Expected **Symptoms**: Data not being archived according to strategy **Diagnostic Steps**: ```sql -- 1. Check configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, ARCHIVE_THRESHOLD_DAYS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'YOUR_TABLE'; -- 2. Check package version SELECT CT_MRDS.FILE_ARCHIVER.GET_VERSION() FROM DUAL; -- Expected: 3.0.0 or higher -- 3. Check process logs SELECT PROCESS_LOG_KEY, PROCESS_NAME, LOG_MESSAGE, LOG_LEVEL, LOG_TIMESTAMP FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME LIKE '%ARCHIVE%' ORDER BY LOG_TIMESTAMP DESC FETCH FIRST 20 ROWS ONLY; -- 4. Test WHERE clause generation DECLARE vConfig CT_MRDS.A_SOURCE_FILE_CONFIG%ROWTYPE; vWhereClause VARCHAR2(4000); BEGIN SELECT * INTO vConfig FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'YOUR_TABLE' AND ROWNUM = 1; -- Note: GET_ARCHIVAL_WHERE_CLAUSE is a private function. -- To test WHERE clause logic, check A_PROCESS_LOG entries from ARCHIVE_TABLE_DATA -- which logs the generated WHERE clause at INFO level. DBMS_OUTPUT.PUT_LINE('Config: ' || vConfig.ARCHIVAL_STRATEGY || ', MIN_AGE=' || vConfig.MINIMUM_AGE_MONTHS); END; / ``` #### Issue 7: Package Compilation Errors After Upgrade **Symptoms**: FILE_ARCHIVER package shows INVALID status **Solution**: ```sql -- Check compilation errors SELECT * FROM USER_ERRORS WHERE NAME = 'FILE_ARCHIVER' AND TYPE IN ('PACKAGE', 'PACKAGE BODY') ORDER BY SEQUENCE; -- Recompile package ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE SPECIFICATION; ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE BODY; -- Verify status SELECT object_name, object_type, status FROM user_objects WHERE object_name = 'FILE_ARCHIVER'; ``` ### Diagnostic Queries for Monitoring #### Query 1: Status Distribution Across All Files ```sql -- Overall file status distribution SELECT PROCESSING_STATUS, COUNT(*) AS FILE_COUNT, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS PERCENTAGE, MIN(CREATED_AT) AS OLDEST_FILE, MAX(CREATED_AT) AS NEWEST_FILE FROM CT_MRDS.A_SOURCE_FILE_RECEIVED GROUP BY PROCESSING_STATUS ORDER BY FILE_COUNT DESC; ``` #### Query 2: Files in TRASH (Archived but Not Purged) ```sql -- Files currently in TRASH folder (status ARCHIVED_AND_TRASHED) SELECT SFR.A_SOURCE_FILE_RECEIVED_KEY, SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFR.SOURCE_FILE_NAME, SFR.ARCH_PATH, SFR.PARTITION_YEAR, SFR.PARTITION_MONTH, SFR.FILE_SIZE_BYTES, SFR.UPDATED_AT AS ARCHIVED_AT, TRUNC(SYSDATE - SFR.UPDATED_AT) AS DAYS_IN_TRASH, SFC.IS_KEPT_IN_TRASH AS TRASH_POLICY FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' ORDER BY SFR.UPDATED_AT DESC; ``` #### Query 3: Archival Activity by Configuration ```sql -- Archival statistics per table configuration SELECT SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, SFC.IS_ARCHIVE_ENABLED, SFC.IS_KEPT_IN_TRASH, COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'INGESTED' THEN 1 END) AS PENDING_ARCHIVE, COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' THEN 1 END) AS IN_TRASH, COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_PURGED' THEN 1 END) AS PURGED, MAX(CASE WHEN SFR.PROCESSING_STATUS LIKE 'ARCHIVED%' THEN SFR.UPDATED_AT END) AS LAST_ARCHIVAL FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY WHERE SFC.SOURCE_FILE_TYPE = 'INPUT' GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, SFC.IS_ARCHIVE_ENABLED, SFC.IS_KEPT_IN_TRASH ORDER BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID; ``` #### Query 4: Files Eligible for Archival (MINIMUM_AGE_MONTHS) ```sql -- Files that should be archived based on MINIMUM_AGE_MONTHS strategy SELECT SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.MINIMUM_AGE_MONTHS AS AGE_THRESHOLD, COUNT(*) AS ELIGIBLE_FILES, SUM(SFR.FILE_SIZE_BYTES) AS TOTAL_SIZE_BYTES, SUM(SFR.TOTAL_RECORDS) AS TOTAL_ROWS, MIN(LH.LOAD_START) AS OLDEST_FILE, MAX(LH.LOAD_START) AS NEWEST_ELIGIBLE FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' AND SFC.IS_ARCHIVE_ENABLED = 'Y' AND SFR.PROCESSING_STATUS = 'INGESTED' AND LH.LOAD_START < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -SFC.MINIMUM_AGE_MONTHS) GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.MINIMUM_AGE_MONTHS ORDER BY ELIGIBLE_FILES DESC; ``` #### Query 5: Archival Performance Metrics ```sql -- Recent archival operations with timing SELECT PROCESS_LOG_KEY, SUBSTR(PARAMETERS, 1, 100) AS CONFIG_INFO, LOG_TIMESTAMP AS START_TIME, LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) AS END_TIME, ROUND((LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) - LOG_TIMESTAMP) * 24 * 60, 2) AS DURATION_MINUTES, CASE WHEN LOG_LEVEL = 'ERROR' THEN 'FAILED' WHEN LOG_MESSAGE LIKE '%Archival completed%' THEN 'SUCCESS' ELSE 'IN_PROGRESS' END AS STATUS FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA' AND LOG_TIMESTAMP > SYSDATE - 7 -- Last 7 days ORDER BY LOG_TIMESTAMP DESC; ``` #### Query 6: TRASH Storage Usage ```sql -- Estimate TRASH folder storage usage SELECT SFC.SOURCE_FILE_ID, COUNT(*) AS FILES_IN_TRASH, ROUND(SUM(SFR.FILE_SIZE_BYTES) / 1024 / 1024 / 1024, 2) AS SIZE_GB, MIN(SFR.UPDATED_AT) AS OLDEST_IN_TRASH, MAX(SFR.UPDATED_AT) AS NEWEST_IN_TRASH, SFC.IS_KEPT_IN_TRASH AS POLICY FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' GROUP BY SFC.SOURCE_FILE_ID, SFC.IS_KEPT_IN_TRASH ORDER BY SIZE_GB DESC; ``` ## Version History ### v3.4.0 (Current - 2026-03-17) - **MARS-1409**: Added `IS_WORKFLOW_SUCCESS_REQUIRED` flag to A_SOURCE_FILE_CONFIG - `'Y'` (default) = archivization requires `WORKFLOW_SUCCESSFUL='Y'` in A_WORKFLOW_HISTORY (standard Airflow+DBT flow) - `'N'` = archive regardless of workflow status (bypass for manual/non-DBT sources) - `IS_WORKFLOW_SUCCESS_REQUIRED` stored in A_TABLE_STAT and A_TABLE_STAT_HIST at statistics gather time - GATHER_TABLE_STAT: conditional `WORKFLOW_SUCCESSFUL='Y'` filter controlled by the flag - ARCHIVE_TABLE_DATA: conditional `WORKFLOW_SUCCESSFUL='Y'` filter controlled by the flag - Added `pIsWorkflowSuccessRequired` parameter to FILE_MANAGER.ADD_SOURCE_FILE_CONFIG - FILE_MANAGER updated to v3.6.2+ ### v3.3.0 (2026-02-11) - **BREAKING CHANGE**: Removed `pKeepInTrash` parameter from ARCHIVE_TABLE_DATA - Added `IS_ARCHIVE_ENABLED` column to A_SOURCE_FILE_CONFIG for selective archiving control - Added `IS_KEPT_IN_TRASH` column to A_SOURCE_FILE_CONFIG (replaces pKeepInTrash parameter) - Added batch procedures with 3-level granularity (config/source/all): - ARCHIVE_ALL - Batch archival procedure - GATHER_TABLE_STAT_ALL - Batch statistics procedure - RESTORE_FILE_FROM_TRASH - Restore files from TRASH folder - PURGE_TRASH_FOLDER - Purge TRASH folder files - TRASH retention now configuration-based instead of parameter-based - Enhanced flexibility for archival orchestration and monitoring ### v3.2.1 (2026-02-10) - Fixed critical bug: Status update ARCHIVED → ARCHIVED_AND_TRASHED when moving files to TRASH folder - Ensures proper status tracking for files retained in TRASH ### v3.2.0 (2026-02-06) - Added `pKeepInTrash` parameter (DEFAULT TRUE) to ARCHIVE_TABLE_DATA - TRASH folder retention control for safety and compliance - Files kept in TRASH subfolder by default for rollback capability ### v3.1.0 (2026-02-05) - **BREAKING CHANGE**: Removed CURRENT_MONTH_ONLY strategy (replaced by MINIMUM_AGE_MONTHS = 0) - Mathematical equivalence: CURRENT_MONTH_ONLY ≡ MINIMUM_AGE_MONTHS = 0 - Updated trigger validation to allow MINIMUM_AGE_MONTHS >= 0 (previously >= 1) - Simplified architecture from 4 strategies to 3 - Enhanced error handling - All 25 Release 01 tables migrated to MINIMUM_AGE_MONTHS (23 with value 0, 2 with value 6) ### v3.0.0 (MARS-828 - 2026-02-04) - Added ARCHIVAL_STRATEGY configuration column - Implemented four archival strategies (later reduced to three in v3.1.0): - THRESHOLD_BASED (backward compatible) - CURRENT_MONTH_ONLY (deprecated in v3.1.0, use MINIMUM_AGE_MONTHS = 0) - MINIMUM_AGE_MONTHS - HYBRID - Added GET_ARCHIVAL_WHERE_CLAUSE function - Created validation trigger TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configured 25 Release 01 tables with appropriate strategies ### v2.0.0 (Legacy) - Initial FILE_ARCHIVER package - THRESHOLD_BASED archival only - Fixed ARCHIVE_THRESHOLD_DAYS configuration ## Related Documentation - [FILE_MANAGER Configuration Guide](FILE_MANAGER_Configuration_Guide.md) - File processing and validation - [Package Deployment Guide](Package_Deployment_Guide.md) - Package deployment standards - [Universal Package Tracking System](Universal_Package_Tracking_System.md) - Version tracking - [MARS-828 README](../MARS_Packages/REL01_ADDITIONS/MARS-828/README.md) - Detailed implementation notes ## Dependencies ### Required Packages - **CT_MRDS.ENV_MANAGER** v3.x - Error handling, logging, version tracking - **CT_MRDS.FILE_MANAGER** v3.x - Bucket URI resolution, file processing - **MRDS_LOADER.cloud_wrapper** - DBMS_CLOUD operations wrapper ### Database Objects - **Table**: CT_MRDS.A_SOURCE_FILE_CONFIG - Configuration storage - **Table**: CT_MRDS.A_SOURCE_FILE_RECEIVED - File processing tracking - **Table**: CT_MRDS.A_WORKFLOW_HISTORY - Workflow execution tracking (Airflow + DBT) - **Trigger**: TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configuration validation - **Credential**: DEF_CRED_ARN - OCI bucket access ### OCI Buckets - **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`) - **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`) - **TRASH**: File retention subfolder in DATA bucket (`'TRASH/{SOURCE}/{TABLE_NAME}/'`) - CSV files after archival - **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`) **Note**: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket for file retention and rollback capability. ## Best Practices ### Strategy Selection Guidelines 1. **Use MINIMUM_AGE_MONTHS when**: - **MINIMUM_AGE_MONTHS = 0**: Current month only retention - Data updated frequently (daily/intraday) - Historical data access is rare - ODS bucket space is limited - Example: LM dissemination feeds - **MINIMUM_AGE_MONTHS = N (N > 0)**: Multi-month retention - Regulatory compliance requires specific retention period - Analytical workloads need N-month access - Data updates are infrequent - Example: CSDB securities data (MINIMUM_AGE_MONTHS = 6) 2. **Use THRESHOLD_BASED when**: - Simple time-based archival is sufficient 3. **Use HYBRID when**: - Complex retention requirements - Combining month boundary check with minimum age threshold - Advanced scenarios not covered by other strategies ### Configuration Best Practices 1. **Test Configuration Changes**: ```sql -- Test on single table first UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 0 -- 0 = current month only WHERE SOURCE_FILE_ID = 'TEST_FILE' AND TABLE_ID = 'TEST_TABLE'; -- Monitor archival behavior -- Expand to other tables after validation ``` 2. **Verify Before Bulk Updates**: ```sql -- Preview changes with SELECT SELECT SOURCE_FILE_ID, TABLE_ID, 'MINIMUM_AGE_MONTHS' AS NEW_STRATEGY, 0 AS NEW_MIN_AGE, -- 0 = current month only ARCHIVAL_STRATEGY AS OLD_STRATEGY, MINIMUM_AGE_MONTHS AS OLD_MIN_AGE FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID LIKE 'Distribute%'; -- Then execute UPDATE ``` 3. **Document Configuration Decisions**: - Record why specific strategy was chosen - Note business requirements driving retention policy - Track configuration changes in version control 4. **Monitor Archival Performance**: ```sql -- Check archival execution logs SELECT PROCESS_NAME, LOG_MESSAGE, LOG_TIMESTAMP FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME LIKE '%ARCHIVE%' AND LOG_TIMESTAMP > SYSDATE - 7 ORDER BY LOG_TIMESTAMP DESC; ``` 5. **Regular Configuration Reviews**: - Verify strategies still match business requirements - Check for tables without archival configuration - Optimize MINIMUM_AGE_MONTHS based on actual usage patterns ### TRASH Folder Retention Best Practices 1. **Default Behavior (IS_KEPT_IN_TRASH = 'Y' - Recommended)**: - Keeps CSV files in TRASH folder after archival - Provides safety net for rollback if archival issues occur - Supports compliance and audit requirements - Status: ARCHIVED_AND_TRASHED - Use for: Production environments, regulatory compliance, critical data - Configuration: ```sql UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'Y' WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE'; ``` 2. **TRASH Cleanup (IS_KEPT_IN_TRASH = 'N')**: - Deletes CSV files from TRASH folder after successful archival - Reduces storage costs in DATA bucket - Status: ARCHIVED_AND_PURGED - Use for: Non-critical data, storage optimization, test environments - Configuration: ```sql UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET IS_KEPT_IN_TRASH = 'N' WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE'; ``` 3. **Monitoring TRASH Folder**: ```sql -- Check files in TRASH retention SELECT SOURCE_FILE_NAME, PROCESSING_STATUS, ARCH_PATH, PARTITION_YEAR, PARTITION_MONTH FROM CT_MRDS.A_SOURCE_FILE_RECEIVED WHERE PROCESSING_STATUS IN ('ARCHIVED_AND_TRASHED', 'ARCHIVED_AND_PURGED') AND RECEPTION_DATE > SYSDATE - 30 ORDER BY PROCESSING_STATUS, RECEPTION_DATE DESC; ``` 4. **TRASH Folder Structure**: ``` DATA Bucket: ├── ODS/LM/STANDING_FACILITIES/file.csv -- Active operational data └── TRASH/LM/STANDING_FACILITIES/file.csv -- Retained after archival ARCHIVE Bucket: └── ARCHIVE/LM/STANDING_FACILITIES/ └── PARTITION_YEAR=2026/ └── PARTITION_MONTH=02/ └── *.parquet -- Archived data ```