# FILE_ARCHIVER Configuration Guide This document describes the archival strategies available in the FILE_ARCHIVER package for managing data lifecycle across OCI buckets (INBOX → ODS → ARCHIVE). ## Overview The FILE_ARCHIVER package provides flexible archival strategies that accommodate different data retention policies across source systems. It manages the movement of processed data from operational storage (ODS bucket) to long-term archival storage (ARCHIVE bucket) based on configurable strategies. ### Key Features - **Four Archival Strategies**: THRESHOLD_BASED, CURRENT_MONTH_ONLY, MINIMUM_AGE_MONTHS, HYBRID - **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG - **Backward Compatible**: Default THRESHOLD_BASED strategy maintains existing behavior - **Validation**: Automatic validation of strategy-specific configuration requirements - **OCI Integration**: Works seamlessly with DBMS_CLOUD operations via cloud_wrapper ### Package Information - **Schema**: CT_MRDS - **Package**: FILE_ARCHIVER - **Current Version**: 3.1.0 - **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_LOAD_HISTORY ## Archival Strategies ### Strategy Overview | Strategy | WHERE Clause Logic | Configuration Required | Primary Use Case | |----------|-------------------|----------------------|------------------| | `THRESHOLD_BASED` | Days since workflow start > threshold | DAYS_FOR_ARCHIVE_THRESHOLD | Legacy compatibility, simple time-based archival | | `CURRENT_MONTH_ONLY` | Archive all data from previous months | None | LM/TOP sources - keep only current month active | | `MINIMUM_AGE_MONTHS` | Archive data older than X months | MINIMUM_AGE_MONTHS | CSDB sources - retain data for specific period | | `HYBRID` | Combines CURRENT_MONTH_ONLY + MINIMUM_AGE_MONTHS | MINIMUM_AGE_MONTHS | Advanced retention scenarios | ### 1. THRESHOLD_BASED (Default) Archives data based on number of days since workflow start. **WHERE Clause**: ```sql extract(day from (systimestamp - workflow_start)) > DAYS_FOR_ARCHIVE_THRESHOLD ``` **Configuration**: ```sql UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'THRESHOLD_BASED', DAYS_FOR_ARCHIVE_THRESHOLD = 30, MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'C2D_DATA' AND TABLE_ID = 'C2D_TABLE'; ``` **Use Case**: Simple time-based archival, backward compatible with FILE_ARCHIVER v2.0.0 behavior. ### 2. CURRENT_MONTH_ONLY Archives all data from previous months, keeping only current month data in ODS bucket. **WHERE Clause**: ```sql TRUNC(workflow_start, 'MM') < TRUNC(SYSDATE, 'MM') ``` **Configuration**: ```sql -- LM/TOP sources: Archive everything except current month UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'CURRENT_MONTH_ONLY', MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'DistributeStandingFacilities' AND TABLE_ID = 'LM_STANDING_FACILITIES'; ``` **Use Case**: - LM dissemination feeds (daily/intraday updates) - TOP operational data requiring current month access - Sources where historical data is rarely accessed **Behavior**: - January data: Archived on February 1st - February data: Remains in ODS bucket during February - March 1st: February data archived, March data active ### 3. MINIMUM_AGE_MONTHS Archives only data older than specified number of months, allowing multi-month retention in ODS bucket. **WHERE Clause**: ```sql workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS) ``` **Configuration**: ```sql -- CSDB: Archive only data older than 6 months UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID IN ('CSDB_DEBT', 'CSDB_DEBT_DAILY'); ``` **Use Case**: - CSDB securities/ratings data requiring 6-month retention - Regulatory compliance with specific retention periods - Analytical workloads needing multi-month historical access **Behavior** (with MINIMUM_AGE_MONTHS = 6): - February 2026: Archives data from July 2025 and earlier - March 2026: Archives data from August 2025 and earlier - Keeps 6 months of data always available in ODS bucket ### 4. HYBRID Combines CURRENT_MONTH_ONLY and MINIMUM_AGE_MONTHS logic - archives data from previous months AND older than minimum age. **WHERE Clause**: ```sql TRUNC(workflow_start, 'MM') < TRUNC(SYSDATE, 'MM') AND workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS) ``` **Configuration**: ```sql -- Advanced: Current month + 3 months minimum UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'HYBRID', MINIMUM_AGE_MONTHS = 3 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'SPECIAL_SOURCE' AND TABLE_ID = 'SPECIAL_TABLE'; ``` **Use Case**: Advanced scenarios requiring both current month retention AND minimum age threshold. ## Configuration Validation ### Validation Trigger **Trigger**: `TRG_BI_A_SRC_FILE_CFG_ARCH_VAL` Automatically validates archival configuration on INSERT/UPDATE to A_SOURCE_FILE_CONFIG: **Validation Rules**: 1. **CURRENT_MONTH_ONLY**: Requires `MINIMUM_AGE_MONTHS IS NULL` - Error: "Strategy CURRENT_MONTH_ONLY requires MINIMUM_AGE_MONTHS to be NULL" 2. **MINIMUM_AGE_MONTHS**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL` - Error: "Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set" 3. **HYBRID**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL` - Error: "Strategy HYBRID requires MINIMUM_AGE_MONTHS to be set" **Example Validation Error**: ```sql -- This will fail validation UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = NULL -- ERROR: Required for this strategy WHERE ...; -- Error: ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set ``` ## Data Lifecycle Workflow ### Standard File Processing Flow ``` ┌─────────────────────────────────────────────────────────────┐ │ FILE PROCESSING LIFECYCLE │ └─────────────────────────────────────────────────────────────┘ 1. INBOX Bucket (Validation) ├─ File arrives from source system ├─ FILE_MANAGER.PROCESS_SOURCE_FILE validates structure ├─ Status: RECEIVED → VALIDATED → READY_FOR_INGESTION └─ FILE_MANAGER.MOVE_FILE relocates to ODS bucket 2. ODS Bucket (Operational Data) ├─ Active data processing (Airflow + DBT) ├─ External tables read data from bucket ├─ Status: INGESTED └─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy 3. ARCHIVE Bucket (Long-term Storage) ├─ Historical data in Parquet format ├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH= ├─ Status: ARCHIVED └─ Optimized for big data analytics (Spark, Hive) ``` ### Archival Process The FILE_ARCHIVER package automatically manages data movement from ODS to ARCHIVE: **Key Procedures**: - `ARCHIVE_TABLE_DATA` - Main archival procedure using strategy-specific WHERE clause - `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy - `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic **Archival Execution**: ```sql -- Triggered by FILE_MANAGER or scheduled job BEGIN CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA( pSourceFileConfig => vSourceFileConfigRecord ); END; / ``` **Strategy-Based Filtering**: - Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG - GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause - Data matching criteria moved from ODS to ARCHIVE bucket - Parquet format with Hive-style partitioning applied ## Configuration Examples ### Example 1: Configure LM Standing Facilities (CURRENT_MONTH_ONLY) ```sql -- Keep only current month data in ODS bucket UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'CURRENT_MONTH_ONLY', MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'DistributeStandingFacilities' AND TABLE_ID = 'LM_STANDING_FACILITIES'; COMMIT; -- Verify configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID = 'DistributeStandingFacilities'; ``` ### Example 2: Configure CSDB Debt (MINIMUM_AGE_MONTHS) ```sql -- Retain 6 months of data in ODS bucket UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID = 'CSDB' AND TABLE_ID = 'CSDB_DEBT'; COMMIT; -- Verify configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'CSDB_DEBT'; ``` ### Example 3: Bulk Configuration for LM Source ```sql -- Configure all 19 LM tables with CURRENT_MONTH_ONLY strategy UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'CURRENT_MONTH_ONLY', MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_TYPE = 'INPUT' AND SOURCE_FILE_ID IN ( 'DistributeStandingFacilities', 'DistributeTTS', 'DistributeAdHocAdjustments', 'DistributeBalanceSheet', 'DistributeCSMAdjustments', 'DistributeCurrentAccounts', 'DistributeForecast', 'DistributeQREAdjustments' ); COMMIT; -- Verify bulk configuration SELECT SOURCE_FILE_ID, COUNT(*) AS TABLE_COUNT, MAX(ARCHIVAL_STRATEGY) AS STRATEGY, MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID LIKE 'Distribute%' GROUP BY SOURCE_FILE_ID ORDER BY SOURCE_FILE_ID; ``` ### Example 4: View Current Archival Configuration ```sql -- All configured tables with their archival strategies SELECT A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, DAYS_FOR_ARCHIVE_THRESHOLD FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' ORDER BY A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID; -- Summary by strategy SELECT ARCHIVAL_STRATEGY, COUNT(*) AS TABLE_COUNT, MIN(MINIMUM_AGE_MONTHS) AS MIN_AGE_MIN, MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE_MAX FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' GROUP BY ARCHIVAL_STRATEGY ORDER BY ARCHIVAL_STRATEGY; ``` ## Release 01 Configuration ### Configured Tables (MARS-828) The following 25 Release 01 tables were configured with archival strategies: **LM Tables (19 total) - CURRENT_MONTH_ONLY**: - LM_STANDING_FACILITIES - LM_STANDING_FACILITIES_HEADER - LM_TTS_HEADER - LM_TTS_ITEM - LM_ADHOC_ADJUSTMENTS_HEADER - LM_ADHOC_ADJUSTMENTS_ITEM - LM_ADHOC_ADJUSTMENTS_ITEM_HEADER - LM_BALANCESHEET_HEADER - LM_BALANCESHEET_ITEM - LM_CSM_ADJUSTMENTS_HEADER - LM_CSM_ADJUSTMENTS_ITEM - LM_CSM_ADJUSTMENTS_ITEM_HEADER - LM_CURRENT_ACCOUNTS_HEADER - LM_CURRENT_ACCOUNTS_ITEM - LM_FORECAST_HEADER - LM_FORECAST_ITEM - LM_QRE_ADJUSTMENTS_HEADER - LM_QRE_ADJUSTMENTS_ITEM - LM_QRE_ADJUSTMENTS_ITEM_HEADER **CSDB Tables (6 total)**: *MINIMUM_AGE_MONTHS = 6*: - CSDB_DEBT - CSDB_DEBT_DAILY *CURRENT_MONTH_ONLY*: - CSDB_INSTR_RAT_FULL - CSDB_INSTR_DESC_FULL - CSDB_ISSUER_RAT_FULL - CSDB_ISSUER_DESC_FULL **Verification Query**: ```sql -- Check Release 01 configuration SELECT CASE WHEN TABLE_ID LIKE 'LM_%' THEN 'LM' WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB' END AS SOURCE_GROUP, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, COUNT(*) AS TABLE_COUNT FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID IN ( -- 25 Release 01 tables 'LM_STANDING_FACILITIES', 'LM_STANDING_FACILITIES_HEADER', 'LM_TTS_HEADER', 'LM_TTS_ITEM', -- ... other tables ) GROUP BY CASE WHEN TABLE_ID LIKE 'LM_%' THEN 'LM' WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB' END, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS ORDER BY SOURCE_GROUP, ARCHIVAL_STRATEGY; ``` ## Troubleshooting ### Common Issues #### Issue 1: Validation Error on Configuration Update **Error**: ``` ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set ``` **Cause**: Trigger validation failed - strategy requires MINIMUM_AGE_MONTHS but value is NULL **Solution**: ```sql -- Provide required MINIMUM_AGE_MONTHS value UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS', MINIMUM_AGE_MONTHS = 6 -- Required for this strategy WHERE ...; ``` #### Issue 2: Validation Error on CURRENT_MONTH_ONLY **Error**: ``` ORA-20001: Strategy CURRENT_MONTH_ONLY requires MINIMUM_AGE_MONTHS to be NULL ``` **Cause**: MINIMUM_AGE_MONTHS has value but strategy doesn't use it **Solution**: ```sql -- Set MINIMUM_AGE_MONTHS to NULL for CURRENT_MONTH_ONLY UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'CURRENT_MONTH_ONLY', MINIMUM_AGE_MONTHS = NULL -- Required to be NULL WHERE ...; ``` #### Issue 3: Archival Not Working as Expected **Symptoms**: Data not being archived according to strategy **Diagnostic Steps**: ```sql -- 1. Check configuration SELECT SOURCE_FILE_ID, TABLE_ID, ARCHIVAL_STRATEGY, MINIMUM_AGE_MONTHS, DAYS_FOR_ARCHIVE_THRESHOLD FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'YOUR_TABLE'; -- 2. Check package version SELECT CT_MRDS.FILE_ARCHIVER.GET_VERSION() FROM DUAL; -- Expected: 3.0.0 or higher -- 3. Check process logs SELECT PROCESS_LOG_KEY, PROCESS_NAME, LOG_MESSAGE, LOG_LEVEL, LOG_TIMESTAMP FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME LIKE '%ARCHIVE%' ORDER BY LOG_TIMESTAMP DESC FETCH FIRST 20 ROWS ONLY; -- 4. Test WHERE clause generation DECLARE vConfig CT_MRDS.A_SOURCE_FILE_CONFIG%ROWTYPE; vWhereClause VARCHAR2(4000); BEGIN SELECT * INTO vConfig FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE TABLE_ID = 'YOUR_TABLE' AND ROWNUM = 1; vWhereClause := CT_MRDS.FILE_ARCHIVER.GET_ARCHIVAL_WHERE_CLAUSE(vConfig); DBMS_OUTPUT.PUT_LINE('WHERE Clause: ' || vWhereClause); END; / ``` #### Issue 4: Package Compilation Errors After Upgrade **Symptoms**: FILE_ARCHIVER package shows INVALID status **Solution**: ```sql -- Check compilation errors SELECT * FROM USER_ERRORS WHERE NAME = 'FILE_ARCHIVER' AND TYPE IN ('PACKAGE', 'PACKAGE BODY') ORDER BY SEQUENCE; -- Recompile package ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE SPECIFICATION; ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE BODY; -- Verify status SELECT object_name, object_type, status FROM user_objects WHERE object_name = 'FILE_ARCHIVER'; ``` ## Version History ### v3.1.0 (Current) - Enhanced archival strategies - Improved error handling - Additional validation features ### v3.0.0 (MARS-828) - Added ARCHIVAL_STRATEGY configuration column - Implemented four archival strategies: - THRESHOLD_BASED (backward compatible) - CURRENT_MONTH_ONLY - MINIMUM_AGE_MONTHS - HYBRID - Added GET_ARCHIVAL_WHERE_CLAUSE function - Created validation trigger TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configured 25 Release 01 tables with appropriate strategies ### v2.0.0 (Legacy) - Initial FILE_ARCHIVER package - THRESHOLD_BASED archival only - Fixed DAYS_FOR_ARCHIVE_THRESHOLD configuration ## Related Documentation - [FILE_MANAGER Configuration Guide](FILE_MANAGER_Configuration_Guide.md) - File processing and validation - [Package Deployment Guide](Package_Deployment_Guide.md) - Package deployment standards - [Universal Package Tracking System](Universal_Package_Tracking_System.md) - Version tracking - [MARS-828 README](../MARS_Packages/REL01_ADDITIONS/MARS-828/README.md) - Detailed implementation notes ## Dependencies ### Required Packages - **CT_MRDS.ENV_MANAGER** v3.x - Error handling, logging, version tracking - **CT_MRDS.FILE_MANAGER** v3.x - Bucket URI resolution, file processing - **MRDS_LOADER.cloud_wrapper** - DBMS_CLOUD operations wrapper ### Database Objects - **Table**: CT_MRDS.A_SOURCE_FILE_CONFIG - Configuration storage - **Table**: CT_ODS.A_LOAD_HISTORY - Workflow tracking - **Trigger**: TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configuration validation - **Credential**: DEF_CRED_ARN - OCI bucket access ### OCI Buckets - **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`) - **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`) - **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`) ## Best Practices ### Strategy Selection Guidelines 1. **Use CURRENT_MONTH_ONLY when**: - Data is updated frequently (daily/intraday) - Historical data access is rare - ODS bucket space is limited - Example: LM dissemination feeds 2. **Use MINIMUM_AGE_MONTHS when**: - Regulatory compliance requires specific retention period - Analytical workloads need multi-month access - Data updates are infrequent - Example: CSDB securities data (6-month retention) 3. **Use THRESHOLD_BASED when**: - Maintaining backward compatibility with legacy behavior - Simple time-based archival is sufficient - Migration from FILE_ARCHIVER v2.0.0 4. **Use HYBRID when**: - Complex retention requirements - Combining current month access with minimum age threshold - Advanced scenarios not covered by other strategies ### Configuration Best Practices 1. **Test Configuration Changes**: ```sql -- Test on single table first UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG SET ARCHIVAL_STRATEGY = 'CURRENT_MONTH_ONLY', MINIMUM_AGE_MONTHS = NULL WHERE SOURCE_FILE_ID = 'TEST_FILE' AND TABLE_ID = 'TEST_TABLE'; -- Monitor archival behavior -- Expand to other tables after validation ``` 2. **Verify Before Bulk Updates**: ```sql -- Preview changes with SELECT SELECT SOURCE_FILE_ID, TABLE_ID, 'CURRENT_MONTH_ONLY' AS NEW_STRATEGY, NULL AS NEW_MIN_AGE, ARCHIVAL_STRATEGY AS OLD_STRATEGY, MINIMUM_AGE_MONTHS AS OLD_MIN_AGE FROM CT_MRDS.A_SOURCE_FILE_CONFIG WHERE SOURCE_FILE_ID LIKE 'Distribute%'; -- Then execute UPDATE ``` 3. **Document Configuration Decisions**: - Record why specific strategy was chosen - Note business requirements driving retention policy - Track configuration changes in version control 4. **Monitor Archival Performance**: ```sql -- Check archival execution logs SELECT PROCESS_NAME, LOG_MESSAGE, LOG_TIMESTAMP FROM CT_MRDS.A_PROCESS_LOG WHERE PROCESS_NAME LIKE '%ARCHIVE%' AND LOG_TIMESTAMP > SYSDATE - 7 ORDER BY LOG_TIMESTAMP DESC; ``` 5. **Regular Configuration Reviews**: - Verify strategies still match business requirements - Check for tables without archival configuration - Optimize MINIMUM_AGE_MONTHS based on actual usage patterns ## Author Created by: Grzegorz Michalski Date: 2026-02-04 Schema: CT_MRDS Package: FILE_ARCHIVER Version: 3.1.0