Files
mars/confluence/FILE_ARCHIVER_Guide.md
Grzegorz Michalski ce9b6eeff6 feat(FILE_MANAGER): Update package version to 3.6.3 and enhance ADD_SOURCE_FILE_CONFIG with new parameters for archival control
- Bump package version to 3.6.3 and update build date.
- Add new parameters: pIsArchiveEnabled, pIsKeepInTrash, pArchivalStrategy, pMinimumAgeMonths to ADD_SOURCE_FILE_CONFIG.
- Include pIsWorkflowSuccessRequired parameter to control workflow success requirement for archival.
- Update version history to reflect changes.

feat(A_SOURCE_FILE_CONFIG): Modify table structure to include new archival control flags

- Add IS_WORKFLOW_SUCCESS_REQUIRED column to A_SOURCE_FILE_CONFIG for workflow bypass functionality.
- Update constraints and comments for new columns.
- Ensure backward compatibility with default values.

fix(A_TABLE_STAT, A_TABLE_STAT_HIST): Extend table structures to accommodate new workflow success tracking

- Add IS_WORKFLOW_SUCCESS_REQUIRED column to both A_TABLE_STAT and A_TABLE_STAT_HIST.
- Update comments to clarify the purpose of new columns.

docs(FILE_ARCHIVER_Guide): Revise documentation to reflect new archival features and configurations

- Document new IS_WORKFLOW_SUCCESS_REQUIRED flag and its implications for archival processes.
- Update examples and configurations to align with recent changes in the database schema.
- Ensure clarity on archival strategies and their configurations.
2026-03-18 18:19:04 +01:00

50 KiB

FILE_ARCHIVER Configuration Guide

This document describes the archival strategies available in the FILE_ARCHIVER package for managing data lifecycle across OCI buckets (INBOX → ODS → ARCHIVE).

Overview

The FILE_ARCHIVER package provides flexible archival strategies that accommodate different data retention policies across source systems. It manages the movement of processed data from operational storage (ODS bucket) to long-term archival storage (ARCHIVE bucket) based on configurable strategies.

Key Features

  • Three Archival Strategies: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID
  • Flexible Configuration: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG
  • Workflow Bypass: IS_WORKFLOW_SUCCESS_REQUIRED flag allows archivization of files from non-DBT sources
  • Validation: Automatic validation of strategy-specific configuration requirements

Package Information

  • Schema: CT_MRDS
  • Package: FILE_ARCHIVER
  • Current Version: 3.4.0
  • Dependencies: ENV_MANAGER, FILE_MANAGER, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY

Critical Prerequisites

⚠️ IMPORTANT: FILE_ARCHIVER requires data to be registered in CT_MRDS.A_SOURCE_FILE_RECEIVED table.

For new system data (Airflow + DBT):

  • A_SOURCE_FILE_RECEIVED records are automatically created by FILE_MANAGER.PROCESS_SOURCE_FILE during file validation
  • No additional configuration needed - standard workflow handles registration

For legacy data migrated from Informatica + WLA system:

  • Use DATA_EXPORTER with pRegisterExport => TRUE parameter to automatically register exported files in A_SOURCE_FILE_RECEIVED
  • This enables FILE_ARCHIVER to process legacy data exports without manual registration
  • Available in both EXPORT_TABLE_DATA (single CSV) and EXPORT_TABLE_DATA_TO_CSV_BY_DATE (partitioned CSV exports)

Example - Legacy Data Export with Registration:

-- Export legacy data to DATA bucket WITH automatic registration
BEGIN
    CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
        pSchemaName     => 'OU_TOP',
        pTableName      => 'AGGREGATED_ALLOTMENT',
        pKeyColumnName  => 'A_ETL_LOAD_SET_KEY_FK',
        pBucketArea     => 'DATA',
        pFolderName     => 'legacy_export',
        pMinDate        => DATE '2024-01-01',
        pMaxDate        => DATE '2024-12-31',
        pRegisterExport => TRUE,  -- ✓ Registers files in A_SOURCE_FILE_RECEIVED
        pProcessName    => 'LEGACY_MIGRATION'
    );
END;
/

-- Now FILE_ARCHIVER can process these files
BEGIN
    CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
        pSourceFileConfigKey => vConfigKey
    );
END;
/

Alternative approach: Export directly to ARCHIVE bucket using DATA_EXPORTER.EXPORT_TABLE_DATA_BY_DATE with pBucketArea => 'ARCHIVE' to bypass archival step entirely

Archival Strategies

Strategy Overview

Strategy WHERE Clause Logic Configuration Required Primary Use Case
THRESHOLD_BASED Days since workflow start > threshold ARCHIVE_THRESHOLD_DAYS Simple time-based archival
MINIMUM_AGE_MONTHS Archive data older than X months (0=current month only) MINIMUM_AGE_MONTHS (≥0) All sources - flexible retention (0 for LM, 6 for CSDB)
HYBRID Combines month boundary + minimum age MINIMUM_AGE_MONTHS Advanced retention scenarios

1. THRESHOLD_BASED (Default)

Archives data based on number of days since workflow start.

WHERE Clause:

extract(day from (systimestamp - workflow_start)) > ARCHIVE_THRESHOLD_DAYS

Configuration:

UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'THRESHOLD_BASED',
    ARCHIVE_THRESHOLD_DAYS = 30,
    MINIMUM_AGE_MONTHS = NULL
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'C2D_DATA'
  AND TABLE_ID = 'C2D_TABLE';

Use Case: Simple time-based archival.

2. MINIMUM_AGE_MONTHS

Archives data older than specified number of months. Special case: MINIMUM_AGE_MONTHS = 0 archives all data before current month.

WHERE Clause:

workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS)
-- When MINIMUM_AGE_MONTHS = 0: workflow_start < TRUNC(SYSDATE, 'MM')

Configuration Examples:

-- LM: Keep only current month data (MINIMUM_AGE_MONTHS = 0)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 0
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'DistributeStandingFacilities'
  AND TABLE_ID = 'LM_STANDING_FACILITIES';

-- CSDB: Retain 6 months of data (MINIMUM_AGE_MONTHS = 6)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 6
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'CSDB'
  AND TABLE_ID IN ('CSDB_DEBT', 'CSDB_DEBT_DAILY');

Use Cases:

  • MINIMUM_AGE_MONTHS = 0: LM dissemination feeds requiring current month only (daily/intraday updates)
  • MINIMUM_AGE_MONTHS = 6: CSDB securities/ratings data requiring 6-month retention
  • MINIMUM_AGE_MONTHS = N: Regulatory compliance with specific N-month retention periods

Behavior Examples:

  • With MINIMUM_AGE_MONTHS = 0:

    • January data: Archived on February 1st
    • February data: Remains in ODS bucket during February
    • March 1st: February data archived, March data active
  • With MINIMUM_AGE_MONTHS = 6:

    • February 2026: Archives data from July 2025 and earlier
    • March 2026: Archives data from August 2025 and earlier
    • Keeps current month + 6 previous months (7 months total) in ODS bucket

3. HYBRID

Combines month boundary check with minimum age threshold - archives data from previous months AND older than minimum age.

WHERE Clause:

TRUNC(workflow_start, 'MM') < TRUNC(SYSDATE, 'MM')
AND workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS)

Configuration:

-- Advanced: Current month + 3 months minimum
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'HYBRID',
    MINIMUM_AGE_MONTHS = 3
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'SPECIAL_SOURCE'
  AND TABLE_ID = 'SPECIAL_TABLE';

Use Case: Advanced scenarios requiring both current month retention AND minimum age threshold.

Archival Triggering Logic

Strategy-Specific Execution Behavior

The FILE_ARCHIVER package uses different triggering logic depending on the configured archival strategy:

MINIMUM_AGE_MONTHS Strategy (Threshold-Independent)

Behavior: Archives data immediately when age criteria is met, without checking archival thresholds.

-- Executed when MINIMUM_AGE_MONTHS strategy is configured
IF vSourceFileConfig.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS' THEN
   vArchivalTriggeredBy := 'AGE_BASED';
   -- Proceeds with archival regardless of FILES_COUNT, ROWS_COUNT, or BYTES_SUM
END IF;

Why: This strategy is designed for strict retention policies where data must be archived based on age alone (e.g., regulatory compliance requiring current month only).

THRESHOLD_BASED and HYBRID Strategies (Threshold-Dependent)

Behavior: Archives data only when at least one of the following thresholds is exceeded:

  1. ARCHIVE_THRESHOLD_FILES_COUNT - Number of files eligible for archival
  2. ARCHIVE_THRESHOLD_ROWS_COUNT - Number of rows eligible for archival
  3. ARCHIVE_THRESHOLD_BYTES_SUM - Total size in bytes eligible for archival
-- Executed for THRESHOLD_BASED and HYBRID strategies
IF vTableStat.OVER_ARCH_THRESOLD_FILE_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_FILES_COUNT THEN
   vArchivalTriggeredBy := 'FILES_COUNT';
ELSIF vTableStat.OVER_ARCH_THRESOLD_ROW_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_ROWS_COUNT THEN
   vArchivalTriggeredBy := 'ROWS_COUNT';
ELSIF vTableStat.OVER_ARCH_THRESOLD_SIZE >= vSourceFileConfig.ARCHIVE_THRESHOLD_BYTES_SUM THEN
   vArchivalTriggeredBy := 'BYTES_SUM';
END IF;

Why: These strategies provide performance optimization by avoiding unnecessary archival operations when data volume is small.

Configuration Example:

-- Set archival thresholds for THRESHOLD_BASED strategy
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVE_THRESHOLD_FILES_COUNT = 10,    -- Archive when 10+ files eligible
    ARCHIVE_THRESHOLD_ROWS_COUNT = 100000,  -- Archive when 100k+ rows eligible
    ARCHIVE_THRESHOLD_BYTES_SUM = 104857600 -- Archive when 100MB+ eligible
WHERE ARCHIVAL_STRATEGY = 'THRESHOLD_BASED'
  AND TABLE_ID = 'YOUR_TABLE';

Important: For MINIMUM_AGE_MONTHS strategy, these threshold values are ignored - archival proceeds based on age alone.

Configuration Validation

Validation Trigger

Trigger: TRG_BI_A_SRC_FILE_CFG_ARCH_VAL

Automatically validates archival configuration on INSERT/UPDATE to A_SOURCE_FILE_CONFIG:

Validation Rules:

  1. MINIMUM_AGE_MONTHS: Requires MINIMUM_AGE_MONTHS IS NOT NULL AND MINIMUM_AGE_MONTHS >= 0

    • Error: "Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set (≥0)"
  2. HYBRID: Requires MINIMUM_AGE_MONTHS IS NOT NULL

    • Error: "Strategy HYBRID requires MINIMUM_AGE_MONTHS to be set"

Example Validation Error:

-- This will fail validation
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = NULL  -- ERROR: Required for this strategy
WHERE ...;

-- Error: ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set

Archival Control Configuration

IS_ARCHIVE_ENABLED Column

Controls whether archival is enabled for specific table configuration.

Column: A_SOURCE_FILE_CONFIG.IS_ARCHIVE_ENABLED (CHAR(1), DEFAULT 'N' NOT NULL)

Values:

  • 'Y' - Table is eligible for archival processing
  • 'N' (default) - Table is excluded from archival (batch operations skip this config)

Use Cases:

  • Disable archival for specific tables without removing configuration
  • Temporarily suspend archival during data migration or troubleshooting
  • Selective archival in batch operations

Configuration Example:

-- Disable archival for specific table
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_ARCHIVE_ENABLED = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'CSDB'
  AND TABLE_ID = 'CSDB_DEBT';
COMMIT;

-- Re-enable archival
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_ARCHIVE_ENABLED = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'CSDB'
  AND TABLE_ID = 'CSDB_DEBT';
COMMIT;

-- Check archival status
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    IS_ARCHIVE_ENABLED,
    ARCHIVAL_STRATEGY
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY SOURCE_FILE_ID, TABLE_ID;

IS_KEEP_IN_TRASH Column

Controls TRASH folder retention policy for archived files.

Column: A_SOURCE_FILE_CONFIG.IS_KEEP_IN_TRASH (CHAR(1), DEFAULT 'N' NOT NULL)

Values:

  • 'Y' - CSV files kept in TRASH folder after archival (status: ARCHIVED_AND_TRASHED)
  • 'N' (default) - CSV files deleted from TRASH folder after archival (status: ARCHIVED_AND_PURGED)

Benefits of TRASH Retention (TRUE):

  • Safety net for rollback if archival issues discovered
  • Supports compliance and audit requirements
  • Enables file restoration via RESTORE_FILE_FROM_TRASH procedure

Benefits of TRASH Cleanup (FALSE):

  • Reduces storage costs in DATA bucket
  • Simplifies bucket management
  • Appropriate for non-critical or test data

Configuration Example:

-- Production: Keep files in TRASH (recommended)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_KEEP_IN_TRASH = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'LM'
  AND TABLE_ID LIKE 'LM_%';
COMMIT;

-- Test environment: Cleanup TRASH to save storage
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_KEEP_IN_TRASH = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'TEST_SOURCE';
COMMIT;

-- Bulk configuration by source
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_KEEP_IN_TRASH = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID IN ('CSDB', 'C2D', 'LM');
COMMIT;

IS_WORKFLOW_SUCCESS_REQUIRED Column

Controls whether archivization requires WORKFLOW_SUCCESSFUL='Y' in A_WORKFLOW_HISTORY. Added in MARS-1409.

Column: A_SOURCE_FILE_CONFIG.IS_WORKFLOW_SUCCESS_REQUIRED (CHAR(1), DEFAULT 'Y' NOT NULL)

Values:

  • 'Y' (default) - Only files with WORKFLOW_SUCCESSFUL='Y' are eligible for archivization (standard Airflow+DBT flow)
  • 'N' - Archivization proceeds regardless of workflow completion status (bypass for manual/non-DBT sources)

Use Cases:

  • 'Y': All standard INBOX-validated sources (LM, CSDB, C2D) - ensures only fully-processed files are archived
  • 'N': Legacy data migrated via DATA_EXPORTER, manual uploads, or any source without DBT workflow tracking

GATHER_TABLE_STAT behavior:

  • 'Y': Statistics (file count, row count, byte sum) counted only from files with WORKFLOW_SUCCESSFUL='Y'
  • 'N': Statistics counted from all INGESTED files regardless of workflow outcome

Configuration Example:

-- Standard source: require DBT workflow completion (default)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_WORKFLOW_SUCCESS_REQUIRED = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'LM';
COMMIT;

-- Non-DBT source: bypass workflow check
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_WORKFLOW_SUCCESS_REQUIRED = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'MANUAL_UPLOAD';
COMMIT;

-- Or set at configuration time via ADD_SOURCE_FILE_CONFIG
CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE_FILE_CONFIG(
    pSourceKey                 => 'MANUAL',
    pSourceFileType            => 'INPUT',
    pSourceFileId              => 'MANUAL_UPLOAD',
    pSourceFileDesc            => 'Manual data upload without DBT',
    pSourceFileNamePattern     => 'manual_*.csv',
    pTableId                   => 'MY_TABLE',
    pTemplateTableName         => 'CT_ET_TEMPLATES.MY_TABLE',
    pIsWorkflowSuccessRequired => 'N'   -- bypass workflow check
);

Status Tracking in A_SOURCE_FILE_RECEIVED

The FILE_ARCHIVER tracks file lifecycle through the PROCESSING_STATUS column in CT_MRDS.A_SOURCE_FILE_RECEIVED table:

Status Progression:

INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional)
                ↓
            INGESTED (via RESTORE_FILE_FROM_TRASH)

Status Descriptions:

  • INGESTED: File successfully processed through Airflow+DBT, residing in ODS bucket
  • ARCHIVED_AND_TRASHED: File archived to Parquet in ARCHIVE bucket, CSV retained in TRASH folder (DATA bucket)
  • ARCHIVED_AND_PURGED: File archived to Parquet, CSV deleted from TRASH folder (when IS_KEEP_IN_TRASH='N')

Associated Columns Updated During Archival:

UPDATE CT_MRDS.A_SOURCE_FILE_RECEIVED
   SET PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED',    -- Status change
       ARCH_PATH = 'archive_directory_prefix/',       -- Directory with Parquet files
       PARTITION_YEAR = 2026,                         -- Year partition value
       PARTITION_MONTH = 02                           -- Month partition value
 WHERE SOURCE_FILE_NAME = 'file.csv';

ARCH_PATH Column: Contains the directory prefix (URI) where archived Parquet files are located in the ARCHIVE bucket. Since DBMS_CLOUD.EXPORT_DATA may create multiple Parquet files with parallel execution, the system stores the directory location rather than individual filenames.

Example ARCH_PATH:

https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/

Standard File Processing Flow

┌─────────────────────────────────────────────────────────────┐
│                   FILE PROCESSING LIFECYCLE                  │
└─────────────────────────────────────────────────────────────┘

1. INBOX Bucket (Validation)
   ├─ File arrives from source system
   ├─ FILE_MANAGER.PROCESS_SOURCE_FILE validates structure
   ├─ Status: RECEIVED → VALIDATED → READY_FOR_INGESTION
   └─ FILE_MANAGER.MOVE_FILE relocates to ODS bucket

2. ODS Bucket (Operational Data)
   ├─ Active data processing (Airflow + DBT)
   ├─ External tables read data from bucket
   ├─ Status: INGESTED
   ├─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
   └─ CSV files moved to TRASH subfolder (ODS → TRASH/)

2.1 TRASH Subfolder (DATA Bucket - File Retention)
   ├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME)
   ├─ Stores CSV files after archival to Parquet
   ├─ Status: ARCHIVED_AND_TRASHED (default, controlled by IS_KEEP_IN_TRASH config)
   ├─ Enables rollback if archival issues occur
   └─ Optional cleanup: ARCHIVED_AND_PURGED (when IS_KEEP_IN_TRASH = 'N')

3. ARCHIVE Bucket (Long-term Storage)
   ├─ Historical data in Parquet format
   ├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH=
   ├─ Status: ARCHIVED_AND_TRASHED or ARCHIVED_AND_PURGED
   └─ Optimized for big data analytics (Spark, Hive)

**Key Procedures**:
- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey)` - Main archival procedure using strategy-specific WHERE clause
  - TRASH folder retention controlled by `IS_KEEP_IN_TRASH` column in A_SOURCE_FILE_CONFIG
- `ARCHIVE_ALL(pSourceFileConfigKey, pSourceKey, pArchiveAll)` - Batch archival with 3-level granularity and error handling
  - **Level 3 (Highest Priority)**: Single configuration via `pSourceFileConfigKey`
  - **Level 2 (Medium Priority)**: All configurations for source via `pSourceKey`
  - **Level 1 (Lowest Priority)**: All configurations system-wide via `pArchiveAll`
  - **Error Handling**: Continues processing other tables on individual failures
  - **Filtering**: Respects `IS_ARCHIVE_ENABLED='Y'` (skips disabled configurations)
  - **Individual TRASH Policy**: Each table's `IS_KEEP_IN_TRASH` setting applied independently
  - **Summary Reporting**: Returns counts of Archived/Skipped/Failed tables
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
- `GATHER_TABLE_STAT_ALL(pSourceFileConfigKey, pSourceKey, pGatherAll)` - Batch statistics with 3-level granularity
- `RESTORE_FILE_FROM_TRASH(pSourceFileConfigKey, pSourceKey, pRestoreAll)` - Restore archived files from TRASH
- `PURGE_TRASH_FOLDER(pSourceFileConfigKey, pSourceKey, pPurgeAll)` - Purge TRASH folder with 3-level granularity

**Archival Execution**:
```sql
-- Single table archival (TRASH retention controlled by IS_KEEP_IN_TRASH config)
BEGIN
    CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
        pSourceFileConfigKey => vSourceFileConfigKey
    );
END;
/

-- Batch archival: All tables for specific source
BEGIN
    CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL(
        pSourceFileConfigKey => NULL,
        pSourceKey => 'LM',  -- Archive all LM tables
        pArchiveAll => FALSE
    );
END;
/

-- Batch archival: All tables system-wide
BEGIN
    CT_MRDS.FILE_ARCHIVER.ARCHIVE_ALL(
        pSourceFileConfigKey => NULL,
        pSourceKey => NULL,
        pArchiveAll => TRUE  -- Archive all configured tables
    );
END;
/

Strategy-Based Filtering:

  • Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
  • GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
  • Only tables with IS_ARCHIVE_ENABLED = 'Y' are processed
  • Data matching criteria moved from ODS to ARCHIVE bucket
  • CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/)
  • Parquet format with Hive-style partitioning applied to ARCHIVE bucket
  • TRASH retention controlled by IS_KEEP_IN_TRASH column in A_SOURCE_FILE_CONFIG

Automatic Rollback Mechanism

FILE_ARCHIVER implements automatic rollback to ensure data integrity if archival process fails:

Process Flow:

  1. Export to ARCHIVE: Data exported to Parquet format in ARCHIVE bucket
  2. Status Update: A_SOURCE_FILE_RECEIVED records updated to 'ARCHIVED_AND_TRASHED'
  3. Move to TRASH: CSV files moved from ODS to TRASH folder (DATA bucket)
  4. Optional Cleanup: If IS_KEEP_IN_TRASH='N', files deleted from TRASH

Automatic Rollback Trigger: If any error occurs during step 3 (Move to TRASH), the system:

  • Reverts all files: Moves successfully processed files from TRASH back to ODS
  • Rolls back status: Resets A_SOURCE_FILE_RECEIVED status to 'INGESTED'
  • Logs error: Records detailed error information in A_PROCESS_LOG
  • Raises exception: Propagates error to calling process

Rollback Logic (from code):

-- If MOVE_FILE_TO_TRASH fails for any file
ELSIF vProcessControlStatus = 'MOVE_FILE_TO_TRASH_FAILURE' THEN
   FOR f in (files already moved to TRASH) LOOP
      -- Move file back from TRASH to ODS
      DBMS_CLOUD.MOVE_OBJECT(
         source_object_uri => 'TRASH/.../filename',
         target_object_uri => 'ODS/.../filename'
      );
      
      -- Revert status back to INGESTED
      UPDATE A_SOURCE_FILE_RECEIVED
         SET PROCESSING_STATUS = 'INGESTED'
       WHERE source_file_name = f.filename;
   END LOOP;
END IF;

Why This Matters: Ensures all-or-nothing archival - either all files for a YEAR_MONTH partition are successfully archived, or none are (maintains data consistency).

TRASH Management Procedures

RESTORE_FILE_FROM_TRASH

Restores files from TRASH folder back to ODS with 3-level granularity:

Level 3 (Highest Priority) - Single File Restore:

-- Restore specific file by A_SOURCE_FILE_RECEIVED_KEY
CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH(
    pSourceFileReceivedKey => 12345
);

Level 2 (Medium Priority) - Configuration-Based Restore:

-- Restore all files for specific table configuration
CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH(
    pSourceFileConfigKey => 341
);

Level 1 (Lowest Priority) - Global Restore:

-- Restore ALL files with ARCHIVED_AND_TRASHED status system-wide
CALL FILE_ARCHIVER.RESTORE_FILE_FROM_TRASH(
    pRestoreAll => TRUE
);

Restore Operations:

  • Moves files: TRASH folder → ODS folder (using DBMS_CLOUD.MOVE_OBJECT)
  • Updates status: ARCHIVED_AND_TRASHED → INGESTED
  • Clears metadata: Sets ARCH_PATH, PARTITION_YEAR, PARTITION_MONTH to NULL
  • Returns files to active processing: Makes data available for Airflow+DBT pipeline

PURGE_TRASH_FOLDER

Permanently deletes files from TRASH with 3-level granularity:

Level 3 (Highest Priority) - Single File Purge:

-- Delete specific file from TRASH
CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER(
    pSourceFileReceivedKey => 12345
);

Level 2 (Medium Priority) - Configuration-Based Purge:

-- Delete all TRASH files for specific table configuration
CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER(
    pSourceFileConfigKey => 341
);

Level 1 (Lowest Priority) - Global Purge:

-- Delete ALL files with ARCHIVED_AND_TRASHED status system-wide
CALL FILE_ARCHIVER.PURGE_TRASH_FOLDER(
    pPurgeAll => TRUE
);

Purge Operations:

  • Deletes files: Permanently removes from TRASH folder (using DBMS_CLOUD.DELETE_OBJECT)
  • Updates status: ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED
  • Warning: Irreversible operation - files cannot be restored after purge
  • Use case: Storage optimization, compliance with data retention policies

Important: Purge is not automatic - must be explicitly called. This provides additional safety layer for data retention.

Configuration Examples

Example 1: Configure LM Standing Facilities (Current Month Only)

-- Keep only current month data in ODS bucket (MINIMUM_AGE_MONTHS = 0)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 0  -- 0 = archives all data before current month
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'DistributeStandingFacilities'
  AND TABLE_ID = 'LM_STANDING_FACILITIES';
COMMIT;

-- Verify configuration
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_ID = 'DistributeStandingFacilities';

Example 2: Configure CSDB Debt (MINIMUM_AGE_MONTHS)

-- Retain 6 months of data in ODS bucket
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 6
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'CSDB'
  AND TABLE_ID = 'CSDB_DEBT';
COMMIT;

-- Verify configuration
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'CSDB_DEBT';

Example 3: Bulk Configuration for LM Source

-- Configure all 19 LM tables with MINIMUM_AGE_MONTHS = 0 (current month only)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 0  -- 0 = keep only current month
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID IN (
    'DistributeStandingFacilities',
    'DistributeTTS',
    'DistributeAdHocAdjustments',
    'DistributeBalanceSheet',
    'DistributeCSMAdjustments',
    'DistributeCurrentAccounts',
    'DistributeForecast',
    'DistributeQREAdjustments'
  );
COMMIT;

-- Verify bulk configuration
SELECT 
    SOURCE_FILE_ID,
    COUNT(*) AS TABLE_COUNT,
    MAX(ARCHIVAL_STRATEGY) AS STRATEGY,
    MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_ID LIKE 'Distribute%'
GROUP BY SOURCE_FILE_ID
ORDER BY SOURCE_FILE_ID;

Example 4: View Current Archival Configuration

-- All configured tables with their archival strategies
SELECT 
    A_SOURCE_KEY,
    SOURCE_FILE_ID,
    TABLE_ID,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS,
    ARCHIVE_THRESHOLD_DAYS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID;

-- Summary by strategy
SELECT 
    ARCHIVAL_STRATEGY,
    COUNT(*) AS TABLE_COUNT,
    MIN(MINIMUM_AGE_MONTHS) AS MIN_AGE_MIN,
    MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE_MAX
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
GROUP BY ARCHIVAL_STRATEGY
ORDER BY ARCHIVAL_STRATEGY;

Example 5: Configure Archival Control Settings

-- Complete configuration with all archival settings
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 6,
    IS_ARCHIVE_ENABLED = 'Y',      -- Enable archival
    IS_KEEP_IN_TRASH = 'Y'         -- Keep files in TRASH for safety
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'CSDB'
  AND TABLE_ID = 'CSDB_DEBT';
COMMIT;

-- Disable archival temporarily for troubleshooting
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET IS_ARCHIVE_ENABLED = 'N'  -- Batch operations will skip this table
WHERE TABLE_ID = 'CSDB_DEBT';
COMMIT;

-- Configure TRASH cleanup for test environment
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET IS_KEEP_IN_TRASH = 'N'  -- Delete files from TRASH after archival
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND SOURCE_FILE_ID = 'TEST_SOURCE';
COMMIT;

-- View complete configuration
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS,
    IS_ARCHIVE_ENABLED,
    IS_KEEP_IN_TRASH
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY SOURCE_FILE_ID, TABLE_ID;

-- Summary by archival status
SELECT 
    IS_ARCHIVE_ENABLED,
    IS_KEEP_IN_TRASH,
    COUNT(*) AS TABLE_COUNT
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
GROUP BY IS_ARCHIVE_ENABLED, IS_KEEP_IN_TRASH
ORDER BY IS_ARCHIVE_ENABLED DESC, IS_KEEP_IN_TRASH DESC;

Release 01 Configuration

Configured Tables (MARS-828)

The following 25 Release 01 tables were configured with archival strategies:

LM Tables (19 total) - MINIMUM_AGE_MONTHS = 0 (current month only):

  • LM_STANDING_FACILITIES
  • LM_STANDING_FACILITIES_HEADER
  • LM_TTS_HEADER
  • LM_TTS_ITEM
  • LM_ADHOC_ADJUSTMENTS_HEADER
  • LM_ADHOC_ADJUSTMENTS_ITEM
  • LM_ADHOC_ADJUSTMENTS_ITEM_HEADER
  • LM_BALANCESHEET_HEADER
  • LM_BALANCESHEET_ITEM
  • LM_CSM_ADJUSTMENTS_HEADER
  • LM_CSM_ADJUSTMENTS_ITEM
  • LM_CSM_ADJUSTMENTS_ITEM_HEADER
  • LM_CURRENT_ACCOUNTS_HEADER
  • LM_CURRENT_ACCOUNTS_ITEM
  • LM_FORECAST_HEADER
  • LM_FORECAST_ITEM
  • LM_QRE_ADJUSTMENTS_HEADER
  • LM_QRE_ADJUSTMENTS_ITEM
  • LM_QRE_ADJUSTMENTS_ITEM_HEADER

CSDB Tables (6 total):

MINIMUM_AGE_MONTHS = 6 (6-month retention):

  • CSDB_DEBT
  • CSDB_DEBT_DAILY

MINIMUM_AGE_MONTHS = 0 (current month only):

  • CSDB_INSTR_RAT_FULL
  • CSDB_INSTR_DESC_FULL
  • CSDB_ISSUER_RAT_FULL
  • CSDB_ISSUER_DESC_FULL

Verification Query:

-- Check Release 01 configuration
SELECT 
    CASE 
        WHEN TABLE_ID LIKE 'LM_%' THEN 'LM'
        WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB'
    END AS SOURCE_GROUP,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS,
    COUNT(*) AS TABLE_COUNT
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
  AND TABLE_ID IN (
    -- 25 Release 01 tables
    'LM_STANDING_FACILITIES', 'LM_STANDING_FACILITIES_HEADER',
    'LM_TTS_HEADER', 'LM_TTS_ITEM',
    -- ... other tables
  )
GROUP BY 
    CASE 
        WHEN TABLE_ID LIKE 'LM_%' THEN 'LM'
        WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB'
    END,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS
ORDER BY SOURCE_GROUP, ARCHIVAL_STRATEGY;

Troubleshooting

Common Issues

Issue 1: Validation Error on Configuration Update

Error:

ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set

Cause: Trigger validation failed - strategy requires MINIMUM_AGE_MONTHS but value is NULL

Solution:

-- Provide required MINIMUM_AGE_MONTHS value
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
    MINIMUM_AGE_MONTHS = 6  -- Required for this strategy
WHERE ...;

Issue 2: Archival Not Triggering Despite Configuration

Scenario A: MINIMUM_AGE_MONTHS strategy not archiving

-- Check files that should be archived
SELECT 
    SFR.A_SOURCE_FILE_RECEIVED_KEY,
    SFR.SOURCE_FILE_NAME,
    SFR.PROCESSING_STATUS,
    LH.LOAD_START,
    TRUNC(MONTHS_BETWEEN(SYSDATE, LH.LOAD_START)) AS MONTHS_AGE,
    SFC.MINIMUM_AGE_MONTHS AS THRESHOLD
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS'
  AND SFR.PROCESSING_STATUS = 'INGESTED'
  AND SFC.IS_ARCHIVE_ENABLED = 'Y'
ORDER BY LH.LOAD_START;

-- Note: MINIMUM_AGE_MONTHS archives immediately (threshold-independent)
-- If files not archived, check IS_ARCHIVE_ENABLED='Y' and run ARCHIVE_TABLE_DATA

Scenario B: THRESHOLD_BASED or HYBRID strategy not archiving

-- Check if threshold reached for specific configuration
SELECT 
    SFC.SOURCE_FILE_ID,
    SFC.TABLE_ID,
    SFC.ARCHIVAL_STRATEGY,
    SFC.ARCHIVE_THRESHOLD_FILES_COUNT AS FILE_THRESHOLD,
    SFC.ARCHIVE_THRESHOLD_ROWS_COUNT AS ROW_THRESHOLD,
    SFC.ARCHIVE_THRESHOLD_BYTES_SUM AS BYTE_THRESHOLD,
    COUNT(SFR.A_SOURCE_FILE_RECEIVED_KEY) AS CURRENT_FILES,
    SUM(SFR.TOTAL_RECORDS) AS CURRENT_ROWS,
    SUM(SFR.FILE_SIZE_BYTES) AS CURRENT_BYTES
FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC
LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR 
    ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY
   AND SFR.PROCESSING_STATUS = 'INGESTED'
WHERE SFC.ARCHIVAL_STRATEGY IN ('THRESHOLD_BASED', 'HYBRID')
  AND SFC.IS_ARCHIVE_ENABLED = 'Y'
  AND SFC.A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey
GROUP BY 
    SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY,
    SFC.ARCHIVE_THRESHOLD_FILES_COUNT, 
    SFC.ARCHIVE_THRESHOLD_ROWS_COUNT,
    SFC.ARCHIVE_THRESHOLD_BYTES_SUM;

-- Expected: At least ONE threshold (FILE/ROW/BYTE) must be exceeded
-- If no threshold exceeded, archival will NOT trigger (threshold-dependent behavior)

Issue 3: ARCH_PATH Contains Directory Not Filename

Symptoms: A_SOURCE_FILE_RECEIVED.ARCH_PATH shows folder path instead of specific file

Explanation: This is expected behavior:

-- Example ARCH_PATH value
SELECT ARCH_PATH 
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED'
  AND ROWNUM = 1;

-- Result (example):
-- https://objectstorage.../ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/

-- Reason: DBMS_CLOUD.EXPORT_DATA with parallel execution creates multiple Parquet files:
-- - STANDING_FACILITIES_part_00001.parquet
-- - STANDING_FACILITIES_part_00002.parquet
-- - ...
-- System stores directory prefix to track ALL generated files

To List Actual Parquet Files:

-- Use DBMS_CLOUD.LIST_OBJECTS with ARCH_PATH as prefix
SELECT object_name, bytes, created
FROM TABLE(DBMS_CLOUD.LIST_OBJECTS(
    credential_name => 'OCI$RESOURCE_PRINCIPAL',
    location_uri => 'https://objectstorage.../b/archive/o/'
))
WHERE object_name LIKE 'ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTITION_MONTH=02/%';

Issue 4: Files Remain in TRASH Folder

Symptoms: Files not deleted from TRASH after archival

Cause: Configuration has IS_KEEP_IN_TRASH='Y' (retain files in TRASH)

Verification:

-- Check TRASH policy for configuration
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    IS_KEEP_IN_TRASH,
    CASE IS_KEEP_IN_TRASH
        WHEN 'Y' THEN 'Files RETAINED in TRASH (manual purge required)'
        WHEN 'N' THEN 'Files DELETED immediately after archival'
    END AS TRASH_BEHAVIOR
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'YOUR_TABLE';

Solutions:

-- Option A: Change configuration to auto-delete (permanent change)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_KEEP_IN_TRASH = 'N'  -- Auto-delete from TRASH after archival
WHERE TABLE_ID = 'YOUR_TABLE';
COMMIT;

-- Option B: Manually purge TRASH for specific table (one-time action)
BEGIN
    CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER(
        pSourceFileConfigKey => :yourConfigKey
    );
END;
/

-- Option C: Purge all TRASH system-wide (use with caution)
BEGIN
    CT_MRDS.FILE_ARCHIVER.PURGE_TRASH_FOLDER(
        pPurgeAll => TRUE
    );
END;
/

Issue 5: Automatic Rollback Occurred

Symptoms: Files unexpectedly back in INGESTED status, archival process reported failure

Cause: Error during "Move to TRASH" step triggered automatic rollback

Investigation:

-- Check process logs for rollback events
SELECT 
    PROCESS_LOG_KEY,
    LOG_LEVEL,
    LOG_MESSAGE,
    PARAMETERS,
    LOG_TIMESTAMP
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA'
  AND LOG_MESSAGE LIKE '%rollback%' OR LOG_MESSAGE LIKE '%MOVE_FILE_TO_TRASH_FAILURE%'
ORDER BY LOG_TIMESTAMP DESC
FETCH FIRST 10 ROWS ONLY;

-- Check files that were rolled back
SELECT 
    A_SOURCE_FILE_RECEIVED_KEY,
    SOURCE_FILE_NAME,
    PROCESSING_STATUS,  -- Should be INGESTED after rollback
    ARCH_PATH,           -- Should be NULL after rollback
    PARTITION_YEAR,      -- Should be NULL after rollback
    PARTITION_MONTH      -- Should be NULL after rollback
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey
  AND UPDATED_AT > SYSDATE - 1  -- Last 24 hours
ORDER BY UPDATED_AT DESC;

Resolution:

  1. Investigate root cause: Check error messages in A_PROCESS_LOG
  2. Fix underlying issue: OCI permissions, bucket access, wrong credentials, etc.
  3. Re-run archival: Call ARCHIVE_TABLE_DATA again after fix

Issue 6: Archival Not Working as Expected

Symptoms: Data not being archived according to strategy

Diagnostic Steps:

-- 1. Check configuration
SELECT 
    SOURCE_FILE_ID,
    TABLE_ID,
    ARCHIVAL_STRATEGY,
    MINIMUM_AGE_MONTHS,
    ARCHIVE_THRESHOLD_DAYS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'YOUR_TABLE';

-- 2. Check package version
SELECT CT_MRDS.FILE_ARCHIVER.GET_VERSION() FROM DUAL;
-- Expected: 3.0.0 or higher

-- 3. Check process logs
SELECT 
    PROCESS_LOG_KEY,
    PROCESS_NAME,
    LOG_MESSAGE,
    LOG_LEVEL,
    LOG_TIMESTAMP
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCESS_NAME LIKE '%ARCHIVE%'
ORDER BY LOG_TIMESTAMP DESC
FETCH FIRST 20 ROWS ONLY;

-- 4. Test WHERE clause generation
DECLARE
    vConfig CT_MRDS.A_SOURCE_FILE_CONFIG%ROWTYPE;
    vWhereClause VARCHAR2(4000);
BEGIN
    SELECT * INTO vConfig
    FROM CT_MRDS.A_SOURCE_FILE_CONFIG
    WHERE TABLE_ID = 'YOUR_TABLE'
      AND ROWNUM = 1;
    
    vWhereClause := CT_MRDS.FILE_ARCHIVER.GET_ARCHIVAL_WHERE_CLAUSE(vConfig);
    DBMS_OUTPUT.PUT_LINE('WHERE Clause: ' || vWhereClause);
END;
/

Issue 3: Package Compilation Errors After Upgrade

Symptoms: FILE_ARCHIVER package shows INVALID status

Solution:

-- Check compilation errors
SELECT * FROM USER_ERRORS 
WHERE NAME = 'FILE_ARCHIVER' 
  AND TYPE IN ('PACKAGE', 'PACKAGE BODY')
ORDER BY SEQUENCE;

-- Recompile package
ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE SPECIFICATION;
ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE BODY;

-- Verify status
SELECT object_name, object_type, status 
FROM user_objects 
WHERE object_name = 'FILE_ARCHIVER';

Diagnostic Queries for Monitoring

Query 1: Status Distribution Across All Files

-- Overall file status distribution
SELECT 
    PROCESSING_STATUS,
    COUNT(*) AS FILE_COUNT,
    ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 2) AS PERCENTAGE,
    MIN(CREATED_AT) AS OLDEST_FILE,
    MAX(CREATED_AT) AS NEWEST_FILE
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
GROUP BY PROCESSING_STATUS
ORDER BY FILE_COUNT DESC;

Query 2: Files in TRASH (Archived but Not Purged)

-- Files currently in TRASH folder (status ARCHIVED_AND_TRASHED)
SELECT 
    SFR.A_SOURCE_FILE_RECEIVED_KEY,
    SFC.SOURCE_FILE_ID,
    SFC.TABLE_ID,
    SFR.SOURCE_FILE_NAME,
    SFR.ARCH_PATH,
    SFR.PARTITION_YEAR,
    SFR.PARTITION_MONTH,
    SFR.FILE_SIZE_BYTES,
    SFR.UPDATED_AT AS ARCHIVED_AT,
    TRUNC(SYSDATE - SFR.UPDATED_AT) AS DAYS_IN_TRASH,
    SFC.IS_KEEP_IN_TRASH AS TRASH_POLICY
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED'
ORDER BY SFR.UPDATED_AT DESC;

Query 3: Archival Activity by Configuration

-- Archival statistics per table configuration
SELECT 
    SFC.SOURCE_FILE_ID,
    SFC.TABLE_ID,
    SFC.ARCHIVAL_STRATEGY,
    SFC.IS_ARCHIVE_ENABLED,
    SFC.IS_KEEP_IN_TRASH,
    COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'INGESTED' THEN 1 END) AS PENDING_ARCHIVE,
    COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' THEN 1 END) AS IN_TRASH,
    COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_PURGED' THEN 1 END) AS PURGED,
    MAX(SFR.UPDATED_AT) FILTER (WHERE SFR.PROCESSING_STATUS LIKE 'ARCHIVED%') AS LAST_ARCHIVAL
FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC
LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY
WHERE SFC.SOURCE_FILE_TYPE = 'INPUT'
GROUP BY 
    SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY, 
    SFC.IS_ARCHIVE_ENABLED, SFC.IS_KEEP_IN_TRASH
ORDER BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID;

Query 4: Files Eligible for Archival (MINIMUM_AGE_MONTHS)

-- Files that should be archived based on MINIMUM_AGE_MONTHS strategy
SELECT 
    SFC.SOURCE_FILE_ID,
    SFC.TABLE_ID,
    SFC.MINIMUM_AGE_MONTHS AS AGE_THRESHOLD,
    COUNT(*) AS ELIGIBLE_FILES,
    SUM(SFR.FILE_SIZE_BYTES) AS TOTAL_SIZE_BYTES,
    SUM(SFR.TOTAL_RECORDS) AS TOTAL_ROWS,
    MIN(LH.LOAD_START) AS OLDEST_FILE,
    MAX(LH.LOAD_START) AS NEWEST_ELIGIBLE
FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC
JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY
JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY
WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS'
  AND SFC.IS_ARCHIVE_ENABLED = 'Y'
  AND SFR.PROCESSING_STATUS = 'INGESTED'
  AND LH.LOAD_START < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -SFC.MINIMUM_AGE_MONTHS)
GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.MINIMUM_AGE_MONTHS
ORDER BY ELIGIBLE_FILES DESC;

Query 5: Archival Performance Metrics

-- Recent archival operations with timing
SELECT 
    PROCESS_LOG_KEY,
    SUBSTR(PARAMETERS, 1, 100) AS CONFIG_INFO,
    LOG_TIMESTAMP AS START_TIME,
    LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) AS END_TIME,
    ROUND((LEAD(LOG_TIMESTAMP) OVER (PARTITION BY SUBSTR(PARAMETERS, 1, 100) ORDER BY LOG_TIMESTAMP) 
          - LOG_TIMESTAMP) * 24 * 60, 2) AS DURATION_MINUTES,
    CASE 
        WHEN LOG_LEVEL = 'ERROR' THEN 'FAILED'
        WHEN LOG_MESSAGE LIKE '%Archival completed%' THEN 'SUCCESS'
        ELSE 'IN_PROGRESS'
    END AS STATUS
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCESS_NAME = 'ARCHIVE_TABLE_DATA'
  AND LOG_TIMESTAMP > SYSDATE - 7  -- Last 7 days
ORDER BY LOG_TIMESTAMP DESC;

Query 6: TRASH Storage Usage

-- Estimate TRASH folder storage usage
SELECT 
    SFC.SOURCE_FILE_ID,
    COUNT(*) AS FILES_IN_TRASH,
    ROUND(SUM(SFR.FILE_SIZE_BYTES) / 1024 / 1024 / 1024, 2) AS SIZE_GB,
    MIN(SFR.UPDATED_AT) AS OLDEST_IN_TRASH,
    MAX(SFR.UPDATED_AT) AS NEWEST_IN_TRASH,
    SFC.IS_KEEP_IN_TRASH AS POLICY
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED'
GROUP BY SFC.SOURCE_FILE_ID, SFC.IS_KEEP_IN_TRASH
ORDER BY SIZE_GB DESC;

Version History

v3.4.0 (Current - 2026-03-17)

  • MARS-1409: Added IS_WORKFLOW_SUCCESS_REQUIRED flag to A_SOURCE_FILE_CONFIG
    • 'Y' (default) = archivization requires WORKFLOW_SUCCESSFUL='Y' in A_WORKFLOW_HISTORY (standard Airflow+DBT flow)
    • 'N' = archive regardless of workflow status (bypass for manual/non-DBT sources)
  • IS_WORKFLOW_SUCCESS_REQUIRED stored in A_TABLE_STAT and A_TABLE_STAT_HIST at statistics gather time
  • GATHER_TABLE_STAT: conditional WORKFLOW_SUCCESSFUL='Y' filter controlled by the flag
  • ARCHIVE_TABLE_DATA: conditional WORKFLOW_SUCCESSFUL='Y' filter controlled by the flag
  • Added pIsWorkflowSuccessRequired parameter to FILE_MANAGER.ADD_SOURCE_FILE_CONFIG
  • FILE_MANAGER updated to v3.6.2+

v3.3.0 (2026-02-11)

  • BREAKING CHANGE: Removed pKeepInTrash parameter from ARCHIVE_TABLE_DATA
  • Added IS_ARCHIVE_ENABLED column to A_SOURCE_FILE_CONFIG for selective archiving control
  • Added IS_KEEP_IN_TRASH column to A_SOURCE_FILE_CONFIG (replaces pKeepInTrash parameter)
  • Added batch procedures with 3-level granularity (config/source/all):
    • ARCHIVE_ALL - Batch archival procedure
    • GATHER_TABLE_STAT_ALL - Batch statistics procedure
    • RESTORE_FILE_FROM_TRASH - Restore files from TRASH folder
    • PURGE_TRASH_FOLDER - Purge TRASH folder files
  • TRASH retention now configuration-based instead of parameter-based
  • Enhanced flexibility for archival orchestration and monitoring

v3.2.1 (2026-02-10)

  • Fixed critical bug: Status update ARCHIVED → ARCHIVED_AND_TRASHED when moving files to TRASH folder
  • Ensures proper status tracking for files retained in TRASH

v3.2.0 (2026-02-06)

  • Added pKeepInTrash parameter (DEFAULT TRUE) to ARCHIVE_TABLE_DATA
  • TRASH folder retention control for safety and compliance
  • Files kept in TRASH subfolder by default for rollback capability

v3.1.0 (2026-02-05)

  • BREAKING CHANGE: Removed CURRENT_MONTH_ONLY strategy (replaced by MINIMUM_AGE_MONTHS = 0)
  • Mathematical equivalence: CURRENT_MONTH_ONLY ≡ MINIMUM_AGE_MONTHS = 0
  • Updated trigger validation to allow MINIMUM_AGE_MONTHS >= 0 (previously >= 1)
  • Simplified architecture from 4 strategies to 3
  • Enhanced error handling
  • All 25 Release 01 tables migrated to MINIMUM_AGE_MONTHS (23 with value 0, 2 with value 6)

v3.0.0 (MARS-828 - 2026-02-04)

  • Added ARCHIVAL_STRATEGY configuration column
  • Implemented four archival strategies (later reduced to three in v3.1.0):
    • THRESHOLD_BASED (backward compatible)
    • CURRENT_MONTH_ONLY (deprecated in v3.1.0, use MINIMUM_AGE_MONTHS = 0)
    • MINIMUM_AGE_MONTHS
    • HYBRID
  • Added GET_ARCHIVAL_WHERE_CLAUSE function
  • Created validation trigger TRG_BI_A_SRC_FILE_CFG_ARCH_VAL
  • Configured 25 Release 01 tables with appropriate strategies

v2.0.0 (Legacy)

  • Initial FILE_ARCHIVER package
  • THRESHOLD_BASED archival only
  • Fixed ARCHIVE_THRESHOLD_DAYS configuration

Dependencies

Required Packages

  • CT_MRDS.ENV_MANAGER v3.x - Error handling, logging, version tracking
  • CT_MRDS.FILE_MANAGER v3.x - Bucket URI resolution, file processing
  • MRDS_LOADER.cloud_wrapper - DBMS_CLOUD operations wrapper

Database Objects

  • Table: CT_MRDS.A_SOURCE_FILE_CONFIG - Configuration storage
  • Table: CT_MRDS.A_SOURCE_FILE_RECEIVED - File processing tracking
  • Table: CT_MRDS.A_WORKFLOW_HISTORY - Workflow execution tracking (Airflow + DBT)
  • Trigger: TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configuration validation
  • Credential: DEF_CRED_ARN - OCI bucket access

OCI Buckets

  • INBOX: Incoming file validation ('INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/')
  • ODS/DATA: Operational data processing ('ODS/{SOURCE}/{TABLE_NAME}/')
  • TRASH: File retention subfolder in DATA bucket ('TRASH/{SOURCE}/{TABLE_NAME}/') - CSV files after archival
  • ARCHIVE: Historical data storage ('ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/')

Note: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket for file retention and rollback capability.

Best Practices

Strategy Selection Guidelines

  1. Use MINIMUM_AGE_MONTHS when:

    • MINIMUM_AGE_MONTHS = 0: Current month only retention
      • Data updated frequently (daily/intraday)
      • Historical data access is rare
      • ODS bucket space is limited
      • Example: LM dissemination feeds
    • MINIMUM_AGE_MONTHS = N (N > 0): Multi-month retention
      • Regulatory compliance requires specific retention period
      • Analytical workloads need N-month access
      • Data updates are infrequent
      • Example: CSDB securities data (MINIMUM_AGE_MONTHS = 6)
  2. Use THRESHOLD_BASED when:

    • Simple time-based archival is sufficient
  3. Use HYBRID when:

    • Complex retention requirements
    • Combining month boundary check with minimum age threshold
    • Advanced scenarios not covered by other strategies

Configuration Best Practices

  1. Test Configuration Changes:

    -- Test on single table first
    UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG 
    SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
        MINIMUM_AGE_MONTHS = 0  -- 0 = current month only
    WHERE SOURCE_FILE_ID = 'TEST_FILE'
      AND TABLE_ID = 'TEST_TABLE';
    
    -- Monitor archival behavior
    -- Expand to other tables after validation
    
  2. Verify Before Bulk Updates:

    -- Preview changes with SELECT
    SELECT 
        SOURCE_FILE_ID,
        TABLE_ID,
        'MINIMUM_AGE_MONTHS' AS NEW_STRATEGY,
        0 AS NEW_MIN_AGE,  -- 0 = current month only
        ARCHIVAL_STRATEGY AS OLD_STRATEGY,
        MINIMUM_AGE_MONTHS AS OLD_MIN_AGE
    FROM CT_MRDS.A_SOURCE_FILE_CONFIG
    WHERE SOURCE_FILE_ID LIKE 'Distribute%';
    
    -- Then execute UPDATE
    
  3. Document Configuration Decisions:

    • Record why specific strategy was chosen
    • Note business requirements driving retention policy
    • Track configuration changes in version control
  4. Monitor Archival Performance:

    -- Check archival execution logs
    SELECT 
        PROCESS_NAME,
        LOG_MESSAGE,
        LOG_TIMESTAMP
    FROM CT_MRDS.A_PROCESS_LOG
    WHERE PROCESS_NAME LIKE '%ARCHIVE%'
      AND LOG_TIMESTAMP > SYSDATE - 7
    ORDER BY LOG_TIMESTAMP DESC;
    
  5. Regular Configuration Reviews:

    • Verify strategies still match business requirements
    • Check for tables without archival configuration
    • Optimize MINIMUM_AGE_MONTHS based on actual usage patterns

TRASH Folder Retention Best Practices

  1. Default Behavior (IS_KEEP_IN_TRASH = 'Y' - Recommended):

    • Keeps CSV files in TRASH folder after archival
    • Provides safety net for rollback if archival issues occur
    • Supports compliance and audit requirements
    • Status: ARCHIVED_AND_TRASHED
    • Use for: Production environments, regulatory compliance, critical data
    • Configuration:
      UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
      SET IS_KEEP_IN_TRASH = 'Y'
      WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE';
      
  2. TRASH Cleanup (IS_KEEP_IN_TRASH = 'N'):

    • Deletes CSV files from TRASH folder after successful archival
    • Reduces storage costs in DATA bucket
    • Status: ARCHIVED_AND_PURGED
    • Use for: Non-critical data, storage optimization, test environments
    • Configuration:
      UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
      SET IS_KEEP_IN_TRASH = 'N'
      WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE';
      
  3. Monitoring TRASH Folder:

    -- Check files in TRASH retention
    SELECT 
        SOURCE_FILE_NAME,
        PROCESSING_STATUS,
        ARCH_FILE_NAME,
        PARTITION_YEAR,
        PARTITION_MONTH
    FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
    WHERE PROCESSING_STATUS IN ('ARCHIVED_AND_TRASHED', 'ARCHIVED_AND_PURGED')
      AND RECEPTION_DATE > SYSDATE - 30
    ORDER BY PROCESSING_STATUS, RECEPTION_DATE DESC;
    
  4. TRASH Folder Structure:

    DATA Bucket:
    ├── ODS/LM/STANDING_FACILITIES/file.csv      -- Active operational data
    └── TRASH/LM/STANDING_FACILITIES/file.csv    -- Retained after archival
    
    ARCHIVE Bucket:
    └── ARCHIVE/LM/STANDING_FACILITIES/
        └── PARTITION_YEAR=2026/
            └── PARTITION_MONTH=02/
                └── *.parquet  -- Archived data