Files
mars/confluence/FILE_ARCHIVER_Guide.md

683 lines
22 KiB
Markdown

# FILE_ARCHIVER Configuration Guide
This document describes the archival strategies available in the FILE_ARCHIVER package for managing data lifecycle across OCI buckets (INBOX → ODS → ARCHIVE).
## Overview
The FILE_ARCHIVER package provides flexible archival strategies that accommodate different data retention policies across source systems. It manages the movement of processed data from operational storage (ODS bucket) to long-term archival storage (ARCHIVE bucket) based on configurable strategies.
### Key Features
- **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID
- **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG
- **Backward Compatible**: Default THRESHOLD_BASED strategy maintains existing behavior
- **Validation**: Automatic validation of strategy-specific configuration requirements
- **OCI Integration**: Works seamlessly with DBMS_CLOUD operations via cloud_wrapper
### Package Information
- **Schema**: CT_MRDS
- **Package**: FILE_ARCHIVER
- **Current Version**: 3.2.0
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
### Critical Prerequisites
⚠️ **IMPORTANT**: FILE_ARCHIVER requires data to be registered in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table. This table is automatically populated when files are processed through the modern Airflow + DBT workflow via `FILE_MANAGER.PROCESS_SOURCE_FILE`.
**For legacy data migrated from Informatica + WLA system:**
- Legacy data exported using `DATA_EXPORTER` does NOT automatically create `A_SOURCE_FILE_RECEIVED` records
- Without these records, FILE_ARCHIVER **CANNOT** archive the data
- See [System Migration Guide](System_Migration_Informatica_to_Airflow_DBT.md) for workaround strategies
**Recommendation for legacy data**: Export directly to ARCHIVE bucket using `DATA_EXPORTER.EXPORT_TABLE_DATA_BY_DATE` with `pBucketArea => 'ARCHIVE'` to bypass this requirement
## Archival Strategies
### Strategy Overview
| Strategy | WHERE Clause Logic | Configuration Required | Primary Use Case |
|----------|-------------------|----------------------|------------------|
| `THRESHOLD_BASED` | Days since workflow start > threshold | DAYS_FOR_ARCHIVE_THRESHOLD | Legacy compatibility, simple time-based archival |
| `MINIMUM_AGE_MONTHS` | Archive data older than X months (0=current month only) | MINIMUM_AGE_MONTHS (≥0) | All sources - flexible retention (0 for LM, 6 for CSDB) |
| `HYBRID` | Combines month boundary + minimum age | MINIMUM_AGE_MONTHS | Advanced retention scenarios |
### 1. THRESHOLD_BASED (Default)
Archives data based on number of days since workflow start.
**WHERE Clause**:
```sql
extract(day from (systimestamp - workflow_start)) > DAYS_FOR_ARCHIVE_THRESHOLD
```
**Configuration**:
```sql
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'THRESHOLD_BASED',
DAYS_FOR_ARCHIVE_THRESHOLD = 30,
MINIMUM_AGE_MONTHS = NULL
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'C2D_DATA'
AND TABLE_ID = 'C2D_TABLE';
```
**Use Case**: Simple time-based archival, backward compatible with FILE_ARCHIVER v2.0.0 behavior.
### 2. MINIMUM_AGE_MONTHS
Archives data older than specified number of months. **Special case**: MINIMUM_AGE_MONTHS = 0 archives all data before current month (replaces deprecated CURRENT_MONTH_ONLY strategy).
**WHERE Clause**:
```sql
workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS)
-- When MINIMUM_AGE_MONTHS = 0: workflow_start < TRUNC(SYSDATE, 'MM')
```
**Configuration Examples**:
```sql
-- LM: Keep only current month data (MINIMUM_AGE_MONTHS = 0)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 0
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'DistributeStandingFacilities'
AND TABLE_ID = 'LM_STANDING_FACILITIES';
-- CSDB: Retain 6 months of data (MINIMUM_AGE_MONTHS = 6)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 6
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'CSDB'
AND TABLE_ID IN ('CSDB_DEBT', 'CSDB_DEBT_DAILY');
```
**Use Cases**:
- **MINIMUM_AGE_MONTHS = 0**: LM dissemination feeds requiring current month only (daily/intraday updates)
- **MINIMUM_AGE_MONTHS = 6**: CSDB securities/ratings data requiring 6-month retention
- **MINIMUM_AGE_MONTHS = N**: Regulatory compliance with specific N-month retention periods
**Behavior Examples**:
- **With MINIMUM_AGE_MONTHS = 0**:
- January data: Archived on February 1st
- February data: Remains in ODS bucket during February
- March 1st: February data archived, March data active
- **With MINIMUM_AGE_MONTHS = 6**:
- February 2026: Archives data from July 2025 and earlier
- March 2026: Archives data from August 2025 and earlier
- Keeps current month + 6 previous months (7 months total) in ODS bucket
### 3. HYBRID
Combines month boundary check with minimum age threshold - archives data from previous months AND older than minimum age.
**WHERE Clause**:
```sql
TRUNC(workflow_start, 'MM') < TRUNC(SYSDATE, 'MM')
AND workflow_start < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -MINIMUM_AGE_MONTHS)
```
**Configuration**:
```sql
-- Advanced: Current month + 3 months minimum
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'HYBRID',
MINIMUM_AGE_MONTHS = 3
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'SPECIAL_SOURCE'
AND TABLE_ID = 'SPECIAL_TABLE';
```
**Use Case**: Advanced scenarios requiring both current month retention AND minimum age threshold.
## Configuration Validation
### Validation Trigger
**Trigger**: `TRG_BI_A_SRC_FILE_CFG_ARCH_VAL`
Automatically validates archival configuration on INSERT/UPDATE to A_SOURCE_FILE_CONFIG:
**Validation Rules**:
1. **MINIMUM_AGE_MONTHS**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL AND MINIMUM_AGE_MONTHS >= 0`
- Error: "Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set (≥0)"
2. **HYBRID**: Requires `MINIMUM_AGE_MONTHS IS NOT NULL`
- Error: "Strategy HYBRID requires MINIMUM_AGE_MONTHS to be set"
**Example Validation Error**:
```sql
-- This will fail validation
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = NULL -- ERROR: Required for this strategy
WHERE ...;
-- Error: ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set
```
## Data Lifecycle Workflow
### Standard File Processing Flow
```
┌─────────────────────────────────────────────────────────────┐
│ FILE PROCESSING LIFECYCLE │
└─────────────────────────────────────────────────────────────┘
1. INBOX Bucket (Validation)
├─ File arrives from source system
├─ FILE_MANAGER.PROCESS_SOURCE_FILE validates structure
├─ Status: RECEIVED → VALIDATED → READY_FOR_INGESTION
└─ FILE_MANAGER.MOVE_FILE relocates to ODS bucket
2. ODS Bucket (Operational Data)
├─ Active data processing (Airflow + DBT)
├─ External tables read data from bucket
├─ Status: INGESTED
├─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
└─ CSV files moved to TRASH subfolder (ODS → TRASH/)
2.1 TRASH Subfolder (DATA Bucket - File Retention)
├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME)
├─ Stores CSV files after archival to Parquet
├─ Status: ARCHIVED_AND_TRASHED (default retention)
├─ Enables rollback if archival issues occur
└─ Optional cleanup: ARCHIVED_AND_PURGED (pKeepInTrash=FALSE)
3. ARCHIVE Bucket (Long-term Storage)
├─ Historical data in Parquet format
├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH=
├─ Status: ARCHIVED_AND_TRASHED or ARCHIVED_AND_PURGED
└─ Optimized for big data analytics (Spark, Hive)
**Key Procedures**:
- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey, pKeepInTrash)` - Main archival procedure using strategy-specific WHERE clause
- `pKeepInTrash` (BOOLEAN, DEFAULT TRUE) - Controls TRASH folder retention
- TRUE: Files kept in TRASH folder for safety and rollback capability (default)
- FALSE: Files deleted from TRASH folder after successful archival
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
**Archival Execution**:
```sql
-- Default behavior: Keep files in TRASH folder (ARCHIVED_AND_TRASHED status)
BEGIN
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
pSourceFileConfigKey => vSourceFileConfigKey,
pKeepInTrash => TRUE -- DEFAULT value
);
END;
/
-- Optional: Delete files from TRASH after archival (ARCHIVED_AND_PURGED status)
BEGIN
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
pSourceFileConfigKey => vSourceFileConfigKey,
pKeepInTrash => FALSE -- Cleanup TRASH folder
);
END;
/
```
**Strategy-Based Filtering**:
- Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
- GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
- Data matching criteria moved from ODS to ARCHIVE bucket
- CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/)
- Parquet format with Hive-style partitioning applied to ARCHIVE bucket
- TRASH retention controlled by pKeepInTrash parameter
## Configuration Examples
### Example 1: Configure LM Standing Facilities (Current Month Only)
```sql
-- Keep only current month data in ODS bucket (MINIMUM_AGE_MONTHS = 0)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 0 -- 0 = archives all data before current month
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'DistributeStandingFacilities'
AND TABLE_ID = 'LM_STANDING_FACILITIES';
COMMIT;
-- Verify configuration
SELECT
SOURCE_FILE_ID,
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_ID = 'DistributeStandingFacilities';
```
### Example 2: Configure CSDB Debt (MINIMUM_AGE_MONTHS)
```sql
-- Retain 6 months of data in ODS bucket
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 6
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'CSDB'
AND TABLE_ID = 'CSDB_DEBT';
COMMIT;
-- Verify configuration
SELECT
SOURCE_FILE_ID,
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'CSDB_DEBT';
```
### Example 3: Bulk Configuration for LM Source
```sql
-- Configure all 19 LM tables with MINIMUM_AGE_MONTHS = 0 (current month only)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 0 -- 0 = keep only current month
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID IN (
'DistributeStandingFacilities',
'DistributeTTS',
'DistributeAdHocAdjustments',
'DistributeBalanceSheet',
'DistributeCSMAdjustments',
'DistributeCurrentAccounts',
'DistributeForecast',
'DistributeQREAdjustments'
);
COMMIT;
-- Verify bulk configuration
SELECT
SOURCE_FILE_ID,
COUNT(*) AS TABLE_COUNT,
MAX(ARCHIVAL_STRATEGY) AS STRATEGY,
MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_ID LIKE 'Distribute%'
GROUP BY SOURCE_FILE_ID
ORDER BY SOURCE_FILE_ID;
```
### Example 4: View Current Archival Configuration
```sql
-- All configured tables with their archival strategies
SELECT
A_SOURCE_KEY,
SOURCE_FILE_ID,
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
DAYS_FOR_ARCHIVE_THRESHOLD
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID;
-- Summary by strategy
SELECT
ARCHIVAL_STRATEGY,
COUNT(*) AS TABLE_COUNT,
MIN(MINIMUM_AGE_MONTHS) AS MIN_AGE_MIN,
MAX(MINIMUM_AGE_MONTHS) AS MIN_AGE_MAX
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
GROUP BY ARCHIVAL_STRATEGY
ORDER BY ARCHIVAL_STRATEGY;
```
## Release 01 Configuration
### Configured Tables (MARS-828)
The following 25 Release 01 tables were configured with archival strategies:
**LM Tables (19 total) - MINIMUM_AGE_MONTHS = 0 (current month only)**:
- LM_STANDING_FACILITIES
- LM_STANDING_FACILITIES_HEADER
- LM_TTS_HEADER
- LM_TTS_ITEM
- LM_ADHOC_ADJUSTMENTS_HEADER
- LM_ADHOC_ADJUSTMENTS_ITEM
- LM_ADHOC_ADJUSTMENTS_ITEM_HEADER
- LM_BALANCESHEET_HEADER
- LM_BALANCESHEET_ITEM
- LM_CSM_ADJUSTMENTS_HEADER
- LM_CSM_ADJUSTMENTS_ITEM
- LM_CSM_ADJUSTMENTS_ITEM_HEADER
- LM_CURRENT_ACCOUNTS_HEADER
- LM_CURRENT_ACCOUNTS_ITEM
- LM_FORECAST_HEADER
- LM_FORECAST_ITEM
- LM_QRE_ADJUSTMENTS_HEADER
- LM_QRE_ADJUSTMENTS_ITEM
- LM_QRE_ADJUSTMENTS_ITEM_HEADER
**CSDB Tables (6 total)**:
*MINIMUM_AGE_MONTHS = 6 (6-month retention)*:
- CSDB_DEBT
- CSDB_DEBT_DAILY
*MINIMUM_AGE_MONTHS = 0 (current month only)*:
- CSDB_INSTR_RAT_FULL
- CSDB_INSTR_DESC_FULL
- CSDB_ISSUER_RAT_FULL
- CSDB_ISSUER_DESC_FULL
**Verification Query**:
```sql
-- Check Release 01 configuration
SELECT
CASE
WHEN TABLE_ID LIKE 'LM_%' THEN 'LM'
WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB'
END AS SOURCE_GROUP,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
COUNT(*) AS TABLE_COUNT
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND TABLE_ID IN (
-- 25 Release 01 tables
'LM_STANDING_FACILITIES', 'LM_STANDING_FACILITIES_HEADER',
'LM_TTS_HEADER', 'LM_TTS_ITEM',
-- ... other tables
)
GROUP BY
CASE
WHEN TABLE_ID LIKE 'LM_%' THEN 'LM'
WHEN TABLE_ID LIKE 'CSDB_%' THEN 'CSDB'
END,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS
ORDER BY SOURCE_GROUP, ARCHIVAL_STRATEGY;
```
## Troubleshooting
### Common Issues
#### Issue 1: Validation Error on Configuration Update
**Error**:
```
ORA-20001: Strategy MINIMUM_AGE_MONTHS requires MINIMUM_AGE_MONTHS to be set
```
**Cause**: Trigger validation failed - strategy requires MINIMUM_AGE_MONTHS but value is NULL
**Solution**:
```sql
-- Provide required MINIMUM_AGE_MONTHS value
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 6 -- Required for this strategy
WHERE ...;
```
#### Issue 2: Archival Not Working as Expected
**Symptoms**: Data not being archived according to strategy
**Diagnostic Steps**:
```sql
-- 1. Check configuration
SELECT
SOURCE_FILE_ID,
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
DAYS_FOR_ARCHIVE_THRESHOLD
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'YOUR_TABLE';
-- 2. Check package version
SELECT CT_MRDS.FILE_ARCHIVER.GET_VERSION() FROM DUAL;
-- Expected: 3.0.0 or higher
-- 3. Check process logs
SELECT
PROCESS_LOG_KEY,
PROCESS_NAME,
LOG_MESSAGE,
LOG_LEVEL,
LOG_TIMESTAMP
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCESS_NAME LIKE '%ARCHIVE%'
ORDER BY LOG_TIMESTAMP DESC
FETCH FIRST 20 ROWS ONLY;
-- 4. Test WHERE clause generation
DECLARE
vConfig CT_MRDS.A_SOURCE_FILE_CONFIG%ROWTYPE;
vWhereClause VARCHAR2(4000);
BEGIN
SELECT * INTO vConfig
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'YOUR_TABLE'
AND ROWNUM = 1;
vWhereClause := CT_MRDS.FILE_ARCHIVER.GET_ARCHIVAL_WHERE_CLAUSE(vConfig);
DBMS_OUTPUT.PUT_LINE('WHERE Clause: ' || vWhereClause);
END;
/
```
#### Issue 3: Package Compilation Errors After Upgrade
**Symptoms**: FILE_ARCHIVER package shows INVALID status
**Solution**:
```sql
-- Check compilation errors
SELECT * FROM USER_ERRORS
WHERE NAME = 'FILE_ARCHIVER'
AND TYPE IN ('PACKAGE', 'PACKAGE BODY')
ORDER BY SEQUENCE;
-- Recompile package
ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE SPECIFICATION;
ALTER PACKAGE CT_MRDS.FILE_ARCHIVER COMPILE BODY;
-- Verify status
SELECT object_name, object_type, status
FROM user_objects
WHERE object_name = 'FILE_ARCHIVER';
```
## Version History
### v3.1.0 (Current - 2026-02-05)
- **BREAKING CHANGE**: Removed CURRENT_MONTH_ONLY strategy (replaced by MINIMUM_AGE_MONTHS = 0)
- Mathematical equivalence: CURRENT_MONTH_ONLY ≡ MINIMUM_AGE_MONTHS = 0
- Updated trigger validation to allow MINIMUM_AGE_MONTHS >= 0 (previously >= 1)
- Simplified architecture from 4 strategies to 3
- Enhanced error handling
- All 25 Release 01 tables migrated to MINIMUM_AGE_MONTHS (23 with value 0, 2 with value 6)
### v3.0.0 (MARS-828 - 2026-02-04)
- Added ARCHIVAL_STRATEGY configuration column
- Implemented four archival strategies (later reduced to three in v3.1.0):
- THRESHOLD_BASED (backward compatible)
- CURRENT_MONTH_ONLY (deprecated in v3.1.0, use MINIMUM_AGE_MONTHS = 0)
- MINIMUM_AGE_MONTHS
- HYBRID
- Added GET_ARCHIVAL_WHERE_CLAUSE function
- Created validation trigger TRG_BI_A_SRC_FILE_CFG_ARCH_VAL
- Configured 25 Release 01 tables with appropriate strategies
### v2.0.0 (Legacy)
- Initial FILE_ARCHIVER package
- THRESHOLD_BASED archival only
- Fixed DAYS_FOR_ARCHIVE_THRESHOLD configuration
## Related Documentation
- [FILE_MANAGER Configuration Guide](FILE_MANAGER_Configuration_Guide.md) - File processing and validation
- [Package Deployment Guide](Package_Deployment_Guide.md) - Package deployment standards
- [Universal Package Tracking System](Universal_Package_Tracking_System.md) - Version tracking
- [MARS-828 README](../MARS_Packages/REL01_ADDITIONS/MARS-828/README.md) - Detailed implementation notes
## Dependencies
### Required Packages
- **CT_MRDS.ENV_MANAGER** v3.x - Error handling, logging, version tracking
- **CT_MRDS.FILE_MANAGER** v3.x - Bucket URI resolution, file processing
- **MRDS_LOADER.cloud_wrapper** - DBMS_CLOUD operations wrapper
### Database Objects
- **Table**: CT_MRDS.A_SOURCE_FILE_CONFIG - Configuration storage
- **Table**: CT_MRDS.A_SOURCE_FILE_RECEIVED - File processing tracking
- **Table**: CT_MRDS.A_WORKFLOW_HISTORY - Workflow execution tracking (Airflow + DBT)
- **Trigger**: TRG_BI_A_SRC_FILE_CFG_ARCH_VAL - Configuration validation
- **Credential**: DEF_CRED_ARN - OCI bucket access
### OCI Buckets
- **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`)
- **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`)
- **TRASH**: File retention subfolder in DATA bucket (`'TRASH/{SOURCE}/{TABLE_NAME}/'`) - CSV files after archival
- **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`)
**Note**: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket for file retention and rollback capability.
## Best Practices
### Strategy Selection Guidelines
1. **Use MINIMUM_AGE_MONTHS when**:
- **MINIMUM_AGE_MONTHS = 0**: Current month only retention
- Data updated frequently (daily/intraday)
- Historical data access is rare
- ODS bucket space is limited
- Example: LM dissemination feeds
- **MINIMUM_AGE_MONTHS = N (N > 0)**: Multi-month retention
- Regulatory compliance requires specific retention period
- Analytical workloads need N-month access
- Data updates are infrequent
- Example: CSDB securities data (MINIMUM_AGE_MONTHS = 6)
2. **Use THRESHOLD_BASED when**:
- Maintaining backward compatibility with legacy behavior
- Simple time-based archival is sufficient
- Migration from FILE_ARCHIVER v2.0.0
3. **Use HYBRID when**:
- Complex retention requirements
- Combining month boundary check with minimum age threshold
- Advanced scenarios not covered by other strategies
### Configuration Best Practices
1. **Test Configuration Changes**:
```sql
-- Test on single table first
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 0 -- 0 = current month only
WHERE SOURCE_FILE_ID = 'TEST_FILE'
AND TABLE_ID = 'TEST_TABLE';
-- Monitor archival behavior
-- Expand to other tables after validation
```
2. **Verify Before Bulk Updates**:
```sql
-- Preview changes with SELECT
SELECT
SOURCE_FILE_ID,
TABLE_ID,
'MINIMUM_AGE_MONTHS' AS NEW_STRATEGY,
0 AS NEW_MIN_AGE, -- 0 = current month only
ARCHIVAL_STRATEGY AS OLD_STRATEGY,
MINIMUM_AGE_MONTHS AS OLD_MIN_AGE
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_ID LIKE 'Distribute%';
-- Then execute UPDATE
```
3. **Document Configuration Decisions**:
- Record why specific strategy was chosen
- Note business requirements driving retention policy
- Track configuration changes in version control
4. **Monitor Archival Performance**:
```sql
-- Check archival execution logs
SELECT
PROCESS_NAME,
LOG_MESSAGE,
LOG_TIMESTAMP
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCESS_NAME LIKE '%ARCHIVE%'
AND LOG_TIMESTAMP > SYSDATE - 7
ORDER BY LOG_TIMESTAMP DESC;
```
5. **Regular Configuration Reviews**:
- Verify strategies still match business requirements
- Check for tables without archival configuration
- Optimize MINIMUM_AGE_MONTHS based on actual usage patterns
### TRASH Folder Retention Best Practices
1. **Default Behavior (pKeepInTrash = TRUE - Recommended)**:
- Keeps CSV files in TRASH folder after archival
- Provides safety net for rollback if archival issues occur
- Supports compliance and audit requirements
- Status: ARCHIVED_AND_TRASHED
- Use for: Production environments, regulatory compliance, critical data
2. **TRASH Cleanup (pKeepInTrash = FALSE)**:
- Deletes CSV files from TRASH folder after successful archival
- Reduces storage costs in DATA bucket
- Status: ARCHIVED_AND_PURGED
- Use for: Non-critical data, storage optimization, test environments
3. **Monitoring TRASH Folder**:
```sql
-- Check files in TRASH retention
SELECT
SOURCE_FILE_NAME,
PROCESSING_STATUS,
ARCH_FILE_NAME,
PARTITION_YEAR,
PARTITION_MONTH
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE PROCESSING_STATUS IN ('ARCHIVED_AND_TRASHED', 'ARCHIVED_AND_PURGED')
AND RECEPTION_DATE > SYSDATE - 30
ORDER BY PROCESSING_STATUS, RECEPTION_DATE DESC;
```
4. **TRASH Folder Structure**:
```
DATA Bucket:
├── ODS/LM/STANDING_FACILITIES/file.csv -- Active operational data
└── TRASH/LM/STANDING_FACILITIES/file.csv -- Retained after archival
ARCHIVE Bucket:
└── ARCHIVE/LM/STANDING_FACILITIES/
└── PARTITION_YEAR=2026/
└── PARTITION_MONTH=02/
└── *.parquet -- Archived data
```
## Author
Created by: Grzegorz Michalski
Date: 2026-02-06
Schema: CT_MRDS
Package: FILE_ARCHIVER
Version: 3.2.0