feat(FILE_MANAGER): Update package version to 3.6.3 and enhance ADD_SOURCE_FILE_CONFIG with new parameters for archival control

- Bump package version to 3.6.3 and update build date.
- Add new parameters: pIsArchiveEnabled, pIsKeepInTrash, pArchivalStrategy, pMinimumAgeMonths to ADD_SOURCE_FILE_CONFIG.
- Include pIsWorkflowSuccessRequired parameter to control workflow success requirement for archival.
- Update version history to reflect changes.

feat(A_SOURCE_FILE_CONFIG): Modify table structure to include new archival control flags

- Add IS_WORKFLOW_SUCCESS_REQUIRED column to A_SOURCE_FILE_CONFIG for workflow bypass functionality.
- Update constraints and comments for new columns.
- Ensure backward compatibility with default values.

fix(A_TABLE_STAT, A_TABLE_STAT_HIST): Extend table structures to accommodate new workflow success tracking

- Add IS_WORKFLOW_SUCCESS_REQUIRED column to both A_TABLE_STAT and A_TABLE_STAT_HIST.
- Update comments to clarify the purpose of new columns.

docs(FILE_ARCHIVER_Guide): Revise documentation to reflect new archival features and configurations

- Document new IS_WORKFLOW_SUCCESS_REQUIRED flag and its implications for archival processes.
- Update examples and configurations to align with recent changes in the database schema.
- Ensure clarity on archival strategies and their configurations.
This commit is contained in:
Grzegorz Michalski
2026-03-18 18:19:04 +01:00
parent 896e67bcb9
commit ce9b6eeff6
25 changed files with 986 additions and 389 deletions

View File

@@ -10,13 +10,14 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate
- **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID
- **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG
- **Workflow Bypass**: IS_WORKFLOW_SUCCESS_REQUIRED flag allows archivization of files from non-DBT sources
- **Validation**: Automatic validation of strategy-specific configuration requirements
### Package Information
- **Schema**: CT_MRDS
- **Package**: FILE_ARCHIVER
- **Current Version**: 3.3.0
- **Current Version**: 3.4.0
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
### Critical Prerequisites
@@ -67,7 +68,7 @@ END;
| Strategy | WHERE Clause Logic | Configuration Required | Primary Use Case |
|----------|-------------------|----------------------|------------------|
| `THRESHOLD_BASED` | Days since workflow start > threshold | DAYS_FOR_ARCHIVE_THRESHOLD | Simple time-based archival |
| `THRESHOLD_BASED` | Days since workflow start > threshold | ARCHIVE_THRESHOLD_DAYS | Simple time-based archival |
| `MINIMUM_AGE_MONTHS` | Archive data older than X months (0=current month only) | MINIMUM_AGE_MONTHS (≥0) | All sources - flexible retention (0 for LM, 6 for CSDB) |
| `HYBRID` | Combines month boundary + minimum age | MINIMUM_AGE_MONTHS | Advanced retention scenarios |
@@ -77,14 +78,14 @@ Archives data based on number of days since workflow start.
**WHERE Clause**:
```sql
extract(day from (systimestamp - workflow_start)) > DAYS_FOR_ARCHIVE_THRESHOLD
extract(day from (systimestamp - workflow_start)) > ARCHIVE_THRESHOLD_DAYS
```
**Configuration**:
```sql
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'THRESHOLD_BASED',
DAYS_FOR_ARCHIVE_THRESHOLD = 30,
ARCHIVE_THRESHOLD_DAYS = 30,
MINIMUM_AGE_MONTHS = NULL
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'C2D_DATA'
@@ -185,17 +186,17 @@ END IF;
**Behavior**: Archives data **only when** at least one of the following thresholds is exceeded:
1. **FILES_COUNT_OVER_ARCHIVE_THRESHOLD** - Number of files eligible for archival
2. **ROWS_COUNT_OVER_ARCHIVE_THRESHOLD** - Number of rows eligible for archival
3. **BYTES_SUM_OVER_ARCHIVE_THRESHOLD** - Total size in bytes eligible for archival
1. **ARCHIVE_THRESHOLD_FILES_COUNT** - Number of files eligible for archival
2. **ARCHIVE_THRESHOLD_ROWS_COUNT** - Number of rows eligible for archival
3. **ARCHIVE_THRESHOLD_BYTES_SUM** - Total size in bytes eligible for archival
```sql
-- Executed for THRESHOLD_BASED and HYBRID strategies
IF vTableStat.OVER_ARCH_THRESOLD_FILE_COUNT >= vSourceFileConfig.FILES_COUNT_OVER_ARCHIVE_THRESHOLD THEN
IF vTableStat.OVER_ARCH_THRESOLD_FILE_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_FILES_COUNT THEN
vArchivalTriggeredBy := 'FILES_COUNT';
ELSIF vTableStat.OVER_ARCH_THRESOLD_ROW_COUNT >= vSourceFileConfig.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD THEN
ELSIF vTableStat.OVER_ARCH_THRESOLD_ROW_COUNT >= vSourceFileConfig.ARCHIVE_THRESHOLD_ROWS_COUNT THEN
vArchivalTriggeredBy := 'ROWS_COUNT';
ELSIF vTableStat.OVER_ARCH_THRESOLD_SIZE >= vSourceFileConfig.BYTES_SUM_OVER_ARCHIVE_THRESHOLD THEN
ELSIF vTableStat.OVER_ARCH_THRESOLD_SIZE >= vSourceFileConfig.ARCHIVE_THRESHOLD_BYTES_SUM THEN
vArchivalTriggeredBy := 'BYTES_SUM';
END IF;
```
@@ -206,9 +207,9 @@ END IF;
```sql
-- Set archival thresholds for THRESHOLD_BASED strategy
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET FILES_COUNT_OVER_ARCHIVE_THRESHOLD = 10, -- Archive when 10+ files eligible
ROWS_COUNT_OVER_ARCHIVE_THRESHOLD = 100000, -- Archive when 100k+ rows eligible
BYTES_SUM_OVER_ARCHIVE_THRESHOLD = 104857600 -- Archive when 100MB+ eligible
SET ARCHIVE_THRESHOLD_FILES_COUNT = 10, -- Archive when 10+ files eligible
ARCHIVE_THRESHOLD_ROWS_COUNT = 100000, -- Archive when 100k+ rows eligible
ARCHIVE_THRESHOLD_BYTES_SUM = 104857600 -- Archive when 100MB+ eligible
WHERE ARCHIVAL_STRATEGY = 'THRESHOLD_BASED'
AND TABLE_ID = 'YOUR_TABLE';
```
@@ -243,15 +244,15 @@ WHERE ...;
## Archival Control Configuration
### ARCHIVE_ENABLED Column
### IS_ARCHIVE_ENABLED Column
Controls whether archival is enabled for specific table configuration.
**Column**: `A_SOURCE_FILE_CONFIG.ARCHIVE_ENABLED` (VARCHAR2(1), DEFAULT 'Y')
**Column**: `A_SOURCE_FILE_CONFIG.IS_ARCHIVE_ENABLED` (CHAR(1), DEFAULT 'N' NOT NULL)
**Values**:
- `'Y'` (default) - Table is eligible for archival processing
- `'N'` - Table is excluded from archival (batch operations skip this config)
- `'Y'` - Table is eligible for archival processing
- `'N'` (default) - Table is excluded from archival (batch operations skip this config)
**Use Cases**:
- Disable archival for specific tables without removing configuration
@@ -262,7 +263,7 @@ Controls whether archival is enabled for specific table configuration.
```sql
-- Disable archival for specific table
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVE_ENABLED = 'N'
SET IS_ARCHIVE_ENABLED = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'CSDB'
AND TABLE_ID = 'CSDB_DEBT';
@@ -270,7 +271,7 @@ COMMIT;
-- Re-enable archival
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVE_ENABLED = 'Y'
SET IS_ARCHIVE_ENABLED = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'CSDB'
AND TABLE_ID = 'CSDB_DEBT';
@@ -280,22 +281,22 @@ COMMIT;
SELECT
SOURCE_FILE_ID,
TABLE_ID,
ARCHIVE_ENABLED,
IS_ARCHIVE_ENABLED,
ARCHIVAL_STRATEGY
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY SOURCE_FILE_ID, TABLE_ID;
```
### KEEP_IN_TRASH Column
### IS_KEEP_IN_TRASH Column
Controls TRASH folder retention policy for archived files.
**Column**: `A_SOURCE_FILE_CONFIG.KEEP_IN_TRASH` (VARCHAR2(1), DEFAULT 'Y')
**Column**: `A_SOURCE_FILE_CONFIG.IS_KEEP_IN_TRASH` (CHAR(1), DEFAULT 'N' NOT NULL)
**Values**:
- `'Y'` (default) - CSV files kept in TRASH folder after archival (status: ARCHIVED_AND_TRASHED)
- `'N'` - CSV files deleted from TRASH folder after archival (status: ARCHIVED_AND_PURGED)
- `'Y'` - CSV files kept in TRASH folder after archival (status: ARCHIVED_AND_TRASHED)
- `'N'` (default) - CSV files deleted from TRASH folder after archival (status: ARCHIVED_AND_PURGED)
**Benefits of TRASH Retention (TRUE)**:
- Safety net for rollback if archival issues discovered
@@ -311,7 +312,7 @@ Controls TRASH folder retention policy for archived files.
```sql
-- Production: Keep files in TRASH (recommended)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'Y'
SET IS_KEEP_IN_TRASH = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'LM'
AND TABLE_ID LIKE 'LM_%';
@@ -319,20 +320,64 @@ COMMIT;
-- Test environment: Cleanup TRASH to save storage
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'N'
SET IS_KEEP_IN_TRASH = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'TEST_SOURCE';
COMMIT;
-- Bulk configuration by source
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'Y'
SET IS_KEEP_IN_TRASH = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID IN ('CSDB', 'C2D', 'LM');
COMMIT;
```
### IS_WORKFLOW_SUCCESS_REQUIRED Column
## Data Lifecycle Workflow
Controls whether archivization requires `WORKFLOW_SUCCESSFUL='Y'` in A_WORKFLOW_HISTORY. Added in MARS-1409.
**Column**: `A_SOURCE_FILE_CONFIG.IS_WORKFLOW_SUCCESS_REQUIRED` (CHAR(1), DEFAULT 'Y' NOT NULL)
**Values**:
- `'Y'` (default) - Only files with `WORKFLOW_SUCCESSFUL='Y'` are eligible for archivization (standard Airflow+DBT flow)
- `'N'` - Archivization proceeds regardless of workflow completion status (bypass for manual/non-DBT sources)
**Use Cases**:
- `'Y'`: All standard INBOX-validated sources (LM, CSDB, C2D) - ensures only fully-processed files are archived
- `'N'`: Legacy data migrated via `DATA_EXPORTER`, manual uploads, or any source without DBT workflow tracking
**GATHER_TABLE_STAT behavior**:
- `'Y'`: Statistics (file count, row count, byte sum) counted only from files with `WORKFLOW_SUCCESSFUL='Y'`
- `'N'`: Statistics counted from all INGESTED files regardless of workflow outcome
**Configuration Example**:
```sql
-- Standard source: require DBT workflow completion (default)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_WORKFLOW_SUCCESS_REQUIRED = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'LM';
COMMIT;
-- Non-DBT source: bypass workflow check
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET IS_WORKFLOW_SUCCESS_REQUIRED = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'MANUAL_UPLOAD';
COMMIT;
-- Or set at configuration time via ADD_SOURCE_FILE_CONFIG
CALL CT_MRDS.FILE_MANAGER.ADD_SOURCE_FILE_CONFIG(
pSourceKey => 'MANUAL',
pSourceFileType => 'INPUT',
pSourceFileId => 'MANUAL_UPLOAD',
pSourceFileDesc => 'Manual data upload without DBT',
pSourceFileNamePattern => 'manual_*.csv',
pTableId => 'MY_TABLE',
pTemplateTableName => 'CT_ET_TEMPLATES.MY_TABLE',
pIsWorkflowSuccessRequired => 'N' -- bypass workflow check
);
```
### Status Tracking in A_SOURCE_FILE_RECEIVED
@@ -348,7 +393,7 @@ INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional)
**Status Descriptions**:
- **INGESTED**: File successfully processed through Airflow+DBT, residing in ODS bucket
- **ARCHIVED_AND_TRASHED**: File archived to Parquet in ARCHIVE bucket, CSV retained in TRASH folder (DATA bucket)
- **ARCHIVED_AND_PURGED**: File archived to Parquet, CSV deleted from TRASH folder (when KEEP_IN_TRASH='N')
- **ARCHIVED_AND_PURGED**: File archived to Parquet, CSV deleted from TRASH folder (when IS_KEEP_IN_TRASH='N')
**Associated Columns Updated During Archival**:
```sql
@@ -390,9 +435,9 @@ https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARC
2.1 TRASH Subfolder (DATA Bucket - File Retention)
├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME)
├─ Stores CSV files after archival to Parquet
├─ Status: ARCHIVED_AND_TRASHED (default, controlled by KEEP_IN_TRASH config)
├─ Status: ARCHIVED_AND_TRASHED (default, controlled by IS_KEEP_IN_TRASH config)
├─ Enables rollback if archival issues occur
└─ Optional cleanup: ARCHIVED_AND_PURGED (when KEEP_IN_TRASH = 'N')
└─ Optional cleanup: ARCHIVED_AND_PURGED (when IS_KEEP_IN_TRASH = 'N')
3. ARCHIVE Bucket (Long-term Storage)
├─ Historical data in Parquet format
@@ -402,14 +447,14 @@ https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARC
**Key Procedures**:
- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey)` - Main archival procedure using strategy-specific WHERE clause
- TRASH folder retention controlled by `KEEP_IN_TRASH` column in A_SOURCE_FILE_CONFIG
- TRASH folder retention controlled by `IS_KEEP_IN_TRASH` column in A_SOURCE_FILE_CONFIG
- `ARCHIVE_ALL(pSourceFileConfigKey, pSourceKey, pArchiveAll)` - Batch archival with 3-level granularity and error handling
- **Level 3 (Highest Priority)**: Single configuration via `pSourceFileConfigKey`
- **Level 2 (Medium Priority)**: All configurations for source via `pSourceKey`
- **Level 1 (Lowest Priority)**: All configurations system-wide via `pArchiveAll`
- **Error Handling**: Continues processing other tables on individual failures
- **Filtering**: Respects `ARCHIVE_ENABLED='Y'` (skips disabled configurations)
- **Individual TRASH Policy**: Each table's `KEEP_IN_TRASH` setting applied independently
- **Filtering**: Respects `IS_ARCHIVE_ENABLED='Y'` (skips disabled configurations)
- **Individual TRASH Policy**: Each table's `IS_KEEP_IN_TRASH` setting applied independently
- **Summary Reporting**: Returns counts of Archived/Skipped/Failed tables
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
@@ -419,7 +464,7 @@ https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/namespace/b/archive/o/ARC
**Archival Execution**:
```sql
-- Single table archival (TRASH retention controlled by KEEP_IN_TRASH config)
-- Single table archival (TRASH retention controlled by IS_KEEP_IN_TRASH config)
BEGIN
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
pSourceFileConfigKey => vSourceFileConfigKey
@@ -451,11 +496,11 @@ END;
**Strategy-Based Filtering**:
- Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
- GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
- Only tables with ARCHIVE_ENABLED = 'Y' are processed
- Only tables with IS_ARCHIVE_ENABLED = 'Y' are processed
- Data matching criteria moved from ODS to ARCHIVE bucket
- CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/)
- Parquet format with Hive-style partitioning applied to ARCHIVE bucket
- TRASH retention controlled by KEEP_IN_TRASH column in A_SOURCE_FILE_CONFIG
- TRASH retention controlled by IS_KEEP_IN_TRASH column in A_SOURCE_FILE_CONFIG
### Automatic Rollback Mechanism
@@ -465,7 +510,7 @@ FILE_ARCHIVER implements **automatic rollback** to ensure data integrity if arch
1. **Export to ARCHIVE**: Data exported to Parquet format in ARCHIVE bucket
2. **Status Update**: A_SOURCE_FILE_RECEIVED records updated to 'ARCHIVED_AND_TRASHED'
3. **Move to TRASH**: CSV files moved from ODS to TRASH folder (DATA bucket)
4. **Optional Cleanup**: If KEEP_IN_TRASH='N', files deleted from TRASH
4. **Optional Cleanup**: If IS_KEEP_IN_TRASH='N', files deleted from TRASH
**Automatic Rollback Trigger**:
If **any error occurs** during step 3 (Move to TRASH), the system:
@@ -655,7 +700,7 @@ SELECT
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
DAYS_FOR_ARCHIVE_THRESHOLD
ARCHIVE_THRESHOLD_DAYS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY A_SOURCE_KEY, SOURCE_FILE_ID, TABLE_ID;
@@ -679,8 +724,8 @@ ORDER BY ARCHIVAL_STRATEGY;
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS',
MINIMUM_AGE_MONTHS = 6,
ARCHIVE_ENABLED = 'Y', -- Enable archival
KEEP_IN_TRASH = 'Y' -- Keep files in TRASH for safety
IS_ARCHIVE_ENABLED = 'Y', -- Enable archival
IS_KEEP_IN_TRASH = 'Y' -- Keep files in TRASH for safety
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'CSDB'
AND TABLE_ID = 'CSDB_DEBT';
@@ -688,13 +733,13 @@ COMMIT;
-- Disable archival temporarily for troubleshooting
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET ARCHIVE_ENABLED = 'N' -- Batch operations will skip this table
SET IS_ARCHIVE_ENABLED = 'N' -- Batch operations will skip this table
WHERE TABLE_ID = 'CSDB_DEBT';
COMMIT;
-- Configure TRASH cleanup for test environment
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'N' -- Delete files from TRASH after archival
SET IS_KEEP_IN_TRASH = 'N' -- Delete files from TRASH after archival
WHERE SOURCE_FILE_TYPE = 'INPUT'
AND SOURCE_FILE_ID = 'TEST_SOURCE';
COMMIT;
@@ -705,21 +750,21 @@ SELECT
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
ARCHIVE_ENABLED,
KEEP_IN_TRASH
IS_ARCHIVE_ENABLED,
IS_KEEP_IN_TRASH
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
ORDER BY SOURCE_FILE_ID, TABLE_ID;
-- Summary by archival status
SELECT
ARCHIVE_ENABLED,
KEEP_IN_TRASH,
IS_ARCHIVE_ENABLED,
IS_KEEP_IN_TRASH,
COUNT(*) AS TABLE_COUNT
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE SOURCE_FILE_TYPE = 'INPUT'
GROUP BY ARCHIVE_ENABLED, KEEP_IN_TRASH
ORDER BY ARCHIVE_ENABLED DESC, KEEP_IN_TRASH DESC;
GROUP BY IS_ARCHIVE_ENABLED, IS_KEEP_IN_TRASH
ORDER BY IS_ARCHIVE_ENABLED DESC, IS_KEEP_IN_TRASH DESC;
```
## Release 01 Configuration
@@ -829,11 +874,11 @@ JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HIST
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS'
AND SFR.PROCESSING_STATUS = 'INGESTED'
AND SFC.ARCHIVE_ENABLED = 'Y'
AND SFC.IS_ARCHIVE_ENABLED = 'Y'
ORDER BY LH.LOAD_START;
-- Note: MINIMUM_AGE_MONTHS archives immediately (threshold-independent)
-- If files not archived, check ARCHIVE_ENABLED='Y' and run ARCHIVE_TABLE_DATA
-- If files not archived, check IS_ARCHIVE_ENABLED='Y' and run ARCHIVE_TABLE_DATA
```
**Scenario B**: **THRESHOLD_BASED** or **HYBRID** strategy not archiving
@@ -843,9 +888,9 @@ SELECT
SFC.SOURCE_FILE_ID,
SFC.TABLE_ID,
SFC.ARCHIVAL_STRATEGY,
SFC.FILES_COUNT_OVER_ARCHIVE_THRESHOLD AS FILE_THRESHOLD,
SFC.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD AS ROW_THRESHOLD,
SFC.BYTES_SUM_OVER_ARCHIVE_THRESHOLD AS BYTE_THRESHOLD,
SFC.ARCHIVE_THRESHOLD_FILES_COUNT AS FILE_THRESHOLD,
SFC.ARCHIVE_THRESHOLD_ROWS_COUNT AS ROW_THRESHOLD,
SFC.ARCHIVE_THRESHOLD_BYTES_SUM AS BYTE_THRESHOLD,
COUNT(SFR.A_SOURCE_FILE_RECEIVED_KEY) AS CURRENT_FILES,
SUM(SFR.TOTAL_RECORDS) AS CURRENT_ROWS,
SUM(SFR.FILE_SIZE_BYTES) AS CURRENT_BYTES
@@ -854,13 +899,13 @@ LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY
AND SFR.PROCESSING_STATUS = 'INGESTED'
WHERE SFC.ARCHIVAL_STRATEGY IN ('THRESHOLD_BASED', 'HYBRID')
AND SFC.ARCHIVE_ENABLED = 'Y'
AND SFC.IS_ARCHIVE_ENABLED = 'Y'
AND SFC.A_SOURCE_FILE_CONFIG_KEY = :yourConfigKey
GROUP BY
SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY,
SFC.FILES_COUNT_OVER_ARCHIVE_THRESHOLD,
SFC.ROWS_COUNT_OVER_ARCHIVE_THRESHOLD,
SFC.BYTES_SUM_OVER_ARCHIVE_THRESHOLD;
SFC.ARCHIVE_THRESHOLD_FILES_COUNT,
SFC.ARCHIVE_THRESHOLD_ROWS_COUNT,
SFC.ARCHIVE_THRESHOLD_BYTES_SUM;
-- Expected: At least ONE threshold (FILE/ROW/BYTE) must be exceeded
-- If no threshold exceeded, archival will NOT trigger (threshold-dependent behavior)
@@ -903,7 +948,7 @@ WHERE object_name LIKE 'ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTI
**Symptoms**: Files not deleted from TRASH after archival
**Cause**: Configuration has `KEEP_IN_TRASH='Y'` (retain files in TRASH)
**Cause**: Configuration has `IS_KEEP_IN_TRASH='Y'` (retain files in TRASH)
**Verification**:
```sql
@@ -911,8 +956,8 @@ WHERE object_name LIKE 'ARCHIVE/LM/STANDING_FACILITIES/PARTITION_YEAR=2026/PARTI
SELECT
SOURCE_FILE_ID,
TABLE_ID,
KEEP_IN_TRASH,
CASE KEEP_IN_TRASH
IS_KEEP_IN_TRASH,
CASE IS_KEEP_IN_TRASH
WHEN 'Y' THEN 'Files RETAINED in TRASH (manual purge required)'
WHEN 'N' THEN 'Files DELETED immediately after archival'
END AS TRASH_BEHAVIOR
@@ -924,7 +969,7 @@ WHERE TABLE_ID = 'YOUR_TABLE';
```sql
-- Option A: Change configuration to auto-delete (permanent change)
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'N' -- Auto-delete from TRASH after archival
SET IS_KEEP_IN_TRASH = 'N' -- Auto-delete from TRASH after archival
WHERE TABLE_ID = 'YOUR_TABLE';
COMMIT;
@@ -997,7 +1042,7 @@ SELECT
TABLE_ID,
ARCHIVAL_STRATEGY,
MINIMUM_AGE_MONTHS,
DAYS_FOR_ARCHIVE_THRESHOLD
ARCHIVE_THRESHOLD_DAYS
FROM CT_MRDS.A_SOURCE_FILE_CONFIG
WHERE TABLE_ID = 'YOUR_TABLE';
@@ -1087,7 +1132,7 @@ SELECT
SFR.FILE_SIZE_BYTES,
SFR.UPDATED_AT AS ARCHIVED_AT,
TRUNC(SYSDATE - SFR.UPDATED_AT) AS DAYS_IN_TRASH,
SFC.KEEP_IN_TRASH AS TRASH_POLICY
SFC.IS_KEEP_IN_TRASH AS TRASH_POLICY
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED'
@@ -1102,8 +1147,8 @@ SELECT
SFC.SOURCE_FILE_ID,
SFC.TABLE_ID,
SFC.ARCHIVAL_STRATEGY,
SFC.ARCHIVE_ENABLED,
SFC.KEEP_IN_TRASH,
SFC.IS_ARCHIVE_ENABLED,
SFC.IS_KEEP_IN_TRASH,
COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'INGESTED' THEN 1 END) AS PENDING_ARCHIVE,
COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED' THEN 1 END) AS IN_TRASH,
COUNT(CASE WHEN SFR.PROCESSING_STATUS = 'ARCHIVED_AND_PURGED' THEN 1 END) AS PURGED,
@@ -1113,7 +1158,7 @@ LEFT JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = S
WHERE SFC.SOURCE_FILE_TYPE = 'INPUT'
GROUP BY
SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.ARCHIVAL_STRATEGY,
SFC.ARCHIVE_ENABLED, SFC.KEEP_IN_TRASH
SFC.IS_ARCHIVE_ENABLED, SFC.IS_KEEP_IN_TRASH
ORDER BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID;
```
@@ -1134,7 +1179,7 @@ FROM CT_MRDS.A_SOURCE_FILE_CONFIG SFC
JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED SFR ON SFC.A_SOURCE_FILE_CONFIG_KEY = SFR.A_SOURCE_FILE_CONFIG_KEY
JOIN CT_ODS.A_LOAD_HISTORY LH ON SFR.A_WORKFLOW_HISTORY_KEY = LH.A_WORKFLOW_HISTORY_KEY
WHERE SFC.ARCHIVAL_STRATEGY = 'MINIMUM_AGE_MONTHS'
AND SFC.ARCHIVE_ENABLED = 'Y'
AND SFC.IS_ARCHIVE_ENABLED = 'Y'
AND SFR.PROCESSING_STATUS = 'INGESTED'
AND LH.LOAD_START < ADD_MONTHS(TRUNC(SYSDATE, 'MM'), -SFC.MINIMUM_AGE_MONTHS)
GROUP BY SFC.SOURCE_FILE_ID, SFC.TABLE_ID, SFC.MINIMUM_AGE_MONTHS
@@ -1173,20 +1218,30 @@ SELECT
ROUND(SUM(SFR.FILE_SIZE_BYTES) / 1024 / 1024 / 1024, 2) AS SIZE_GB,
MIN(SFR.UPDATED_AT) AS OLDEST_IN_TRASH,
MAX(SFR.UPDATED_AT) AS NEWEST_IN_TRASH,
SFC.KEEP_IN_TRASH AS POLICY
SFC.IS_KEEP_IN_TRASH AS POLICY
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED SFR
JOIN CT_MRDS.A_SOURCE_FILE_CONFIG SFC ON SFR.A_SOURCE_FILE_CONFIG_KEY = SFC.A_SOURCE_FILE_CONFIG_KEY
WHERE SFR.PROCESSING_STATUS = 'ARCHIVED_AND_TRASHED'
GROUP BY SFC.SOURCE_FILE_ID, SFC.KEEP_IN_TRASH
GROUP BY SFC.SOURCE_FILE_ID, SFC.IS_KEEP_IN_TRASH
ORDER BY SIZE_GB DESC;
```
## Version History
### v3.3.0 (Current - 2026-02-11)
### v3.4.0 (Current - 2026-03-17)
- **MARS-1409**: Added `IS_WORKFLOW_SUCCESS_REQUIRED` flag to A_SOURCE_FILE_CONFIG
- `'Y'` (default) = archivization requires `WORKFLOW_SUCCESSFUL='Y'` in A_WORKFLOW_HISTORY (standard Airflow+DBT flow)
- `'N'` = archive regardless of workflow status (bypass for manual/non-DBT sources)
- `IS_WORKFLOW_SUCCESS_REQUIRED` stored in A_TABLE_STAT and A_TABLE_STAT_HIST at statistics gather time
- GATHER_TABLE_STAT: conditional `WORKFLOW_SUCCESSFUL='Y'` filter controlled by the flag
- ARCHIVE_TABLE_DATA: conditional `WORKFLOW_SUCCESSFUL='Y'` filter controlled by the flag
- Added `pIsWorkflowSuccessRequired` parameter to FILE_MANAGER.ADD_SOURCE_FILE_CONFIG
- FILE_MANAGER updated to v3.6.2+
### v3.3.0 (2026-02-11)
- **BREAKING CHANGE**: Removed `pKeepInTrash` parameter from ARCHIVE_TABLE_DATA
- Added `ARCHIVE_ENABLED` column to A_SOURCE_FILE_CONFIG for selective archiving control
- Added `KEEP_IN_TRASH` column to A_SOURCE_FILE_CONFIG (replaces pKeepInTrash parameter)
- Added `IS_ARCHIVE_ENABLED` column to A_SOURCE_FILE_CONFIG for selective archiving control
- Added `IS_KEEP_IN_TRASH` column to A_SOURCE_FILE_CONFIG (replaces pKeepInTrash parameter)
- Added batch procedures with 3-level granularity (config/source/all):
- ARCHIVE_ALL - Batch archival procedure
- GATHER_TABLE_STAT_ALL - Batch statistics procedure
@@ -1226,7 +1281,7 @@ ORDER BY SIZE_GB DESC;
### v2.0.0 (Legacy)
- Initial FILE_ARCHIVER package
- THRESHOLD_BASED archival only
- Fixed DAYS_FOR_ARCHIVE_THRESHOLD configuration
- Fixed ARCHIVE_THRESHOLD_DAYS configuration
## Related Documentation
@@ -1337,7 +1392,7 @@ ORDER BY SIZE_GB DESC;
### TRASH Folder Retention Best Practices
1. **Default Behavior (KEEP_IN_TRASH = 'Y' - Recommended)**:
1. **Default Behavior (IS_KEEP_IN_TRASH = 'Y' - Recommended)**:
- Keeps CSV files in TRASH folder after archival
- Provides safety net for rollback if archival issues occur
- Supports compliance and audit requirements
@@ -1346,11 +1401,11 @@ ORDER BY SIZE_GB DESC;
- Configuration:
```sql
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'Y'
SET IS_KEEP_IN_TRASH = 'Y'
WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE';
```
2. **TRASH Cleanup (KEEP_IN_TRASH = 'N')**:
2. **TRASH Cleanup (IS_KEEP_IN_TRASH = 'N')**:
- Deletes CSV files from TRASH folder after successful archival
- Reduces storage costs in DATA bucket
- Status: ARCHIVED_AND_PURGED
@@ -1358,7 +1413,7 @@ ORDER BY SIZE_GB DESC;
- Configuration:
```sql
UPDATE CT_MRDS.A_SOURCE_FILE_CONFIG
SET KEEP_IN_TRASH = 'N'
SET IS_KEEP_IN_TRASH = 'N'
WHERE SOURCE_FILE_TYPE = 'INPUT' AND TABLE_ID = 'YOUR_TABLE';
```