aktualizacja dokumentacji w związku z TRASH i nowymi statusami plików.
This commit is contained in:
@@ -18,7 +18,7 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate
|
||||
|
||||
- **Schema**: CT_MRDS
|
||||
- **Package**: FILE_ARCHIVER
|
||||
- **Current Version**: 3.1.0
|
||||
- **Current Version**: 3.2.0
|
||||
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
|
||||
|
||||
### Critical Prerequisites
|
||||
@@ -177,30 +177,46 @@ WHERE ...;
|
||||
├─ Active data processing (Airflow + DBT)
|
||||
├─ External tables read data from bucket
|
||||
├─ Status: INGESTED
|
||||
└─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
|
||||
├─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
|
||||
└─ CSV files moved to TRASH subfolder (ODS → TRASH/)
|
||||
|
||||
2.1 TRASH Subfolder (DATA Bucket - File Retention)
|
||||
├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME)
|
||||
├─ Stores CSV files after archival to Parquet
|
||||
├─ Status: ARCHIVED_AND_TRASHED (default retention)
|
||||
├─ Enables rollback if archival issues occur
|
||||
└─ Optional cleanup: ARCHIVED_AND_PURGED (pKeepInTrash=FALSE)
|
||||
|
||||
3. ARCHIVE Bucket (Long-term Storage)
|
||||
├─ Historical data in Parquet format
|
||||
├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH=
|
||||
├─ Status: ARCHIVED
|
||||
├─ Status: ARCHIVED_AND_TRASHED or ARCHIVED_AND_PURGED
|
||||
└─ Optimized for big data analytics (Spark, Hive)
|
||||
```
|
||||
|
||||
### Archival Process
|
||||
|
||||
The FILE_ARCHIVER package automatically manages data movement from ODS to ARCHIVE:
|
||||
|
||||
**Key Procedures**:
|
||||
- `ARCHIVE_TABLE_DATA` - Main archival procedure using strategy-specific WHERE clause
|
||||
- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey, pKeepInTrash)` - Main archival procedure using strategy-specific WHERE clause
|
||||
- `pKeepInTrash` (BOOLEAN, DEFAULT TRUE) - Controls TRASH folder retention
|
||||
- TRUE: Files kept in TRASH folder for safety and rollback capability (default)
|
||||
- FALSE: Files deleted from TRASH folder after successful archival
|
||||
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
|
||||
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
|
||||
|
||||
**Archival Execution**:
|
||||
```sql
|
||||
-- Triggered by FILE_MANAGER or scheduled job
|
||||
-- Default behavior: Keep files in TRASH folder (ARCHIVED_AND_TRASHED status)
|
||||
BEGIN
|
||||
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||
pSourceFileConfig => vSourceFileConfigRecord
|
||||
pSourceFileConfigKey => vSourceFileConfigKey,
|
||||
pKeepInTrash => TRUE -- DEFAULT value
|
||||
);
|
||||
END;
|
||||
/
|
||||
|
||||
-- Optional: Delete files from TRASH after archival (ARCHIVED_AND_PURGED status)
|
||||
BEGIN
|
||||
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||
pSourceFileConfigKey => vSourceFileConfigKey,
|
||||
pKeepInTrash => FALSE -- Cleanup TRASH folder
|
||||
);
|
||||
END;
|
||||
/
|
||||
@@ -210,7 +226,9 @@ END;
|
||||
- Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
|
||||
- GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
|
||||
- Data matching criteria moved from ODS to ARCHIVE bucket
|
||||
- Parquet format with Hive-style partitioning applied
|
||||
- CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/)
|
||||
- Parquet format with Hive-style partitioning applied to ARCHIVE bucket
|
||||
- TRASH retention controlled by pKeepInTrash parameter
|
||||
|
||||
## Configuration Examples
|
||||
|
||||
@@ -527,8 +545,11 @@ WHERE object_name = 'FILE_ARCHIVER';
|
||||
### OCI Buckets
|
||||
- **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`)
|
||||
- **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`)
|
||||
- **TRASH**: File retention subfolder in DATA bucket (`'TRASH/{SOURCE}/{TABLE_NAME}/'`) - CSV files after archival
|
||||
- **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`)
|
||||
|
||||
**Note**: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket for file retention and rollback capability.
|
||||
|
||||
## Best Practices
|
||||
|
||||
### Strategy Selection Guidelines
|
||||
@@ -609,10 +630,53 @@ WHERE object_name = 'FILE_ARCHIVER';
|
||||
- Check for tables without archival configuration
|
||||
- Optimize MINIMUM_AGE_MONTHS based on actual usage patterns
|
||||
|
||||
### TRASH Folder Retention Best Practices
|
||||
|
||||
1. **Default Behavior (pKeepInTrash = TRUE - Recommended)**:
|
||||
- Keeps CSV files in TRASH folder after archival
|
||||
- Provides safety net for rollback if archival issues occur
|
||||
- Supports compliance and audit requirements
|
||||
- Status: ARCHIVED_AND_TRASHED
|
||||
- Use for: Production environments, regulatory compliance, critical data
|
||||
|
||||
2. **TRASH Cleanup (pKeepInTrash = FALSE)**:
|
||||
- Deletes CSV files from TRASH folder after successful archival
|
||||
- Reduces storage costs in DATA bucket
|
||||
- Status: ARCHIVED_AND_PURGED
|
||||
- Use for: Non-critical data, storage optimization, test environments
|
||||
|
||||
3. **Monitoring TRASH Folder**:
|
||||
```sql
|
||||
-- Check files in TRASH retention
|
||||
SELECT
|
||||
SOURCE_FILE_NAME,
|
||||
PROCESSING_STATUS,
|
||||
ARCH_FILE_NAME,
|
||||
PARTITION_YEAR,
|
||||
PARTITION_MONTH
|
||||
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
|
||||
WHERE PROCESSING_STATUS IN ('ARCHIVED_AND_TRASHED', 'ARCHIVED_AND_PURGED')
|
||||
AND RECEPTION_DATE > SYSDATE - 30
|
||||
ORDER BY PROCESSING_STATUS, RECEPTION_DATE DESC;
|
||||
```
|
||||
|
||||
4. **TRASH Folder Structure**:
|
||||
```
|
||||
DATA Bucket:
|
||||
├── ODS/LM/STANDING_FACILITIES/file.csv -- Active operational data
|
||||
└── TRASH/LM/STANDING_FACILITIES/file.csv -- Retained after archival
|
||||
|
||||
ARCHIVE Bucket:
|
||||
└── ARCHIVE/LM/STANDING_FACILITIES/
|
||||
└── PARTITION_YEAR=2026/
|
||||
└── PARTITION_MONTH=02/
|
||||
└── *.parquet -- Archived data
|
||||
```
|
||||
|
||||
## Author
|
||||
|
||||
Created by: Grzegorz Michalski
|
||||
Date: 2026-02-04
|
||||
Date: 2026-02-06
|
||||
Schema: CT_MRDS
|
||||
Package: FILE_ARCHIVER
|
||||
Version: 3.1.0
|
||||
Version: 3.2.0
|
||||
|
||||
Reference in New Issue
Block a user