aktualizacja dokumentacji w związku z TRASH i nowymi statusami plików.
This commit is contained in:
@@ -18,7 +18,7 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate
|
|||||||
|
|
||||||
- **Schema**: CT_MRDS
|
- **Schema**: CT_MRDS
|
||||||
- **Package**: FILE_ARCHIVER
|
- **Package**: FILE_ARCHIVER
|
||||||
- **Current Version**: 3.1.0
|
- **Current Version**: 3.2.0
|
||||||
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
|
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
|
||||||
|
|
||||||
### Critical Prerequisites
|
### Critical Prerequisites
|
||||||
@@ -177,30 +177,46 @@ WHERE ...;
|
|||||||
├─ Active data processing (Airflow + DBT)
|
├─ Active data processing (Airflow + DBT)
|
||||||
├─ External tables read data from bucket
|
├─ External tables read data from bucket
|
||||||
├─ Status: INGESTED
|
├─ Status: INGESTED
|
||||||
└─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
|
├─ FILE_ARCHIVER.ARCHIVE_TABLE_DATA archives based on strategy
|
||||||
|
└─ CSV files moved to TRASH subfolder (ODS → TRASH/)
|
||||||
|
|
||||||
|
2.1 TRASH Subfolder (DATA Bucket - File Retention)
|
||||||
|
├─ Located in DATA bucket (e.g., TRASH/LM/TABLE_NAME)
|
||||||
|
├─ Stores CSV files after archival to Parquet
|
||||||
|
├─ Status: ARCHIVED_AND_TRASHED (default retention)
|
||||||
|
├─ Enables rollback if archival issues occur
|
||||||
|
└─ Optional cleanup: ARCHIVED_AND_PURGED (pKeepInTrash=FALSE)
|
||||||
|
|
||||||
3. ARCHIVE Bucket (Long-term Storage)
|
3. ARCHIVE Bucket (Long-term Storage)
|
||||||
├─ Historical data in Parquet format
|
├─ Historical data in Parquet format
|
||||||
├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH=
|
├─ Hive-style partitioning: PARTITION_YEAR=/PARTITION_MONTH=
|
||||||
├─ Status: ARCHIVED
|
├─ Status: ARCHIVED_AND_TRASHED or ARCHIVED_AND_PURGED
|
||||||
└─ Optimized for big data analytics (Spark, Hive)
|
└─ Optimized for big data analytics (Spark, Hive)
|
||||||
```
|
|
||||||
|
|
||||||
### Archival Process
|
|
||||||
|
|
||||||
The FILE_ARCHIVER package automatically manages data movement from ODS to ARCHIVE:
|
|
||||||
|
|
||||||
**Key Procedures**:
|
**Key Procedures**:
|
||||||
- `ARCHIVE_TABLE_DATA` - Main archival procedure using strategy-specific WHERE clause
|
- `ARCHIVE_TABLE_DATA(pSourceFileConfigKey, pKeepInTrash)` - Main archival procedure using strategy-specific WHERE clause
|
||||||
|
- `pKeepInTrash` (BOOLEAN, DEFAULT TRUE) - Controls TRASH folder retention
|
||||||
|
- TRUE: Files kept in TRASH folder for safety and rollback capability (default)
|
||||||
|
- FALSE: Files deleted from TRASH folder after successful archival
|
||||||
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
|
- `GET_ARCHIVAL_WHERE_CLAUSE` - Returns WHERE clause based on configured strategy
|
||||||
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
|
- `GATHER_TABLE_STAT` - Calculates archival statistics using strategy logic
|
||||||
|
|
||||||
**Archival Execution**:
|
**Archival Execution**:
|
||||||
```sql
|
```sql
|
||||||
-- Triggered by FILE_MANAGER or scheduled job
|
-- Default behavior: Keep files in TRASH folder (ARCHIVED_AND_TRASHED status)
|
||||||
BEGIN
|
BEGIN
|
||||||
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||||
pSourceFileConfig => vSourceFileConfigRecord
|
pSourceFileConfigKey => vSourceFileConfigKey,
|
||||||
|
pKeepInTrash => TRUE -- DEFAULT value
|
||||||
|
);
|
||||||
|
END;
|
||||||
|
/
|
||||||
|
|
||||||
|
-- Optional: Delete files from TRASH after archival (ARCHIVED_AND_PURGED status)
|
||||||
|
BEGIN
|
||||||
|
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||||
|
pSourceFileConfigKey => vSourceFileConfigKey,
|
||||||
|
pKeepInTrash => FALSE -- Cleanup TRASH folder
|
||||||
);
|
);
|
||||||
END;
|
END;
|
||||||
/
|
/
|
||||||
@@ -210,7 +226,9 @@ END;
|
|||||||
- Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
|
- Package retrieves ARCHIVAL_STRATEGY from A_SOURCE_FILE_CONFIG
|
||||||
- GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
|
- GET_ARCHIVAL_WHERE_CLAUSE generates appropriate WHERE clause
|
||||||
- Data matching criteria moved from ODS to ARCHIVE bucket
|
- Data matching criteria moved from ODS to ARCHIVE bucket
|
||||||
- Parquet format with Hive-style partitioning applied
|
- CSV files moved to TRASH subfolder in DATA bucket (ODS/ → TRASH/)
|
||||||
|
- Parquet format with Hive-style partitioning applied to ARCHIVE bucket
|
||||||
|
- TRASH retention controlled by pKeepInTrash parameter
|
||||||
|
|
||||||
## Configuration Examples
|
## Configuration Examples
|
||||||
|
|
||||||
@@ -527,8 +545,11 @@ WHERE object_name = 'FILE_ARCHIVER';
|
|||||||
### OCI Buckets
|
### OCI Buckets
|
||||||
- **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`)
|
- **INBOX**: Incoming file validation (`'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'`)
|
||||||
- **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`)
|
- **ODS/DATA**: Operational data processing (`'ODS/{SOURCE}/{TABLE_NAME}/'`)
|
||||||
|
- **TRASH**: File retention subfolder in DATA bucket (`'TRASH/{SOURCE}/{TABLE_NAME}/'`) - CSV files after archival
|
||||||
- **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`)
|
- **ARCHIVE**: Historical data storage (`'ARCHIVE/{SOURCE}/{TABLE_NAME}/PARTITION_YEAR=/PARTITION_MONTH=/'`)
|
||||||
|
|
||||||
|
**Note**: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket for file retention and rollback capability.
|
||||||
|
|
||||||
## Best Practices
|
## Best Practices
|
||||||
|
|
||||||
### Strategy Selection Guidelines
|
### Strategy Selection Guidelines
|
||||||
@@ -609,10 +630,53 @@ WHERE object_name = 'FILE_ARCHIVER';
|
|||||||
- Check for tables without archival configuration
|
- Check for tables without archival configuration
|
||||||
- Optimize MINIMUM_AGE_MONTHS based on actual usage patterns
|
- Optimize MINIMUM_AGE_MONTHS based on actual usage patterns
|
||||||
|
|
||||||
|
### TRASH Folder Retention Best Practices
|
||||||
|
|
||||||
|
1. **Default Behavior (pKeepInTrash = TRUE - Recommended)**:
|
||||||
|
- Keeps CSV files in TRASH folder after archival
|
||||||
|
- Provides safety net for rollback if archival issues occur
|
||||||
|
- Supports compliance and audit requirements
|
||||||
|
- Status: ARCHIVED_AND_TRASHED
|
||||||
|
- Use for: Production environments, regulatory compliance, critical data
|
||||||
|
|
||||||
|
2. **TRASH Cleanup (pKeepInTrash = FALSE)**:
|
||||||
|
- Deletes CSV files from TRASH folder after successful archival
|
||||||
|
- Reduces storage costs in DATA bucket
|
||||||
|
- Status: ARCHIVED_AND_PURGED
|
||||||
|
- Use for: Non-critical data, storage optimization, test environments
|
||||||
|
|
||||||
|
3. **Monitoring TRASH Folder**:
|
||||||
|
```sql
|
||||||
|
-- Check files in TRASH retention
|
||||||
|
SELECT
|
||||||
|
SOURCE_FILE_NAME,
|
||||||
|
PROCESSING_STATUS,
|
||||||
|
ARCH_FILE_NAME,
|
||||||
|
PARTITION_YEAR,
|
||||||
|
PARTITION_MONTH
|
||||||
|
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
|
||||||
|
WHERE PROCESSING_STATUS IN ('ARCHIVED_AND_TRASHED', 'ARCHIVED_AND_PURGED')
|
||||||
|
AND RECEPTION_DATE > SYSDATE - 30
|
||||||
|
ORDER BY PROCESSING_STATUS, RECEPTION_DATE DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **TRASH Folder Structure**:
|
||||||
|
```
|
||||||
|
DATA Bucket:
|
||||||
|
├── ODS/LM/STANDING_FACILITIES/file.csv -- Active operational data
|
||||||
|
└── TRASH/LM/STANDING_FACILITIES/file.csv -- Retained after archival
|
||||||
|
|
||||||
|
ARCHIVE Bucket:
|
||||||
|
└── ARCHIVE/LM/STANDING_FACILITIES/
|
||||||
|
└── PARTITION_YEAR=2026/
|
||||||
|
└── PARTITION_MONTH=02/
|
||||||
|
└── *.parquet -- Archived data
|
||||||
|
```
|
||||||
|
|
||||||
## Author
|
## Author
|
||||||
|
|
||||||
Created by: Grzegorz Michalski
|
Created by: Grzegorz Michalski
|
||||||
Date: 2026-02-04
|
Date: 2026-02-06
|
||||||
Schema: CT_MRDS
|
Schema: CT_MRDS
|
||||||
Package: FILE_ARCHIVER
|
Package: FILE_ARCHIVER
|
||||||
Version: 3.1.0
|
Version: 3.2.0
|
||||||
|
|||||||
@@ -371,11 +371,14 @@ INBOX Bucket - Pattern: 'INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/'
|
|||||||
└── {pTableId}/ -- e.g., "A_UC_DISSEM_METADATA_LOADS", "STANDING_FACILITIES"
|
└── {pTableId}/ -- e.g., "A_UC_DISSEM_METADATA_LOADS", "STANDING_FACILITIES"
|
||||||
└── files matching {pSourceFileNamePattern}
|
└── files matching {pSourceFileNamePattern}
|
||||||
|
|
||||||
ODS Bucket - Pattern: 'ODS/{SOURCE}/{TABLE_NAME}/'
|
DATA Bucket - Patterns: 'ODS/{SOURCE}/{TABLE_NAME}/' and 'TRASH/{SOURCE}/{TABLE_NAME}/'
|
||||||
└── ODS/
|
├── ODS/
|
||||||
|
│ └── {pSourceKey}/ -- e.g., "C2D", "LM"
|
||||||
|
│ └── {pTableId}/ -- e.g., "A_UC_DISSEM_METADATA_LOADS", "STANDING_FACILITIES"
|
||||||
|
│ └── processed files
|
||||||
|
└── TRASH/ -- File retention subfolder (not a separate bucket)
|
||||||
└── {pSourceKey}/ -- e.g., "C2D", "LM"
|
└── {pSourceKey}/ -- e.g., "C2D", "LM"
|
||||||
└── {pTableId}/ -- e.g., "A_UC_DISSEM_METADATA_LOADS", "STANDING_FACILITIES"
|
└── {pTableId}/ -- CSV files after archival (ARCHIVED_AND_TRASHED status)
|
||||||
└── processed files
|
|
||||||
|
|
||||||
ARCHIVE Bucket - Pattern: 'ARCHIVE/{SOURCE}/{TABLE_NAME}/'
|
ARCHIVE Bucket - Pattern: 'ARCHIVE/{SOURCE}/{TABLE_NAME}/'
|
||||||
└── ARCHIVE/
|
└── ARCHIVE/
|
||||||
@@ -389,9 +392,11 @@ ARCHIVE Bucket - Pattern: 'ARCHIVE/{SOURCE}/{TABLE_NAME}/'
|
|||||||
**Critical Path Pattern Requirements:**
|
**Critical Path Pattern Requirements:**
|
||||||
- **INBOX** requires full 3-level path: `INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/`
|
- **INBOX** requires full 3-level path: `INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_NAME}/`
|
||||||
- **ODS** uses simplified 2-level path: `ODS/{SOURCE}/{TABLE_NAME}/` (no SOURCE_FILE_ID)
|
- **ODS** uses simplified 2-level path: `ODS/{SOURCE}/{TABLE_NAME}/` (no SOURCE_FILE_ID)
|
||||||
|
- **TRASH** uses simplified 2-level path: `TRASH/{SOURCE}/{TABLE_NAME}/` (subfolder in DATA bucket)
|
||||||
- **ARCHIVE** uses simplified 2-level path: `ARCHIVE/{SOURCE}/{TABLE_NAME}/` (no SOURCE_FILE_ID)
|
- **ARCHIVE** uses simplified 2-level path: `ARCHIVE/{SOURCE}/{TABLE_NAME}/` (no SOURCE_FILE_ID)
|
||||||
- **All patterns are mandatory** - no simplified versions allowed
|
- **All patterns are mandatory** - no simplified versions allowed
|
||||||
- File names must match `pSourceFileNamePattern` for automatic processing
|
- File names must match `pSourceFileNamePattern` for automatic processing
|
||||||
|
- **Note**: TRASH is NOT a separate bucket - it's a subfolder within the DATA bucket
|
||||||
|
|
||||||
## Configuration Management Best Practices
|
## Configuration Management Best Practices
|
||||||
|
|
||||||
@@ -693,7 +698,10 @@ SELECT FILE_MANAGER.PROCESS_SOURCE_FILE(
|
|||||||
|
|
||||||
1. **File Arrival**: File is uploaded to Oracle Cloud Storage bucket
|
1. **File Arrival**: File is uploaded to Oracle Cloud Storage bucket
|
||||||
2. **Registration**: FILE_MANAGER.REGISTER_SOURCE_FILE_RECEIVED() creates record
|
2. **Registration**: FILE_MANAGER.REGISTER_SOURCE_FILE_RECEIVED() creates record
|
||||||
3. **Status**: RECEIVED → VALIDATED → READY_FOR_INGESTION → INGESTED → ARCHIVED
|
3. **Status**: RECEIVED → VALIDATED → READY_FOR_INGESTION → INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional)
|
||||||
|
- Legacy ARCHIVED status maintained for backward compatibility
|
||||||
|
- ARCHIVED_AND_TRASHED: Files archived to Parquet and kept in TRASH folder (default)
|
||||||
|
- ARCHIVED_AND_PURGED: Files archived to Parquet and deleted from TRASH folder
|
||||||
4. **External Table**: Created automatically based on template table
|
4. **External Table**: Created automatically based on template table
|
||||||
5. **Data Loading**: Data is loaded into target ODS schema
|
5. **Data Loading**: Data is loaded into target ODS schema
|
||||||
6. **Archival**: File is moved to archive bucket after processing
|
6. **Archival**: File is moved to archive bucket after processing
|
||||||
|
|||||||
@@ -164,7 +164,9 @@ ORDER BY RECEPTION_DATE DESC;
|
|||||||
| `VALIDATED` | File validation completed successfully | After successful validation |
|
| `VALIDATED` | File validation completed successfully | After successful validation |
|
||||||
| `READY_FOR_INGESTION` | File validated and prepared for Airflow+DBT processing | After successful validation and preparation |
|
| `READY_FOR_INGESTION` | File validated and prepared for Airflow+DBT processing | After successful validation and preparation |
|
||||||
| `INGESTED` | Data has been consumed/ingested by target system | After data consumption |
|
| `INGESTED` | Data has been consumed/ingested by target system | After data consumption |
|
||||||
| `ARCHIVED` | Data exported to PARQUET format and file moved to archival storage | Final archival state using FILE_ARCHIVER |
|
| `ARCHIVED` | (Legacy) Data exported to PARQUET format and file moved to archival storage | Legacy archival state (backward compatibility) |
|
||||||
|
| `ARCHIVED_AND_TRASHED` | Data archived to Parquet, CSV files kept in TRASH folder (default) | Archival with file retention using FILE_ARCHIVER |
|
||||||
|
| `ARCHIVED_AND_PURGED` | Data archived to Parquet, CSV files deleted from TRASH folder | Archival with TRASH cleanup (pKeepInTrash=FALSE) |
|
||||||
| `VALIDATION_FAILED` | File validation failed | After failed validation |
|
| `VALIDATION_FAILED` | File validation failed | After failed validation |
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -68,7 +68,9 @@ ARCH_FILE_NAME VARCHAR2 -- Parquet archive file path
|
|||||||
|
|
||||||
**Status Workflow**:
|
**Status Workflow**:
|
||||||
```
|
```
|
||||||
RECEIVED → VALIDATED → READY_FOR_INGESTION → INGESTED → ARCHIVED
|
RECEIVED → VALIDATED → READY_FOR_INGESTION → INGESTED → ARCHIVED_AND_TRASHED → ARCHIVED_AND_PURGED (optional)
|
||||||
|
|
||||||
|
Note: Legacy ARCHIVED status maintained for backward compatibility
|
||||||
```
|
```
|
||||||
|
|
||||||
**Usage Pattern**:
|
**Usage Pattern**:
|
||||||
|
|||||||
@@ -394,6 +394,9 @@ DATA Bucket:
|
|||||||
├── ODS/
|
├── ODS/
|
||||||
│ └── {SOURCE}/
|
│ └── {SOURCE}/
|
||||||
│ └── {TABLE_NAME}/
|
│ └── {TABLE_NAME}/
|
||||||
|
└── TRASH/ -- File retention subfolder (not a separate bucket)
|
||||||
|
└── {SOURCE}/
|
||||||
|
└── {TABLE_NAME}/ -- CSV files after archival (ARCHIVED_AND_TRASHED status)
|
||||||
|
|
||||||
ARCHIVE Bucket:
|
ARCHIVE Bucket:
|
||||||
└── ARCHIVE/
|
└── ARCHIVE/
|
||||||
@@ -402,6 +405,8 @@ ARCHIVE Bucket:
|
|||||||
└── PARTITION_YEAR=*/
|
└── PARTITION_YEAR=*/
|
||||||
└── PARTITION_MONTH=*/
|
└── PARTITION_MONTH=*/
|
||||||
└── *.parquet
|
└── *.parquet
|
||||||
|
|
||||||
|
Note: TRASH is a subfolder within the DATA bucket for file retention and rollback capability.
|
||||||
```
|
```
|
||||||
|
|
||||||
### 4. Migration Checklist
|
### 4. Migration Checklist
|
||||||
|
|||||||
@@ -123,7 +123,8 @@ WHEN OTHERS THEN
|
|||||||
```sql
|
```sql
|
||||||
-- Dodano 'VALIDATION_FAILED' do dozwolonych statusów
|
-- Dodano 'VALIDATION_FAILED' do dozwolonych statusów
|
||||||
PROCESSING_STATUS IN ('RECEIVED', 'VALIDATED', 'READY_FOR_INGESTION',
|
PROCESSING_STATUS IN ('RECEIVED', 'VALIDATED', 'READY_FOR_INGESTION',
|
||||||
'INGESTED', 'ARCHIVED', 'VALIDATION_FAILED')
|
'INGESTED', 'ARCHIVED', 'ARCHIVED_AND_TRASHED',
|
||||||
|
'ARCHIVED_AND_PURGED', 'VALIDATION_FAILED')
|
||||||
```
|
```
|
||||||
|
|
||||||
## 📊 Testowanie
|
## 📊 Testowanie
|
||||||
|
|||||||
Reference in New Issue
Block a user