dokumentacja
This commit is contained in:
@@ -11,25 +11,55 @@ The FILE_ARCHIVER package provides flexible archival strategies that accommodate
|
|||||||
- **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID
|
- **Three Archival Strategies**: THRESHOLD_BASED, MINIMUM_AGE_MONTHS (with 0=current month only), HYBRID
|
||||||
- **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG
|
- **Flexible Configuration**: Per-table archival strategy configuration via A_SOURCE_FILE_CONFIG
|
||||||
- **Validation**: Automatic validation of strategy-specific configuration requirements
|
- **Validation**: Automatic validation of strategy-specific configuration requirements
|
||||||
- **OCI Integration**: Works seamlessly with DBMS_CLOUD operations via cloud_wrapper
|
|
||||||
|
|
||||||
### Package Information
|
### Package Information
|
||||||
|
|
||||||
- **Schema**: CT_MRDS
|
- **Schema**: CT_MRDS
|
||||||
- **Package**: FILE_ARCHIVER
|
- **Package**: FILE_ARCHIVER
|
||||||
- **Current Version**: 3.3.0
|
- **Current Version**: 3.3.0
|
||||||
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, cloud_wrapper, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
|
- **Dependencies**: ENV_MANAGER, FILE_MANAGER, A_SOURCE_FILE_CONFIG, A_SOURCE_FILE_RECEIVED, A_WORKFLOW_HISTORY
|
||||||
|
|
||||||
### Critical Prerequisites
|
### Critical Prerequisites
|
||||||
|
|
||||||
⚠️ **IMPORTANT**: FILE_ARCHIVER requires data to be registered in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table. This table is automatically populated when files are processed through the modern Airflow + DBT workflow via `FILE_MANAGER.PROCESS_SOURCE_FILE`.
|
⚠️ **IMPORTANT**: FILE_ARCHIVER requires data to be registered in `CT_MRDS.A_SOURCE_FILE_RECEIVED` table.
|
||||||
|
|
||||||
|
**For new system data (Airflow + DBT):**
|
||||||
|
- `A_SOURCE_FILE_RECEIVED` records are automatically created by `FILE_MANAGER.PROCESS_SOURCE_FILE` during file validation
|
||||||
|
- No additional configuration needed - standard workflow handles registration
|
||||||
|
|
||||||
**For legacy data migrated from Informatica + WLA system:**
|
**For legacy data migrated from Informatica + WLA system:**
|
||||||
- Legacy data exported using `DATA_EXPORTER` does NOT automatically create `A_SOURCE_FILE_RECEIVED` records
|
- Use `DATA_EXPORTER` with **`pRegisterExport => TRUE`** parameter to automatically register exported files in `A_SOURCE_FILE_RECEIVED`
|
||||||
- Without these records, FILE_ARCHIVER **CANNOT** archive the data
|
- This enables FILE_ARCHIVER to process legacy data exports without manual registration
|
||||||
- See [System Migration Guide](System_Migration_Informatica_to_Airflow_DBT.md) for workaround strategies
|
- Available in both `EXPORT_TABLE_DATA` (single CSV) and `EXPORT_TABLE_DATA_TO_CSV_BY_DATE` (partitioned CSV exports)
|
||||||
|
|
||||||
**Recommendation for legacy data**: Export directly to ARCHIVE bucket using `DATA_EXPORTER.EXPORT_TABLE_DATA_BY_DATE` with `pBucketArea => 'ARCHIVE'` to bypass this requirement
|
**Example - Legacy Data Export with Registration**:
|
||||||
|
```sql
|
||||||
|
-- Export legacy data to DATA bucket WITH automatic registration
|
||||||
|
BEGIN
|
||||||
|
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
|
||||||
|
pSchemaName => 'OU_TOP',
|
||||||
|
pTableName => 'AGGREGATED_ALLOTMENT',
|
||||||
|
pKeyColumnName => 'A_ETL_LOAD_SET_KEY_FK',
|
||||||
|
pBucketArea => 'DATA',
|
||||||
|
pFolderName => 'legacy_export',
|
||||||
|
pMinDate => DATE '2024-01-01',
|
||||||
|
pMaxDate => DATE '2024-12-31',
|
||||||
|
pRegisterExport => TRUE, -- ✓ Registers files in A_SOURCE_FILE_RECEIVED
|
||||||
|
pProcessName => 'LEGACY_MIGRATION'
|
||||||
|
);
|
||||||
|
END;
|
||||||
|
/
|
||||||
|
|
||||||
|
-- Now FILE_ARCHIVER can process these files
|
||||||
|
BEGIN
|
||||||
|
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||||
|
pSourceFileConfigKey => vConfigKey
|
||||||
|
);
|
||||||
|
END;
|
||||||
|
/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Alternative approach**: Export directly to ARCHIVE bucket using `DATA_EXPORTER.EXPORT_TABLE_DATA_BY_DATE` with `pBucketArea => 'ARCHIVE'` to bypass archival step entirely
|
||||||
|
|
||||||
## Archival Strategies
|
## Archival Strategies
|
||||||
|
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
# System Migration: Informatica + WLA → Airflow + DBT
|
# System Migration: Informatica + WLA → Airflow + DBT
|
||||||
|
|
||||||
This document describes the migration from the legacy Informatica + WLA data processing system to the modern Airflow + DBT architecture, including control table differences, data export strategies, and known limitations.
|
This document describes the migration from the legacy Informatica + WLA data processing system to the new Airflow + DBT architecture, including control table differences, data export strategies, and known limitations.
|
||||||
|
|
||||||
## Migration Overview
|
## Migration Overview
|
||||||
|
|
||||||
@@ -13,7 +13,7 @@ The MRDS (Market Reference Data System) is undergoing a fundamental technology m
|
|||||||
- Primary Control Table: `CT_ODS.A_LOAD_HISTORY`
|
- Primary Control Table: `CT_ODS.A_LOAD_HISTORY`
|
||||||
- Key Column: `A_ETL_LOAD_SET_KEY`
|
- Key Column: `A_ETL_LOAD_SET_KEY`
|
||||||
|
|
||||||
**Modern System (Airflow + DBT):**
|
**New System (Airflow + DBT):**
|
||||||
- Orchestration: Apache Airflow
|
- Orchestration: Apache Airflow
|
||||||
- Transformation: DBT (Data Build Tool)
|
- Transformation: DBT (Data Build Tool)
|
||||||
- Control Schema: `CT_MRDS` (MRDS Control)
|
- Control Schema: `CT_MRDS` (MRDS Control)
|
||||||
@@ -49,7 +49,7 @@ DQ_FLAG VARCHAR2(5) -- Data quality flag
|
|||||||
- Used for temporal partitioning in DATA_EXPORTER
|
- Used for temporal partitioning in DATA_EXPORTER
|
||||||
- Referenced via `A_ETL_LOAD_SET_KEY_FK` foreign key in data tables
|
- Referenced via `A_ETL_LOAD_SET_KEY_FK` foreign key in data tables
|
||||||
|
|
||||||
### Modern System: CT_MRDS Control Tables
|
### New System: CT_MRDS Control Tables
|
||||||
|
|
||||||
#### 1. A_SOURCE_FILE_RECEIVED
|
#### 1. A_SOURCE_FILE_RECEIVED
|
||||||
|
|
||||||
@@ -126,7 +126,7 @@ END;
|
|||||||
|
|
||||||
**Result**: CSV files in ODS bucket (DATA area), partitioned by LOAD_START from A_LOAD_HISTORY
|
**Result**: CSV files in ODS bucket (DATA area), partitioned by LOAD_START from A_LOAD_HISTORY
|
||||||
|
|
||||||
### Scenario 2: Modern System Data (Airflow + DBT → ODS → ARCHIVE)
|
### Scenario 2: New System Data (Airflow + DBT → ODS → ARCHIVE)
|
||||||
|
|
||||||
**Use Case**: Ongoing processing with new Airflow + DBT system
|
**Use Case**: Ongoing processing with new Airflow + DBT system
|
||||||
|
|
||||||
@@ -150,104 +150,91 @@ END;
|
|||||||
/
|
/
|
||||||
```
|
```
|
||||||
|
|
||||||
## Critical Gap: Legacy Data Archival
|
## Legacy Data Archival
|
||||||
|
|
||||||
### Problem Statement
|
### FILE_ARCHIVER Requirement
|
||||||
|
|
||||||
**Scenario**: Historical data exported using DATA_EXPORTER from Informatica-loaded tables
|
⚠️ **IMPORTANT**: FILE_ARCHIVER requires records in `A_SOURCE_FILE_RECEIVED` table to track and manage archival lifecycle.
|
||||||
|
|
||||||
**Issue**: FILE_ARCHIVER requires records in `A_SOURCE_FILE_RECEIVED`, but legacy exports don't create them
|
**For new system data (Airflow + DBT)**:
|
||||||
|
- Records automatically created by `FILE_MANAGER.PROCESS_SOURCE_FILE`
|
||||||
|
- No additional steps needed
|
||||||
|
|
||||||
**Impact**: Legacy data exported to ODS/DATA bucket **CANNOT** be archived to ARCHIVE bucket using FILE_ARCHIVER
|
**For legacy data (Informatica + WLA)**:
|
||||||
|
- Historical data requires registration in `A_SOURCE_FILE_RECEIVED`
|
||||||
|
- ✅ **SOLUTION**: Use DATA_EXPORTER v2.9.0+ with `pRegisterExport => TRUE` parameter
|
||||||
|
- Automatically registers exported files with proper metadata (size, checksum, location)
|
||||||
|
|
||||||
### Technical Analysis
|
### Export Strategies for Legacy Data
|
||||||
|
|
||||||
**DATA_EXPORTER Behavior**:
|
#### Strategy 1: Automatic Registration (Recommended)
|
||||||
```sql
|
|
||||||
-- Uses A_LOAD_HISTORY for partitioning (Informatica workflows)
|
|
||||||
SELECT DISTINCT TO_CHAR(L.LOAD_START,'YYYY') AS YR,
|
|
||||||
TO_CHAR(L.LOAD_START,'MM') AS MN
|
|
||||||
FROM OU_TOP.AGGREGATED_ALLOTMENT T, CT_ODS.A_LOAD_HISTORY L
|
|
||||||
WHERE T.A_ETL_LOAD_SET_KEY_FK = L.A_ETL_LOAD_SET_KEY
|
|
||||||
AND L.LOAD_START >= :pMinDate
|
|
||||||
AND L.LOAD_START < :pMaxDate;
|
|
||||||
|
|
||||||
-- Creates CSV files: ODS/legacy_migration/AGGREGATED_ALLOTMENT_YYYYMM.csv
|
✅ **DATA_EXPORTER v2.9.0+** supports automatic file registration via `pRegisterExport` parameter.
|
||||||
-- Does NOT create A_SOURCE_FILE_RECEIVED records
|
|
||||||
```
|
|
||||||
|
|
||||||
**FILE_ARCHIVER Requirement**:
|
**Benefits**:
|
||||||
```sql
|
- Simple, one-step export with automatic registration
|
||||||
-- Joins A_SOURCE_FILE_RECEIVED with A_WORKFLOW_HISTORY
|
- Files tracked in `A_SOURCE_FILE_RECEIVED` (enables FILE_ARCHIVER processing)
|
||||||
JOIN CT_MRDS.A_SOURCE_FILE_RECEIVED r
|
- Proper metadata capture (file size, checksum, location, timestamps)
|
||||||
ON r.A_SOURCE_FILE_CONFIG_KEY = pSourceFileConfig.A_SOURCE_FILE_CONFIG_KEY
|
- Standard workflow integration (archival strategies, status tracking)
|
||||||
AND r.PROCESSING_STATUS = 'INGESTED';
|
|
||||||
|
|
||||||
-- Without A_SOURCE_FILE_RECEIVED records, archival CANNOT proceed
|
**Example - CSV Export with Registration**:
|
||||||
```
|
|
||||||
|
|
||||||
### Workaround Strategies
|
|
||||||
|
|
||||||
#### Strategy 1: Manual Registration (Recommended for Small Datasets)
|
|
||||||
|
|
||||||
Manually create `A_SOURCE_FILE_RECEIVED` records for legacy exported files:
|
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
-- Step 1: Export legacy data to ODS/DATA
|
-- Export with automatic registration (DATA_EXPORTER v2.9.0+)
|
||||||
BEGIN
|
BEGIN
|
||||||
DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
|
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA_TO_CSV_BY_DATE(
|
||||||
pSchemaName => 'OU_TOP',
|
pSchemaName => 'OU_TOP',
|
||||||
pTableName => 'AGGREGATED_ALLOTMENT',
|
pTableName => 'AGGREGATED_ALLOTMENT',
|
||||||
pKeyColumnName => 'A_ETL_LOAD_SET_KEY_FK',
|
pKeyColumnName => 'A_ETL_LOAD_SET_KEY_FK',
|
||||||
pBucketArea => 'DATA',
|
pBucketArea => 'DATA',
|
||||||
pFolderName => 'legacy_export',
|
pFolderName => 'legacy_export',
|
||||||
pMinDate => DATE '2024-01-01',
|
pMinDate => DATE '2024-01-01',
|
||||||
pMaxDate => DATE '2024-12-31'
|
pMaxDate => DATE '2024-12-31',
|
||||||
|
pRegisterExport => TRUE, -- ✓ Automatically registers files
|
||||||
|
pProcessName => 'LEGACY_MIGRATION'
|
||||||
);
|
);
|
||||||
END;
|
END;
|
||||||
/
|
/
|
||||||
|
|
||||||
-- Step 2: List exported CSV files
|
-- Files now registered in A_SOURCE_FILE_RECEIVED with:
|
||||||
SELECT object_name, time_created, bytes
|
-- - SOURCE_FILE_NAME: Full OCI path
|
||||||
FROM TABLE(MRDS_LOADER.cloud_wrapper.list_objects(
|
-- - PROCESSING_STATUS: 'INGESTED'
|
||||||
credential_name => 'DEF_CRED_ARN',
|
-- - BYTES: Actual file size
|
||||||
location_uri => 'https://objectstorage.eu-frankfurt-1.oraclecloud.com/n/frtgjxu7zl7c/b/data/o/'
|
-- - CHECKSUM: File ETag from OCI
|
||||||
)) WHERE object_name LIKE 'ODS/legacy_export/AGGREGATED_ALLOTMENT_%';
|
-- - PROCESS_NAME: 'LEGACY_MIGRATION'
|
||||||
|
|
||||||
-- Step 3: Manually register each file in A_SOURCE_FILE_RECEIVED
|
-- Now FILE_ARCHIVER can process these files
|
||||||
-- (Requires source configuration for AGGREGATED_ALLOTMENT to exist)
|
|
||||||
INSERT INTO CT_MRDS.A_SOURCE_FILE_RECEIVED (
|
|
||||||
A_SOURCE_FILE_RECEIVED_KEY,
|
|
||||||
A_SOURCE_FILE_CONFIG_KEY,
|
|
||||||
SOURCE_FILE_NAME,
|
|
||||||
PROCESSING_STATUS,
|
|
||||||
RECEPTION_DATE,
|
|
||||||
BYTES,
|
|
||||||
CHECKSUM,
|
|
||||||
EXTERNAL_TABLE_NAME
|
|
||||||
) VALUES (
|
|
||||||
A_SOURCE_FILE_RECEIVED_KEY_SEQ.NEXTVAL,
|
|
||||||
(SELECT A_SOURCE_FILE_CONFIG_KEY FROM A_SOURCE_FILE_CONFIG
|
|
||||||
WHERE SOURCE_FILE_ID = 'AGGREGATED_ALLOTMENT' AND SOURCE_FILE_TYPE = 'INPUT'),
|
|
||||||
'ODS/legacy_export/AGGREGATED_ALLOTMENT_202401.csv',
|
|
||||||
'INGESTED', -- Skip validation, mark as already ingested
|
|
||||||
DATE '2024-01-15',
|
|
||||||
1048576, -- File size in bytes
|
|
||||||
'manual_registration',
|
|
||||||
NULL -- No external table needed
|
|
||||||
);
|
|
||||||
-- Repeat for all exported CSV files
|
|
||||||
COMMIT;
|
|
||||||
|
|
||||||
-- Step 4: Now FILE_ARCHIVER can process these files
|
|
||||||
BEGIN
|
BEGIN
|
||||||
FILE_ARCHIVER.ARCHIVE_TABLE_DATA(pSourceFileConfig => vConfig);
|
CT_MRDS.FILE_ARCHIVER.ARCHIVE_TABLE_DATA(
|
||||||
|
pSourceFileConfigKey => vConfigKey
|
||||||
|
);
|
||||||
|
END;
|
||||||
|
/
|
||||||
|
```
|
||||||
|
|
||||||
|
**Example - Single CSV Export with Registration**:
|
||||||
|
```sql
|
||||||
|
-- For single file export (not partitioned by date)
|
||||||
|
BEGIN
|
||||||
|
CT_MRDS.DATA_EXPORTER.EXPORT_TABLE_DATA(
|
||||||
|
pSchemaName => 'CT_MRDS',
|
||||||
|
pTableName => 'MY_TABLE',
|
||||||
|
pKeyColumnName => 'A_ETL_LOAD_SET_KEY_FK',
|
||||||
|
pBucketArea => 'DATA',
|
||||||
|
pFolderName => 'legacy_export',
|
||||||
|
pFileName => 'my_table_export.csv',
|
||||||
|
pTemplateTableName => 'CT_ET_TEMPLATES.MY_TEMPLATE',
|
||||||
|
pRegisterExport => TRUE, -- ✓ Registers file
|
||||||
|
pProcessName => 'LEGACY_MIGRATION'
|
||||||
|
);
|
||||||
END;
|
END;
|
||||||
/
|
/
|
||||||
```
|
```
|
||||||
|
|
||||||
#### Strategy 2: Direct Archive Export (Bypass ODS)
|
#### Strategy 2: Direct Archive Export (Bypass ODS)
|
||||||
|
|
||||||
|
⚠️ **Use when**: You want to skip the ODS bucket entirely and go straight to ARCHIVE
|
||||||
|
|
||||||
Skip ODS/DATA bucket entirely - export directly to ARCHIVE bucket in Parquet format:
|
Skip ODS/DATA bucket entirely - export directly to ARCHIVE bucket in Parquet format:
|
||||||
|
|
||||||
```sql
|
```sql
|
||||||
@@ -411,18 +398,18 @@ CALL FILE_MANAGER.ADD_SOURCE_FILE_CONFIG(
|
|||||||
|
|
||||||
## Known Limitations
|
## Known Limitations
|
||||||
|
|
||||||
### 1. No Retroactive A_SOURCE_FILE_RECEIVED Creation
|
### 1. FILE_ARCHIVER Requires A_SOURCE_FILE_RECEIVED
|
||||||
DATA_EXPORTER does not automatically create A_SOURCE_FILE_RECEIVED records when exporting legacy data. This is by design - it's a one-time export tool, not a file tracking system.
|
FILE_ARCHIVER cannot archive data without corresponding A_SOURCE_FILE_RECEIVED records.
|
||||||
|
|
||||||
### 2. FILE_ARCHIVER Requires A_SOURCE_FILE_RECEIVED
|
**Solutions**:
|
||||||
FILE_ARCHIVER cannot archive data without corresponding A_SOURCE_FILE_RECEIVED records. This prevents archiving of:
|
- ✅ **New system data**: Automatically registered via `FILE_MANAGER.PROCESS_SOURCE_FILE`
|
||||||
- Legacy Informatica-loaded data exported via DATA_EXPORTER
|
- ✅ **Legacy data exports**: Use `DATA_EXPORTER` with `pRegisterExport => TRUE` (v2.9.0+)
|
||||||
- Manually uploaded files not processed through FILE_MANAGER.PROCESS_SOURCE_FILE
|
- ⚠️ **Manual uploads**: Must be registered via `FILE_MANAGER.PROCESS_SOURCE_FILE` or manual INSERT
|
||||||
|
|
||||||
### 3. Mixed Control Table References
|
### 2. Mixed Control Table References
|
||||||
During migration period, some procedures reference A_LOAD_HISTORY (DATA_EXPORTER) while others reference A_WORKFLOW_HISTORY (FILE_ARCHIVER). This is intentional but requires careful understanding of data lineage.
|
During migration period, some procedures reference A_LOAD_HISTORY (DATA_EXPORTER) while others reference A_WORKFLOW_HISTORY (FILE_ARCHIVER). This is intentional but requires careful understanding of data lineage.
|
||||||
|
|
||||||
### 4. A_WORKFLOW_HISTORY vs A_LOAD_HISTORY Column Mismatch
|
### 3. A_WORKFLOW_HISTORY vs A_LOAD_HISTORY Column Mismatch
|
||||||
The control tables have different schemas:
|
The control tables have different schemas:
|
||||||
- **A_LOAD_HISTORY**: `LOAD_START`, `A_ETL_LOAD_SET_KEY`
|
- **A_LOAD_HISTORY**: `LOAD_START`, `A_ETL_LOAD_SET_KEY`
|
||||||
- **A_WORKFLOW_HISTORY**: `WORKFLOW_START`, `A_WORKFLOW_HISTORY_KEY`
|
- **A_WORKFLOW_HISTORY**: `WORKFLOW_START`, `A_WORKFLOW_HISTORY_KEY`
|
||||||
@@ -445,4 +432,8 @@ The migration from Informatica + WLA to Airflow + DBT introduces new control tab
|
|||||||
- **Archival Operations**: Ensuring FILE_ARCHIVER has required metadata
|
- **Archival Operations**: Ensuring FILE_ARCHIVER has required metadata
|
||||||
- **Testing**: Using correct control tables in test scenarios
|
- **Testing**: Using correct control tables in test scenarios
|
||||||
|
|
||||||
The recommended approach for legacy data migration is **Strategy 2 (Direct to ARCHIVE)** for large datasets, as it avoids the complexity of manual A_SOURCE_FILE_RECEIVED registration while achieving the goal of moving historical data to long-term archival storage.
|
**Recommended Approach for Legacy Data Migration**:
|
||||||
|
|
||||||
|
1. ✅ **Strategy 1 (Automatic Registration)** - Use `DATA_EXPORTER` with `pRegisterExport => TRUE` to automatically register files in `A_SOURCE_FILE_RECEIVED`, enabling full FILE_ARCHIVER workflow (archival strategies, status tracking, rollback capabilities)
|
||||||
|
|
||||||
|
2. ⚠️ **Strategy 2 (Direct to ARCHIVE)** - Export directly to ARCHIVE bucket to bypass ODS bucket entirely and avoid registration requirements (use when tracking is not needed)
|
||||||
|
|||||||
Reference in New Issue
Block a user