Przeniesienie

This commit is contained in:
Grzegorz Michalski
2026-02-03 13:32:06 +01:00
parent b353fb38f5
commit e3ff1618ce
101 changed files with 0 additions and 0 deletions

View File

@@ -0,0 +1,207 @@
# MARS-835: Required External Tables for Smart Column Mapping
## Overview
This document lists all external tables required for MARS-835 data exports using DATA_EXPORTER v2.4.0 with Smart Column Mapping feature.
**Purpose**: Smart Column Mapping ensures CSV files are generated with columns in the EXACT order expected by external tables, preventing NULL values due to Oracle's positional CSV mapping.
---
## Required External Tables
### Group 1: DATA Bucket (CSV Format) - **CRITICAL**
#### 1. ODS.CSDB_DEBT_DATA_ODS
- **Source Table**: OU_CSDB.LEGACY_DEBT
- **Format**: CSV
- **Bucket**: DATA (mrds_data_dev/ODS/CSDB/CSDB_DEBT/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY (position 2 recommended)
- **Critical**: Must use Smart Column Mapping to avoid NULL values in A_WORKFLOW_HISTORY_KEY
#### 2. ODS.CSDB_DEBT_DAILY_DATA_ODS
- **Source Table**: OU_CSDB.LEGACY_DEBT_DAILY
- **Format**: CSV
- **Bucket**: DATA (mrds_data_dev/ODS/CSDB/CSDB_DEBT_DAILY/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY (position 2 recommended)
- **Critical**: Must use Smart Column Mapping to avoid NULL values in A_WORKFLOW_HISTORY_KEY
---
### Group 2: ARCHIVE Bucket (Parquet Format) - **RECOMMENDED**
#### 3. ODS.CSDB_DEBT_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_DEBT
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_DEBT/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
- **Note**: Parquet uses schema-based mapping (column order less critical but Smart Column Mapping ensures consistency)
#### 4. ODS.CSDB_DEBT_DAILY_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_DEBT_DAILY
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_DEBT_DAILY/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
#### 5. ODS.CSDB_INSTR_RAT_FULL_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_INSTR_RAT_FULL
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_INSTR_RAT_FULL/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
#### 6. ODS.CSDB_INSTR_DESC_FULL_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_INSTR_DESC_FULL
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_INSTR_DESC_FULL/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
#### 7. ODS.CSDB_ISSUER_RAT_FULL_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_ISSUER_RAT_FULL
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_ISSUER_RAT_FULL/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
#### 8. ODS.CSDB_ISSUER_DESC_FULL_ARCHIVE
- **Source Table**: OU_CSDB.LEGACY_ISSUER_DESC_FULL
- **Format**: Parquet with Hive partitioning
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_ISSUER_DESC_FULL/)
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
---
## External Table Column Order Requirements
### **CRITICAL for CSV Tables** (DATA bucket):
All CSV external tables MUST have **A_WORKFLOW_HISTORY_KEY at position 2**:
```
Position 1: A_KEY (NUMBER)
Position 2: A_WORKFLOW_HISTORY_KEY (NUMBER) ← MUST BE HERE!
Position 3+: Other columns in any order
```
**Reason**: Oracle External Tables with CSV format use **positional mapping** (ignore header row). If source table has A_ETL_LOAD_SET_FK at position 72, but CSV puts it at position 72 while external table expects A_WORKFLOW_HISTORY_KEY at position 2, the external table will try to read position 2 (which might be a DATE column) as NUMBER → conversion fails → NULL value.
**Solution**: Smart Column Mapping (v2.4.0) generates CSV columns in EXTERNAL TABLE order, ensuring position 2 has the correct NUMBER value.
### **OPTIONAL for Parquet Tables** (ARCHIVE bucket):
Parquet format uses **schema-based mapping** (column names). Column order doesn't matter, but Smart Column Mapping provides consistency.
---
## Creation Script Example
### CSV External Table (CRITICAL - Correct Column Order)
```sql
-- Example: ODS.CSDB_DEBT_DATA_ODS
-- IMPORTANT: A_WORKFLOW_HISTORY_KEY must be at position 2!
BEGIN
ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
pTableName => 'CSDB_DEBT_DATA_ODS',
pTemplateTableName => 'CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE',
pPrefix => 'ODS/CSDB/CSDB_DEBT',
pBucketUri => CT_MRDS.ENV_MANAGER.gvDataBucketUri,
pFormat => 'CSV' -- Uses positional mapping!
);
END;
/
-- Verify column order (A_WORKFLOW_HISTORY_KEY should be position 2)
SELECT column_id, column_name, data_type
FROM all_tab_columns
WHERE table_name = 'CSDB_DEBT_DATA_ODS'
AND owner = 'ODS'
ORDER BY column_id;
```
### Parquet External Table (Optional Column Order)
```sql
-- Example: ODS.CSDB_DEBT_ARCHIVE
-- Column order flexible (schema-based mapping)
BEGIN
ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
pTableName => 'CSDB_DEBT_ARCHIVE',
pTemplateTableName => 'CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE',
pPrefix => 'ARCHIVE/CSDB/CSDB_DEBT',
pBucketUri => CT_MRDS.ENV_MANAGER.gvArchiveBucketUri,
pFormat => 'PARQUET' -- Uses schema-based mapping
);
END;
/
```
---
## Template Tables Required
All external tables require corresponding template tables in CT_ET_TEMPLATES schema:
- `CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE`
- `CT_ET_TEMPLATES.CSDB_DEBT_DAILY_TEMPLATE`
- `CT_ET_TEMPLATES.CSDB_INSTR_RAT_FULL_TEMPLATE`
- `CT_ET_TEMPLATES.CSDB_INSTR_DESC_FULL_TEMPLATE`
- `CT_ET_TEMPLATES.CSDB_ISSUER_RAT_FULL_TEMPLATE`
- `CT_ET_TEMPLATES.CSDB_ISSUER_DESC_FULL_TEMPLATE`
**Note**: Template tables must be created by ADMIN or CT_ET_TEMPLATES user (MRDS_LOADER cannot create them).
---
## Verification Checklist
Before running MARS-835 exports:
- [ ] All 8 external tables exist in ODS schema
- [ ] CSV tables (DATA bucket) have A_WORKFLOW_HISTORY_KEY at position 2
- [ ] Template tables exist in CT_ET_TEMPLATES schema
- [ ] MRDS_LOADER has EXECUTE privilege on ODS.FILE_MANAGER_ODS
- [ ] ODS schema has access to CT_MRDS.ENV_MANAGER for logging
- [ ] DATA_EXPORTER v2.4.0 deployed with Smart Column Mapping feature
---
## Testing Verification
After export, verify A_WORKFLOW_HISTORY_KEY is not NULL:
```sql
-- CSV tables (should be 100% populated)
SELECT 'CSDB_DEBT_DATA_ODS' AS TABLE_NAME,
COUNT(*) AS TOTAL_ROWS,
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
FROM ODS.CSDB_DEBT_DATA_ODS;
SELECT 'CSDB_DEBT_DAILY_DATA_ODS' AS TABLE_NAME,
COUNT(*) AS TOTAL_ROWS,
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
FROM ODS.CSDB_DEBT_DAILY_DATA_ODS;
-- Parquet tables (should also be 100% populated)
SELECT 'CSDB_DEBT_ARCHIVE' AS TABLE_NAME,
COUNT(*) AS TOTAL_ROWS,
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
FROM ODS.CSDB_DEBT_ARCHIVE;
```
**Expected Result**: SUCCESS_RATE_PCT = 100.00 for all tables
---
## Related Documentation
- [DATA_EXPORTER v2.4.0 Smart Column Mapping Examples](../MARS-835-PREHOOK/current_version/v2.3.0/DATA_EXPORTER_v2.4.0_Smart_Column_Mapping_Examples.sql)
- [Oracle External Tables Column Order Issue](../../confluence/additions/Oracle_External_Tables_Column_Order_Issue.md)
- [MARS-835 README](README.md)
---
**Last Updated**: 2026-01-09
**Author**: GitHub Copilot (MARS-835 Update)