Init
This commit is contained in:
@@ -0,0 +1,207 @@
|
||||
# MARS-835: Required External Tables for Smart Column Mapping
|
||||
|
||||
## Overview
|
||||
This document lists all external tables required for MARS-835 data exports using DATA_EXPORTER v2.4.0 with Smart Column Mapping feature.
|
||||
|
||||
**Purpose**: Smart Column Mapping ensures CSV files are generated with columns in the EXACT order expected by external tables, preventing NULL values due to Oracle's positional CSV mapping.
|
||||
|
||||
---
|
||||
|
||||
## Required External Tables
|
||||
|
||||
### Group 1: DATA Bucket (CSV Format) - **CRITICAL**
|
||||
|
||||
#### 1. ODS.CSDB_DEBT_DATA_ODS
|
||||
- **Source Table**: OU_CSDB.LEGACY_DEBT
|
||||
- **Format**: CSV
|
||||
- **Bucket**: DATA (mrds_data_dev/ODS/CSDB/CSDB_DEBT/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY (position 2 recommended)
|
||||
- **Critical**: Must use Smart Column Mapping to avoid NULL values in A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
#### 2. ODS.CSDB_DEBT_DAILY_DATA_ODS
|
||||
- **Source Table**: OU_CSDB.LEGACY_DEBT_DAILY
|
||||
- **Format**: CSV
|
||||
- **Bucket**: DATA (mrds_data_dev/ODS/CSDB/CSDB_DEBT_DAILY/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY (position 2 recommended)
|
||||
- **Critical**: Must use Smart Column Mapping to avoid NULL values in A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
---
|
||||
|
||||
### Group 2: ARCHIVE Bucket (Parquet Format) - **RECOMMENDED**
|
||||
|
||||
#### 3. ODS.CSDB_DEBT_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_DEBT
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_DEBT/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
- **Note**: Parquet uses schema-based mapping (column order less critical but Smart Column Mapping ensures consistency)
|
||||
|
||||
#### 4. ODS.CSDB_DEBT_DAILY_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_DEBT_DAILY
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_DEBT_DAILY/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
#### 5. ODS.CSDB_INSTR_RAT_FULL_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_INSTR_RAT_FULL
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_INSTR_RAT_FULL/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
#### 6. ODS.CSDB_INSTR_DESC_FULL_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_INSTR_DESC_FULL
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_INSTR_DESC_FULL/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
#### 7. ODS.CSDB_ISSUER_RAT_FULL_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_ISSUER_RAT_FULL
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_ISSUER_RAT_FULL/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
#### 8. ODS.CSDB_ISSUER_DESC_FULL_ARCHIVE
|
||||
- **Source Table**: OU_CSDB.LEGACY_ISSUER_DESC_FULL
|
||||
- **Format**: Parquet with Hive partitioning
|
||||
- **Bucket**: ARCHIVE (mrds_hist_dev/ARCHIVE/CSDB/CSDB_ISSUER_DESC_FULL/)
|
||||
- **Key Column Mapping**: A_ETL_LOAD_SET_FK → A_WORKFLOW_HISTORY_KEY
|
||||
|
||||
---
|
||||
|
||||
## External Table Column Order Requirements
|
||||
|
||||
### **CRITICAL for CSV Tables** (DATA bucket):
|
||||
|
||||
All CSV external tables MUST have **A_WORKFLOW_HISTORY_KEY at position 2**:
|
||||
|
||||
```
|
||||
Position 1: A_KEY (NUMBER)
|
||||
Position 2: A_WORKFLOW_HISTORY_KEY (NUMBER) ← MUST BE HERE!
|
||||
Position 3+: Other columns in any order
|
||||
```
|
||||
|
||||
**Reason**: Oracle External Tables with CSV format use **positional mapping** (ignore header row). If source table has A_ETL_LOAD_SET_FK at position 72, but CSV puts it at position 72 while external table expects A_WORKFLOW_HISTORY_KEY at position 2, the external table will try to read position 2 (which might be a DATE column) as NUMBER → conversion fails → NULL value.
|
||||
|
||||
**Solution**: Smart Column Mapping (v2.4.0) generates CSV columns in EXTERNAL TABLE order, ensuring position 2 has the correct NUMBER value.
|
||||
|
||||
### **OPTIONAL for Parquet Tables** (ARCHIVE bucket):
|
||||
|
||||
Parquet format uses **schema-based mapping** (column names). Column order doesn't matter, but Smart Column Mapping provides consistency.
|
||||
|
||||
---
|
||||
|
||||
## Creation Script Example
|
||||
|
||||
### CSV External Table (CRITICAL - Correct Column Order)
|
||||
|
||||
```sql
|
||||
-- Example: ODS.CSDB_DEBT_DATA_ODS
|
||||
-- IMPORTANT: A_WORKFLOW_HISTORY_KEY must be at position 2!
|
||||
|
||||
BEGIN
|
||||
ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
|
||||
pTableName => 'CSDB_DEBT_DATA_ODS',
|
||||
pTemplateTableName => 'CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE',
|
||||
pPrefix => 'ODS/CSDB/CSDB_DEBT',
|
||||
pBucketUri => CT_MRDS.ENV_MANAGER.gvDataBucketUri,
|
||||
pFormat => 'CSV' -- Uses positional mapping!
|
||||
);
|
||||
END;
|
||||
/
|
||||
|
||||
-- Verify column order (A_WORKFLOW_HISTORY_KEY should be position 2)
|
||||
SELECT column_id, column_name, data_type
|
||||
FROM all_tab_columns
|
||||
WHERE table_name = 'CSDB_DEBT_DATA_ODS'
|
||||
AND owner = 'ODS'
|
||||
ORDER BY column_id;
|
||||
```
|
||||
|
||||
### Parquet External Table (Optional Column Order)
|
||||
|
||||
```sql
|
||||
-- Example: ODS.CSDB_DEBT_ARCHIVE
|
||||
-- Column order flexible (schema-based mapping)
|
||||
|
||||
BEGIN
|
||||
ODS.FILE_MANAGER_ODS.CREATE_EXTERNAL_TABLE(
|
||||
pTableName => 'CSDB_DEBT_ARCHIVE',
|
||||
pTemplateTableName => 'CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE',
|
||||
pPrefix => 'ARCHIVE/CSDB/CSDB_DEBT',
|
||||
pBucketUri => CT_MRDS.ENV_MANAGER.gvArchiveBucketUri,
|
||||
pFormat => 'PARQUET' -- Uses schema-based mapping
|
||||
);
|
||||
END;
|
||||
/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Template Tables Required
|
||||
|
||||
All external tables require corresponding template tables in CT_ET_TEMPLATES schema:
|
||||
|
||||
- `CT_ET_TEMPLATES.CSDB_DEBT_TEMPLATE`
|
||||
- `CT_ET_TEMPLATES.CSDB_DEBT_DAILY_TEMPLATE`
|
||||
- `CT_ET_TEMPLATES.CSDB_INSTR_RAT_FULL_TEMPLATE`
|
||||
- `CT_ET_TEMPLATES.CSDB_INSTR_DESC_FULL_TEMPLATE`
|
||||
- `CT_ET_TEMPLATES.CSDB_ISSUER_RAT_FULL_TEMPLATE`
|
||||
- `CT_ET_TEMPLATES.CSDB_ISSUER_DESC_FULL_TEMPLATE`
|
||||
|
||||
**Note**: Template tables must be created by ADMIN or CT_ET_TEMPLATES user (MRDS_LOADER cannot create them).
|
||||
|
||||
---
|
||||
|
||||
## Verification Checklist
|
||||
|
||||
Before running MARS-835 exports:
|
||||
|
||||
- [ ] All 8 external tables exist in ODS schema
|
||||
- [ ] CSV tables (DATA bucket) have A_WORKFLOW_HISTORY_KEY at position 2
|
||||
- [ ] Template tables exist in CT_ET_TEMPLATES schema
|
||||
- [ ] MRDS_LOADER has EXECUTE privilege on ODS.FILE_MANAGER_ODS
|
||||
- [ ] ODS schema has access to CT_MRDS.ENV_MANAGER for logging
|
||||
- [ ] DATA_EXPORTER v2.4.0 deployed with Smart Column Mapping feature
|
||||
|
||||
---
|
||||
|
||||
## Testing Verification
|
||||
|
||||
After export, verify A_WORKFLOW_HISTORY_KEY is not NULL:
|
||||
|
||||
```sql
|
||||
-- CSV tables (should be 100% populated)
|
||||
SELECT 'CSDB_DEBT_DATA_ODS' AS TABLE_NAME,
|
||||
COUNT(*) AS TOTAL_ROWS,
|
||||
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
|
||||
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
|
||||
FROM ODS.CSDB_DEBT_DATA_ODS;
|
||||
|
||||
SELECT 'CSDB_DEBT_DAILY_DATA_ODS' AS TABLE_NAME,
|
||||
COUNT(*) AS TOTAL_ROWS,
|
||||
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
|
||||
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
|
||||
FROM ODS.CSDB_DEBT_DAILY_DATA_ODS;
|
||||
|
||||
-- Parquet tables (should also be 100% populated)
|
||||
SELECT 'CSDB_DEBT_ARCHIVE' AS TABLE_NAME,
|
||||
COUNT(*) AS TOTAL_ROWS,
|
||||
COUNT(A_WORKFLOW_HISTORY_KEY) AS NON_NULL_COUNT,
|
||||
ROUND(COUNT(A_WORKFLOW_HISTORY_KEY) * 100.0 / NULLIF(COUNT(*), 0), 2) AS SUCCESS_RATE_PCT
|
||||
FROM ODS.CSDB_DEBT_ARCHIVE;
|
||||
```
|
||||
|
||||
**Expected Result**: SUCCESS_RATE_PCT = 100.00 for all tables
|
||||
|
||||
---
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [DATA_EXPORTER v2.4.0 Smart Column Mapping Examples](../MARS-835-PREHOOK/current_version/v2.3.0/DATA_EXPORTER_v2.4.0_Smart_Column_Mapping_Examples.sql)
|
||||
- [Oracle External Tables Column Order Issue](../../confluence/additions/Oracle_External_Tables_Column_Order_Issue.md)
|
||||
- [MARS-835 README](README.md)
|
||||
|
||||
---
|
||||
|
||||
**Last Updated**: 2026-01-09
|
||||
**Author**: GitHub Copilot (MARS-835 Update)
|
||||
Reference in New Issue
Block a user