13 KiB
PROCESS_SOURCE_FILE Procedure Guide
This document provides comprehensive documentation for the FILE_MANAGER.PROCESS_SOURCE_FILE procedure, which validates incoming files and prepares them for loading on the Airflow+DBT side through Oracle Cloud Infrastructure (OCI) file management operations.
Overview
PROCESS_SOURCE_FILE is an umbrella procedure that validates incoming files and prepares them for downstream processing by Airflow+DBT pipelines. It orchestrates the complete workflow from file registration and validation to OCI storage preparation, ensuring files are properly validated and positioned for consumption by the Airflow+DBT data processing stack.
Key Characteristics:
- File Validation Focus: Comprehensive validation of incoming CSV files against template structures
- Airflow+DBT Preparation: Prepares validated files for loading and processing by Airflow+DBT pipelines
- OCI File Management: Handles file operations and movements within Oracle Cloud Infrastructure
- Umbrella Procedure: Coordinates multiple validation and file preparation sub-procedures in sequence
- Automated Workflow: Requires minimal manual intervention once configured
- Error Resilient: Comprehensive error handling and logging for validation and file operations
- Status Tracking: Updates file processing status throughout validation and preparation workflow
Procedure Signatures
The procedure is available in two variants:
Procedure Version
PROCEDURE PROCESS_SOURCE_FILE(pSourceFileReceivedName IN VARCHAR2);
Purpose: Execute processing workflow without return value
Use Case: Standard automated processing, fire-and-forget scenarios
Function Version
FUNCTION PROCESS_SOURCE_FILE(pSourceFileReceivedName IN VARCHAR2) RETURN PLS_INTEGER;
Purpose: Execute processing workflow and return status code
Use Case: When you need to check processing result programmatically
Parameters
pSourceFileReceivedName
- Type: VARCHAR2
- Required: YES
- Description: Relative path to the file within the cloud storage bucket
- Format:
INBOX/{SOURCE}/{SOURCE_FILE_ID}/{TABLE_ID}/filename.csv
Examples:
'INBOX/C2D/UC_DISSEM/A_UC_DISSEM_METADATA_LOADS/UC_NMA_DISSEM-277740.csv'
'INBOX/TOP/ALLOTMENT/AGGREGATED_ALLOTMENT/allotment_data_20241006.csv'
'INBOX/LM/RATES/INTEREST_RATES/rates_monthly_202410.csv'
Processing Workflow
The procedure executes six main steps in sequence:
Step 1: REGISTER_SOURCE_FILE_RECEIVED
Purpose: Register file in the system and extract metadata
Actions:
- Creates record in
CT_MRDS.A_SOURCE_FILE_RECEIVEDtable - Determines source configuration based on file path pattern
- Extracts file metadata (size, checksum, creation date)
- Assigns unique
A_SOURCE_FILE_RECEIVED_KEY - Sets initial status to 'RECEIVED'
Step 2: CREATE_EXTERNAL_TABLE
Purpose: Create temporary external table for data access
Actions:
- Generates unique external table name
- Creates external table pointing to the CSV file
- Uses template table structure from
CT_ET_TEMPLATES - Configures appropriate column mappings and data types
Step 3: VALIDATE_SOURCE_FILE_RECEIVED
Purpose: Perform comprehensive data validation
Actions:
- Validates CSV column count against template
- Checks data type compatibility
- Verifies required fields are populated
- Performs business rule validations
- Updates status to 'VALIDATED' on success
Step 4: DROP_EXTERNAL_TABLE
Purpose: Clean up temporary external table
Actions:
- Drops the temporary external table created in Step 2
- Releases database resources
- Maintains clean schema state
Step 5: MOVE_FILE
Purpose: Relocate file from INBOX to ODS location
Actions:
- Copies file from INBOX bucket to ODS bucket
- Preserves file metadata
- Deletes original file from INBOX after successful copy
Step 6: SET_SOURCE_FILE_RECEIVED_STATUS
Purpose: Update final processing status
Actions:
- Sets
PROCESSING_STATUSto 'READY_FOR_INGESTION' - Records completion timestamp
- Indicates file is validated and ready for Airflow+DBT processing
Return Values (Function Version)
| Value | Meaning | Description |
|---|---|---|
0 |
Success | File processed successfully through all steps |
-20001 |
Empty Parameters | Both fileUri and receivedKey parameters are NULL |
-20002 |
No Config Match | No configuration matches the file pattern |
-20011 |
Column Mismatch | CSV has different column count than template |
-20021 |
Processing Error | General processing failure |
| Other negative | Various Errors | Specific error codes for different failure scenarios |
Usage Examples
Basic Processing
-- Simple processing (procedure version)
BEGIN
CT_MRDS.FILE_MANAGER.PROCESS_SOURCE_FILE(
pSourceFileReceivedName => 'INBOX/C2D/UC_DISSEM/A_UC_DISSEM_METADATA_LOADS/data_file.csv'
);
END;
/
Prerequisites
Before using PROCESS_SOURCE_FILE, ensure proper system configuration is in place. For detailed setup instructions including source system registration, file type configuration, template table creation, and date format configuration, see the FILE_MANAGER Configuration Guide.
Monitoring and Troubleshooting
Monitoring File Processing Status
-- Check recent file processing activity
SELECT
SOURCE_FILE_NAME,
PROCESSING_STATUS,
RECEPTION_DATE,
EXTERNAL_TABLE_NAME
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE RECEPTION_DATE >= SYSDATE - 1 -- Last 24 hours
ORDER BY RECEPTION_DATE DESC;
Processing Status Values
Processing Status Values:
| Status | Description | Workflow Stage |
|---|---|---|
RECEIVED |
File registered, processing starting | Initial registration |
VALIDATED |
File validation completed successfully | After successful validation |
READY_FOR_INGESTION |
File validated and prepared for Airflow+DBT processing | After successful validation and preparation |
INGESTED |
Data has been consumed/ingested by target system | After data consumption |
ARCHIVED |
Data exported to PARQUET format and file moved to archival storage | Final archival state using FILE_ARCHIVER |
VALIDATION_FAILED |
File validation failed | After failed validation |
Detailed Processing Logs
-- View detailed processing logs
SELECT
LOG_TIMESTAMP,
PROCEDURE_NAME,
LOG_LEVEL,
LOG_MESSAGE,
PROCEDURE_PARAMETERS
FROM CT_MRDS.A_PROCESS_LOG
WHERE PROCEDURE_NAME IN ('PROCESS_SOURCE_FILE', 'REGISTER_SOURCE_FILE_RECEIVED',
'CREATE_EXTERNAL_TABLE', 'VALIDATE_SOURCE_FILE_RECEIVED')
AND LOG_TIMESTAMP >= SYSDATE - 1
ORDER BY LOG_TIMESTAMP DESC;
Common Error Scenarios and Solutions
Error -20002: No Configuration Match
Problem: File path doesn't match any configured pattern
-- Check configured patterns
SELECT
s.A_SOURCE_KEY,
sfc.SOURCE_FILE_ID,
sfc.SOURCE_FILE_NAME_PATTERN,
sfc.TABLE_ID
FROM CT_MRDS.A_SOURCE_FILE_CONFIG sfc
JOIN CT_MRDS.A_SOURCE s ON s.A_SOURCE_KEY = sfc.A_SOURCE_KEY
ORDER BY s.A_SOURCE_KEY, sfc.SOURCE_FILE_ID;
Solution: Add missing configuration or correct file naming
Error -20011: Column Count Mismatch
Problem: CSV file has different number of columns than template table
-- Check template table structure
SELECT column_name, data_type, column_id
FROM user_tab_columns
WHERE table_name = 'YOUR_TEMPLATE_TABLE'
ORDER BY column_id;
-- Analyze validation errors
SELECT FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(file_key) FROM DUAL;
Solutions:
- Fix CSV file column count
- Add missing columns to template table
- Remove excess columns from CSV
File Not Found Errors
Problem: File doesn't exist in expected cloud storage location
-- List files in bucket location
SELECT object_name
FROM DBMS_CLOUD.LIST_OBJECTS(
credential_name => 'DEF_CRED_ARN',
location_uri => 'https://your-bucket-uri/',
prefix => 'INBOX/C2D/UC_DISSEM/'
)
WHERE ROWNUM <= 20;
Solutions:
- Verify file upload to correct location
- Check file naming matches expected pattern
- Verify cloud storage credentials and permissions
Enhanced Error Monitoring and Logging
Error Log Monitoring
The FILE_MANAGER system provides comprehensive error logging for troubleshooting:
-- View recent processing errors
SELECT LOG_TIMESTAMP, LOG_LEVEL, LOG_MESSAGE, PROCEDURE_NAME
FROM CT_MRDS.A_PROCESS_LOG
WHERE LOG_LEVEL = 'ERROR'
AND LOG_TIMESTAMP >= SYSDATE - 1 -- Last 24 hours
ORDER BY LOG_TIMESTAMP DESC;
-- View validation-specific errors
SELECT LOG_TIMESTAMP, LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG
WHERE LOG_MESSAGE LIKE '%EXCESS COLUMNS%'
OR LOG_MESSAGE LIKE '%VALIDATION%'
ORDER BY LOG_TIMESTAMP DESC;
-- Analyze errors for specific file
SELECT sfl.SOURCE_FILE_NAME, pl.LOG_MESSAGE, pl.LOG_TIMESTAMP
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED sfl
JOIN CT_MRDS.A_PROCESS_LOG pl ON pl.LOG_MESSAGE LIKE '%' || sfl.SOURCE_FILE_NAME || '%'
WHERE sfl.SOURCE_FILE_NAME = 'your_file.csv'
AND pl.LOG_LEVEL = 'ERROR';
File Validation and Error Handling
The FILE_MANAGER system includes comprehensive validation features for CSV files during processing:
Pre-Processing Validation
- Column Count Verification: Automatically checks if CSV files match template table structure
- Error Prevention: Validates files before creating external tables to prevent processing failures
- Detailed Error Messages: Provides specific guidance when validation fails
Common Validation Scenarios
Scenario 1: Excess Columns (Error -20011)
EXCESS COLUMNS DETECTED!
CSV file has 8 columns but template expects only 5
Excess columns: 3
Solutions:
- Remove excess columns from CSV file
- Add missing columns to template table:
ALTER TABLE CT_ET_TEMPLATES.{SOURCE}_{TABLE_NAME} ADD (NEW_COLUMN1 VARCHAR2(100), NEW_COLUMN2 NUMBER);
Error Analysis for File Validation
-- Find file key for analysis
SELECT A_SOURCE_FILE_RECEIVED_KEY
FROM CT_MRDS.A_SOURCE_FILE_RECEIVED
WHERE SOURCE_FILE_NAME = 'your_file.csv';
-- Analyze validation errors using wrapper function
SELECT CT_MRDS.FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(file_key) FROM DUAL;
-- Example with specific key:
SELECT CT_MRDS.FILE_MANAGER.ANALYZE_VALIDATION_ERRORS(63) FROM DUAL;
Validation Error Monitoring
-- View recent validation errors
SELECT LOG_TIMESTAMP, LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG
WHERE LOG_LEVEL = 'ERROR'
AND (LOG_MESSAGE LIKE '%EXCESS COLUMNS%' OR LOG_MESSAGE LIKE '%VALIDATION%')
ORDER BY LOG_TIMESTAMP DESC;
Common Error Patterns and Solutions
| Error Code | Pattern | Solution |
|---|---|---|
| ORA-20011 | EXCESS COLUMNS DETECTED | Remove excess columns from CSV or add missing columns to template table |
| ORA-20002 | No match for source file | Configure file pattern in A_SOURCE_FILE_CONFIG |
| ORA-29913 | External table open error | Check bucket paths and file existence |
| ORA-01821 | Date format not recognized | Update date format in ADD_COLUMN_DATE_FORMAT |
Proactive Monitoring Setup
Set up monitoring for critical error patterns:
-- Create monitoring view for critical errors
CREATE OR REPLACE VIEW V_CRITICAL_ERRORS AS
SELECT
LOG_TIMESTAMP,
PROCEDURE_NAME,
CASE
WHEN LOG_MESSAGE LIKE '%ORA-20011%' THEN 'COLUMN_MISMATCH'
WHEN LOG_MESSAGE LIKE '%ORA-20002%' THEN 'CONFIG_MISSING'
WHEN LOG_MESSAGE LIKE '%ORA-29913%' THEN 'FILE_ACCESS'
ELSE 'OTHER_ERROR'
END as ERROR_CATEGORY,
LOG_MESSAGE
FROM CT_MRDS.A_PROCESS_LOG
WHERE LOG_LEVEL = 'ERROR'
AND LOG_TIMESTAMP >= SYSDATE - 7; -- Last week
This enhanced monitoring helps identify and resolve issues quickly, ensuring smooth file processing operations.
Best Practices
File Naming Conventions
- Use consistent naming patterns that match
SOURCE_FILE_NAME_PATTERN - Avoid special characters that might cause parsing issues
Related Procedures
The following procedures are called internally by PROCESS_SOURCE_FILE:
- REGISTER_SOURCE_FILE_RECEIVED: File registration and metadata extraction
- CREATE_EXTERNAL_TABLE: External table creation for data access
- VALIDATE_SOURCE_FILE_RECEIVED: Data validation and structure checking
- DROP_EXTERNAL_TABLE: Cleanup of temporary external tables
- MOVE_FILE: File relocation between buckets
- SET_SOURCE_FILE_RECEIVED_STATUS: Status management
For detailed information about individual procedures, refer to the package documentation.
Summary
PROCESS_SOURCE_FILE is the cornerstone of the FILE PROCESSOR system, providing a complete automated workflow for validating files and preparing them for Airflow+DBT processing pipelines. Its umbrella architecture ensures consistent file validation and preparation while comprehensive error handling and logging provide visibility and reliability for enterprise file processing operations that feed into downstream Airflow+DBT data workflows.