In IBM Infosphere DataStage, both Copy Stage and Modify Stage are simple processing stages used in parallel jobs, but their purpose is different.
Copy Stage
Definition :
The Copy Stage is used to pass data from input to output without changing the data.
It simply copies rows from the source to the target.
Example Design
Sequential File → Copy Stage → Dataset
Input:
Emp_ID Name
101 John
102 Mary
Output (same data):
Emp_ID Name
101 John
102 Mary
When Copy Stage is Used
1. Splitting data into multiple outputs
2. Passing data without transformation
3. Improving parallel processing
4. Debugging jobs
Example – Multiple Outputs
→ Target1
Source → Copy →
→ Target2
Same data goes to both targets.
✔
No transformation
✔ Very fast
✔ Minimal processing
2️. Modify Stage
Definition
The Modify Stage is used to change column values or data types using simple expressions.
It is faster than Transformer because it performs simple transformations.
Example
Input:
Name = john
Salary = 5000
Modify stage expression:
Name = upcase(Name);
Salary = Salary * 1.10;
Output:
Name = JOHN
Salary = 5500
3️ .Common Modify Stage Operations
Data Type Conversion
string_to_int(Age)
Change Column Value
Salary = Salary + 1000
Convert Case
Name = upcase(Name)
4️. Copy vs Modify Stage .
|
Feature |
Copy Stage |
Modify Stage |
|
Transformation |
No |
Yes (simple only) |
|
Performance |
Very fast |
Faster than Transformer |
|
Expressions |
Not allowed |
Allowed |
|
Use Case |
Data duplication |
Simple data modification |
5️. Real-Time Scenario
Scenario
Source sends customer names in lowercase, but target requires uppercase.
Design
Source → Modify Stage → Target
Expression
Customer_Name = upcase(Customer_Name);