- Basics
- Design
     Document
- Transformations 
- Lookups 
- Joiners 
- Flat Files
- Database
     Connections
- Functions
- Change
     Management
- Workflows 
- Repository
     Manager
- Profiling
- XML
     Sources and Targets
- Problems
     and Solutions
- Tips
- Informatica
     Data Quality
- Administration
All areas to be
formatted, or have other work done, are marked with todo.
connect to the
repository
- Open
     designer
- menu
     "add repository": add repository name and user id
- then right
     click on the repository
- enter userid,
     pw, server and port
to view: drag
into the workspace
When exiting:
close all tools (menu repository > close all tools) otherwise some objects
will be locked
Open the folder
with right-click then "open" (even if the contents are visible)
Choose tool with little icons above the grey area
Choose tool with little icons above the grey area
- source
     analyzer
- warehouse
     designer (designs targets)
- transformation
     developer
- maplet
     designer
- mapping
     designer
Use session
start time for time stamps and not the sysdate. 
Stored
procedure:
bug in SQL server: remove the "I" checks for output parameters.
The "R" column indicates the returned value. A check should show in the "O" column too
Remember to set the "connection information" in the next tab ($source or $target).
Then, this variable must be set in the task properties, under mapping > connection $source/$target connection value.
Status codes are also sent back by the stored procedure (success, failure), but these are not visible and are acted upon by PowerCenter.
Use unconnected stored procedure transformation for running a stored procedure before or after a session, pre- or post-load of the source, pre- or post-load of the target, handle null values, or run nested stored procudures.
Call an unconnected procedure: :sp.the_procedure(PROC_RESULT) PROC_RESULT does not correspond to anything
bug in SQL server: remove the "I" checks for output parameters.
The "R" column indicates the returned value. A check should show in the "O" column too
Remember to set the "connection information" in the next tab ($source or $target).
Then, this variable must be set in the task properties, under mapping > connection $source/$target connection value.
Status codes are also sent back by the stored procedure (success, failure), but these are not visible and are acted upon by PowerCenter.
Use unconnected stored procedure transformation for running a stored procedure before or after a session, pre- or post-load of the source, pre- or post-load of the target, handle null values, or run nested stored procudures.
Call an unconnected procedure: :sp.the_procedure(PROC_RESULT) PROC_RESULT does not correspond to anything
mapping
variable:
menu mapping > parameters and variables
menu mapping > parameters and variables
Shared folder:
Copy object to the shared folder. Save. Then copy back to original folder.
Note that with the ctrl key down, it copies instead of creating a shortcut.
Copy object to the shared folder. Save. Then copy back to original folder.
Note that with the ctrl key down, it copies instead of creating a shortcut.
Copy mapping:
open the mapping, then use the menu > mappings > copy as .
use small INT for boolean
open the mapping, then use the menu > mappings > copy as .
use small INT for boolean
Debugger:
in menu > mapping: start debugger
Choose "use an existing session".
Use "discard.." so as not to write to the database
Step through. See results in the "session log" tab
When done, stop the debugger.
in menu > mapping: start debugger
Choose "use an existing session".
Use "discard.." so as not to write to the database
Step through. See results in the "session log" tab
When done, stop the debugger.
source
qualifier:
sort: add "number of sorted ports": this will sort the first n ports.
or use SQL override.
If the source has only a few tables, put the join in the "User-Defined Join" field instead of doing a SQL override.
sort: add "number of sorted ports": this will sort the first n ports.
or use SQL override.
If the source has only a few tables, put the join in the "User-Defined Join" field instead of doing a SQL override.
DECODE (TRUE,
DATATYPE != REPLACESTR(0,DATATYPE, 'char', 'text', 'lob', 'string', ''), 'C',
DATATYPE != REPLACESTR(0,DATATYPE, 'int', 'decimal', 'long', 'float', 'money',
'numeric', 'number', 'real', 'double', ''), 'N', DATATYPE != REPLACESTR(0,DATATYPE,
'datetime', 'date', 'timestamp', ''), 'D', '?')
Precision (see
Designer Guide)
- Integer in
     Informatica is 4 bytes in size and handles -2,147,483,648 to 2,147,483,647
     (precision of 10, scale of 0)
- Small
     integer: 4 bytes (!), precision of 5, scale of 0
- Bigint: 8
     bytes, precision of 19, scale of 0
- Decimal: 8
     to 20 bytes, up to 28 digits. Precision of 1 to 28 and scale 0 to 28.
When dragging
from one folder to another
| 
Source folder | 
Target folder | 
Result | 
| 
Open, object not saved | 
Open | 
Copy object | 
| 
Open, object saved | 
Open (has to be) | 
Create a short cut | 
Error
management
functions that
can be used in expression to handle errors:
- ERROR('the
     message')
- ABORT('the
     message')
Propagate the
error status CA7 -->
script on Win --> pmcmd .... -paramfile ...
Propagate
failure: Must put the
check in "fail parent" Then check the status in each link between the
sessions.
tip: default
session load type: tools > options > misc > target load type
tasks: sequential
buffer length: the buffor for just one line.
paths for
server, sessions, values for $variables: menu server > server configuration
> choose config > button edit > button advanced
Mapping
Parameters and Variables
Mapping
parameters do not change. Set them at the beginning. The parameter file is
generally used for this. Otherwise set the value as a default value in menu
mappings > parameters and variables > Initial value. To pass workflow
value into the mapping, in the task properties, choose components tab >
"Pre-session variable assignment." 
Keep the column
"IsExprVar" as false. Set to true if the variable contains an
expression to be evaluated, such as a condition or an expression that varies. 
To return a
value up to the workflow level, use a mapping variable (not parameter). Set the
value with SETVARIABLE($$RETURN_VALUE,
'something') in a place
where the value is actually used, not in a port with no input and no output. In
the "Post-session on success variable assignment," assign the value
to a workflow parameter. The mapping variables change in the mapping and the
value is kept from session to session, if not overridden by a parameter file.
Use these functions to change the value: SetMaxVariable, SetMinVariable,
SetCountVariable, and SetVariable.
SetVariable($$VARIABLE_NAME, new_value) (the corresponding port will take on the value).
SetVariable($$VARIABLE_NAME, new_value) (the corresponding port will take on the value).
I can't assign
variables to a reusable task. Need to look into this more.
The string
values do not need quotes around them
Workflow
variables are created in menu workflows > edit > variables tab. They get
initial values and can be defined as persistent. Values can also be assigned in
an "Assignment" task.
A workflow
variable can be used as the name of a target file: $$WKFL_THE_FILE_NAME. Then
put it in an email (below are the attachement and the simple display of the
variable's value): %a<$PMTargetFileDir/$$WKFL_THE_FILE_NAME>
The value of the variable is $$WKFL_THE_FILE_NAME
The design
document will typically contain the following elements:
- High Level
     Data Flow describes where the data comes from and where it is
     going, which files are used for sources and targets, where the files are
     located.
- Type of
     load, i.e.
     full, partial, deltas. Time interval. 
- Change
     Data Capture method
- Frequency
     and Timing
- Overall dependencies
- Error
     Handling
For each
mapping, provide the following in addition to the mapping spreadsheet:
- Import and
     output formats: flat file, database connection.
- Dependencies on other
     mappings or processes.
- Transformations for each
     mapping: data cleanup, lookups, transformation logic. The details of this
     appear in the mapping spreadsheet.
- Source and
     target layouts if they do not show clearly in the mapping document
- Details of
     error handling
As a developer,
ask these questions (as a designer, make sure the answers are defined):
- what is
     source data (table...)
- what is
     the target data (database, table, ..)
- requirements
     for target table: fields
- transformation
     logic 
- sorting
      needs
- group
      by...
- workflow: 
- times,
      schedule
- pre /
      post SQL statements or stored procedures
Active or
passive tranformations: active change the number of rows, passive
transformations keep the existing number of rows. 
Connected
transformations are connected to the dataflow and to other transformations.
Unconnected transformations are called from within another transformation.
| 
name | 
Active | 
Passive | 
descr | |
| 
For all (or most) transformations | 
Most are connected
  transformations, some can be unconnected too. Most have two groups of ports: input and output. | |||
|  | 
Aggregator | 
Active | 
Performs aggregate calculations.
  Define expression for the aggregate. Can use local variables (see below). Try to use pre-sorted data if possible. Aggregators do not sort, but use a clustering algorithm that appears to sort the output. | |
|  | 
Application Source Qualifier | 
Active | 
Represents the rows that the
  Integration Service  reads from an application, such as an ERP source, when it runs a session. | |
|  | 
Custom | 
Active | 
Passive | 
Calls a procedure in a shared
  library or DLL. | 
|  | 
Expression | 
Passive | 
Calculates a value. Define the
  row-level calculation. Can use local variables (see below). | |
|  | 
External Procedure | 
Passive | 
Calls a procedure in a shared
  library or in the  COM layer of Windows. Can be unconnected. | |
|  | 
Filter | 
Active | 
Filters data. Define the expression
  that returns true or false, with true indicating that the data can go through. | |
|  | 
HTTP Transformation | 
Passive | 
Connects to an HTTP server to read
  or update data.  Contains input, output and header groups | |
|  | 
Mapplet Input | 
Passive | 
Defines mapplet input rows.
  Available in the Mapplet Designer. | |
|  | 
Java | 
Active | 
Passive | 
Executes user logic coded in Java.
  The byte code for the user  logic is stored in the repository. | 
|  | 
Joiner | 
Active | ||
|  | 
Lookup | 
Passive | ||
|  | 
Normalizer | 
Active | 
Source qualifier for COBOL
  sources. Can also use in the pipeline to  normalize data from relational or flat file sources. For bringing in n columns into one, define one column as occurring n times. Link the n columns to the n input occurrences of the one column. This goes into one output column. The GCID_col goes from 1 to n and identifies which column was the source. The GK_col outputs a unique identifier, which is probably redundant with a primary key used for the target. | |
|  | 
Mapplet Output | 
Passive | 
Defines mapplet output rows.
  Available in the Mapplet Designer. | |
|  | 
Rank | 
Active | 
Limits records to a top or bottom
  range.  Define expression for the ranking. Can use local variables (see below). | |
|  | 
Router | 
Active | 
Routes data into multiple
  transformations based on group conditions.  One input group, multiple output groups, with one default group. Expressions returning true or false define the groups. A row for which all expression results are false goes to the default group. When linking, drag the input fields to get the names right, then drag the appropriate output fields to make the links. | |
|  | 
Sequence Generator | 
Passive | 
Generates primary keys.  Normally, "start value" = 0, "Increase by" = 1, "current value" = 1, "reset" = no. To start at 1 at each execution, set "current value" to 1 and put check in "reset". Best practice is to make sequence generators reusable. Connect only the NEXTVAL port in mappings. | |
|  | 
Sorter | 
Active | 
Sorts data based on a sort key. | |
|  | 
Source Qualifier | 
Active | 
Represents the rows that the
  Integration Service reads  from a relational or flat file source when it runs a session. Only ports that are linked to the next transformation are generated in the select statement. If fields do not show, it is because a link should be made with the next transformation. It is best to first generate the SQL before changing the "from" and "where" clauses so that the ports will line up with the fields of the SQL statement. | |
| 
SQL | 
Active | 
Passive | 
Executes SQL queries against a
  database. | |
|  | 
Stored Procedure | 
Passive | 
Calls a stored procedure. Can be
  unconnected. Best practice is to make stored procedure transformations reusable. See more information in the "basics" section. | |
|  | 
Transaction Control | 
Active | 
Defines commit and rollback
  transactions. An expression determines what type of transaction is performed (commit before or after, rollback before or after, or no change). | |
|  | 
Union | 
Active | 
Merges data from different
  databases or flat file systems.  Multiple input groups, one output group. | |
|  | 
Update Strategy | 
Active | 
Determines whether to insert,
  delete, update, or reject rows. An expression returns a numerical value that
  determines which of the four strategies to use. Select "forward rejected rows" to pass the rejects to another transformation. Use variables in the "update strategy expression": DD_INSERT (0), DD_UPDATE (1), DD_DELETE (2), DD_REJECT (3). For updates, key is determined by the definition of the target, i.e. define the primary key in the target definition. Session level update strategy is defined by the session's "treat source rows as" property. The options insert, update, or delete act on all rows. The data driven option uses an update strategy transformation within the mapping. Only inserts or deletes or updates (mutually exclusive): "Treat Source Rows As" = insert or delete or update. Make sure to select insert/delete/update option for all target instances in the session. Different operations on rows: add an update strategy transformation. Make sure that the insert, delete, and one of the update options is selected for each target instance in the session. Select "data driven" for the "treat source rows as" property. | |
| 
XML Generator | 
Active | 
Reads data from one or more input
  ports and outputs XML  through a single output port. Multiple input groups and one output group. | ||
| 
XML Parser | 
Active | 
Reads XML from one input port and
  outputs data to one or more output ports. One input group and multiple output groups. | ||
| 
XML Source Qualifier | 
Active | 
Represents the rows that the
  Integration Service reads  from an XML source when it runs a session. Multiple input and output groups. | ||
| 
XML Target Definition | 
Multiple input groups. | 
What you do in
a transformation: 
- Name/rename
     (tab transformation)
- Describe
     the transformation (tab transformation)
- Make
     reusable (tab transformation)
- Define the
     ports: see below
- Add ports
     either by dragging from another transformation (see layout > copy
     columns if this is not enabled) or by clicking button "add port"
- Add groups
- Enter
     SQL-like expressions
- Define
     local variables
- Override
     default variables (strings need single quote)
- Note that
     input/output ports do not transform the data
- Configure
     the tracing
- Define the
     properties
- Add
     metadata extensions
When renaming
ports, the ports are automatically renamed in the expressions.
Add comments
with "--" or "//", or with the comments button.
Expressions are
linked to the output port.
Ports are
evaluated in the following order: input ports, variable ports and output ports.
Equivalence
between numeric and boolean: true is non-zero, false is zero.
Initial values:
0 for numeric, '' for string, a low date for date/time. Input ports are null by
default. 
A blank default value means NULL.
A blank default value means NULL.
Local
variables: 
- When the
     same expression shows several times, put in a local variable then use the
     local variable.
- Keep
     values from previous rows (when counting or summing)
Lookup:
Define the lookup table.
Then bring in an outside field (or fields) for comparison. Call them "IN_"
Create the condition (see separate tab)
For unconnected lookups, the column that you want back must be checked with the "R". It must be unique.
Lookups should be made reusable.
Define the lookup table.
Then bring in an outside field (or fields) for comparison. Call them "IN_"
Create the condition (see separate tab)
For unconnected lookups, the column that you want back must be checked with the "R". It must be unique.
Lookups should be made reusable.
Unconnected
Lookup:
Inport the definition
Define the input ports
Define one of the ports as the return port by clicking in the "R" column
Only one column can be returned for unconnected lookups
Write the following expression: :LKP.the_transf_name(input_ports)
Inport the definition
Define the input ports
Define one of the ports as the return port by clicking in the "R" column
Only one column can be returned for unconnected lookups
Write the following expression: :LKP.the_transf_name(input_ports)
Dynamic lookup:
turn on "dynamic lookup chache" and "insert else update"
Bring in the key and the type 2 fields. Then assoicate the lookup fields with the type 2 fields.
Put a selection on current data in the sql override (end_effective_date is null or end_effetive_date = 31/12/9999)
Only equality is allowed in the condition.
turn on "dynamic lookup chache" and "insert else update"
Bring in the key and the type 2 fields. Then assoicate the lookup fields with the type 2 fields.
Put a selection on current data in the sql override (end_effective_date is null or end_effetive_date = 31/12/9999)
Only equality is allowed in the condition.
Lookup Tips:
Must have at
least one input port.
Properties: SQL override, table name, caching (flat files always cached), cache directory, cache persistence, cache size,
policy on multiple matches, connection information, and many others.
Best practice is to make lookups reusable.
Properties: SQL override, table name, caching (flat files always cached), cache directory, cache persistence, cache size,
policy on multiple matches, connection information, and many others.
Best practice is to make lookups reusable.
- Minimize
     the amount of rows returned by the lookup by adding filters in the where
     clause
- Enter a
     sufficient cache size so as to prevent paging: (#rows * size of rows) +
     overhead 
- Use an
     unconnected lookup if the lookup is not used for every row (only on some
     rows)
- If the
     input can potentially contain two lines for the same account/ID, then 
- either
      commit interval = 1 and no cache on lookup
- or commit
      interval > max # rows (to use rollback in caes of error) and use a
      dynamic lookup.
- It is best
     to make lookups reusable. In this way, they appear in the list of
     transformations.
- Note that
     it is also very important to keep track of the tables used by the lookups
     as changes to the underlying tables can only be introduced manually to the
     lookup transformations. Make the change to the lookup transformation and
     then test the change by generating and validating a SQL statement (an
     unknown field will be generated in a SQL statement without any problem,
     but the statement will not pass the validation).
- To keep a
     port but to not select it in the generated SQL statement, remove the check
     for "O" (output port). This also works on files: the unselected
     fields are not brought into memory.
- Case insensitive
     lookups do not work on flat files and are DBMS dependent for databases.
- Lookup on
     a file: use options "strip trailing blanks" otherwise the keys
     will not match. Though this takes care of trailing blanks, it does not
     take care of leading blanks.
- In the
     conditions, put the "=" first, then the conditions based on less
     than, greater than, and finally the conditions based on not equals to.
- Integration
     Service matches null values
- If the
     source and the lookup are on the same database and caching is not
     possible, then join the lookup table to the source.
- In
     uncached lookups, when the condition is false, Integration Service returns
     the default value for connected transformations and NULL for unconnected
     transformations. User defined default values only work with connected
     lookups.
Joins data from
different databases or flat file systems. 
Contains master input group, detail input group and output group. Try to use pre-sorted data if possible.
Define the data set with lower cardinality as the master.
A normal join or a master outer join perform faster.
A master outer join discards unmatched master records and keeps all records from the detail source.
A detail outer join discards unmatched detail records and keeps all records from the master source.
The error "Joiner Transformation has invalid join condition: not all of the ports could be found" disappeared when the join types were tweaked.
Contains master input group, detail input group and output group. Try to use pre-sorted data if possible.
Define the data set with lower cardinality as the master.
A normal join or a master outer join perform faster.
A master outer join discards unmatched master records and keeps all records from the detail source.
A detail outer join discards unmatched detail records and keeps all records from the master source.
The error "Joiner Transformation has invalid join condition: not all of the ports could be found" disappeared when the join types were tweaked.
| 
Join Type | 
Normal  | 
Master Outer | 
Detail Outer | 
Full Outer | 
|  Master Lower cardinality | 
Keep
  records only in master | 
Keep
  both | ||
|  Detail Higher cardinality | 
Keep
  records only in detail | 
Keep
  both | ||
| 
Performance | 
Faster | 
Faster | 
| 
 | 
 | ||||||||||||||||||||||||
| 
 | 
 | ||||||||||||||||||||||||
Create a flat
file definition from a table:
- Create the
     table in the database;
- Import the
     definition into the source;
- In the
     target designer, pull in the source definition: it now becomes a target
     too;
- Change the
     database type to "Flat File";
- Change the
     name of the object; for sources, set the "database name" to
     "FlatFile" as appropriate;
- Define the
     details in the advanced tab;
- Create a
     simple mapping with just source and target;
- Best
     practice is to define all flat file fields as strings and use appropriate
     expression transformations to convert data types;
- Indirect
     files: the file name is in another file (in a list file).
- When using
     fixed width, select the option "Strip Trailing Blanks" in the
     source file properties of the session. This seemed to be the problem with
     one large file, but not sure.
- The option
     "Line Sequential File Format" indicates that the carriage return
     is the end of the record. This can help handle rows that are too short.
- For comma
     separated value files (csv), see functions.
To renew the
definition of a source or a target, simply re-import the table and choose the
"replace" option. Remember to renew the definition of the primary
key.
substr(a_string,
start_pos, length)
Return a substring. Start_pos=0..1:
first character. Start_pos=2: second character.
Start_pos<0: count from the left.
If length<=0, then empty string. If length is omitted, then from start_pos to end.
Start_pos<0: count from the left.
If length<=0, then empty string. If length is omitted, then from start_pos to end.
instr(a_string,
b_string [, start [, occ_num]]) 
Searches in a_string for an occurrence
of b_string. 
Starts at start (1=beginning of string, negative counts from the end)
If occ_num > 1, then take the 2nd, 3rd... occurrence. Must be 1 or more.
instr('whatever', chr(39), 1, 1)
Result: 0 --> not found; 1 or more --> position of b_string in a_string
Starts at start (1=beginning of string, negative counts from the end)
If occ_num > 1, then take the 2nd, 3rd... occurrence. Must be 1 or more.
instr('whatever', chr(39), 1, 1)
Result: 0 --> not found; 1 or more --> position of b_string in a_string
is_date(a_date,
'MM/DD/YYYY')
The date format is compulsary. You must
put in a format! Be exact: if hours exist, put 'MM/DD/YYYY HH24:MI'. Note that
0:00 is a 24-hour format and does not exist in the 12-hour format. If the month
or day has just one digit, MM or DD works too. In fact, just M or just D will
not work.
iif(isnull(THE_DATE),
'', iif(IS_DATE(THE_DATE, 'MM/DD/YYYY HH12:MI AM'), TO_CHAR(TO_DATE(THE_DATE,
'MM/DD/YYYY HH12:MI AM'), 'MM/DD/YYYY'), ''))
Conversion functions. To prevent
transformation errors, test the validity of the string as a date before
converting. Notice the 12-hour format with the AM/PM indicator
in(in_port,
'A', 'B', 'C', case_flag)
Returns 1 (true) if the input value is
in the enumeration.
case_flag=0 ==> case INsensitive;
case_flag=1 ==> case sensitive (actually, case_flag != 0)
case_flag=0 ==> case INsensitive;
case_flag=1 ==> case sensitive (actually, case_flag != 0)
decode(value,
first_equiv, first_result, second_equiv, second_result [, default])
Returns the first matching result
variable port: iif(instr(ltrim(in_port), ' ', 1,
1)>0, instr(ltrim(in_port), ' ', 1, 1), length(ltrim(in_port)))
out port: iif(IS_NUMBER(substr(ltrim(in_port), 1, v_the_var_port)),
rtrim(substr(ltrim(in_port), v_the_var_port + 1, length(in_port))),
ltrim(rtrim(in_port)) )
out port: iif(IS_NUMBER(substr(ltrim(in_port), 1, v_the_var_port)),
rtrim(substr(ltrim(in_port), v_the_var_port + 1, length(in_port))),
ltrim(rtrim(in_port)) )
Remove the first word only if it is a
number. The variable is the length of the first word
lpad(string,
up_to_length, 'x')
Pad the string on the left with a
maximum of up_to_length occurrences of 'x'.
REPLACECHR(caseFlag,
input_string, old_char_set, new_char_set)
Replace a character with another
caseFlag: 0 --> case INsensitive; not 0 --> case sensitive
Example: REPLACECHR(0, field_name, chr(10) || chr(13), ' ')
caseFlag: 0 --> case INsensitive; not 0 --> case sensitive
Example: REPLACECHR(0, field_name, chr(10) || chr(13), ' ')
REPLACESTR(caseFlag,
input_string, old_string1, [old_string2, ... old_stringn,] new_char_set)
Replace one of several strings with
another
caseFlag: 0 (or null) --> case INsensitive; not 0 --> case sensitive
Example: REPLACESTR(0, field_name, '"', '""')
caseFlag: 0 (or null) --> case INsensitive; not 0 --> case sensitive
Example: REPLACESTR(0, field_name, '"', '""')
'My family' ||
CHR(39) || 's house' --> My family's house 
Insert a single quote in a string with
CHR(39)
MAKE_DATE_TIME(y,
m, d [, h, m, s])
Create a date. Hours, minutes and
seconds are optional
the_key I O .
the_key
v_the_test . . V iif(the_key=v_prev_key, 'same', 'new') <-- this is a variable!
v_prev_key . . V the_key
o_the_test . O . v_the_test
v_the_test . . V iif(the_key=v_prev_key, 'same', 'new') <-- this is a variable!
v_prev_key . . V the_key
o_the_test . O . v_the_test
Compare a value to that of previous row:
remember that output ports are evaluated after variables. You may want to test
for a null value of v_prev_key too.
Comma-separated
values (csv)
First option is
with no text delimiter in the file properties. Selectively surround fields
containing commas and end-of-line characters with double quotes, and repeat any
existing double quotes:
'"' || REPLACESTR(0, descr, '"', '""') || '"'
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target)
'="' || field_with_leading_zeros || '"'
'"' || REPLACESTR(0, descr, '"', '""') || '"'
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target)
'="' || field_with_leading_zeros || '"'
Another option
is to set the text delimiter to double quotes in the file properties. Repeat
any existing double quotes in both the descriptions and the formula, but don't
explicitely surround fields with double quotes: 
REPLACESTR(0, descr, '"', '""')
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target). Use repeated double quotes:
'=""' || field_with_leading_zeros || '""'
REPLACESTR(0, descr, '"', '""')
Handle any fields with leading zeros or spaces by making is a formula (only works with Excel target). Use repeated double quotes:
'=""' || field_with_leading_zeros || '""'
Aggregate
functions: 
- AVG 
- COUNT
- FIRST
- LAST
- MAX/MIN
     (date, number and string)
- MEDIAN
- PERCENTILE
     (value, percentile [, filter])
- STDDEV
- SUM
- VARIANCE
Nest up to two
aggregator functions, e.g. SUM(AVG(a_field))
Nulls are treated as either null or 0 depending on the setting of the Integration Service.
All of the aggregate functions take a second argument that is a filter (percentile takes up to three arguments): FCTN(value [, filter condition]). The Integration Service selects the row if the filter evaluates to true, and not if the filter evaluates to false or null.
Nulls are treated as either null or 0 depending on the setting of the Integration Service.
All of the aggregate functions take a second argument that is a filter (percentile takes up to three arguments): FCTN(value [, filter condition]). The Integration Service selects the row if the filter evaluates to true, and not if the filter evaluates to false or null.
There is no
convenient functionality to detect changes to the underlying sources and to
determine the impacts of the changes. (Vive Genio!). A certain amount of
discipline is needed to document the mappings in such a way as to determine
impacts of changes: these are ideas for documentation.
- Sources
     used: all tables, even if the SQL is overriden, should show in the list of
     source tables. The name of the object should reflect the name of the
     table, preceeded with the database and/or schema if necessary.
- Define the
     sources first, then drag them into the Target Designer. You can't go
     backwards.
- Targets
     used: idem.
- Keep note
     of the keys of the targets, especially if the keys were modified after the
     target was imported.
- Lookups:
     make re-usable so that the lookups show in the list of transformations.
     The name of the lookup transformation should reflect the name of the
     underlying table (preceeded with the database and/or schema if necessary).
     
- Note where
     source qualifiers have SQL overrides.
- Note where
     target updates have SQL overrides, in particular for type 1 updates (based
     on a natural key for a table where the key is a surrogate key).
- Note where
     lookups have SQL overrides.
- Sequences:
     make re-usable. Note which tables have a related sequence.
Some common
problems:
- The target
     was re-imported and the definition of the keys was lost.
- New
     columns were removed then added again to the source: the links have to be
     redone between the update strategy and the target.
- Not all
     targets are linked the same way, for example when there are several
     branches for the same target.
In the workflow
manager:
create a workflow, then inside that create a task.
There is always a "start" task
In the "mapping" tab of the task, define the source and the target.
In the target, define the "target load type" as normal (not bulk)
Use "truncate target table option" to reload from start each time
There is also a "delete" option, but what does this do?
Do not override in the session
Generally make reusable sessions.
create a workflow, then inside that create a task.
There is always a "start" task
In the "mapping" tab of the task, define the source and the target.
In the target, define the "target load type" as normal (not bulk)
Use "truncate target table option" to reload from start each time
There is also a "delete" option, but what does this do?
Do not override in the session
Generally make reusable sessions.
Define source
and target in "connections". Define source file location and name in
"files, directories and commands. 
Use variables $Source and $Target and define the target in the "$Target Connection Value" (put $Target in places where Infa put the actual value).
Use variables $Source and $Target and define the target in the "$Target Connection Value" (put $Target in places where Infa put the actual value).
Schedule a
workflow
In workflow
designer, drag the workflow into the "workflow designer" panel menu
workflows > edit > scheduler tab > click on button next to scheduler
box Remember to save
Workflow
monitor:
execute: connect to the repository then to the scheduler
execute: connect to the repository then to the scheduler
to find the
session log in workflow monitor,
right-click on the session(not the workflow, not the green bar)
partition points in th the second "tab" of :
workflow manager, task, task propoerties, tab "mapping", button "partitions" below left corner.
right-click on the session(not the workflow, not the green bar)
partition points in th the second "tab" of :
workflow manager, task, task propoerties, tab "mapping", button "partitions" below left corner.
Error
Detection
- Add a
     filter to the links between tasks: $s_m_session_name.Status=Succeeded or $s_m_session_name.Status=Failed
- Use option
     "Fail parent if this task fails"
Attachement in
an email:
See attached
file.
%a</u01/usr/target/e_file.txt>
%a</u01/usr/target/e_file.txt>
Other codes:
- %a -->
     attachment
- %b -->
     Session start time
- %c -->
     Session completion time
- %e -->
     Session status
- %g -->
     Attach session log
- %l -->
     Total records loaded
- %r -->
     Total records rejected
- %s -->
     Session name
Extras in
Tasks
Generally, all
of this happens in the "Components" tab of the tasks editor.
Rename a file
before running the task
Non re-usable pre-session command: 
cp /u01/after_cdc/export_file.out /u01/arch/export_file`date '+%Y_%m_%d_%H_%M_%S'`.out
Or copy the current file to the previous file before doing a Change Data Capture
cp /u01/curr/export_file.out /u01/prev/export_file.out
cp /u01/after_cdc/export_file.out /u01/arch/export_file`date '+%Y_%m_%d_%H_%M_%S'`.out
Or copy the current file to the previous file before doing a Change Data Capture
cp /u01/curr/export_file.out /u01/prev/export_file.out
Send email on
failure
Configure an on-error email in
"components"
command task
In the commands, call a script: /u01/shell_scripts/check_import.sh
Send an email
with an attachment
Create a dummy mapping. In the success
email or the failure email, define the body as:
See attachment: %a</u01/the_file>
See attachment: %a</u01/the_file>
Create a
folder:
In repository manager, menu folder > create
In repository manager, menu folder > create
For group-by in
auto-profile:
- Create the
     auto-profile
- Edit the
     profile and put in group-by functions on all the fields for which it is
     worth it
- Note that instead
     of scrolling down the list of fields, just press the down arrow
- Modify the
     name of the profiling function by adding the column name to it (press
     "next" to get the name of the column, then press
     "back" to modify the name
If field is too
long, the auto-profile does not do the min/max and distinct/duplicate analysis.
Under some conditions, the domain analysis is not done. Is it when the profile
runs inside a workflow with other profiles?
This function
(oops, which function?) can infer domains for columns with a numeric datatype
with a precision of 28 digits or less or a string datatype with a precision of
200 characters or less. This function can also infer domains for columns of the
Date/Time datatype.
- Add a
     filename to the XML output: edit the target, right-click on the root view
     and choose the option "Create a Filename Column".
- The
     generated keys in XML are numbers by default. Explicitely change to string
     if needed. 
- In XML
     source, options when linking data from two hierarchical levels: 
- Create
      two XML sources and join the two related source qualifiers with a joiner.
      Be careful with the generated keys: choose one of two options (I haven't
      tested them): reset or restart.
- Join two
      levels from one source qualifier coming from one XML source. The joiner
      must have the sorted input option set. I tried this and it seems to work.
      It sounds too easy so I will have to continue to monitor it.
- Modify
      the XML view so that the fields are denormalized.
- In XML
     target, the check in "format output" put every new element on a
     new line. Otherwise, everything is on one line.
- Invalid
     XML defintion. Solution: validate in XML editor: menu XML Views >
     validate XML definition
- "This
     XML definition is too large. The total length of all columns must not exceed
     500MB." This may be because some strings are defined with length
     "infinite". Solution: define a length for these strings.
- What to do
     with empty or null values? If the desired result is to output the
     attribute name with with an empty string for the value, then use these
     settings: 
- Null
      content representation = tag with empty content
- Empty
      string content representation = tag with empty content
- Null
      attribute representation = Attribute Name with Empty String
- Empty
      string attribute representation = Attribute Name with Empty 
Creating an
XML View
- Create the
     view
- Enter the
     name (the column properties should be visible at the bottom of the window)
- Select the
     node in the Schema Navigator from which the view will be created
- Menu
     Components > XPath Navigator
- In the
     upper right of the little window, select view row mode
- Select the
     view in the navigator on the left and drag into the newly-created view
- Create a
     primary key
<  
- error
     "cannot connect to server ...": start the service
- the
     workflow fails and there is no log: is the source file format correct?
- The
     workflow is invalid when trying to execute. Close the folder, reopen the
     folder, use menu > workflow > validate. Or try saving the workflow
     again.
- Debug with
     debugger, or with verbose data (define in the session properties) or look
     at session properties in the workflow manager and look for "first
     error".
- Data being
     truncated? Data types have changes? Right-click on the port (in layout)
     and choose "propagate".
- Look at
     the links between the last transformation and the target, just in case a
     change to the fields removed a link.
- When
     updating, the changes do not get put in the database. The log says
     rejected rows, with error about no keys in target table. Define the keys
     in the target table definition.
- Fields
     from lookup do not line up correctly: the fields may not be in the
     original order, so restore the original order.
- The number
     of rows to be skipped at the beginning of a file is defined in the
     advanced properties of the source. See the source analyzer. But once it is
     in a session, look at properties for the source, then click on the
     hyperlink "Set File Properties" on the title bar of the
     properties section.
 "0" --> no rows skipped; "1" --> header row skipped.
- The
     session execution terminates unexpectedly, no error log is produced: 
- Generally
      comes from a file-related issue, in particular because one of the input
      files is not available
- Start
      with the lookup files
- Informatica
      does NOT like removing the option for "case sensitive string comparison"
      on file-based lookups
- For an
      XML target, some of the fields were null that should not have been null.
- For an
      XML target, it does not like outputing no data to an XML target.
- The
      toughest is targets. If a file does not get written, even with 0 length,
      maybe that is the file that is causing Informatica to stop.
- Invalid
      date formats can make a mapping stop without warning, even in debugger
      mode.
- The error
     "Joiner Transformation has invalid join condition: not all of the
     ports could be found": tweak the join types were; try master outer
     join instead of full join.
- A lookup
     cannot find a value, and this value just happens to be in the first row.
     Check the number of rows skipped.
- The
     variable values do not change. I created a variable with value A. I modified
     the value to B. When running in debugger, the correct value B showed. When
     running in normal mode, the old value A showed. Solution: created a new
     variable with value B and removed the old variable.
- "ERROR:
     Partitioning option license required to run sessions with user-defined
     partition points" --> Remove the option "Sorted Input"
     in the aggregator
- "Concatenation
     error: Please check the Normalizer concatenation rule": only one
     transformation can follow a normalizer. Output ports from a noramlizer should
     go to only one transfromation.
- Add 1 to
     the offset field in the xml source definitions created from exporting the
     source definition. 
- Datetime
     fields in a target flat file acually only occupy 10 characters instead of
     the 19 characters as shown in the target definition.
- Given two
     fields A and B, sort on both A and B then feed into an ∑: it runs and
     produces results but task shows as failed. Remove the sort on B then add
     it again????
- By the
     way, if you cannot find the reason for an error, try checking that what
     you are working on is what is producing the error! 
- A mapping
     that contains a mapplet will not validate: check that all output ports of
     the mapplet are linked to another transformation. If needed, just pull and
     output port to an expression transformation even if nothing is done with
     it.
- A trigger
     on the target table prevents "bulk" target load type. Change to
     normal target load type.
- Normalizer,
     then filter, then incoming key from sequence, to e-Business Suite target:
     fails because a row still exists virtually despite the filter.
 Put the sequence before the filter: normalizer, then incoming key from sequence, then filter.
- Error
     "ORA-26002: Table ETL_MISC.PROCESSED_FILES has index defined upon
     it...Function Name : Execute Multiple" on a target table with bulk
     load type. This is because there is an index on a field and the bulk load
     type cannot handle it. Set the "Target Load Type" in the session
     target configuration to "Normal." 
- When
     validating an overridden query, an error on the number of returned rows
     can also be because of an invalid syntax. By the way, keep the from and
     where statements and re-generate the SQL.
- E-Business
     Suite Sources: Application Source Qualifiers go with Application Sources,
     not the "regular" source qualifiers
- E-Business
     Suite Targets: to see the rows that are inserted, look for the message
     code "EBWRT..." and the message "inserted rows in Partition
     #1 Group ... "
- E-business
     Suite Connections (menu connections > application >
     OracleEBusinessSuiteConnection: Set the name to the "apps"
     database user (schema name), connect string is the TNSNAMES identifier,
     the apps schema name is same as the user.
- No rows in
     the target, the debugger terminates on an error. Cause: aggregation on a
     field with a null result from a lookup. solution: if the lookup does not
     find the value, give a default value, not a null value.
- Break
     logic, assuming a variable is null some cases. Count lines instead.
- Rename a
     group of ports in a transformation: for example adding
     "..._lkp": key down, key "end", Ctrl+V.
- Build the
     expression for comparing the incoming fields to the lookup fields: go down
     the list of incoming fields, double_click the field, paste "_lkp !=
     " double_click again then click "OR"
 (of course, when the source and target fields names do not match then the validate button will complain --> correct manually).
- Put
     indexes and caches in memory for aggregators, lookups and sorters. 
 Monitor the $XXXX_CACHE while the mapping is running to determine if the default size is too small. This is the case when the cache size increases beyond the default as the transformation runs. Increase the default size in this case.
- It may be
     a good idea to drop or disable indexes during the load.
- Remember
     to put tracing back to normal when moving to QA.
- Check the
     correctness of the Buffer Block Size and Buffer Pool sizes.
- Session
     commit interval: set this to the largest possible, i.e. so that it covers
     the full loading. But do not go beyond 1 million.
- An
     alternative to dynamic lookups: have one fully cached lookup. If the row
     is not found, then use another non-cached lookup that executes a query
     each time. Then if this fails, a new row can be added.
- When
     evaluating a boolean expression in the debugger, 0 is false, 1 is true
- Bring in
     all the ports up to the first "cleanup" function. Try to keep
     the same order.
- Remove
     duplicate records with an aggregator. The source must be sorted. Be sure
     to check the "sorted input" option.
- Check the
     session log for: 
- Errors
      during read indicated by "Read [n] rows, read [e] error rows for
      source table..." where n is the number of rows in the source and e
      is the number of errors. If e>0, then an error occurred when reading.
- Errors in
      transformations, usually indicatd by a multiline description and
      "Transformation evaluation error"
- Number of
      rows skipped indicatd by "Skipped [1] lines."
- The
      actual input file from which the data was read
- The
     performance degrades exponentially with the number of targets
- Update
     strategies do not perform well
- Break down
     mappings with more than 50 objects into smaller mappings
- Denormalize
     with an aggregator. Group by the table's key field. For each output field,
     use the expression FIRST(amount, REC_TYPE='A') so as to create different amount
     fields depending on the record type. Amount and rec_type are both ports
     (to be verified).
- Strip off
     the path from a full file name: SUBSTR(full_file_name, INSTR(full_file_name, '/',
     -1, 1)+1)
 Keep the path: SUBSTR(full_file_name, 1, INSTR(full_file_name, '/', -1, 1))
 In windows, replace / with \
- Transform
     number to string without trailing zeros: to_char(to_integer(the_number))
Reset a
sequence generator:
| 
-- Instructions: see below  -- Don't worry: nothing is updated by this script. Copy and paste to update. declare @the_widget_type numeric(5), @the_attr_id numeric(5), @the_widget_id numeric(10) set @the_widget_type = 7 set @the_attr_id = 4 -- To use this, set the variable "@the_widget_id" to 0 -- Look at the list and choose -- Then set the variable to the appropriate ID -- Don't worry: nothing is updated by this script. Copy and paste to update. set @the_widget_id = 122 print 'Widgets' select wa.WIDGET_ID , left(w.widget_name, 20) as widget_name , left(subj.SUBJ_NAME, 20) as subj_name , left(a.ATTR_NAME, 15) as attr_name , wa.ATTR_VALUE from opb_widget_attr wa inner join opb_attr a on wa.widget_type = a.object_type_id and wa.attr_id = a.attr_id inner join opb_widget w on w.widget_id = wa.widget_id inner join opb_subject subj on w.subject_id = subj.subj_id where wa.widget_type = @the_widget_type and wa.attr_id = @the_attr_id and (wa.widget_id = @the_widget_id or 0 = @the_widget_id); print '--This is the update sql to use:' print 'update opb_widget_attr set attr_value = ''1'' ' print ' where widget_type = ' + cast(@the_widget_type as varchar(10)) print ' and attr_id = ' + cast (@the_attr_id as varchar(10)) print ' and widget_id = ' + cast(@the_widget_id as varchar(10)) + ';' print ' ' print ' ' print ' ' print 'These queries are just to verify that the widget ID and the attribute ID have not changed' select left(object_type_name, 20) as object_type_name , object_type_id , 'assumed to be 7' from opb_object_type where object_type_id = @the_widget_type; select left(attr_name, 20) as attr_name , attr_id , 'assumed to be 4' from opb_attr where object_type_id = @the_widget_type and attr_id = @the_attr_id; | 
Dependencies
in a mapping:
It is possible
to see dependancies between widgets. However, it is not easy to see the link
between input ports and output ports inside the widgets. The challenge is with
the expressions. The expressions are stored by not parsed. A possible option:
select the expressions and manually parse them to create links between input
ports and output ports. Another option is to look for where the
"analyze" functionality is located in the repository?
A widget is a
transformation. An instance is a widget inside a mapping. Here are some of the
views:
- rep_mapping_conn_ports:
     shows the connections between ports. Objects refer to widgets. Instances
     refer to widgets, but as numbered within a mapping. Object type is one of
     the dozen object types. This view does not have the field_id so use
     rep_widget_dep to get the field id.
- rep_widget_dep:
     instance from and instance to, with field ids. Returns the same number of
     rows as rep_mapping_conn_ports.
- rep_mapping_unconn_ports:
     unconnected ports.
- rep_widget_inst:
     list of instances (widgets).
- rep_widget_field:
     lists the output fields in each of the widgets and the expressions.
     To get the input fields, the expression has to be parsed. Note too that
     only the first line of the expression is shown, which may be incomplete.
- rep_widget_attr:
     was empty in studied repository.
Export
Target and Import as Source
- Remove
     UUID in the folder tag
- Remove the
     attributes in the powermart, but keep the attributes in repository.
- Remove the
     TABLEATTRIBUTE tags
- Global
     replace "TARGET" to "SOURCE" 
- Remove
     CONSTRAINT in the SOURCE tag
- Add
     DBDNAME ="FlatFile" (the sub-folder in sources) to the SOURCE
     tag
- Remove
     TABLEOPTIONS ="" and add OWNERNAME ="" to the SOURCE
     tag
- Add
     SHIFTSENSITIVEDATA ="NO" to the flat file tag
- It is
     possible to group several sources into one folder
- Names
     should not start with a number nor contain a dash "-".
- Completeness:
     fields with "real"values, i.e. values other than null, blank or
     a default value
- Conformity:
     data is legible and has a determined data type
- Consistency:
     the data types are similar and the data is stored in the correct fields;
     columns are independent
- Accuracy:
     data is correct, in comparison to a reference source
- Duplication:
     information is unique and it is not showing elsewhere in a different form
- Integrity:
     all relevant information for a data record is present in a usable form;
     links join related data
| 
Data Analysis | 
Data Enhancement | |
| 
Analysis Identify quality-related features with fields with emphasis on completeness, conformity, and consistency | 
Standardization Address issues of completeness, conformity, and consistency | |
| 
Matching Identify equivalent or related data records Identify inaccuarate records by comparing to a reference data set | 
Consolidation Address issues of accuracy, duplication, and integrity | 
Scorecarding:
measures data quality for one field
Grading: measures data quality for records
Grading: measures data quality for records
A plan consists
of 
- File-based
     or database data inputs
- Tools to
     analyze and enhance data, such as numeric field analysis, code field
     analysis, free text analysis, business rule applicatin for consistency
     analysis, missing values counter, rule based analyzer, character labeller,
     token labeller
- File-based
     or report-based sinks
- Independent
     data resources or dictionaries
Modify the
configuration:
infasetup command_name arguments
echo %errorlevel% (if error level > 0 then failure)
infasetup command_name arguments
echo %errorlevel% (if error level > 0 then failure)
To define a
domain:
Create an
environment variable called INFA_DOMAINS_FILE and set it to C:\informatica\PowerCenter8.1.0\domains.infa
/u01/informatica/powercenter/domains.infa
/u01/informatica/powercenter/domains.infa
To see session
logs in a file format, choose the option for backward compatible session logs.
The default location for session logs is
/u01/informatica/powercenter/server/infa_shared/sessLogs.
The node
configuration, including database connection details, is in /u01/informatica/powercenter/server/config/nodemeta.xml
Informatica
security: permissions of a folder:
- owner
- owner's
     group
- repository=all
     other users
A folder is
linked to one group and users are linked to several groups.
 
No comments:
Post a Comment