Informatica Powercenter: 2009

Wednesday, June 17, 2009

Query to get the session information from the repository

Query to get the session information from the repository

1) use the following sql if u want just the times for the session or workflow
SELECT
OPB_SUBJECT.SUBJ_NAME SUBJECT_AREA,
OPB_WFLOW_RUN.WORKFLOW_NAME,
OPB_TASK_INST_RUN.SUBJECT_ID,
OPB_TASK_INST_RUN.WORKFLOW_ID,
OPB_TASK_INST_RUN.WORKFLOW_RUN_ID,
OPB_TASK_INST_RUN.WORKLET_RUN_ID,
OPB_TASK_INST_RUN.CHILD_RUN_ID,
OPB_TASK_INST_RUN.INSTANCE_ID,
OPB_TASK_INST_RUN.INSTANCE_NAME,
OPB_TASK_INST_RUN.TASK_ID,
OPB_TASK_INST_RUN.TASK_TYPE,
OPB_TASK_INST_RUN.START_TIME,
OPB_TASK_INST_RUN.END_TIME,
OPB_TASK_INST_RUN.RUN_ERR_CODE,
OPB_TASK_INST_RUN.RUN_ERR_MSG,
OPB_TASK_INST_RUN.RUN_STATUS_CODE,
OPB_TASK_INST_RUN.TASK_NAME,
OPB_TASK_INST_RUN.VERSION_NUMBER TASK_VERSION_NUMBER,
OPB_TASK_INST_RUN.SERVER_ID,
OPB_TASK_INST_RUN.SERVER_NAME
FROM
OPB_TASK_INST_RUN, OPB_WFLOW_RUN, OPB_SUBJECT
WHERE
OPB_SUBJECT.SUBJ_ID = OPB_TASK_INST_RUN.SUBJECT_ID
AND OPB_WFLOW_RUN.WORKFLOW_ID = OPB_TASK_INST_RUN.WORKFLOW_ID
AND OPB_WFLOW_RUN.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID

3) Use the following if u want the number of records
NOTE :: widget_type 3 = source / source qualifier
2 = target

so u cannot do sum on the total....u might have to split into 2 queries...or u can use decode

SELECT
OPB_SUBJECT.SUBJ_NAME SUBJECT_AREA,
OPB_WFLOW_RUN.WORKFLOW_NAME,
OPB_TASK_INST_RUN.SUBJECT_ID,
OPB_TASK_INST_RUN.WORKFLOW_ID,
OPB_TASK_INST_RUN.WORKFLOW_RUN_ID,
OPB_TASK_INST_RUN.WORKLET_RUN_ID,
OPB_TASK_INST_RUN.CHILD_RUN_ID,
OPB_TASK_INST_RUN.INSTANCE_ID,
OPB_TASK_INST_RUN.INSTANCE_NAME,
OPB_TASK_INST_RUN.TASK_ID,
OPB_TASK_INST_RUN.TASK_TYPE,
OPB_TASK_INST_RUN.START_TIME,
OPB_TASK_INST_RUN.END_TIME,
OPB_TASK_INST_RUN.RUN_ERR_CODE,
OPB_TASK_INST_RUN.RUN_ERR_MSG,
OPB_TASK_INST_RUN.RUN_STATUS_CODE,
OPB_TASK_INST_RUN.TASK_NAME,
OPB_SWIDGINST_LOG.INSTANCE_NAME,
OPB_SWIDGINST_LOG.WIDGET_TYPE,
APPLIED_ROWS SUCCESSFUL_ROWS,
AFFECTED_ROWS SUCCESSFUL_AFFECTED_ROWS,
REJECTED_ROWS FAILED_ROWS,
LAST_ERR_MSG LAST_ERROR,
LAST_ERR_CODE LAST_ERROR_CODE
FROM
OPB_TASK_INST_RUN, OPB_WFLOW_RUN, OPB_SUBJECT, OPB_SWIDGINST_LOG
WHERE
OPB_SUBJECT.SUBJ_ID = OPB_TASK_INST_RUN.SUBJECT_ID
AND OPB_WFLOW_RUN.WORKFLOW_ID = OPB_TASK_INST_RUN.WORKFLOW_ID
AND OPB_WFLOW_RUN.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID
AND OPB_SWIDGINST_LOG.WORKFLOW_RUN_ID = OPB_TASK_INST_RUN.WORKFLOW_RUN_ID
AND OPB_SWIDGINST_LOG.TASK_INSTANCE_ID = OPB_TASK_INST_RUN.INSTANCE_ID
AND OPB_SWIDGINST_LOG.WIDGET_TYPE IN (3, 2)

Tuesday, June 2, 2009

Teradata External Loaders

FAST LOAD -
- Only on Empty Tables with no SI. Use it for Truncate and Reload
- High volume updates, when you run a session with single partition
- Each FastLoad job loads one table on the Teradata Database.
- When using FastLoad to load more than one table, you must submit multiple FastLoad jobs—one for each table.

Fastload acquires table level locks and does not release them if the loader terminates abnormally. You must drop and recreate the table to release the locks

TPUMP -
- Performs Insert, Update, Delete and Upserts
- Low volume updates, when you run a sessions with multiple partitions
- Acquires row-hash lock - other users can access

MULTILOAD -
- Performs Insert, Update, Delete and Upserts
- High volume updates, when you run a sessions with multiple partitions
- Acquires Table level locks - others cannot access it / offline loading
- Each Teradata MultiLoad import task can perform multiple data insert, update, and delete
functions on up to five different tables or views

To recover from a failed MultiLoad, you must release the target table from the MultiLoad state and also drop the MultiLoad log table.
MultiLoad can require a large amount of resources on a Teradata system
By default, infa uses instance of MultiLoad for every target file.
To load multiple tables the generated MultiLoad script file must be editted.

Each Teradata MultiLoad import task can perform multiple data insert, update, and delete
functions on up to five different tables or views

To recover from a failed MultiLoad, you must release the target table from the MultiLoad state and also drop the MultiLoad log table.
MultiLoad can require a large amount of resources on a Teradata system
By default, infa uses instance of MultiLoad for every target file.
To load multiple tables the generated MultiLoad script file must be editted.

Both FastLoad and MultiLoad work at the data block level. In other words, these loaders are much faster than"standard DML" within Teradata.
The both acquire table level locks, which means they are only appropriate for off-line data loading. MultiLoad first writes the data into temporary tables in Teradata, and then it updates the data blocks directly. All changes to a physical data block are made in a single operation

You can also use data driven mode to perform insert, update, or delete operations based on an Update Strategy or Custom transformation
For a session with multiple partitions, use the round-robin partition type to route data to a single target file.
If you invoke a greater number of sessions than the maximum number of concurrent sessions the database allows, the session may hang. You can set the minimum value for Tenacity and Sleep to ensure that sessions fail rather than hang.

If you use a Teradata external loader to perform update or upsert operations, use the Target Update Override option in the Mapping Designer to override the UPDATE statement in the external loader control file. For upsert, the INSERT statement in the external loader control file remains unchanged

- If you use named pipes, set the checkpoint value to 0 to avoid checkpointing

Overriding Control File
- You can override the control file in the session properties. If you do not override the control file, the Integration Service generates a new control file based on the session and loader properties each time you run a session. If you change a target or loader connection setting after you edit the control file, the control file does not include those changes.

- You can enter PmNullPasswd as the database password to prevent the password from appearing in the control file. Instead, the Integration Service writes an empty string for the password in the control file.

IS Staged -
If not checked then the file will be piped/streamed to loader. If not checked set CHECKPOINT 0
If checked then the file is written to the disk first and then the file is used to load.

CHECKPOINT -
Recovery/Restart will not work if you use Named Pipe.

If a table has a partition defined then the key(s) on which the partition has been defined should also be marked as a primary key(s) when defining the target table
If the primary keys are not defined then any attempt to update, upsert or delete data using Teradata loaders will result in an error.

Properties

MULTILOAD

Error Limit - 0 - number of rejected records. 0 means no limit. uniqueness doesnt count.
Checkpoint - 10000 - 60 or more - means multiple of records
- 1-59 means - in minutes
Tenacity - 10000 - hours to try, if login fails, waits for x minutes (sleep time) and then try to login
Load Mode - Insert, Delete, Update, Upsert or Data driven.
- if it is dd, infa writes a column to target to indicate update strategy.
Max Session - 1 - max num of mload sesions per mload job. max is between 1 & 32767
Is Staged - Disabled - if u enable, then it writes a flat file and hten loads, otherwise uses named pipe.
Errro Database Name, Work Table Database, Log Table Database

At Session Level
Error Table 1 -ET_
Errot Table 2 - UV_
Work Table - WT_
Log Table - ML_
Control file Content Override

TPUMP -

Error Limit - 0 - num of rows rejected for errors. if it exceeds tpump rollsback the transaction that causes the last error. 0 stop on error.
Checkpoint - 15 -
Tenacity - 4 -
Packing Factor - 20 - num of rows that each session buffers hold. improves network/channel efficiency.
statement rate - 0 - initial max rate per minute
Serialize - Disabled - enable if u have multiple updates per row

Teradata Client

Teradata Client
Teradata BTEQ client software, the Teradata ODBC driver, and any other Teradata client software must be installed on the machine where the Integration Service process runs

Mappings/Transformations
Source - Use ODBC. ODBC is a"native" interface for Teradata; it is not a layer on top of CLI
Lookup & Stored Procedure - Use ODBC
Target - Use ODBC, FASTLOAD, MLOAD, TPUMP

You can use only fixed-width files when you load to Teradata through loaders.
If you specify delimited files, the Informatica Server fails the session

In the control file all columns are defined as CHAR.
During runtime these loaders convert the input data to CHAR and then convert it back to original datatypes

When calling a pre- or post-load stored procedure without any argument, use a dummy value for the argument in the call text properties.
If you do not configure the call text property, the mapping is invalid.

Setting by ODBC
1) Set the TERADATA_HOME, ODBCHOME, and PATH environment variables.
2) Set the shared library environment variable
3) Edit the existing odbc.ini file or copy the odbc.ini file to the home directory and edit it

To run the loader utilities, make sure the local host file on the machine where the Teradata client is installed
Teradata data source in the odbc.ini file: DateTimeFormat=AAA
NoScan=Yes

Session Modes

The default value of SessionMode is Teradata , which is case insensitive.
Change the SessionMode parameter to 'ANSI' in the ODBC DSN in the .odbc.ini file. (SessionMode = ANSI)

When you use ANSI session mode, Teradata does not roll back the transaction when it encounters a row error.
If you choose Teradata session mode, Teradata rolls back the transaction when it encounters a row error.
So if you have a session with a Teradata target as well as a lookup you will want to create a separate .odbc.ini entry(DSN) for the target to avoid this issue.

A Stored Procedure that was compiled in one mode cannot be run in the other mode. You will get Invalid session mode for procedure error.

ANSI mode:
A transaction initiation is always implicit.
A transaction is opened by the first SQL statement executed in a session or by the first statement executed following the close of a transaction.
A transaction is initiated when no transaction is currently active, and a SQL statement is executed.
The COMMIT [WORK] or ROLLBACK [WORK] statements close a transaction.
MULTISET is the default in the CREATE TABLE statement in ANSI mode.

Teradata mode:
Transactions can be implicit or explicit.
Statement failures roll back the entire transaction. An error result rolls back the entire transaction.
Control of character truncation causes no error.
The default in character comparison is NOT CASESPECIFIC.
SET is the default in the CREATE TABLE statement

TIMESTAMP -
To laod timestamp with fractional secs, write the column as varchar2 and in the format 'YYYY-MM-DD HH24:MM:SS.SSSSSS'

You may also encounter an error writing a PowerCenter date/time port to a Teradata timestamp (0) field, or a Teradata timestamp with no fractional seconds.
To resolve this issue - change the column in Teradata to a Timestamp(n) wheren>0
or edit the control file generated-change the length of the field that corresponds to the Timestamp (0) field from CHAR (26) to CHAR (19)

Friday, May 8, 2009

Informatica session fails with RR_4035 : SQL Error [ ORA-29275: partial multibyte character

Informatica session fails with RR_4035 : SQL Error [ ORA-29275: partial multibyte character

The source database or the target database has multibyte characters.
To resolve this you have to change the Code Page of the Relational Connection

Steps :-
1)Open Workflow Manager
2) Click on the Relational Connections and choose the relational connection
3) Change the Code Page to : UTF-8 encoding of Unicode.

If the error occurs in any other tool, then make sure the connection you make to the database is with the correct code page which supports Unicode.

Informatica Installation

1) Virtual Name for the Unix Machines - Starting Informatica Server

When you start the informatica server the informatica will check the ip address stored in the repository (configured in workflow manager) with the ip address of the server machine.

When we want to use virtual name - the ip address resolved in the client-workflow manager will be different from the actual fiscal ip address of the server.

To Fix : Open the file pmserver.cfg and add "ServerAddress=usp-mmsprod" and then start the server.

2)Configuring the Environment Variable for Installation
---------------------------------------------------------

a)
LD_LIBRARY_PATH=$LD_LIBRARY_PATH:path for repository server:path for informatica/pc:usr/lib:lib

PATH=$PATH: path for repository server:path for informatica/pc:usr/lib:lib

b)
For Oracle, set NLS_LANG environment variable.
If you are unsure what the value is, use the following query:
select PARAMETER, VALUE from nls_database_parameters
where PARAMETER in ('NLS_LANGUAGE','NLS_TERRITORY','NLS_CHARACTERSET')

Use the format NLS_LANGUAGE_NLS_TERRITORY.NLS_CHARACTERSET
E.g
NLS_LANGUAGE = AMERICAN
NLS_TERRITORY = AMERICA
NLS_CHARACTERSET = WE8ISO8859P1

NLS_LANG = AMERICAN_AMERICA.WE8ISO8859P1; export NLS_LANG

c)
For 32-bit oracle set LD_LIBRARY_PATH to look in ORACLE_HOME/lib32.
LD_LIBRARY_PATH=$ORACLE_HOME/lib:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH.
Or
For 64-bit oracle set LD_LIBRARY_PATH to look in ORACLE_HOME/lib32. LD_LIBRARY_PATH=$ORACLE_HOME/lib32:$LD_LIBRARY_PATH; export LD_LIBRARY_PATH

Informatica v7 - Undocumented Feature

Informatica : Unexpected commits

When using target-based commit PowerMart will commit when the writer thread is completely idle for one minute(Default value of WriteWaitTimeOut), even if the commit interval has not yet been reached.

_undocumented_ Feature -

WriterWaitTimeOut=
num of secs the writer will be idle before a commit is issued. Default is 60.
Use a large value to disable. Dont set it to 0

Informatica : Virtual Name for the Unix Machines - Starting Informatica Server

When you start the informatica server the informatica will check the ip address stored in the repository (configured in workflow manager) with the ip address of the server machine.

When we want to use virtual name - the ip address resolved in the client-workflow manager will be different from the actual fiscal ip address of the server.

To Fix : Open the file pmserver.cfg and add "ServerAddress=usp-mmsprod" and then start the server.

Informatica v7 High precision data

Informatica v7 High precision data

SELECT TO_CHAR(avg_rate) chris
, LENGTH(TO_CHAR(avg_rate)) len , LENGTH(SUBSTR(TO_CHAR(avg_rate),INSTR(TO_CHAR(avg_rate),'.'))) DEC
,LENGTH(TO_CHAR(avg_rate)) - LENGTH(SUBSTR(TO_CHAR(avg_rate),INSTR(TO_CHAR(avg_rate),'.'))) b4
FROM GL_TRANS_RATE_TST r

0.024967355	40	40	0
12345.12346	21	16	5
123456789.1	25	16	9
122.1001221	14	11	3
1.350074254	12	11	1
123457890	9	9	0
1.23457E+14	25	10	15
1.23457E+17	28	10	18
48.34354082	39	37	2

Test results after running the session
(The datatypes specified are defined only in the mapping. In the target table it is defined as Oracle NUMBER datatype )

SELECT set_of_books_id,TO_CHAR(average_rate) ris
, LENGTH(TO_CHAR(average_rate)) len , LENGTH(SUBSTR(TO_CHAR(average_rate),INSTR(TO_CHAR(average_rate),'.'))) DEC
, LENGTH(SUBSTR(TO_CHAR(average_rate),1,INSTR(TO_CHAR(average_rate),'.'))) before1
FROM conv_ref_tst
ORDER BY 1

DECIMAL(28,12)

1	.0249673551830981	17	17	1
2	12345.123456789	15	10	6
3	123456789.123457	16	7	10
4	122.1001221001	14	11	4
5	1.3500742541	12	11	2
6	123457890	9	9
7	123456789012345	15	15
8	123456789012346000	18	18
9	48.3435408203774	16	14	3

DECIMAL(28,12) WITH high PRECISION

1	.024967355183	13	13	1
2	12345.123456789012	18	13	6
3	123456789.123456789012	22	13	10
4	122.1001221001	14	11	4
5	1.3500742541	12	11	2
6	123457890	9	9
7	123456789012345.123456789	25	10	16
9	48.343540820377	15	13	3
9	48.3435408203774	16	14	3

DECIMAL 28,28 WITH high PRECISION

.0249673551830980992352499107

DECIMAL 15,15 WITH high PRECISION

.024967355183098

DECIMAL 28,15 WITH high PRECISION

1	.024967355183098	16	16	1
5	1.3500742541	12	11	2
9	48.343540820377357	18	16	3

DECIMAL 26,16 WITH high PRECISION

1	.0249673551830981	17	17	1
5	1.3500742541	12	11	2
9	48.3435408203773566	19	17	3

DECIMAL 40,28 WITH high PRECISION
DECIMAL 28,28

1	.0249673551830981	17	17	1
2	12345.123456789	15	10	6
3	123456789.123457	16	7	10
4	122.1001221001	14	11	4
5	1.3500742541	12	11	2
6	123457890	9	9
7	123456789012345	15	15
8	123456789012346000	18	18
9	48.3435408203774	16	14	3

Powercenter server processes decimals as doubles or decimals.
If you dont enable High precision, informatica powercenter treates teh data as doubl e with precision of 15.
To enable High precision, Decimal datatype in the mapping should have a precision of 16 to 28. Then select Enable High Precision in the session properties

Repository Query

Query to find list of sessions for which the target is set to Truncate & Reload

It will display both reusable sessions and non-reusable sessions inside the workflow
Run the following query in the Repository Database

SELECT S.SUBJ_NAME, TI.INSTANCE_ID, TI.INSTANCE_NAME, W.TASK_NAME WORKFLOW_NAME
FROM OPB_EXTN_ATTR EA, OPB_TASK_INST TI , OPB_TASK W, OPB_SUBJECT S
WHERE SESSION_ID IN (31583, 31582, 31589)
AND ATTR_ID = 9 AND ATTR_VALUE = 1
AND TI.INSTANCE_ID = SESSION_INST_ID
AND EA.WORKFLOW_ID = W.TASK_ID
AND W.SUBJECT_ID = S.SUBJ_ID AND SESSION_INST_ID != 0
UNION
SELECT DISTINCT S.SUBJ_NAME, TI.TASK_ID, TI.TASK_NAME, NULL WFLOW_NAME
FROM OPB_EXTN_ATTR EA , OPB_TASK TI, OPB_SUBJECT S
WHERE SESSION_INST_ID = 0 AND SESSION_ID IN (31583, 31582, 31589)
AND ATTR_ID = 9 AND ATTR_VALUE = 1
AND EA.SESSION_ID = TI.TASK_ID
AND TI.SUBJECT_ID = S.SUBJ_ID

Informatica : SQLOverride in the Session
select wad.attr_value override
from opb_session ses,opb_mapping m
,OPB_SWIDGET_INST SW,OPB_WIDGET_ATTR WA, OPB_WIDGET_ATTR WAd
where m.mapping_id=ses.mapping_id
AND SES.SESSION_ID=SW.SESSION_ID
AND SW.WIDGET_ID = WA.WIDGET_ID(+)
AND SW.instance_id= WA.INSTANCE_ID(+)
and wa.session_task_id (+) =sw.session_id
AND SW.WIDGET_ID = WAd.WIDGET_ID
AND 0!= WAd.INSTANCE_ID
and wa.widget_type=3
and wad.attr_id=1
AND SES.SESSION_ID=(
SELECT ti.task_id
FROM
OPB_TASK T,
OPB_SUBJECT S,
OPB_WFLOW_DEP DEP,
OPB_TASK_INST TI
WHERE
T.SUBJECT_ID = S.SUBJ_ID
AND S.SUBJ_NAME = 'JIM' -- folder name
AND T.TASK_NAME = 'wks_DELETE' -- workflow name
AND TI.INSTANCE_NAME='s_mD_MRPDESIGNATOR_DIM' -- session name
AND T.TASK_ID = DEP.WORKFLOW_ID
AND DE

Wednesday, May 6, 2009

Repository SQL

SQL to check if the Verbose Data is specified for any mapping or session

--- Reusable Sessions which have Verbose data enabled.
SELECT S.SUBJ_NAME, T.TASK_ID, T.TASK_NAME, SI.INSTANCE_NAME, SWA.ATTR_VALUE actual_value, T.*
-- SI.WIDGET_ID, SI.WIDGET_TYPE, SI.SESS_WIDG_INST_ID, A.ATTR_NAME, A.ATTR_VALUE default_value,
FROM OPB_SUBJECT S, OPB_TASK T, OPB_SWIDGET_INST SI, OPB_SWIDGET_ATTR SWA, OPB_ATTR A
WHERE S.SUBJ_ID = T.SUBJECT_ID
AND T.IS_REUSABLE=1 AND T.TASK_ID =SI.SESSION_ID
AND SI.SESS_WIDG_INST_ID = SWA.SESS_WIDG_INST_ID
AND SI.SESSION_ID = SWA.SESSION_ID
AND SI.WIDGET_TYPE = A.OBJECT_TYPE_ID AND SWA.ATTR_ID = A.ATTR_ID AND A.ATTR_NAME ='Tracing Level'
AND SUBJ_NAME LIKE '%FIX%' --Specify the informatica folder name

-- Reusable Sessions in Workflow / worklet which have Verbose data enabled.
SELECT S.SUBJ_NAME,W.TASK_NAME WORKFLOW_NAME, T.TASK_ID, T.INSTANCE_NAME, SI.INSTANCE_NAME, SWA.ATTR_VALUE actual_value,
-- SI.WIDGET_ID, SI.WIDGET_TYPE, SI.SESS_WIDG_INST_ID, A.ATTR_NAME, A.ATTR_VALUE default_value,
FROM OPB_SUBJECT S,OPB_TASK W, OPB_TASK_INST T, OPB_SWIDGET_INST SI, OPB_SWIDGET_ATTR SWA, OPB_ATTR A
WHERE S.SUBJ_ID = W.SUBJECT_ID
AND W.TASK_ID = T.WORKFLOW_ID
AND T.TASK_ID =SI.SESSION_ID
AND SI.SESS_WIDG_INST_ID = SWA.SESS_WIDG_INST_ID
AND SI.SESSION_ID = SWA.SESSION_ID
AND SI.WIDGET_TYPE = A.OBJECT_TYPE_ID AND SWA.ATTR_ID = A.ATTR_ID AND A.ATTR_NAME ='Tracing Level'
AND SUBJ_NAME LIKE '%FIX%' --Specify the informatica folder name

--- Mappings which have Verbose data enabled.
SELECT S.SUBJ_NAME, M.MAPPING_NAME, WI.INSTANCE_NAME, WI.WIDGET_TYPE, A.ATTR_NAME, WA.ATTR_VALUE actual_value
FROM OPB_SUBJECT S,OPB_MAPPING M, OPB_WIDGET_INST WI, OPB_WIDGET_ATTR WA, OPB_ATTR A
WHERE S.SUBJ_ID = M.SUBJECT_ID
AND M.MAPPING_ID = WI.MAPPING_ID
AND WI.WIDGET_ID = WA.WIDGET_ID
AND wi.WIDGET_TYPE = A.OBJECT_TYPE_ID
AND wa.ATTR_ID = A.ATTR_ID
AND A.ATTR_NAME ='Tracing Level' AND WA.ATTR_VALUE != '2'
AND SUBJ_NAME LIKE '%FIX%' --Specify the informatica folder name

SQL to get the Recovery Strategy Attribute of a session

SELECT s.subj_name, t.task_name, ta.attr_id, ta.attr_value
FROM opb_subject s, opb_task t, opb_Task_attr ta
WHERE s.subj_id = t.subject_id
AND t.task_id = ta.task_id AND ta.attr_id = 18
AND t.task_name LIKE 's_mFIX507%'
ORDER BY 3

SELECT
s.subj_name,w.task_name worfklow, t.instance_id , ta.instance_id ,t.instance_name, ta.attr_id, ta.attr_value
FROM opb_subject s, opb_Task w, opb_task_inst t, opb_Task_attr ta
WHERE s.subj_id = w.subject_id AND w.task_id = t.workflow_id
AND t.task_id = ta.task_id AND (ta.instance_id = t.instance_id OR ta.instance_id = 0)
AND ta.attr_id = 18 AND ta.attr_value != '2'
--AND t.instance_name LIKE 's_mFIX507%'
ORDER BY 3

Wednesday, April 29, 2009

Workflow Manager

Workflow Manager

Export and Import Objects :-
You can export workflows, worklets, tasks and sessions.
When you export it includes child objects like worklets, sessions, connection objects and session configuration.

List of Tasks :-

Reusable Tasks
- Are the tasks which you can create in Task Developer. Can be used in worklets and Workflows
- Only Email, Command and Session Tasks can be Reusable tasks

Email
You specify Email Usernames, Subject and Text.
Command
Fail task if any Command Fails
Recovery Strategy [Fail task and continue workflow / Restart ]

Timer
Event Wait
Event Raise
Assignment
Control
Decision

Connection Objects :-

FTP Connection
- Username , Password, HostName, Default Remote Directory,
- Retry Period
- Instead of specifying password, you can Use parameter in the password
- Use SFTP. For SFTP you have to specify Public Key Filename, Private Key Filename and Private Key File password

Relational Connections
- Username, password, Connect String and Code Page
- You can use Session parameter $ParamName for database username and password
To use session parameter as password, enable Use Parameter in Password
For the database which doesnt have username/password use PmNullUser and PmNullPasswd

- Connection Environment SQL - SQL to run everytime when a connection is made
- Transaction Environment SQL - SQL to run at the begining of each transaction.
- Connection Retry Period : Time in seconds to retry connecting to the database. 0 means infinite

Sunday, April 26, 2009

Session Properties

Session Properties

General Options

Write backward compatible Session log file
Session log file name
Session log file directory
Parameter file name

Enable test load
Number of rows to test
- will be enabled only if you Enable Test Load

$Source connection value
$Target connection value

Target source rows as
- Insert / Update / Delete / Data driven
- If the mapping has update transformation then by default Data driven.

Commit Type [Target]
- Source / Target / User defined
Commit Interval
- commit after x number of records
Commit on End of file
Rollback transaction on errors

Recovery Strategy
- Restart Task / Fail task continue workflow / Resume from last check point

Java classpath

Performance -

DTM Buffer size [Auto]
- You can specify Auto or value. By default Integ service allocated 12MB

Collect performance data
- to collect performance data. you can use workflow monitor to see it

Write performance data to repository
- writes the performance details to the repository. you can use workflow monitor to see it

Incremental Aggregation
Reinitialize Aggregate data
Enable High precision
- to process Decimal datatype to a precision of 28

Session retry on deadlock
- applicable only for Normal loads
- In the Integ service you can specify number of deadlock retries and deadlock sleep time period.

Session sort order
- If the integration service runs in Ascii then it is Binary
- If it runs on Unicode, then it displays sort order based on the Integration service code page

Pushdown Optimization
- None / To Source /To Source with view / To Target / Full / Full with view / $PushdownConfig
-

Session Configuration

Session Configuration

Advanced -
Constraint based load ordering [Unchecked]
- To load data into the targets based on primary key foreign key relationship
Cache Lookup() Function - [Checked]
Default buffer block size [Auto]
-
Line Sequential Buffer Length [1024]
- Informatica reads a row from the file and holds it in the buffer. If the row is more than 1024 bytes, then increase the value.
Max Memory Allowed for Auto Memory attributes [512Mb]
Max percentage of Total memory allowed for Auto Mem Attributes [5]
Additional Concurrent pipelines for Lookup Cache Creation [Auto]

Log Options -
Save Sessions Log by [Session runs]
- Session runs - saves the number of sessions logs specifed
- Session timestamp - appends the timestamp to the session log file name.
Save Sessions log for these runs
- will be enabled only for Session Log by Session runs.
- specify the number of log files to store .0 means it will store only the current log file
- you can use $PMSessionLogCount

Error Handling -
Stop on errors
- Specify number of non-fatal errors to allow (reader, writer, dtm errors)
- 0 means it wont stop. you can use $PMSessionErrorThreshold service variable

Override tracing
On Stored Procedures error [Stop] - Stop / Continue
On Pre-session command task error [Stop] - Stop / Continue
On Pre-Post SQL error [Stop] - Stop / Continue

Error Log Type
Errot Log DB Connection
Error Log Table Name prefix
Error Log File Directory
Error Log File Name
Log Row Data
Log Source Row Data
Data column delimiter

Partitioning Options -
Dynamic Partitions [Disabled]
- Other options are Based on number of partitions , Based on source Partitioning, Based on number of nodes in the grid.
Number of Partitions [1]
- Will be enabled if you choose "Based on number of partitions"
- You can also specify $DynamicPartitionCount session parameter

Sessions on Grid -
Is Enabled [Unchecked]
- To run the sessions in a grid

Saturday, April 25, 2009

Mapping Designer

Mapping Designer

Source in the Mapping
- Owner Name
- Source Table name - Override with parameter or variable value

Expression
- You can add comments to the expresssion

Target in the Mapping
- Reject Truncated/Overflowed rows
- Update Override
- Table Name Prefix
- Pre SQL & Post SQL

Repository Manager

Repository Manager

Folder
+ Allow Shortcut Option allows you to create shortcuts for the mappings present in the folder
+ Owner - you can specify the owner of the folder
Folder Permissions
For each user/usergroup you can specify RWX permissions
Owner has RWX and you cannot change it.
OS Profile
If the Integration Service uses OS profiles, then you can specify the profile to use.

User Groups
+ Default Groups - Administrator and Public
+ Administrator Group contains 2 users, User Administrator and repository database User.
You cannot delete or remove these 2 users from the Administrator Group.

Show Locks & Show Connections
- You can see the locks obtained on the repository objects by user , folder, application, host.
- To release a lock you have to idnetify the user connection and kill the User Connection.

Export Objects and Import Objects:-
- When you export an workflow, it exports all the child objects like mappings and sessions.
- When you import you can choose to import individual objects like mapping or import the whole workflow.

Metadata Extension
The Metadata extension which you create using Repository Manager is ReUsable.
For example if you create it for Source, then it is available for all the sources in the repository

Vendor-defined cannot be changed. Anything you create is user-defined. To change the value , you use designer or workflow manager.

You can add metadata extension for any Transformation or for A;;.
Data type String / Boolean / Numeric
Default Value
Client Visible
Client Editable
Share Read
Share Write

Constraint based Loading

Constraint based Loading :-

It is an option available in the session.
If enabled, for each row generated by the source, it will load the record in the primary key table first, then it will load any foreign key tables. It applies only to the inserts.

- All the targets should have the same Active source
- You should define the relationships for the Target in the Warehouse Designer
- Targets shoule be in the same Target connection group
- Treat source as "Insert". You cannot use updates with constraint based loading

FullCBLOSupport = Yes/No
- It is a custom property. If you set it in the Integration service, it is applicable for all workflows. You can also set it in the session config object tab in the Custom Properties attributes.
- Allows you to load the changed data (updates and deletes) to be loaded to the target in the same transaction control unit (TCU).
- Your mapping should not contain any active transformation which changes the row id generated by the reader.

Wednesday, April 22, 2009

Update Transformation

Update Transformation

Active Tranformation
Used to flag records for insert / update / delete or reject based on condition.

Forward Rejected rows -
By default the rejected rows are forwarded to the next transformation.
All the rejected rows are written to the session reject file. If you disable this option then the records are written to session log file. If you enable row error handling then the rows are written to row error logs and not to the session log file.

Aggreagator after Update Transformation
If a record is flagged for delete, then the aggregator will subtract the value, if it is flagged for insert then it is added.

Router Transformation

Router Transformation:-

Active Transformation.
1 Input Group and Multiple Output group transformation.

Two types of output groups -Userdefined and default.
You specify the test condition for each user defined group you create.

All the groups are processed by informatica if the default group is connected.
If the default group is not connected then informatica will not process the groups which are not connected.

If a record satisfies more than one group, then informatica will pass this row multiple times.

Friday, April 17, 2009

Version 8 - New Features - Workflow Manager

Workflow Manager

Server Configuration is removed from Workflow and is added to the Repository Administration Console

Session
Session log file is written in new binary format

Write backward compatible session log file
Recovery Strategy (Restart task, Fail task continue workflow, Resume from lask checkpoint)
Java Classpath
Write performance data to repository

DTM Buffer Size value is set to Auto in v8. In v7 it is 120000000

Queries
Object Types which you can query are Schedulers, Session Configuration, Tasks, Sessions, Worklets and Workflows

Session Configuration

Advanced
Maximum Memory Allowed For Auto Memory Attributes
Maximum Percentage of Total Memory Allowed For Auto Memory Attributes
Additional Concurrent Pipelines for Lookup Cache Creation

The value for Default Buffer block size is "Auto" in v8. In v7 it is 64000.

Error Handling
Enable Recovery Option is removed in version 8

Partitioning Options
Dynamic Partitioning (Disabled or Enabled)
Number of Partitions (You can specify a number when you Enable Dynamic Partitioning)

Relational Connections
Envirnoment SQL which was in version 7 was replaced by 2
Connection Environment SQL
Transaction Environment SQL
New : Connection Retry Period : Time in seconds to retry connecting to the database. 0 means infinite
[For Oracle] Rollback Segment which was in version 7 was removed

FTP Connection
Retry Period

Command Tasks
In Version 8
Fail Task if any command fails (checkbox)
Recovery Strategy (Fail Task and continue workflow , Restart)
In Vesrion 7
Run if previous completed (checkbox)

Thursday, April 16, 2009

New Features : Designer : Version 8.1 , 8.6

User-defined Functions
- It can be private - callable in UDF only or public - callable from any transformation expresssion
- You cannot change from Public to private
- You can export/import. If you export a mapping which uses UDF, then it exports UDF too.

"Create and Add Target"
You can right click on any Transformation "

In v8.6
When creating lookup transformation you have the option of choosing source, target, source qualifier or Import

Source Qualifier
Output is deterministic - Checkbox - Source data does not change during the session run
. When you configure this property, the Integration Service does not stage source data for recovery if transformations in the pipeline always produce repeatable data.

Output is repeatable - Never/Always Source data is sorted
When output is deterministic and output is repeatable, the Integration Service does not stage source data for recovery.

Sorter Transformation
Sort Cache Size - By default it is Auto. In version 7 it is 8MB

Sequence Generator
In version 8.6 the datatype of NEXTVAL & CURRVAL is bigint(19) and the ENDVALUE is 9223372036854775807.
In version 7, 8.1. the datatypes are integer(10) and the ENDVALUE is 2147483647

Joiner Transformation (v8.6)
Property : Master Sort Order : Auto/Ascending
Specifies the sort order of the master source data. Choose Ascending if the master source data is in ascending order. If you choose Ascending, also enable sorted input. Default is Auto

Lookup Transformation (v8.1)(
Lookup policy on multiple match : Use Any Value is added.
Lookup source is static :checkbox
When you choose dynamic lookup cache (lookup policy on mulitple match defaults to Report Error in v8(. In v7 nothing happens

Lookup Transformation (v8.6)
Lookup Source Filter
Source Type (Database, Flat File, Source Qualifier)
Lookup policy on multiple match : Use Any Value is added.
Lookup source is static :checkbox
When you choose dynamic lookup cache (lookup policy on mulitple match defaults to Report Error in v8(. In v7 nothing happens
Pre-build lookup cache (Auto, Always allowed, Always disallowed)
Subsecond Precision :Default integer 6

Stored Procedure (v8.6)
Subsecond precision
Output is repeatable - Based on Input order
Output is deterministic - Checkbox

Sorter Transformation

Sorter Transformation

- Active/Passive

- Sorts data according to the sort key specified.
You can specify multiple columns as sort keys and the order (Ascending or Descending for each column)

Options :-

Distinct Output Rows -
Can output distinct records if you check this option

Null Treated Low -
By default it treats as High values. If you check this option it will treat it as Low

Case Sensitive - Checked by default. If you enable uppercase are higher

Work Directory - By default it is $PMTempDir

Sorter Cache Size -
It is the max amount of memory used for sorter.
In version 8 it is set to Auto. In version v7 it is 8MB
If it cannot allocate then the session will fail
If it needs more memory that it will pages the data in the Work Directory, and writes a warning in the log file.

Tuesday, April 7, 2009

SQL transformation

SQL transformation
- To run ANSI SQL queries in the pipelines

Type : Active/Passive based on the Mode

Two modes :-
1) Script Mode
- run scripts that are located externally. you specify script name to the transformation with
- It outputs one row for each input row
Ports
ScriptName Input
ScriptResult Output - Returns PASSED or FAILED
ScriptError Output - Returns error

2) Query Mode
Allows to run queries that you define in the editor. You can pass strings or parameters to the query (dynamic query). You can output multiple rows when the query has a select statement.
Static
Dynamic :
Full Query - put ~Query_Port~ to pass the full query to the transformation
you can pass more than 1 query statment. If the query has SELECT, then you have to configure the output ports

Partial Query -
you can create a port and then write the sql and then use ~Table_Port~ to substitue the value

Connection Type : Specify a connection object or pass the database connection to the transformation at run time.

Number of ports should match with the number of columns in the SELECT
Default port :SQLErrorPort, Numaffectedports

Version 8 Architecture

Informatica Version 8 architecture

Domain :-
- A collection of nodes and services
- Primary unit of administration

+ Service Manager
It stores domain users and groups in the domain configuration database and periodically synchronizes them to the PowerCenter repository.

+ Alerts
Subscribe to alerts - notification abt node events such as failure

Administration Console :-
- It is a web based application.

Security Page
- You can manage all Domain users and groups along with roles and privileges at the Domain level

Users -

Domain users with the appropriate privileges can log into the following PowerCenter applications: Administration Console, PowerCenter Client tools.

Repository Object level permissions are administered using the Repository Manager and are stored in the PowerCenter repository

High Availability :-

+ Resilience for sources and lookup transforamtion
Resilience for Network failures or database unavailability. it attempts to reconnect for the amount of time configured for retry period in the connection object

+ Operating mode
Normal mode
Safe mode - to limit access to IntegServ during deployment or maintenance
You can also configure safe mode on failover

Integration Service :-

+ Resource Thershold - resource thershold for each node to consider or eliminate a node for the dispatch of task. Maximum CPU run queue length, Max memory % thersholds, Max processes

+ Dispatch mode - the way load balancer dispatches tasks - round robin, metric based or adaptive dispatch mode

+ Service Levels - the order in which to dispatch tasks from dispatch queue. you have to create services levels in the Admin console

Transformations
+ SQL Transformation
+ HTTP Transformation

pmcmd
- massupdate

NonVersioned repository - deploymnet groups, dynamic deployment groups
Execute permiison on deploymnet group privelage to deploy without having write permission
u still need read on the source

can have mixed version service in domain

oracle supartition support - use dynamic partitioning based on NUMBER of source partition in a session -> the infa service sets the number of partitions to the total number of supartitions at he source

PUSHDOWN Optimization :-
Subquery support

data masking

Performance Improvements
- Sorter uses less cache/ temp space
- Aggregator and Joiner are faster
- Flat file source and targets
- Repository Agent Caching
Caches the objects. By default 10000

EnableRepAgentCaching :	Yes
RACacheCapacity	50000

Use it in production, where there is minimal repository changes. Otherwise it will recache everytime an object changes and might impact performance

Concurrent workflow - multiple instances of workflow
Pipeline Lookup - can lookup relational, flat, sap, jms, msmq
Get runtime information - mapping variables, session variables and workflow variables e.g mapping name, folder name

Incremental Aggregation -
updates the Btree when u run for the firsttime after the upgrade.
if the upgrade is from 7 run the session without data to update the btree

Domain

Parameters and Variables :-
Can have different sections for Service Variables, service process variables, workflow variables, session parameters, mapping parameters, and mapping variables

1) you can use workflow variables in sessions. it treats as parameter and the value does not change

2) you can use the parameters and variables in the following places
Table owner name for relational sources, table prefix for targets
Ftp file name and directory
Lookup cache file name and directory
Email task(address, subject and body)

Target pre and post session sql
pre and post session commands
call text for unconnected stored procedure
target update override
command tasks
workflow log file names
table name prefix for error logs

Performance Improvement in v8.1.1:-

+ Sorter - sorter uses more compact format which reduces the temp disk space . Sort alogorithm is also improved to reduce sort time.
+ Pushdown Optimization

Generate flat file targets by transaction. You can
transaction, you add a FileName port to the flat file target definition. When you connect the FileName port in the mapping, the Integration Service writes a target file at each commit.

Version 8.6.1

Deployment Groups
+ Non versioned repository
+ Execute Deployment Groups privelage - to deploy without having write permission
u still need read on the source
+ Post deployment validation

Partitioning
+ Oracle subpartition support

Transformations
+ Data Masking.

From PowerCenter 8, you can manage all users and groups information along with roles and privileges at the Domain level on the Security page of the Administration Console

The Service Manager stores domain users and groups in the domain configuration database and periodically synchronizes them to the PowerCenter repository.

Domain users with the appropriate privileges can log into the following PowerCenter applications: Administration Console, PowerCenter Client tools.

Repository Object level permissions are administered using the Repository Manager and are stored in the PowerCenter repository

1. Move the files to server/infa_shared directory under the PowerCenter Services installation directory.

2. Update $PMRootDir for the Integration Service process in the Administration Console. For more information about configuring $PMRootDir, see “Creating and Configuring the Integration Service” in the PowerCenter Administrator Guide.

Monday, April 6, 2009

Version 8 Flat file Enhancements

Merge Type
Not at all
Sequentially
Concurrently
File list

Append if Exists
Header Options

Header Command
Footer Command

Input Type
- File
- Command
- Command Type

Target Command
- Infa writs data to command instead of target flat file

FTP :-
- Source can be File List
- Merge files concurrently on the FTP

Sunday, March 29, 2009

Active and Passive Transformations

Active Transformations
Any transformation which changes the number of rows passing through it is called active transformations. If the number of input rows is not equal to output rows, then it is called active.

Source Qualifier
Joiner
Normalizer
Union
Filters
Routers
Sort
Rank
Aggregator
Transaction Control
Update Strategy

Passive Transformation
Any transformation which does not change the number of records passing through it is called passive transformation. If the number of input records = number of output records then it is passive.

Expression
Lookups
Stored Procedure
External Stored Procedure

Special Cases : -
Mapplet can be Active or Passive based on the transformations inside the mapplet. If it has any Active transformation, then it is active otherwise it is passive.

Interview Questions :-
1) Is a Mapplet Active or Passive - based on the transformations inside the mapplet
2) Why is a sorter active transformation - beacuse it has "Sort Distinct" property which will return distinct values.

Sequence Generator Transformation

What is NEXTVAL
NEXTVAL - generates sequence numbers when you connect this port to a transformation.

If you want to load 2 targets with the same sequence number
- Connect the NEXTVAL port to a transformation and then connect that transformation to the 2 targets.

If you want to load 2 targets with the unique sequence number
- Connect the NEXTVAL port to two different targets or two different pipelines which has those 2 targets

What is CURRVAL
CURRVAL = NEXTVAL + Increment by Value property

If you connect CURRVAL port without connecting the NEXTVAL, then the CURRVAL will give you constant value.
If you connect CURRVAL, Informatica process one row at a time - can be a performance issue.

Properties :
Start Value : Start Value of the sequence. It will be used as the start value, once it reaches End Value and if the Cycle Option is enabled

Increment By :

End Value : End Value for the Sequence Generator

Current Value : Current Value of the Sequence. It will be updated either at the end of the session(Non-reusable and Number of Cached Values =0 ) or during the session (Number of Cached Value > 0 ) . If Reset Property is enabled, then Informatica resets the value to its original value of the session.

Cycle : Enable if you want to cycle through the values after it reaches the End Value.

Reset : If enabled, Informatica updates current value to the original value every time it runs the session. If disabled Informatica updates the current value to last generated value + 1.

Number of Cached Values : Non-Reusable - If the Value is 0 - It doesnt Cache Values. Informatica reads the start value form the repository and then keeps generating the sequence values. At the ene of the session it updates the current value to Start Value or Lastgeneratedsequence+1 (based on the Reset Option)

If the Value is > 0 then Informatica reads the start value and caches the number of values based on the Number of Cached Values and then updates the Current value in the repository. It will again go to the Repository if all the values are used. At the end of the session it throws away any unremaining Sequence Numbers which were not used.

Interview Questions :-
1) If the End Value is reached and if the Cycle option is not enabled then the session fails
2) Reset Option is disabled for Reusable Sequence Generator
3) Number of Cached Values has to be >= 1 for Reusable Sequence Generator
4) To avoid big gaps between sequence values, set the Number of cached values to 0 if it is non-reusable, set the Number of Cached values to less number if it is Reusable.

Performance Issues :-
1) Do not connect CURRVAL
2) Check Number of Cached Values
Set it to 0 if it is Non-Reusable, Set it to appropriate(default 1000) if it is Reusuable

ENDVALUE & Datatypes in Versions :-
In version 8.6 the datatype of NEXTVAL & CURRVAL is bigint(19) and the ENDVALUE is 9223372036854775807.

In version 7, 8.1. the datatypes are integer(10) and the ENDVALUE is 2147483647