Abinitio Interview Questions: Interview Questions

Interview Questions:-

Q.How can we count a number of records in a flat file using Abinitio?

A. Using the aggregate function "count"
Or
Use rollup component to count the number of record in the flat file.
Use {} as key in the key specifier. It will consider all the fields as one record and count the total

number of records.

Q. How do we use SCD Types in the Abinitio graphs?
A.

Q. What is the order of execution of a graph when it runs?

A. Order of Graph Execution
1. Initialisation of Parameters
2. Start script execution
3. Graph execution
4. End script execution

Q. How to calculate the total number of records in the file using REFORMAT instead of ROLLUP?

A. Via its log port.
Or
Connect reformat to log port and use this code and in select parameter specify event_type "finish"
type reformat_final_msg record
decimal("records") read_count;
string("readn") filler_read;
decimal("records") written_count;
string("writtenn") filler_written;
decimal("records") rejected_count;
string("rejected") filler_rejected;
end;
out::reformat(in) begin
out.rec_count :1: string_lrtrim(reinterpret_as(reformat_final_msg in.event_text).read_count);
end;

Q. How do we append records to an already existing file usin abinitio graph?

A. Create a graph by taking the existing file as the out put file and keep the mode of the output file

in Append Mode. Pass the new records from the input file to this output file through a reformat. This

will append new records in the existing File.

Q. What is output index? How does it work in reformat?
Does below function show Output index in use
output:1:if(in.emp.sal<500)in.emp.sal
output:2:force_error("Employee salary is less than 500)?

A. Output index function is used in reformat having multiple output ports to direct which record goes

to which out port.
for eg. for a reformat with 3 out ports such a function could be like

if (value 'A') 1 else if (value 'B') 2 else 3

which basically means that if the field 'value' of any record evaluates to A in the transform function

it will come out of port 1 only and not from 2 or 3.

Q. How does component folding works?
A.
Q. What is the advantage of SORT within GROUP Clause?

A. Sort within Groups refines the sorting of data records already sorted according to one key and it

sorts the records within the groups formed by the first sort according to a second key.

Q. What are environment variable? Why are they required?

A. Environment Variables or other wise know as ABINITIO environment variable. Its set in stdenv under

which private project and public project will be there.
Parameters like $AB_HOME $AB_AIR_PROT will be present in environment variable and this will link to the

relational path respectively .

Q. What is the need of config variables in abinitio?
(ab_job,ab_max_core) and where to define them?
A.

Q. How to avoid duplicates without using dedup component?

A. To avoid the duplicates use rollup component.
Rollup component avoidS the duplicates and produces actual results.
or
We can avoid duplicate by using "key_change" method of the rollup component.
The code will be like below.
out :: key_change(prev curr)
begin out :: cur ! prev ; end out :: rollup(in) begin out :: in ; end

Q. What will happen when we pass dot or invlaid parameters in the inout component layout URL?
A.

Q. What is use of Ab_job command in Abinitio?

A. AB_JOB parameter is set when we want to run the same graph at the same time for different job names.
or
When you want to run the same instance of the graph many times which is palced in one place then we go

for AB_JOB. its should be defined in sandbox parameter. If you dont give the value for it it will take

AB_JOB as default.

Q. How to use a normal batch graph as a sub graph in Continuous graph?
A.

Q. How to open Abinitio in UNIX?

A. We cannot open AB Initio in UNIX. We can only run graphs in UNIX using the .ksh.

Q. How do you do production support for Graph?

A. If the graph failed in the production usually we get emergency access to see the failure then

analyse the failure if it is a code bug then we go back to development env and fix the bug test it then

deploy back to production and run.

Q.How do you check whether graph is completed successfully or not (is it $? of unix?)

A. $mpjret 0 then success if it is 1 then fail.

Q. What are different return values?

A. 0 and 1
0 is success
1 is failure
$? return status of last executed command.

Q. Why and When do we get the "Pipeline Broken Error" in Ab Initio?

A. Pipeline broken error will actually indicates the failure of a downstream component.
It normally occurs when the database is running out of memory which makes database components in the

graph unavailable.

Q. What are the two types of .dbc files?
A. Generally .dbc files are classified in 2 types with accordance of their parametric value for

fixed_size_dml which can be either true or false.
If the value is false the database generates delimited types whenever possible(it recognizes null as

zero-length string).In case of true it takes fixed length dml.
Other parameters for .dbc files are:
dbms
db_version
db_home
db_name
db_nodes
User & password
case
generate_dml_with_nulls
fixed_size_dml
treat_blanks_as_null
oldstyle_emptystring_as_null
fully_qualify_dml
delimited_dml_with_maximum_size
interface
environment
direct_parallel
Q. What is the usage of .mfctl and .mdir files in the mfs directory of Ab Initio?

A. .mfctl and .mdir are both related to multifile system. .mfctl extension of control file created when

we are using the MFS. The file extension .mfctl will contain the URLs of all the data partitions. The

file with the extension .mdir will contain the URL of the control file used by MFS.

Q. How to separate duplicate records with out Dedup sorted from the grouped input file?

A. in.* Rollup
or
with help roolup component functions like last first.
Or
you can use rollup to remove a duplicate record in a input file(note that it is key based duplicate) it

will keep the last record based on that key.
Or
Rollup will help to avoid the duplicate without using dedup component.
It takes the first record and reject the rest.

Q. how to rerun a graph in UNIX?

A. we can rerun graph by using ab_job variable.
Or
you can run the graph by giving the following command in unix
dtm run <recvory file name> -continue
or
when ever a graph fails it creates a .rec file in the working directory the directory may be where ur

graph deployed script is stored .so remove that .rec file and then run the deployed script of the graph

from unix u may use m_rollback –d.

Q. what do you mean by rerun?

A. Your graph failed and you want to run it again or you want to run multiple instances of this graph.

Q. How do you pass parameters to a graph in AI ?

A. Using Input Parameters/ Graph parameter.
Or
If you want to pass a parameter to your graph then declare a formal parameter in edit-parametrs region.
Or
yes you can declare parametes in edit paramter option in GDE while running the .ksh you can pass the

value in command line.

Q. Which component does not work in pipeline parallelism?

A. Sort component does not work in pipeline parallelism.
Or
Sort component does not work in pipeline parallelism it blocks the pipeline parallelism.
Or
sort component does not work in pipleline parallelism because in case of sort all the data must read

before writing any records hence it does not support pipeline parallelism. Hope this make sense.
or
Sort Sort within group Rollup will break pipeline parallelism.

Q. How does one make use of the "Call Web Service" component in the $AB_HOME/connectors/Internet

directory of the component selectory window of the Ab Initio Console? Explain with Sample Code?

A.

Q. What is patch database (IPD etc)?

A.

Q. How do you check root disk failed?

A.

Q. How do you restore whole OS backup and a selected single file?

A.
Q. how to create SCDs(slowly changing dimensions) in abinitio?

A. If you want to implement the SCDs in abinitio then you should do the delta processing.

Q. How do you join two files with different layouts?

A. if the two files have totally different layout....u can use Fuse Component.Read about it from

Abinitio Help.<.
Or
If the layout is totally different ----use Fuse Component.
Or
To join a serial file and a multifile if that is the case use broadcast component after the serial file

and before join.

Q. What is Vector Field? Explain?

A. Vector field. This field is used in the denormalize component.
Denormalize generates multiple output data records to each of its input records.
We specify field names, we specify output length, this legnth called the vector field.
Depends on vector field length generates output records.
Denomalize specify one element type & count the index. According to this vector field generates

output records.

Q. Which file should we keep it as a look up file, large file or less data records file & why?

A. We should always use small file ( i.e. file with less no. of records ) as lookup. The reason is -

This file will be kept in main memory ( RAM ) from the starting to ending of the script/graph run.

Hence less the file size more performance from server.
or
Lookup file should be always small. If the data is growing every day then the performance will become

poor and its not wise to use bigger file as lookup. It spoils the lookup concept.

Q. How metadata management takes place in ABinitio?

A. it is possible with help of EME. it follows UNIX file structure.

Q. Is there a way of implementing File Listener in Ab Initio?? It should continuously scan a given

directory, as soon as a file is placed in that directory, it should copy that file to a working

directory and trigger a corresponding Ab Initio graph?

A. You can use the CONTINOUS components to build this. It requires and environment setup though. You

can read through the Ab Initio help by searching on 'Continuous graphs'.

Q. How many Sandboxes can be there for a project?

A. A Project can have many sandboxes.
We can see many developers working in different sandboxes which is attached to a single project.
Or
we can have any no of sandboxes sand box is nothing but users work area where each user will get copy

of the project & do the modifications acc.
or
There can be numerous sandboxes for a project but there should be only one sandbox associated with EME

for a project.

Q. How will you connect two servers?
A. Connecting two different servers in Abinito is done thorugh a file called abinitio.rc. This is used

for remote connectivity. This file contains information like the server ip(or name) the user name and

the password required to connect.
Q. How can you extract and load without transforming?
A. Provided the DML is same you can directly connect both input and output datasets and perform and

extract and load operation. For example If the input dataset is a table and output is file you can

directly connect both these making sure the DML of the file is propagated from table.
Q. If want to run the graph in unix !what command i need to use ?
A. 1. First design the graph.
2. Save it
3. Run it.
4. Go to runtab then go to deploy press deploy.
Now Abintio automatically generates ksh of the graph in run folder of your sand box.
5. Go to sand box in run folder there you will find your graph.ksh.

Q. how will i can implemate Insert,Update,delete in abinitio?

A. to find records which should be inserted , updated or
deleted one should use ab initio flow
a. unload master table
b. read delta file
c. use inner join to join a and b unused a will be your
delete records (if required) unused b will be your insert
record . joined a and b will be your update records.

Q. how will u view MFS in unix?

A. to view MFS in unix you should run m_expand command.

Q. what is diff/btween conditional dml & conditional component?

A. conditional DML can be pass as program variable
conditional components will be used only when condition past
to the graph is true.

Q.
Q.What is the difference between In-Memory Sort and Inputs must be sorted?

A.The Inmemory sort and input must be sorted options are
there in the Join,Rollup and Dedup components.
Main difference between these two is if you selected input
must be sorted options in the above mentioned components
the the downstream components will get the records in a
sorted oder. if you are selected option as Inmemory sort
then the downstream components will not get the sorted
records.

Q. Graph was failed how it is achived ?
A. There are several resons that graph will be failed.
I have one specific Answar for this is...

If the graph is failed then Abinitio will create one .rec
file in the run directory of your sendbox. if you want to
rollback the graph then use m_rollback command in the unix
directory or you can use m_cleanup utilities in the Unix
command.

Q. how will i can implemate Insert,Update,delete in abinitio? how will u view MFS in unix?what is

diff/btween conditional dml& conditional component?
A. to find records which should be inserted , updated or
deleted one should use ab initio flow
a. unload master table
b. read delta file
c. use inner join to join a and b unused a will be your
delete records (if required) unused b will be your insert
record . joined a and b will be your update records

to view MFS in unix you should run m_expand command
conditional DML can be pass as program variable
conditional components will be used only when condition past
to the graph is true.

Q. What is meant header and tailer, suppose header and tailer had some junk data how will delete junk

data ? which components r used?
A. 1. If you know the signature of header and tailer record
then use filerby expression component to filter the header
and tailer records

2. Use one reformate component and then inside the
transformation use next_in_sequence() function to assign
unique numbers to each record,and then use filter by
expression component to filter the records based on
sequence numbers.

3.Follow the step 2 and use instead of filter by expression
component use leading records component to filter the
header and tailer records.

Q. I had 10,000 records r there i loded today 4000 records, i need load to 4001 - 10,000 next day how

is in Type 1 and how is it on type 2?
A. simply take a reformat component and then put
next_in_sequence()> 4000 in select parameter.

Q. what are the steps in actual ab initio graph processing including general,pre and post process

settings?
A. 1. Start script
2. Graph components.
3.End script

Q. What is air_project_parameters and air_sandbox_overrides? what is the relation between them?
A. .air-project-parameters
Contains the parameter definitions of all the parameters
within a sandbox. This file is maintained by the GDE and
the Ab Initio environment scripts.

.air-sandbox-overrides
This file exists only if you are using version 1.11 or a
later version of the GDE. It contains the user's private
values for any parameters in .air-project-parameters that
have the Private Value flag set. It has the same format as
the .air-project-parameters file.

When you edit a value (in GDE) for a parameter that has the
Private Value flag checked, the value is stored in the .air-
sandbox-overrides file rather than the .air-project-
parameters file.

Q. In Join component which record will go to unused port and which will go to reject port ?
A. In case of inner-join all the records not matching the key
specified goes to the respective unused ports, in full
outer-join none of the records goes to the unused ports. In
case of reject port, records which do not match with DML
come to the reject port.
OR
In case of inner-join all the records not matching the key
specified goes to the respective unused ports, in full
outer-join none of the records goes to the unused ports.
All the records which evaluates to NULL during joiin
transformation will go into reject port if the limit +
ramp*number_of_input_records_so_far <
number_of_input_records_so_far.

Q. wt is meant by repartioning in howmany ways it can be done?
A. Repartitioning means changing one or both of the following:
1) The degree of parallelism of partitioned data
2) The grouping of records within the partitions of
partitioned data

Q. How to Create Surrogate Key using Ab Initio?
A. There r many ways to create Surrogatekey but it depends on your business logic. here u can try these

ways...

1. use next_in_sequence() function in your transform.

2.use Assign key values component (if ur gde is higher than 1.10)

3.write a stored proc to this and call this stor proc wherever u need
Q. What is semi-join?
A. In abinitio,there are 3 types of join 1.inner join. 2.outer join and 3.semi join. for

inner join 'record_requiredn' parameter is true for all in ports. for outer join it is false for

all the in ports. if you want the semi join you put 'record_required' as true for the required

component and false for other components. 
Q. How will you ensure that the components created in one version do not malfunction/cease functioning

in other version?
A. Runtime behaviour of components will remain same in all versions unless its requires to have any

additional paramter to be defined in any version. Evolution of new version of ETL comes with some

changes in component level parameters (observation as of now).
or
Components should be compatibile to run in previous versions of GDE. The depreciated components would

run in new versions.
Q. What data modelling do you follow while loading of data to tables? Also the DB you are inserting the

data has Star schema or Snow flake schema?
A.
Q. How does force_error function work ? If we set never abort in reformat , will force_error stop the

graph or will it continue to process the next set of records ?
A. Here you can set the two conditions for the reformat component
1. If you want to fail set the reject thresold to fail on first reject
2. If don't want to fail you set never to abort.

Force_error is used to abort any graph if the conditions are not met and you write the error errors

records in file and then abort the graphs this can done in different ways.
Or
force_error() fuction will not stop the graph it will write the error message to the error port for

that record and will process the next record.
Q. Phase verses Checkpoint?
A. Phase is breaking the graph into different block. It create some temp file while running and deletes

it once the completion is done.

Checkpoint is used for recovery purpose. when the graph is interrupted instead of rerunning the graph

from the start. the excution starts from the stop where it is stopeed.
Q. what is the function of XFR in abinitio? It would be great if one of you can explain me in brief

what is the function of xfr (like what does it do ,where is it stored ,how does it affect )?
A. As you know when you create a new sandbox in ab initio environment the following directories will be

created
1.mp
2.dml
3.xfr
4.db
etc etc.

xfr is directory in abinitio where we can write our own function and use them during the tranformation

(rollup , reformat etc..).

example you can write a function to convert a string into decimal or to get string max length , I can

write that function in a file called user_define_function.xfr in xfr directory inside this file i

can define a function called string_to_interger or get_string_max_length or both. In any transform

component you can include the file liek
include "<full path>/user_define_function.xfr "

you can called the function like anyother function in ab initio.
Q. What is the difference between the flows of 3 parallelisms?
A. Parallelism's are of 3 types:
1. Component Parallelism: All program components runnings simultaneously on different data sets.
2. Pipeline Parallelism: All program components runnings simultaneously on same data sets. we can break

the pipeline parallelism using all sort based components.

Ex: sort sort within groups AGG Rollup Join etc.
3. Data Parallelism: Distributes data records into multiple locations using partition components.
Q. How can I calculate the total memory requirement of a graph?
A. You can roughly calculate memory requirement as:

1. Each partition of a component uses:
~ 7 MB + max-core (if any)

2. Add size of lookup files used in phase (if multiple components use same lookup only count it once)

3. Multiply by degree of parallelism. Add up all components in a phase; that is how much memory is used

in that phase.

4. (Total memory requirement of a graph) > (the largest-memory phase in the graph).

Q. How can I achieve cummulative sumary in AB Initio other than using SCAN component. Is there any

inbuilt function available for that?
A. Scan is really the most simple way to achieve this. Another way is to use a ROLLUP since it is a

multistage component. You need to put the ROLLUP component into multistage format and write the

intermediate results to a temp array (I think they're called vectors in AI). The ROLLUP loops through

each record in your defined group.

Let's say you want to get intermediate results by date. You sort your data by {ID; DATE} first. Then

ROLLUP by {ID}. The ROLLUP will execute it's transformation for each record per ID. So store your

results in a temp vector which will need to be initialized to be the size of your largest group. Each

time the ROLLUP enters the tranformation write to the [i] position in the array and increment i each

time. As long as this is all done in the "rollup" transformation and not the "finalize" transformation

it will run the "initialize" portion before it moves to the next ID.

I have done it this way but the Scan is easier. I was doing a more simple rollup before I found that I

needed cumulative intermediate results so I just modified my existing ROLLUP. Ab Initio documentation

does not explain this technique in detail but it can be done.
or
There are three ways
1) You can use Scan with rollup component
2) Use Rollup component
3) You can also use Scan followed by Dedup sort and select the last record. That will solve the purpose
or
Other then scan we can use rollup to do the cumulative summary.
Or
Use in built componenet in Abinitio .. "SCANWITHROLLUP"
Q. I have file containing 5 unique rows and I am passing them through SORT component using null key and

and passing output of SORT to Dedup sort. What will happen, what will be the output.?
A. If there is no key used in the sort component while using the dedup sort the output depends on the

keep parameter.
If its set to firt then the output would have only the first record
if its set to last the output would have the last record
if its set to unique_only then there would be no records in the output file.
Q. Can we process 1 GB data(1 million records) by using Lookup? How?
A. I think it is not adviseable to use a 1GB lookup file it will definitely effect the parallel

processing of other applications and affect the performance.

I would prefer to use the MFS lookup file and not serial lookup file in this case.

Q. If I have 2 files containing field file1(A,B,C) and file2(A,B,D), if we partition both the files on

key A using partition by key and pass the output to join component, if the join key is (A,B) will it

join or not and WHY?
A.

Q. In my sandbox i am having 10 graphs, i checked-in those graphs into EME. Again i checked-out the

graph and i do the modifications, i found out the modifications was wrong. what i have to do if i want

to get the original graph..?
A.
How do I create subgraphs in Ab Initio?

Q.What is a sandbox?
A. Sandbox is a directory structure of which each directory level is assigned a variable name, is used

to manage check-in and checkout of repository based objects such as graphs.

fin -------> top level directory ( $AI_PROJECT )
|
|---- dml -------> second level directory ( $AI_DML )
|
|----- xfr -------> second level directory ( $AI_XfR )
|
|----- run --------> second level directory ( $AI_RUN )
|

You'll require a sandbox when you use EME (repository s/w) to maintain release control.

Within EME for the same project an identical structure will exist.

The above-mentioned structure will exist under the os (eg unix), for instance for the project called

fin, and is usually name of the top-level directory.

In EME, a similar structure will exist for the project: fin.

When you checkout or check-in a whole project or an object belonging to a project, the information is

exchanged between these two structures.

For instance, if you checkout a dml called fin.dml for the project called fin, you need a sandbox with

the same structure as the EME project called fin. Once you've created that, as shown above, fin.dml or

a copy of it will come out from EME and be placed in the dml directory of your sandbox.

Q. I have a job that will do the following: ftps files from remote server; reformat data in those files

and updates the database; deletes the temporary files. How do we trap errors generated by Ab Initio

when an ftp fails? If I have to re-run / re-start a graph again, what are the points to be considered?

does *.rec file have anything to do with it?

A. AbInitio has very good restartability and recovery features built into it. In Your situation you can

do the tasks you mentioned in one graph with phase breaks.

FTP in phase 1 and your transaformation in next phase and then DB update in another pahse (This is just

an example this may not best of doing it as best design depends on various other factors)

If the graph fails during FTP then your graph fails in Phase 0, you can restart the graph, if your

graph fails in Phase 1 then AB_JOB.rec file exists and when you restart your graph you would see a

message saying recovery file exists, do you want to start your graph from last successful check point

or restart from begining. Same thing if it fails in Phase 2.

Phases are expensive from Disk I/O perspective, so have to be careful in doing too much phasing.

Coming back to error trapping each component has reject, error, log ports, reject captures rejected

records, error captures corresponding error and log captures the execution statistics of the component.

You can control reject status of each component by setting reject threshold to either "Never Abort",

"Abort on first reject" or setting "ramp/limit"

Recovery files keep tack of crucial information for recovering the graph from failed status, which node

the component is executing on etc. It is a bad idea to just remove the *.rec files, you always want to

rollback the recovery fils cleanly so that temporary files created during graph execution won't hang

around and occupy disk space and create issues.

always use m_rollback –d

Q. What is Ad hoc multifile? How is it used?
A. Here is a description of Ad hoc multifile:

Ad hoc multifiles treat several serial files having the same record format as a single graph component.

Frequently, the input of a graph consists of a set of serial files, all of which have to be processed

as a unit. An Ad hoc multifile is a multifile created 'on the fly' out of a set of serial files,

without needing to define a multifile system to contain it. This enables you to represent the needed

set of serial files with a single input file component in the graph. Moreover, the set of files used by

the component can be determined at runtime. This lets the user customize which set of files the graph

uses as input without having to change the graph itself, even after it goes into production.

Ad hoc multifiles can be used as output, intermediate, and lookup files as well as input files.

The simplest way to define an Ad hoc multifile is to list the files explicitly as follows:

1. Insert an input file component in your graph.
2. Open the properties dialog. Select Description tab.
3. Select Partitions in the Data Location of the Description tab
4. Click Edit to open the Define multifile Partitions dialog box.
5. Click New and enter the first file name. Click New again and enter the second file name and so on.
6. Click OK.

If you have added 'n' files, then the input file now acts something like a file in a n-way multifile

system, whose data partitions are the n files you listed. It is possible for components to run in the

layout of the input file component. However, there is no way to run commands such as m_ls or m_dump on

the files, because they do not comprise a real multifile system.

There are other ways than listing the input files explicitly in an Ad hoc multifile.

1. Listing files using wildcards - If the input file names have a common pattern then you can use a

wild card for all the files. E.g. $AI_SERIAL/ad_hoc_input_*.dat. All the files that are found at the

runtime matching the wild card pattern will be taken for the Ad hoc multifile.

2. Listing files in a variable. You can create a runtime parameter for the graph and inside the

parameter you can list all the files separated by spaces.

3. Listing files using a command - E.g. $(ls $AI_SERIAL/ad_hoc_input_*.dat), which produces the list of

files to be used for the ad hoc multifile. This method gives maximum flexibility in choosing the input

files, since you can use complex commands also that involves owner of file or date time stamp.

Q. What is the difference between Replicate and Broadcast?
A. Broadcast and Replicate are similar components but generally Replicate is used to increase Component

Parallelism, emitting multiple straight flows to seperate pipelines. Broadcast is used to increase data

parallelism by feeding records to fan-out or all-to-all flows.
Or
Replicate is old component when compared to broadcast. You can use Broadcast as join component, where

as Replicate you can't use as join. By Default, Replicate is Straight flow and Broadcast is fan-out or

All-To-All Flow.
Broadcast is used for Data Parallism whereas Replicate is used for Component Parallesim.
Or
Replicate

Supports component parallelism

Input File -------> Replicate --------> Format ---->Output File
|
|
|
--------->Rollup-------> output File

Broadcast

Supports data parallelism

Input File1 (MF) -----------------> JOIN -----------> Output File
^
|
|
Input File 2(Serial)---> Broadcast -->

Input File2 is a serial file and it is being joined with a mf, input file2, without being partitioned.

The compoment, Broadcast, is writing data to all partitions of Input file1, creating an implicit fan

out flow.
Or
The short answer is that the Replicate copies a flow while a Broadcast multiplies it. Broadcast is a

partitioner where Replicate is a simple flow-copy mechanism.

Replicate appears in over 90% of all AI graphs (across the board of all implementations worldwide)

where Broadcast appears in less than 1% of all graphs.

You won't see any difference in the two until you start using data-parallel, then it will go south

rather quickly. Here's an experiment:

Use a simple serial input file, followed by a broadcast, then a 4-way multifile output file component.

If you run the graph with say, 100 records from the input file, it will create 400 records in the

output file - 100 records for each flow partition encountered.

If you had used a Replicate, it would have read and written 100 records.

Hi Just went through 8 ab initio interviews and some of the tough
questions were as follows.

1.What is the function you would use to transfer a string into a decimal.?

2.How many parallelisms in ab initio and a definition of the three. ?

3.What is the difference between db config and a cfg file?

4.Have you eveer encountered an error called depth not equal (this
apparently occurs when you extensively create graphs.....kinda a trick
question)?

5.How do you truncate a table.....each candidate would say only 1 of the
several ways to do this. ?

6.How do you improve the performance of a graph?

7.Whats the difference between partitioning with key and round robin?

8.Have you worked with packages?

9.How do you add default rules in transformer?

10.What is a ramp limit

11.Have you used rollup component ....describe?

12.How many components in your most complicated graph?

13.Do you know what a local lookup is?

Latest Features in Ab Initio - 2.14
Dynamic Script Generation is the latest buzz in Ab Initio world and one of it’s finest. It comes with

lots of other advantages which were not there in earlier versions of Ab Initio Co>Operating System. Now

it is available in Co>Operating System version 2.14.46 and
above.
This feature typically enables the use of Ab Initio PDL (Parameter Definition Language) and Component

Folding.
Now if we enable this feature by changing the script generation method to Dynamic in Run Settings we

will be able to run a graph without deploying it through GDE. From now onwards we will execute the mp

file only; there is no need to have the ksh. In production server once we run the mp file using air

sandbox run command on the fly it generates a reduced script, which contains the commands to set up the

host environment. It doesn’t include component details of the graph at all.
You can check the mp file of dynamic script generation enabled graph. It is an editable text file.
Component Folding: It is a feature by which Co>Operating system combines group of components and runs

them as a single process. Now question - Does it improve the performance? Yes, in most of the cases it

will bring a significant performance boost over the traditional approach of execution.

Prerequisites of Component Folding:
• The components must be foldable • They must be in same phase and layout • Components must be

connected via a straight flow.

How it works (Advantages):
1. When this is enabled by checking the folding option in Run Setting, Co>Operating System runtime

folds all the processes (foldable components) in a single process. As a result number of processes is

reduced when a graph executes. Every process has overheads of creation of new process, scheduling,

memory consumption etc. These overheads will vary from OS to OS. In some OS like MVS, creation and

maintenance of processes are very costly compared to different flavors of UNIX.
2. Another major benefit of component folding is the reduction of interpretation time for the DML

between processes. Because it will end up with multitool folded processes communicating with other

multitool or unitool.
3. Apart from that increase in number of processes results higher interprocess communication. Data

movement between two or more processes will not only consume time but memory too. In CFG (Continuous

Flow Graph) interprocess communication is always very high. So it is worth enabling Component folding

in a CFG.
Disadvantages of Component Folding:
1. Pipeline Parallelism: As component folding folds different component in a single process it will

hurt the pipeline parallelism of Ab Initio. If flow of our graph is like - Input File -> Filter By

Expression -> Reformat -> Output File. In traditional method by the help of Pipeline Parallelism FBE

and Reformat will execute concurrently. But now these two components are folded together so there is no

chance of parallel execution.
2. Address Space: In a 32 bit OS maximum limit of Address space for process is 4 GB. So if we combine 4

different components to a single process by component folding OS will allow only 4 GB of address space

for all 4 instead of 4X4 total 16 GB of spaces. So we should avert component folding components where

memory use is very high as in-memory Rollup, Join, and Reformat with lookup. Some components like Sort,

in-memory Join causes internal buffering of data. Combing them in a single process will result writing

to disk (Higher IO).

Set AB_MULTITOOL_MAXCORE variable to limit the maximum allowable memory for the folded component group.
Excluding any component from Component Folding:
I know sometime you would wish to prevent components to be folded to allow pipeline parallelism or to

access more address space. Then you need to exclude some components from being folded.
Set AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES configuration variable to space separated mpname of the

components in your $HOME/.abinitiorc or system wide $AB_HOME/config/abinitiorc file. e.g. export

AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES= hash-rollup reformat-transform
In other way to prevent two different components from getting folded together right click on the flow

between them and uncheck the Allow Component Folding option.
Everything has its cost. So it is always worth benchmarking before taking a decision. Prevent and allow

component folding for your components of the graph, tune it for the highest performance.
CPU tracking report of folded components in a graph:
To report the execution detail of folded graph on console we need to override the AB_REPORT variable

with show-folding option as – AB_REPORT=”show-folding flows times interval=180 scroll=true spillage

totals file-percentages”.
The folded components are displayed as multitool process in CPU tracking information. The CPU time for

a folded component is shown twice one for the component itself once as a multitool component.
Parameter Definition Language (PDL):
PDL is used to put logic for inline computation in parameter value. It provides high flexibility in

terms of interpretation. It supports both $ and ${} substitution. For this you need to set the

interpretation PDL and write the DML expression within $[ ]. This approach is much faster than

traditional shell scripting. It is the way to move forward to a much flexible and robust technique of

designing. With the use of it we can abolish the old shell scripting as script-end and script-start are

already beaten enough to death since last few years. You can use PDL interpretation for condition of a

component.
NOTE. The detail of PDL within the GDE is lacking any consistency. Basically, we can use the majority

of the Ab Initio DML functions. I would recommend looking at the metaprograming section for starters.

Then play with the parameters editor.

e.g.
Suppose in a graph we have a conditional component which runs based on existence of a file called

emp.dat.
Now FILE_NAME parameter is defined as /home/xyz/emp.dat and a conditional parameter called EXIST is

defined as
$[if (file_information($’FILE_NAME’).found) 1 else 0]
We can define a parameter with type and transform function with the help of parameter AB_DML_DEFS.
e.g. Suppose AB_DML_DEFS is defined as
out :: sqrt(in) = begin out :: math_sqrt(in); end;
Now in a parameter called SQRT is defined as $[sqrt (16)]
Resolved value from this parameter will be 4.
Ensure your host run settings are checked for dynamic script generation, and read the 2.14 patchset

notes for a description of any hint.

18-12-10 Syntel(cosider IT done) questions with 2+ exp?

Unix:
1.I hv a file as

a
ab
abc
abcd
abcde..
I want the line of ‘abc’
A) grep

2. I m in a subdirectoery. I wnt list the all files which are in the previous directory with out going

to that directory. i.e from the subdirectory.
A) I said ls –lrt complete path.

DWH concepts: AND Abinitio

1.what is topdown approach and bottom up approach in dw?
2.what r the dimesnsions and how many types:
3.what is confirmed dimension other dimensions were also asked.
4.what is scd. How many types. Hv u heared abt type3, type4.
5.Hv u developed scd2? If so how?
6.I hv two files. They are yesterday file and today file. Today file contains 150 records and 100 of

them are from yesterday file. And yesterdays file contains 100 records only.
How do u filter the 50 records with out using join?
7.If u give unsorted i/ps to join and use inmemory option then the o/p is in sortedorder or unsorted

order.
8.I hv 100 records in a multifile. I wnt to distribute these records into 2 o/p files. If I use

Replicate ang Broadcast the result is same or different. If different, how many records the op files

contains in both cases?
9.How do u stripout header and trailers if there are no indicators.
10.what r the db components did u use?
11. what do u mean by parallel unloading?
12. whether u use tab layout or parallel hit in parallel unloading?
13. Why do u need ABLOCAL() construct.
14. what is conventional loading and direct loading in db?
15.Hv u used psets or PDL or .ksh for runnig graphs ?
16.Hv u heared by conditional components?
17.I hv given a condition in a component in phase 0 which is checked in the component that is in

phase1.
Is it possible to run the component of phase 1?
18. How do u count no.of records in a file? Which component do u use for that?
I said Rollup.
They asked What do u do in rollup?
22-12-10 3:00PM-3:30 PM capgemini(Pune)

1. Tell me a little bit of urself and ur project?
2. How many graphs hv developed so far?
3. What are the components hv used in ur graphs?
4. I hv a table which contains a numeric column. I wnt to filter these records based on conditions as
>40 one file
<40 one file
>60 one file

A) I answered “Part by exp”

They asked another componet?
I ans: Reformat
What will u do in reformat?
Ans: I use output_index for that.
What Is output_index?
Ans: When u want to send a singe record to the single output port we use output index.
5. I wnt to distribute my records in 80:20 ratio?
A) part by percentage.
Q) Alternative? BcZ I dnt wnt partition the data. I wnt just filter?
A) no idea.
6. Hv u used rollup?
What are the 2 phases of rollup?
I hv 2 reocrds in a file as follows.

Sid subname Marks
100 English 90
100 Maths 90

My key is Sid and the I write sum(Marks) fun then what is the o/p?
Ans: 100 Maths 180

Why don’t u get 100 English 180
It isn’t possible bcz Rollup gives last record in each and every group.
7. Hv worked with psets?
No
8. How to run a graph from background?
Ans: using Wrapper script.
9. What Is dependency analysis?
10. How do u do dependency ananlysis?
11.What is “Capital one” server name?
12. How did u write scripts in ur capital one project?
Ans: I didn’t write any scripts in my projects.
Q) yes. But u got scripts from server. How could u get?
Ans: I don’t remember.
13.Hv u worked with db components?
Yes
Q) If I don’t give any update statements to my update table , will it run?
Ans: Yes. It will run. If don’t provide any i/ps to ur update table the statements that are in the

‘insert sql file’ are executed.
14. I hv an i/p file . I wnt to load the the i/p file into 2 different db tables?
How is the graph?
Ans: i.pfile Replicate----oracle o/p table
--- db2 o/p table
q) is there any thing in between with in the components?
Nothing.
Q) wt abt dbc file for the 2 tables.
A) the dbc files different for the two tables depending on the db u r using.
15.What is .dbc file?
16. I hv 2 files with only one column as
1) 1,2,3
2)1,1,2
If I perform inner join, Wt is the o/p?
A) 1,1.1,2,2
17. what is .abinitiorc file?
Ans; It contains configuration info. i.e when a machine wnts to connect another machine it must contain

.abinitiorc file.
18. what are the variables it contains?
Ans: AB_HOME etc.
Q) What AB_HOME contains:
Ans: The path of the host machine..
Q) Host means?
Ans: The machine where the co>op is installed.
19. What is AB_WORK_DIR contains?
Ans: It is temporary work directory.
20. What is AB_DATA_DIR ?
Ans: It is a temporary data dir.
21. When I m runnig the graph I got an error as “no space ……” .What it means?
Ans: That means ur temporary directory full.
Q) How can u solve the problem?
Ans: I delete unnecessary files in temporary directory and then run the graph.
Q) How can u delete temporary files.
A) using m_rollback or m_cleanup.
Q) what is the difference?
A) m_rollback rollbacks a partially completed graph into previous state.
m-cleanup cleansup the files left over from unsuccessful completion of the graph.
22. What is .rec file?
Ans: It is recovery file it contains temporary data while running the graph.

UNIX:

1. Hv u involved in scripting?
Ans: Yes.
2. Write a script for the foll scenario.
I hv two files. If first file biggerthan the second print first file is greater.
Else second is greater else both are equal.
Write a script for that.
Ans: lof1=wc –c filename1
Lof2 = wc –c filename2.
If(lof1-gt lof2)
Echo “ first file bigger”
Else if (lof1 –lt lof2)
Echo “ second is greater”
Else echo “ both are equal”.

Q) It is not good . Bcz we don’t measure the file size in characters. That is always not correct.
Can u say any alternative?
Ans; no
2. I hv different files in my directory. Among that I wnt to print the files those are of size 0

bytes.
Which command do u use?
M_touch command

1.What is a layout?
2.What is a snowflack schema? Give me an example.
3. Explain cdc graph?
Which type of join do u use in cdc?
A) innerjoin.
3. Diff between uniqe key and primary key ?
4.Have you worked on psets?tell me about them?
5.Havu u worked on PDL? Explain pdl
6.Can U Overwrite Sandbox Parameters?
7.What is .abinitiorc and What it contains?
8.Have you worked on denormilze component?
9.Have u worked on Vectors?
10.How can u creat e vectors?
11.Have U worked on Normalize component?
12.What is SCD TYPE2?explain with an example and explain graph?
13.What is a phase?Check point and explain diff?
14.What is a .rec file?how do we recover from it?
15.Have u used join with DB?
16.Have u used air commands?
17.I hv checked out an object. How can u remove?
18.I hv 5 records. I wnt to make 100 records from the 5 records for testing purpose. How can u create?
Ans: I use generate records component,.
Q) I don’t want to use generate records component. Any alternative.
A) Broad cast
Q) any other:
A) replicate.
Q) any other.
A) I hv no idea.
19.What is the difference between is_valid() and is_defined()?
20.I hv one i/p file having 100 records then I wnt to retrieve records form 50 to 70 ?
A) I use fileter by expression as next_in_sequence()>=50 and next_in_sequence()<71
Q) can u say any command?
A)m_dump .dml .dat –start 50 –end 70
21.How can u create multifile?
A)m_mkfs control URL datapartition URL
22.what is control url?
A) It contains the urls of the control partitioins.
Q) How can we see the partitions or no.of partitioins?
A) m_expand.
UNIX:
1. hv a file with 3 columns (cid,cname,csal) I wnt csal field in a file with sorted order.
A) First I sort the file on 3 field i.e.. csal. And I cut the 3 field
Sort -3k filename|cut –f 3 > filename.
2.I wnt first record and last record from a file.
A) sed –n 1,$p filename.
DB:
1)I hv a table with duplicate records. I wnt to see duplicates and I wnt remove duplicates.
Write the query for that.
A) I didn’t say
2) I hv a table and I wnt create the same table with same records.
A) create table tablename as select * from table;

1. capgemini
2. Tel me abt your self?
3. How do you mean the Performing tuning any issues how will you recover?
4. wt is EME version?
5. How many parallelisms supports? Are u using Data parallelism what purpose?
6. what is partion and Departing?
7. One file is 100 records ie .i/p----Reformat--o/p I make reformat is Never abort to make the

reject threshold is>=99 my graph is fail? How can I make my graph fail?
8. Diff b/w the $ and $ with{} parameter evaluation?
9. m_command use
1gb file 200 to 300 records in a file what is syntax?
10. One file as 100 record( empid,ename,sal……..) now I want delete duplicates using Rollup wt is

Transformer?
11. what is Override and Overidenkey() in join?
12. which component produce intermediate summary records?
13. In using Surrogate key wt component?
14. I have one file 4 records I hv one transaction record i hv specified Dml Transaction record the

data how can I transformed to separate diff 4 records
Using record is inline deliminator?
15. wt r u r challenges in your project?
16. How can I multiple instances given from command prompt ?(10 instances is a graph)
17. How can I make conditional graph?
18. Ex:2 file indicator and production how can I make graph conditional?
19. Diff b/w the Api & utility mode?
20. wt is Sequential of avalutation when u run a graph?
21. Diff b/w the m_rollback and m_cleanup?
22. 1 Gb file 80400 replace them to 80500 how can I chang 1 GB file in unix?
23. 1 to 90 th Date I want to delete older than 90 th Date how can I delete?
24. I hv i/p file 100 records I wn’t to disply o/p Add / Even recods how?
25. I hv i/p file I want to display separate Header Body Tailor in o/p file?
26. i/p file 3 records in one table next file is 6 rec using the join key is null how many o/p

records?

Mahendra Satyam
1 . lookup how much space is occupy?
2. Wt is project and sandbox different?
3. how you know Gde Co>op versions?
4. wt is private public common projects?
5. hw many components in ur package editir?
6. wt is pset?
7. what is commit in o/p table?
8. wt is layout in i/p table?
9. who is your end users?
10. what are all conditional components?
11. wt is the conditional dml?
12. how to check in the graph?
13. hw many editors in the Gde?
14. wt is Migration?
15. wt is performance tuning?
16. hw to translate the 3 way to 5way sys?
17. in day 1 some work and 2 nd day is some work wt happen in execution time is

another day?
18. wt is shell commands?

1. What is adhoc multifiles?
2. How can you SORT an already partitioned (round-robin) data ?
3. How internally partition by key decides which key to send in which partition ?
4. What is PDL ? Give him a shell type parameter and ask him to convert it to PDL.
5. As shell type parameters are not supported by EME, then how you can use shell type parameter (If you

don't want to use PDL) without hampering lineage diagram ?
6.How you can generate dml from a COBOL copy book ?
7. How you can convert from ebcdic to packed decimal ?
8.What is regex (lookup)? When you should use it?
9. Why creation of temporary files depends on the value of MAX CORE ?
10. What is the diff between abinitiorc and .abinitiorc files ?
11. What is the use of allocate()?
12. What is use of branch in EME ?
13. How you can break a lock in EME ? How can you lock a file so that only no one other than EME admin

can break it ?
14. When you should be using ablocal() ? How you can use ablocal_expr?
15. Why you should not keep the layout as 'default' for input table component ?
16. What is dynamic lookup ?
17. What is dependent parameter ?
18. What is BRE ? (Business Rule Environment - This is a recent addition in abinitio package)
19.What is output index ?
20. How you can track the records those are not getting selected from ‘select’ in reformat component ?
21. Can we have more than one launcher process for a particular graph ? How about agent ?
22. There are lot of new fuctions added in 2.15 , you can ask about them ?
23. How can you run multiple instances of a graph in parallel?
24. In which scenario, .rec files will not get created even if graph fails ?
25. What is diff between force_error & force_abort?
26.What is the best way of creating huge test feeds ?
27. Can you read multiple input files using one input file component ? (Same DML)
28.What is diff between API & Utility mode ? Why we need to use API mode when Utility mode had both t
2. 29. What is the significance of record required indicator of Join component ?
30. What is flow buffering ? How it reduces the chances of having deadlock ?
31. When we should use 'jobid' for commit table (Output table component??
32.Layout - L1*, L1 - What is the difference ?
33. What is standard environment ?
34. How we can remove temp files of a failed job if .rec is not available ?
35. why Skew should always be 0 ?
36.You can ask about different vector functions.
37. What mpjret contains ?
38. Can you use scan to generate sequence number ?
39. How you can generate surrogate key?
40. You can ask about - meta pivot, leading records, read & write multiple files component ?
41. What is meta programming ? Can you generate xfr dynamically?
42. How you can delete an object from EME data store ?
43. How you can delete an object from a tag ?
44. If you check in one file twice in EME, one with a tag and second without a tag - Will the version

number change ?
43. What is use of creating a save files?
44. What is air-project parameter ?
45, Can you run air sandbox run using pset ?
46.How you can create cross joined output using join component ?
47.What is the default layout of watcher files ?
48.Why you get 'too many open files' error ?
49.What is the significance of vnode folder under AB_WORK_DIR ?
50, What is the significance AB_AIR_BRANCH?
51. How can you select only unique records from a set of records ?
52. How next_in_sequence work in parallel layout ?
53. Why you want to set a modest number in limit ?
54. How you can encrypt a password and use it in dbc file?
55. Which function you should be using if you do not want to process non printable char from a feed

file ?
56. How you can run a component for certain conditions ?
3. 56. What is catalog and when you should use it ?
57. How you can use reformat as a router ?
58. How many process gets created for a n-way parallel component ?
59. Why delimited DMLs take more time to process than fixed length DMLs?
60. How can you select only first and last record from a file without using next_in_sequence?
61.Why you should always keep largest input as driving input for Join ?
62. What a catalog file contains ?
63. What Is private project and public project ?
64. Why you should not use checkpoint after replicate ?
65. Broadcast and replicate does the similar work , what’s the diff then ?
66. How can you export a component’s internal parameters ?
67. How you can test a dbc file ?
68. What is the significance of ‘mp run’ command ?
69. How can you run a graph continuously without using continuous components ?
70. How can you get all the fields form a lookup files ? (Which function)

TCS(Chenni) Bhanu prakash 13-01-2010 Thu 4:00 to 4::25
1.Tell me brief abt ur education and ur skills and ur project roles and reponsibilties?
2. What is your current project and ur previous project. Explain me both of them?
3. What are ur roles and responsibiltes in ur project?
4,what are ur daily activities in ur project in ur company?
5.How much efficient u in abinitio ,unix,sql out of 5?
I told as 3,2,2 ½.
6. you involved in only development or in support also?
A) In my 2 yrs of exp I involved in only dev and I hv never got chance to involve in support.
7. From where did u get the source data in ur project. Is it a db or any other?
A) from flat file.
8.what is ur project destination?
A) oracle db.
9.Why are u looking for change?
10. For filtering purpose which components do u prefer?
A) FByExp, Reformat.
Q) Any alt
A) Part by exp.
Q) no.no..no My scenario is I wnt filter the customer records acc to branches and regions?
A) In that cases u can use part by exp if u don’t want to use part by exp use reformat with output

index /indexes
Q) Any alt?
A) no idea.
11) Hv u worked with join?
A) y
12) when u use explicit join. Which parameters can u see and what is the behavour of exdplicit join

give me a brief description?
13) Did u observe memory constraint in sort?
A) y maxcore param
14) fine if I give the data to the sort that exceeds maxcore value then what is the behavior of the

graph?
A) graph fails.
15. if ur graph is runnig slowly which performance techniques do u use for improve performance?
16.A graph was developed in abintio the how do u get that graph?
A) By check out
17. how can u check out?
A) from GDE check out wizard.
Q) what do u give in check out wizard
A) path
Q)to where do u check out?
A) EME.
18) u hv developed 1 graph 10 days back and checked in to eme. Today u want to check in the graph but

that was modified by others. At that time what do u observe while checking in.
A) we will be prompted with a msg that shows us the version was changed.
19)While u r checking in the graph what do u observe?
A) dependency analysis
20) After that what do u do?
A) I modified the graph.
21) Before modifying what do u do?
A) I lock the graph.
22) what is background operation when the graph is locked?
A) it prevents the other users with out modifying the graph.,
23. list some air commands?
A) air project import/export, air object ls/rm
24) hv u heard abt “ air dump” command.
A) I hv never used air dump I don’t know abt it.
25)in performance issues what is the good component?
A) pby RR
26) What r the different types of departition methods?
A) concatenate,gather,merger,interleave.
Q) Explain the differences and behavior of them?
27) I hv a file that contains duplicate records and unique records then I wnt to get only unique record

what is the ab graph?
A) use dedup sorted
Q) what do u do in that?
A) keep unique.
28)How can u create a mfs?
A)m_mkfs
29)I hv today date in graph param and jan1 in sandbox param.
I wnt to get today date while I m runnig the graph?
How>
A) We hv to overwrite the sandbox param.
Q) I don’t want that ans. If I run the graph what do I get whether today date or jan1.
A) jan1.
(silence for 10 secs………….)
30 ) what do u mean by abinitorc file?
31)I hv a file as

C1 t1 1
C1 t2 2
C1 t3 3
C2 t1 1
C2 t2 2
C2 t3 3
. . .
. . .
And so on.
I wnt to get the sum of amts along with higest three transactions.

Approximately around 40 questions…… I don’t remember all.

Q.How can we count a number of records in a flat file using Abinitio?

A. Using the aggregate function "count"
Or
Use rollup component to count the number of record in the flat file.
Use {} as key in the key specifier. It will consider all the fields as one record and count the total

number of records.

Q. How do we use SCD Types in the Abinitio graphs?
A.

Q. What is the order of execution of a graph when it runs?

A. Order of Graph Execution
1. Initialisation of Parameters
2. Start script execution
3. Graph execution
4. End script execution

Q. How to calculate the total number of records in the file using REFORMAT instead of ROLLUP?

A. Via its log port.
Or
Connect reformat to log port and use this code and in select parameter specify event_type "finish"
type reformat_final_msg record
decimal("records") read_count;
string("readn") filler_read;
decimal("records") written_count;
string("writtenn") filler_written;
decimal("records") rejected_count;
string("rejected") filler_rejected;
end;
out::reformat(in) begin
out.rec_count :1: string_lrtrim(reinterpret_as(reformat_final_msg in.event_text).read_count);
end;

Q. How do we append records to an already existing file usin abinitio graph?

A. Create a graph by taking the existing file as the out put file and keep the mode of the output file

in Append Mode. Pass the new records from the input file to this output file through a reformat. This

will append new records in the existing File.

Q. What is output index? How does it work in reformat?
Does below function show Output index in use
output:1:if(in.emp.sal<500)in.emp.sal
output:2:force_error("Employee salary is less than 500)?

A. Output index function is used in reformat having multiple output ports to direct which record goes

to which out port.
for eg. for a reformat with 3 out ports such a function could be like

if (value 'A') 1 else if (value 'B') 2 else 3

which basically means that if the field 'value' of any record evaluates to A in the transform function

it will come out of port 1 only and not from 2 or 3.

Q. How does component folding works?
A.
Q. What is the advantage of SORT within GROUP Clause?

A. Sort within Groups refines the sorting of data records already sorted according to one key and it

sorts the records within the groups formed by the first sort according to a second key.

Q. What are environment variable? Why are they required?

A. Environment Variables or other wise know as ABINITIO environment variable. Its set in stdenv under

which private project and public project will be there.
Parameters like $AB_HOME $AB_AIR_PROT will be present in environment variable and this will link to the

relational path respectively .

Q. What is the need of config variables in abinitio?
(ab_job,ab_max_core) and where to define them?
A.

Q. How to avoid duplicates without using dedup component?

A. To avoid the duplicates use rollup component.
Rollup component avoidS the duplicates and produces actual results.
or
We can avoid duplicate by using "key_change" method of the rollup component.
The code will be like below.
out :: key_change(prev curr)
begin out :: cur ! prev ; end out :: rollup(in) begin out :: in ; end

Q. What will happen when we pass dot or invlaid parameters in the inout component layout URL?
A.

Q. What is use of Ab_job command in Abinitio?

A. AB_JOB parameter is set when we want to run the same graph at the same time for different job names.
or
When you want to run the same instance of the graph many times which is palced in one place then we go

for AB_JOB. its should be defined in sandbox parameter. If you dont give the value for it it will take

AB_JOB as default.

Q. How to use a normal batch graph as a sub graph in Continuous graph?
A.

Q. How to open Abinitio in UNIX?

A. We cannot open AB Initio in UNIX. We can only run graphs in UNIX using the .ksh.

Q. How do you do production support for Graph?

A. If the graph failed in the production usually we get emergency access to see the failure then

analyse the failure if it is a code bug then we go back to development env and fix the bug test it then

deploy back to production and run.

Q.How do you check whether graph is completed successfully or not (is it $? of unix?)

A. $mpjret 0 then success if it is 1 then fail.

Q. What are different return values?

A. 0 and 1
0 is success
1 is failure
$? return status of last executed command.

Q. Why and When do we get the "Pipeline Broken Error" in Ab Initio?

A. Pipeline broken error will actually indicates the failure of a downstream component.
It normally occurs when the database is running out of memory which makes database components in the

graph unavailable.

Q. What are the two types of .dbc files?
A. Generally .dbc files are classified in 2 types with accordance of their parametric value for

fixed_size_dml which can be either true or false.
If the value is false the database generates delimited types whenever possible(it recognizes null as

zero-length string).In case of true it takes fixed length dml.
Other parameters for .dbc files are:
dbms
db_version
db_home
db_name
db_nodes
User & password
case
generate_dml_with_nulls
fixed_size_dml
treat_blanks_as_null
oldstyle_emptystring_as_null
fully_qualify_dml
delimited_dml_with_maximum_size
interface
environment
direct_parallel
Q. What is the usage of .mfctl and .mdir files in the mfs directory of Ab Initio?

A. .mfctl and .mdir are both related to multifile system. .mfctl extension of control file created when

we are using the MFS. The file extension .mfctl will contain the URLs of all the data partitions. The

file with the extension .mdir will contain the URL of the control file used by MFS.

Q. How to separate duplicate records with out Dedup sorted from the grouped input file?

A. in.* Rollup
or
with help roolup component functions like last first.
Or
you can use rollup to remove a duplicate record in a input file(note that it is key based duplicate) it

will keep the last record based on that key.
Or
Rollup will help to avoid the duplicate without using dedup component.
It takes the first record and reject the rest.

Q. how to rerun a graph in UNIX?

A. we can rerun graph by using ab_job variable.
Or
you can run the graph by giving the following command in unix
dtm run <recvory file name> -continue
or
when ever a graph fails it creates a .rec file in the working directory the directory may be where ur

graph deployed script is stored .so remove that .rec file and then run the deployed script of the graph

from unix u may use m_rollback –d.

Q. what do you mean by rerun?

A. Your graph failed and you want to run it again or you want to run multiple instances of this graph.

Q. How do you pass parameters to a graph in AI ?

A. Using Input Parameters/ Graph parameter.
Or
If you want to pass a parameter to your graph then declare a formal parameter in edit-parametrs region.
Or
yes you can declare parametes in edit paramter option in GDE while running the .ksh you can pass the

value in command line.

Q. Which component does not work in pipeline parallelism?

A. Sort component does not work in pipeline parallelism.
Or
Sort component does not work in pipeline parallelism it blocks the pipeline parallelism.
Or
sort component does not work in pipleline parallelism because in case of sort all the data must read

before writing any records hence it does not support pipeline parallelism. Hope this make sense.
or
Sort Sort within group Rollup will break pipeline parallelism.

Q. How does one make use of the "Call Web Service" component in the $AB_HOME/connectors/Internet

directory of the component selectory window of the Ab Initio Console? Explain with Sample Code?

A.

Q. What is patch database (IPD etc)?

A.

Q. How do you check root disk failed?

A.

Q. How do you restore whole OS backup and a selected single file?

A.
Q. how to create SCDs(slowly changing dimensions) in abinitio?

A. If you want to implement the SCDs in abinitio then you should do the delta processing.

Q. How do you join two files with different layouts?

A. if the two files have totally different layout....u can use Fuse Component.Read about it from

Abinitio Help.<.
Or
If the layout is totally different ----use Fuse Component.
Or
To join a serial file and a multifile if that is the case use broadcast component after the serial file

and before join.

Q. What is Vector Field? Explain?

A. Vector field. This field is used in the denormalize component.
Denormalize generates multiple output data records to each of its input records.
We specify field names, we specify output length, this legnth called the vector field.
Depends on vector field length generates output records.
Denomalize specify one element type & count the index. According to this vector field generates

output records.

Q. Which file should we keep it as a look up file, large file or less data records file & why?

A. We should always use small file ( i.e. file with less no. of records ) as lookup. The reason is -

This file will be kept in main memory ( RAM ) from the starting to ending of the script/graph run.

Hence less the file size more performance from server.
or
Lookup file should be always small. If the data is growing every day then the performance will become

poor and its not wise to use bigger file as lookup. It spoils the lookup concept.

Q. How metadata management takes place in ABinitio?

A. it is possible with help of EME. it follows UNIX file structure.

Q. Is there a way of implementing File Listener in Ab Initio?? It should continuously scan a given

directory, as soon as a file is placed in that directory, it should copy that file to a working

directory and trigger a corresponding Ab Initio graph?

A. You can use the CONTINOUS components to build this. It requires and environment setup though. You

can read through the Ab Initio help by searching on 'Continuous graphs'.

Q. How many Sandboxes can be there for a project?

A. A Project can have many sandboxes.
We can see many developers working in different sandboxes which is attached to a single project.
Or
we can have any no of sandboxes sand box is nothing but users work area where each user will get copy

of the project & do the modifications acc.
or
There can be numerous sandboxes for a project but there should be only one sandbox associated with EME

for a project.

Q. How will you connect two servers?
A. Connecting two different servers in Abinito is done thorugh a file called abinitio.rc. This is used

for remote connectivity. This file contains information like the server ip(or name) the user name and

the password required to connect.
Q. How can you extract and load without transforming?
A. Provided the DML is same you can directly connect both input and output datasets and perform and

extract and load operation. For example If the input dataset is a table and output is file you can

directly connect both these making sure the DML of the file is propagated from table.
Q. If want to run the graph in unix !what command i need to use ?
A. 1. First design the graph.
2. Save it
3. Run it.
4. Go to runtab then go to deploy press deploy.
Now Abintio automatically generates ksh of the graph in run folder of your sand box.
5. Go to sand box in run folder there you will find your graph.ksh.

Q. how will i can implemate Insert,Update,delete in abinitio?

A. to find records which should be inserted , updated or
deleted one should use ab initio flow
a. unload master table
b. read delta file
c. use inner join to join a and b unused a will be your
delete records (if required) unused b will be your insert
record . joined a and b will be your update records.

Q. how will u view MFS in unix?

A. to view MFS in unix you should run m_expand command.

Q. what is diff/btween conditional dml & conditional component?

A. conditional DML can be pass as program variable
conditional components will be used only when condition past
to the graph is true.

Q.
Q.What is the difference between In-Memory Sort and Inputs must be sorted?

A.The Inmemory sort and input must be sorted options are
there in the Join,Rollup and Dedup components.
Main difference between these two is if you selected input
must be sorted options in the above mentioned components
the the downstream components will get the records in a
sorted oder. if you are selected option as Inmemory sort
then the downstream components will not get the sorted
records.

Q. Graph was failed how it is achived ?
A. There are several resons that graph will be failed.
I have one specific Answar for this is...

If the graph is failed then Abinitio will create one .rec
file in the run directory of your sendbox. if you want to
rollback the graph then use m_rollback command in the unix
directory or you can use m_cleanup utilities in the Unix
command.

Q. how will i can implemate Insert,Update,delete in abinitio? how will u view MFS in unix?what is

diff/btween conditional dml& conditional component?
A. to find records which should be inserted , updated or
deleted one should use ab initio flow
a. unload master table
b. read delta file
c. use inner join to join a and b unused a will be your
delete records (if required) unused b will be your insert
record . joined a and b will be your update records

to view MFS in unix you should run m_expand command
conditional DML can be pass as program variable
conditional components will be used only when condition past
to the graph is true.

Q. What is meant header and tailer, suppose header and tailer had some junk data how will delete junk

data ? which components r used?
A. 1. If you know the signature of header and tailer record
then use filerby expression component to filter the header
and tailer records

2. Use one reformate component and then inside the
transformation use next_in_sequence() function to assign
unique numbers to each record,and then use filter by
expression component to filter the records based on
sequence numbers.

3.Follow the step 2 and use instead of filter by expression
component use leading records component to filter the
header and tailer records.

Q. I had 10,000 records r there i loded today 4000 records, i need load to 4001 - 10,000 next day how

is in Type 1 and how is it on type 2?
A. simply take a reformat component and then put
next_in_sequence()> 4000 in select parameter.

Q. what are the steps in actual ab initio graph processing including general,pre and post process

settings?
A. 1. Start script
2. Graph components.
3.End script

Q. What is air_project_parameters and air_sandbox_overrides? what is the relation between them?
A. .air-project-parameters
Contains the parameter definitions of all the parameters
within a sandbox. This file is maintained by the GDE and
the Ab Initio environment scripts.

.air-sandbox-overrides
This file exists only if you are using version 1.11 or a
later version of the GDE. It contains the user's private
values for any parameters in .air-project-parameters that
have the Private Value flag set. It has the same format as
the .air-project-parameters file.

When you edit a value (in GDE) for a parameter that has the
Private Value flag checked, the value is stored in the .air-
sandbox-overrides file rather than the .air-project-
parameters file.

Q. In Join component which record will go to unused port and which will go to reject port ?
A. In case of inner-join all the records not matching the key
specified goes to the respective unused ports, in full
outer-join none of the records goes to the unused ports. In
case of reject port, records which do not match with DML
come to the reject port.
OR
In case of inner-join all the records not matching the key
specified goes to the respective unused ports, in full
outer-join none of the records goes to the unused ports.
All the records which evaluates to NULL during joiin
transformation will go into reject port if the limit +
ramp*number_of_input_records_so_far <
number_of_input_records_so_far.

Q. wt is meant by repartioning in howmany ways it can be done?
A. Repartitioning means changing one or both of the following:
1) The degree of parallelism of partitioned data
2) The grouping of records within the partitions of
partitioned data

Q. How to Create Surrogate Key using Ab Initio?
A. There r many ways to create Surrogatekey but it depends on your business logic. here u can try these

ways...

1. use next_in_sequence() function in your transform.

2.use Assign key values component (if ur gde is higher than 1.10)

3.write a stored proc to this and call this stor proc wherever u need
Q. What is semi-join?
A. In abinitio,there are 3 types of join 1.inner join. 2.outer join and 3.semi join. for

inner join 'record_requiredn' parameter is true for all in ports. for outer join it is false for

all the in ports. if you want the semi join you put 'record_required' as true for the required

component and false for other components. 
Q. How will you ensure that the components created in one version do not malfunction/cease functioning

in other version?
A. Runtime behaviour of components will remain same in all versions unless its requires to have any

additional paramter to be defined in any version. Evolution of new version of ETL comes with some

changes in component level parameters (observation as of now).
or
Components should be compatibile to run in previous versions of GDE. The depreciated components would

run in new versions.
Q. What data modelling do you follow while loading of data to tables? Also the DB you are inserting the

data has Star schema or Snow flake schema?
A.
Q. How does force_error function work ? If we set never abort in reformat , will force_error stop the

graph or will it continue to process the next set of records ?
A. Here you can set the two conditions for the reformat component
1. If you want to fail set the reject thresold to fail on first reject
2. If don't want to fail you set never to abort.

Force_error is used to abort any graph if the conditions are not met and you write the error errors

records in file and then abort the graphs this can done in different ways.
Or
force_error() fuction will not stop the graph it will write the error message to the error port for

that record and will process the next record.
Q. Phase verses Checkpoint?
A. Phase is breaking the graph into different block. It create some temp file while running and deletes

it once the completion is done.

Checkpoint is used for recovery purpose. when the graph is interrupted instead of rerunning the graph

from the start. the excution starts from the stop where it is stopeed.
Q. what is the function of XFR in abinitio? It would be great if one of you can explain me in brief

what is the function of xfr (like what does it do ,where is it stored ,how does it affect )?
A. As you know when you create a new sandbox in ab initio environment the following directories will be

created
1.mp
2.dml
3.xfr
4.db
etc etc.

xfr is directory in abinitio where we can write our own function and use them during the tranformation

(rollup , reformat etc..).

example you can write a function to convert a string into decimal or to get string max length , I can

write that function in a file called user_define_function.xfr in xfr directory inside this file i

can define a function called string_to_interger or get_string_max_length or both. In any transform

component you can include the file liek
include "<full path>/user_define_function.xfr "

you can called the function like anyother function in ab initio.
Q. What is the difference between the flows of 3 parallelisms?
A. Parallelism's are of 3 types:
1. Component Parallelism: All program components runnings simultaneously on different data sets.
2. Pipeline Parallelism: All program components runnings simultaneously on same data sets. we can break

the pipeline parallelism using all sort based components.

Ex: sort sort within groups AGG Rollup Join etc.
3. Data Parallelism: Distributes data records into multiple locations using partition components.
Q. How can I calculate the total memory requirement of a graph?
A. You can roughly calculate memory requirement as:

1. Each partition of a component uses:
~ 7 MB + max-core (if any)

2. Add size of lookup files used in phase (if multiple components use same lookup only count it once)

3. Multiply by degree of parallelism. Add up all components in a phase; that is how much memory is used

in that phase.

4. (Total memory requirement of a graph) > (the largest-memory phase in the graph).

Q. How can I achieve cummulative sumary in AB Initio other than using SCAN component. Is there any

inbuilt function available for that?
A. Scan is really the most simple way to achieve this. Another way is to use a ROLLUP since it is a

multistage component. You need to put the ROLLUP component into multistage format and write the

intermediate results to a temp array (I think they're called vectors in AI). The ROLLUP loops through

each record in your defined group.

Let's say you want to get intermediate results by date. You sort your data by {ID; DATE} first. Then

ROLLUP by {ID}. The ROLLUP will execute it's transformation for each record per ID. So store your

results in a temp vector which will need to be initialized to be the size of your largest group. Each

time the ROLLUP enters the tranformation write to the [i] position in the array and increment i each

time. As long as this is all done in the "rollup" transformation and not the "finalize" transformation

it will run the "initialize" portion before it moves to the next ID.

I have done it this way but the Scan is easier. I was doing a more simple rollup before I found that I

needed cumulative intermediate results so I just modified my existing ROLLUP. Ab Initio documentation

does not explain this technique in detail but it can be done.
or
There are three ways
1) You can use Scan with rollup component
2) Use Rollup component
3) You can also use Scan followed by Dedup sort and select the last record. That will solve the purpose
or
Other then scan we can use rollup to do the cumulative summary.
Or
Use in built componenet in Abinitio .. "SCANWITHROLLUP"
Q. I have file containing 5 unique rows and I am passing them through SORT component using null key and

and passing output of SORT to Dedup sort. What will happen, what will be the output.?
A. If there is no key used in the sort component while using the dedup sort the output depends on the

keep parameter.
If its set to firt then the output would have only the first record
if its set to last the output would have the last record
if its set to unique_only then there would be no records in the output file.
Q. Can we process 1 GB data(1 million records) by using Lookup? How?
A. I think it is not adviseable to use a 1GB lookup file it will definitely effect the parallel

processing of other applications and affect the performance.

I would prefer to use the MFS lookup file and not serial lookup file in this case.

Q. If I have 2 files containing field file1(A,B,C) and file2(A,B,D), if we partition both the files on

key A using partition by key and pass the output to join component, if the join key is (A,B) will it

join or not and WHY?
A.

Q. In my sandbox i am having 10 graphs, i checked-in those graphs into EME. Again i checked-out the

graph and i do the modifications, i found out the modifications was wrong. what i have to do if i want

to get the original graph..?
A.
How do I create subgraphs in Ab Initio?

Q.What is a sandbox?
A. Sandbox is a directory structure of which each directory level is assigned a variable name, is used

to manage check-in and checkout of repository based objects such as graphs.

fin -------> top level directory ( $AI_PROJECT )
|
|---- dml -------> second level directory ( $AI_DML )
|
|----- xfr -------> second level directory ( $AI_XfR )
|
|----- run --------> second level directory ( $AI_RUN )
|

You'll require a sandbox when you use EME (repository s/w) to maintain release control.

Within EME for the same project an identical structure will exist.

The above-mentioned structure will exist under the os (eg unix), for instance for the project called

fin, and is usually name of the top-level directory.

In EME, a similar structure will exist for the project: fin.

When you checkout or check-in a whole project or an object belonging to a project, the information is

exchanged between these two structures.

For instance, if you checkout a dml called fin.dml for the project called fin, you need a sandbox with

the same structure as the EME project called fin. Once you've created that, as shown above, fin.dml or

a copy of it will come out from EME and be placed in the dml directory of your sandbox.

Q. I have a job that will do the following: ftps files from remote server; reformat data in those files

and updates the database; deletes the temporary files. How do we trap errors generated by Ab Initio

when an ftp fails? If I have to re-run / re-start a graph again, what are the points to be considered?

does *.rec file have anything to do with it?

A. AbInitio has very good restartability and recovery features built into it. In Your situation you can

do the tasks you mentioned in one graph with phase breaks.

FTP in phase 1 and your transaformation in next phase and then DB update in another pahse (This is just

an example this may not best of doing it as best design depends on various other factors)

If the graph fails during FTP then your graph fails in Phase 0, you can restart the graph, if your

graph fails in Phase 1 then AB_JOB.rec file exists and when you restart your graph you would see a

message saying recovery file exists, do you want to start your graph from last successful check point

or restart from begining. Same thing if it fails in Phase 2.

Phases are expensive from Disk I/O perspective, so have to be careful in doing too much phasing.

Coming back to error trapping each component has reject, error, log ports, reject captures rejected

records, error captures corresponding error and log captures the execution statistics of the component.

You can control reject status of each component by setting reject threshold to either "Never Abort",

"Abort on first reject" or setting "ramp/limit"

Recovery files keep tack of crucial information for recovering the graph from failed status, which node

the component is executing on etc. It is a bad idea to just remove the *.rec files, you always want to

rollback the recovery fils cleanly so that temporary files created during graph execution won't hang

around and occupy disk space and create issues.

always use m_rollback –d

Q. What is Ad hoc multifile? How is it used?
A. Here is a description of Ad hoc multifile:

Ad hoc multifiles treat several serial files having the same record format as a single graph component.

Frequently, the input of a graph consists of a set of serial files, all of which have to be processed

as a unit. An Ad hoc multifile is a multifile created 'on the fly' out of a set of serial files,

without needing to define a multifile system to contain it. This enables you to represent the needed

set of serial files with a single input file component in the graph. Moreover, the set of files used by

the component can be determined at runtime. This lets the user customize which set of files the graph

uses as input without having to change the graph itself, even after it goes into production.

Ad hoc multifiles can be used as output, intermediate, and lookup files as well as input files.

The simplest way to define an Ad hoc multifile is to list the files explicitly as follows:

1. Insert an input file component in your graph.
2. Open the properties dialog. Select Description tab.
3. Select Partitions in the Data Location of the Description tab
4. Click Edit to open the Define multifile Partitions dialog box.
5. Click New and enter the first file name. Click New again and enter the second file name and so on.
6. Click OK.

If you have added 'n' files, then the input file now acts something like a file in a n-way multifile

system, whose data partitions are the n files you listed. It is possible for components to run in the

layout of the input file component. However, there is no way to run commands such as m_ls or m_dump on

the files, because they do not comprise a real multifile system.

There are other ways than listing the input files explicitly in an Ad hoc multifile.

1. Listing files using wildcards - If the input file names have a common pattern then you can use a

wild card for all the files. E.g. $AI_SERIAL/ad_hoc_input_*.dat. All the files that are found at the

runtime matching the wild card pattern will be taken for the Ad hoc multifile.

2. Listing files in a variable. You can create a runtime parameter for the graph and inside the

parameter you can list all the files separated by spaces.

3. Listing files using a command - E.g. $(ls $AI_SERIAL/ad_hoc_input_*.dat), which produces the list of

files to be used for the ad hoc multifile. This method gives maximum flexibility in choosing the input

files, since you can use complex commands also that involves owner of file or date time stamp.

Q. What is the difference between Replicate and Broadcast?
A. Broadcast and Replicate are similar components but generally Replicate is used to increase Component

Parallelism, emitting multiple straight flows to seperate pipelines. Broadcast is used to increase data

parallelism by feeding records to fan-out or all-to-all flows.
Or
Replicate is old component when compared to broadcast. You can use Broadcast as join component, where

as Replicate you can't use as join. By Default, Replicate is Straight flow and Broadcast is fan-out or

All-To-All Flow.
Broadcast is used for Data Parallism whereas Replicate is used for Component Parallesim.
Or
Replicate

Supports component parallelism

Input File -------> Replicate --------> Format ---->Output File
|
|
|
--------->Rollup-------> output File

Broadcast

Supports data parallelism

Input File1 (MF) -----------------> JOIN -----------> Output File
^
|
|
Input File 2(Serial)---> Broadcast -->

Input File2 is a serial file and it is being joined with a mf, input file2, without being partitioned.

The compoment, Broadcast, is writing data to all partitions of Input file1, creating an implicit fan

out flow.
Or
The short answer is that the Replicate copies a flow while a Broadcast multiplies it. Broadcast is a

partitioner where Replicate is a simple flow-copy mechanism.

Replicate appears in over 90% of all AI graphs (across the board of all implementations worldwide)

where Broadcast appears in less than 1% of all graphs.

You won't see any difference in the two until you start using data-parallel, then it will go south

rather quickly. Here's an experiment:

Use a simple serial input file, followed by a broadcast, then a 4-way multifile output file component.

If you run the graph with say, 100 records from the input file, it will create 400 records in the

output file - 100 records for each flow partition encountered.

If you had used a Replicate, it would have read and written 100 records.

Hi Just went through 8 ab initio interviews and some of the tough
questions were as follows.

1.What is the function you would use to transfer a string into a decimal.?

2.How many parallelisms in ab initio and a definition of the three. ?

3.What is the difference between db config and a cfg file?

4.Have you eveer encountered an error called depth not equal (this
apparently occurs when you extensively create graphs.....kinda a trick
question)?

5.How do you truncate a table.....each candidate would say only 1 of the
several ways to do this. ?

6.How do you improve the performance of a graph?

7.Whats the difference between partitioning with key and round robin?

8.Have you worked with packages?

9.How do you add default rules in transformer?

10.What is a ramp limit

11.Have you used rollup component ....describe?

12.How many components in your most complicated graph?

13.Do you know what a local lookup is?

Latest Features in Ab Initio - 2.14
Dynamic Script Generation is the latest buzz in Ab Initio world and one of it’s finest. It comes with

lots of other advantages which were not there in earlier versions of Ab Initio Co>Operating System. Now

it is available in Co>Operating System version 2.14.46 and
above.
This feature typically enables the use of Ab Initio PDL (Parameter Definition Language) and Component

Folding.
Now if we enable this feature by changing the script generation method to Dynamic in Run Settings we

will be able to run a graph without deploying it through GDE. From now onwards we will execute the mp

file only; there is no need to have the ksh. In production server once we run the mp file using air

sandbox run command on the fly it generates a reduced script, which contains the commands to set up the

host environment. It doesn’t include component details of the graph at all.
You can check the mp file of dynamic script generation enabled graph. It is an editable text file.
Component Folding: It is a feature by which Co>Operating system combines group of components and runs

them as a single process. Now question - Does it improve the performance? Yes, in most of the cases it

will bring a significant performance boost over the traditional approach of execution.

Prerequisites of Component Folding:
• The components must be foldable • They must be in same phase and layout • Components must be

connected via a straight flow.

How it works (Advantages):
1. When this is enabled by checking the folding option in Run Setting, Co>Operating System runtime

folds all the processes (foldable components) in a single process. As a result number of processes is

reduced when a graph executes. Every process has overheads of creation of new process, scheduling,

memory consumption etc. These overheads will vary from OS to OS. In some OS like MVS, creation and

maintenance of processes are very costly compared to different flavors of UNIX.
2. Another major benefit of component folding is the reduction of interpretation time for the DML

between processes. Because it will end up with multitool folded processes communicating with other

multitool or unitool.
3. Apart from that increase in number of processes results higher interprocess communication. Data

movement between two or more processes will not only consume time but memory too. In CFG (Continuous

Flow Graph) interprocess communication is always very high. So it is worth enabling Component folding

in a CFG.
Disadvantages of Component Folding:
1. Pipeline Parallelism: As component folding folds different component in a single process it will

hurt the pipeline parallelism of Ab Initio. If flow of our graph is like - Input File -> Filter By

Expression -> Reformat -> Output File. In traditional method by the help of Pipeline Parallelism FBE

and Reformat will execute concurrently. But now these two components are folded together so there is no

chance of parallel execution.
2. Address Space: In a 32 bit OS maximum limit of Address space for process is 4 GB. So if we combine 4

different components to a single process by component folding OS will allow only 4 GB of address space

for all 4 instead of 4X4 total 16 GB of spaces. So we should avert component folding components where

memory use is very high as in-memory Rollup, Join, and Reformat with lookup. Some components like Sort,

in-memory Join causes internal buffering of data. Combing them in a single process will result writing

to disk (Higher IO).

Set AB_MULTITOOL_MAXCORE variable to limit the maximum allowable memory for the folded component group.
Excluding any component from Component Folding:
I know sometime you would wish to prevent components to be folded to allow pipeline parallelism or to

access more address space. Then you need to exclude some components from being folded.
Set AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES configuration variable to space separated mpname of the

components in your $HOME/.abinitiorc or system wide $AB_HOME/config/abinitiorc file. e.g. export

AB_FOLD_COMPONENTS_EXCLUDE_MPNAMES= hash-rollup reformat-transform
In other way to prevent two different components from getting folded together right click on the flow

between them and uncheck the Allow Component Folding option.
Everything has its cost. So it is always worth benchmarking before taking a decision. Prevent and allow

component folding for your components of the graph, tune it for the highest performance.
CPU tracking report of folded components in a graph:
To report the execution detail of folded graph on console we need to override the AB_REPORT variable

with show-folding option as – AB_REPORT=”show-folding flows times interval=180 scroll=true spillage

totals file-percentages”.
The folded components are displayed as multitool process in CPU tracking information. The CPU time for

a folded component is shown twice one for the component itself once as a multitool component.
Parameter Definition Language (PDL):
PDL is used to put logic for inline computation in parameter value. It provides high flexibility in

terms of interpretation. It supports both $ and ${} substitution. For this you need to set the

interpretation PDL and write the DML expression within $[ ]. This approach is much faster than

traditional shell scripting. It is the way to move forward to a much flexible and robust technique of

designing. With the use of it we can abolish the old shell scripting as script-end and script-start are

already beaten enough to death since last few years. You can use PDL interpretation for condition of a

component.
NOTE. The detail of PDL within the GDE is lacking any consistency. Basically, we can use the majority

of the Ab Initio DML functions. I would recommend looking at the metaprograming section for starters.

Then play with the parameters editor.

e.g.
Suppose in a graph we have a conditional component which runs based on existence of a file called

emp.dat.
Now FILE_NAME parameter is defined as /home/xyz/emp.dat and a conditional parameter called EXIST is

defined as
$[if (file_information($’FILE_NAME’).found) 1 else 0]
We can define a parameter with type and transform function with the help of parameter AB_DML_DEFS.
e.g. Suppose AB_DML_DEFS is defined as
out :: sqrt(in) = begin out :: math_sqrt(in); end;
Now in a parameter called SQRT is defined as $[sqrt (16)]
Resolved value from this parameter will be 4.
Ensure your host run settings are checked for dynamic script generation, and read the 2.14 patchset

notes for a description of any hint.

18-12-10 Syntel(cosider IT done) questions with 2+ exp?

Unix:
1.I hv a file as

a
ab
abc
abcd
abcde..
I want the line of ‘abc’
A) grep

2. I m in a subdirectoery. I wnt list the all files which are in the previous directory with out going

to that directory. i.e from the subdirectory.
A) I said ls –lrt complete path.

DWH concepts: AND Abinitio

1.what is topdown approach and bottom up approach in dw?
2.what r the dimesnsions and how many types:
3.what is confirmed dimension other dimensions were also asked.
4.what is scd. How many types. Hv u heared abt type3, type4.
5.Hv u developed scd2? If so how?
6.I hv two files. They are yesterday file and today file. Today file contains 150 records and 100 of

them are from yesterday file. And yesterdays file contains 100 records only.
How do u filter the 50 records with out using join?
7.If u give unsorted i/ps to join and use inmemory option then the o/p is in sortedorder or unsorted

order.
8.I hv 100 records in a multifile. I wnt to distribute these records into 2 o/p files. If I use

Replicate ang Broadcast the result is same or different. If different, how many records the op files

contains in both cases?
9.How do u stripout header and trailers if there are no indicators.
10.what r the db components did u use?
11. what do u mean by parallel unloading?
12. whether u use tab layout or parallel hit in parallel unloading?
13. Why do u need ABLOCAL() construct.
14. what is conventional loading and direct loading in db?
15.Hv u used psets or PDL or .ksh for runnig graphs ?
16.Hv u heared by conditional components?
17.I hv given a condition in a component in phase 0 which is checked in the component that is in

phase1.
Is it possible to run the component of phase 1?
18. How do u count no.of records in a file? Which component do u use for that?
I said Rollup.
They asked What do u do in rollup?
22-12-10 3:00PM-3:30 PM capgemini(Pune)

1. Tell me a little bit of urself and ur project?
2. How many graphs hv developed so far?
3. What are the components hv used in ur graphs?
4. I hv a table which contains a numeric column. I wnt to filter these records based on conditions as
>40 one file
<40 one file
>60 one file

A) I answered “Part by exp”

They asked another componet?
I ans: Reformat
What will u do in reformat?
Ans: I use output_index for that.
What Is output_index?
Ans: When u want to send a singe record to the single output port we use output index.
5. I wnt to distribute my records in 80:20 ratio?
A) part by percentage.
Q) Alternative? BcZ I dnt wnt partition the data. I wnt just filter?
A) no idea.
6. Hv u used rollup?
What are the 2 phases of rollup?
I hv 2 reocrds in a file as follows.

Sid subname Marks
100 English 90
100 Maths 90

My key is Sid and the I write sum(Marks) fun then what is the o/p?
Ans: 100 Maths 180

Why don’t u get 100 English 180
It isn’t possible bcz Rollup gives last record in each and every group.
7. Hv worked with psets?
No
8. How to run a graph from background?
Ans: using Wrapper script.
9. What Is dependency analysis?
10. How do u do dependency ananlysis?
11.What is “Capital one” server name?
12. How did u write scripts in ur capital one project?
Ans: I didn’t write any scripts in my projects.
Q) yes. But u got scripts from server. How could u get?
Ans: I don’t remember.
13.Hv u worked with db components?
Yes
Q) If I don’t give any update statements to my update table , will it run?
Ans: Yes. It will run. If don’t provide any i/ps to ur update table the statements that are in the

‘insert sql file’ are executed.
14. I hv an i/p file . I wnt to load the the i/p file into 2 different db tables?
How is the graph?
Ans: i.pfile Replicate----oracle o/p table
--- db2 o/p table
q) is there any thing in between with in the components?
Nothing.
Q) wt abt dbc file for the 2 tables.
A) the dbc files different for the two tables depending on the db u r using.
15.What is .dbc file?
16. I hv 2 files with only one column as
1) 1,2,3
2)1,1,2
If I perform inner join, Wt is the o/p?
A) 1,1.1,2,2
17. what is .abinitiorc file?
Ans; It contains configuration info. i.e when a machine wnts to connect another machine it must contain

.abinitiorc file.
18. what are the variables it contains?
Ans: AB_HOME etc.
Q) What AB_HOME contains:
Ans: The path of the host machine..
Q) Host means?
Ans: The machine where the co>op is installed.
19. What is AB_WORK_DIR contains?
Ans: It is temporary work directory.
20. What is AB_DATA_DIR ?
Ans: It is a temporary data dir.
21. When I m runnig the graph I got an error as “no space ……” .What it means?
Ans: That means ur temporary directory full.
Q) How can u solve the problem?
Ans: I delete unnecessary files in temporary directory and then run the graph.
Q) How can u delete temporary files.
A) using m_rollback or m_cleanup.
Q) what is the difference?
A) m_rollback rollbacks a partially completed graph into previous state.
m-cleanup cleansup the files left over from unsuccessful completion of the graph.
22. What is .rec file?
Ans: It is recovery file it contains temporary data while running the graph.

UNIX:

1. Hv u involved in scripting?
Ans: Yes.
2. Write a script for the foll scenario.
I hv two files. If first file biggerthan the second print first file is greater.
Else second is greater else both are equal.
Write a script for that.
Ans: lof1=wc –c filename1
Lof2 = wc –c filename2.
If(lof1-gt lof2)
Echo “ first file bigger”
Else if (lof1 –lt lof2)
Echo “ second is greater”
Else echo “ both are equal”.

Q) It is not good . Bcz we don’t measure the file size in characters. That is always not correct.
Can u say any alternative?
Ans; no
2. I hv different files in my directory. Among that I wnt to print the files those are of size 0

bytes.
Which command do u use?
M_touch command

1.What is a layout?
2.What is a snowflack schema? Give me an example.
3. Explain cdc graph?
Which type of join do u use in cdc?
A) innerjoin.
3. Diff between uniqe key and primary key ?
4.Have you worked on psets?tell me about them?
5.Havu u worked on PDL? Explain pdl
6.Can U Overwrite Sandbox Parameters?
7.What is .abinitiorc and What it contains?
8.Have you worked on denormilze component?
9.Have u worked on Vectors?
10.How can u creat e vectors?
11.Have U worked on Normalize component?
12.What is SCD TYPE2?explain with an example and explain graph?
13.What is a phase?Check point and explain diff?
14.What is a .rec file?how do we recover from it?
15.Have u used join with DB?
16.Have u used air commands?
17.I hv checked out an object. How can u remove?
18.I hv 5 records. I wnt to make 100 records from the 5 records for testing purpose. How can u create?
Ans: I use generate records component,.
Q) I don’t want to use generate records component. Any alternative.
A) Broad cast
Q) any other:
A) replicate.
Q) any other.
A) I hv no idea.
19.What is the difference between is_valid() and is_defined()?
20.I hv one i/p file having 100 records then I wnt to retrieve records form 50 to 70 ?
A) I use fileter by expression as next_in_sequence()>=50 and next_in_sequence()<71
Q) can u say any command?
A)m_dump .dml .dat –start 50 –end 70
21.How can u create multifile?
A)m_mkfs control URL datapartition URL
22.what is control url?
A) It contains the urls of the control partitioins.
Q) How can we see the partitions or no.of partitioins?
A) m_expand.
UNIX:
1. hv a file with 3 columns (cid,cname,csal) I wnt csal field in a file with sorted order.
A) First I sort the file on 3 field i.e.. csal. And I cut the 3 field
Sort -3k filename|cut –f 3 > filename.
2.I wnt first record and last record from a file.
A) sed –n 1,$p filename.
DB:
1)I hv a table with duplicate records. I wnt to see duplicates and I wnt remove duplicates.
Write the query for that.
A) I didn’t say
2) I hv a table and I wnt create the same table with same records.
A) create table tablename as select * from table;

1. capgemini
2. Tel me abt your self?
3. How do you mean the Performing tuning any issues how will you recover?
4. wt is EME version?
5. How many parallelisms supports? Are u using Data parallelism what purpose?
6. what is partion and Departing?
7. One file is 100 records ie .i/p----Reformat--o/p I make reformat is Never abort to make the

reject threshold is>=99 my graph is fail? How can I make my graph fail?
8. Diff b/w the $ and $ with{} parameter evaluation?
9. m_command use
1gb file 200 to 300 records in a file what is syntax?
10. One file as 100 record( empid,ename,sal……..) now I want delete duplicates using Rollup wt is

Transformer?
11. what is Override and Overidenkey() in join?
12. which component produce intermediate summary records?
13. In using Surrogate key wt component?
14. I have one file 4 records I hv one transaction record i hv specified Dml Transaction record the

data how can I transformed to separate diff 4 records
Using record is inline deliminator?
15. wt r u r challenges in your project?
16. How can I multiple instances given from command prompt ?(10 instances is a graph)
17. How can I make conditional graph?
18. Ex:2 file indicator and production how can I make graph conditional?
19. Diff b/w the Api & utility mode?
20. wt is Sequential of avalutation when u run a graph?
21. Diff b/w the m_rollback and m_cleanup?
22. 1 Gb file 80400 replace them to 80500 how can I chang 1 GB file in unix?
23. 1 to 90 th Date I want to delete older than 90 th Date how can I delete?
24. I hv i/p file 100 records I wn’t to disply o/p Add / Even recods how?
25. I hv i/p file I want to display separate Header Body Tailor in o/p file?
26. i/p file 3 records in one table next file is 6 rec using the join key is null how many o/p

records?

Mahendra Satyam
1 . lookup how much space is occupy?
2. Wt is project and sandbox different?
3. how you know Gde Co>op versions?
4. wt is private public common projects?
5. hw many components in ur package editir?
6. wt is pset?
7. what is commit in o/p table?
8. wt is layout in i/p table?
9. who is your end users?
10. what are all conditional components?
11. wt is the conditional dml?
12. how to check in the graph?
13. hw many editors in the Gde?
14. wt is Migration?
15. wt is performance tuning?
16. hw to translate the 3 way to 5way sys?
17. in day 1 some work and 2 nd day is some work wt happen in execution time is

another day?
18. wt is shell commands?

1. What is adhoc multifiles?
2. How can you SORT an already partitioned (round-robin) data ?
3. How internally partition by key decides which key to send in which partition ?
4. What is PDL ? Give him a shell type parameter and ask him to convert it to PDL.
5. As shell type parameters are not supported by EME, then how you can use shell type parameter (If you

don't want to use PDL) without hampering lineage diagram ?
6.How you can generate dml from a COBOL copy book ?
7. How you can convert from ebcdic to packed decimal ?
8.What is regex (lookup)? When you should use it?
9. Why creation of temporary files depends on the value of MAX CORE ?
10. What is the diff between abinitiorc and .abinitiorc files ?
11. What is the use of allocate()?
12. What is use of branch in EME ?
13. How you can break a lock in EME ? How can you lock a file so that only no one other than EME admin

can break it ?
14. When you should be using ablocal() ? How you can use ablocal_expr?
15. Why you should not keep the layout as 'default' for input table component ?
16. What is dynamic lookup ?
17. What is dependent parameter ?
18. What is BRE ? (Business Rule Environment - This is a recent addition in abinitio package)
19.What is output index ?
20. How you can track the records those are not getting selected from ‘select’ in reformat component ?
21. Can we have more than one launcher process for a particular graph ? How about agent ?
22. There are lot of new fuctions added in 2.15 , you can ask about them ?
23. How can you run multiple instances of a graph in parallel?
24. In which scenario, .rec files will not get created even if graph fails ?
25. What is diff between force_error & force_abort?
26.What is the best way of creating huge test feeds ?
27. Can you read multiple input files using one input file component ? (Same DML)
28.What is diff between API & Utility mode ? Why we need to use API mode when Utility mode had both t
2. 29. What is the significance of record required indicator of Join component ?
30. What is flow buffering ? How it reduces the chances of having deadlock ?
31. When we should use 'jobid' for commit table (Output table component??
32.Layout - L1*, L1 - What is the difference ?
33. What is standard environment ?
34. How we can remove temp files of a failed job if .rec is not available ?
35. why Skew should always be 0 ?
36.You can ask about different vector functions.
37. What mpjret contains ?
38. Can you use scan to generate sequence number ?
39. How you can generate surrogate key?
40. You can ask about - meta pivot, leading records, read & write multiple files component ?
41. What is meta programming ? Can you generate xfr dynamically?
42. How you can delete an object from EME data store ?
43. How you can delete an object from a tag ?
44. If you check in one file twice in EME, one with a tag and second without a tag - Will the version

number change ?
43. What is use of creating a save files?
44. What is air-project parameter ?
45, Can you run air sandbox run using pset ?
46.How you can create cross joined output using join component ?
47.What is the default layout of watcher files ?
48.Why you get 'too many open files' error ?
49.What is the significance of vnode folder under AB_WORK_DIR ?
50, What is the significance AB_AIR_BRANCH?
51. How can you select only unique records from a set of records ?
52. How next_in_sequence work in parallel layout ?
53. Why you want to set a modest number in limit ?
54. How you can encrypt a password and use it in dbc file?
55. Which function you should be using if you do not want to process non printable char from a feed

file ?
56. How you can run a component for certain conditions ?
3. 56. What is catalog and when you should use it ?
57. How you can use reformat as a router ?
58. How many process gets created for a n-way parallel component ?
59. Why delimited DMLs take more time to process than fixed length DMLs?
60. How can you select only first and last record from a file without using next_in_sequence?
61.Why you should always keep largest input as driving input for Join ?
62. What a catalog file contains ?
63. What Is private project and public project ?
64. Why you should not use checkpoint after replicate ?
65. Broadcast and replicate does the similar work , what’s the diff then ?
66. How can you export a component’s internal parameters ?
67. How you can test a dbc file ?
68. What is the significance of ‘mp run’ command ?
69. How can you run a graph continuously without using continuous components ?
70. How can you get all the fields form a lookup files ? (Which function)

TCS(Chenni) Bhanu prakash 13-01-2010 Thu 4:00 to 4::25
1.Tell me brief abt ur education and ur skills and ur project roles and reponsibilties?
2. What is your current project and ur previous project. Explain me both of them?
3. What are ur roles and responsibiltes in ur project?
4,what are ur daily activities in ur project in ur company?
5.How much efficient u in abinitio ,unix,sql out of 5?
I told as 3,2,2 ½.
6. you involved in only development or in support also?
A) In my 2 yrs of exp I involved in only dev and I hv never got chance to involve in support.
7. From where did u get the source data in ur project. Is it a db or any other?
A) from flat file.
8.what is ur project destination?
A) oracle db.
9.Why are u looking for change?
10. For filtering purpose which components do u prefer?
A) FByExp, Reformat.
Q) Any alt
A) Part by exp.
Q) no.no..no My scenario is I wnt filter the customer records acc to branches and regions?
A) In that cases u can use part by exp if u don’t want to use part by exp use reformat with output

index /indexes
Q) Any alt?
A) no idea.
11) Hv u worked with join?
A) y
12) when u use explicit join. Which parameters can u see and what is the behavour of exdplicit join

give me a brief description?
13) Did u observe memory constraint in sort?
A) y maxcore param
14) fine if I give the data to the sort that exceeds maxcore value then what is the behavior of the

graph?
A) graph fails.
15. if ur graph is runnig slowly which performance techniques do u use for improve performance?
16.A graph was developed in abintio the how do u get that graph?
A) By check out
17. how can u check out?
A) from GDE check out wizard.
Q) what do u give in check out wizard
A) path
Q)to where do u check out?
A) EME.
18) u hv developed 1 graph 10 days back and checked in to eme. Today u want to check in the graph but

that was modified by others. At that time what do u observe while checking in.
A) we will be prompted with a msg that shows us the version was changed.
19)While u r checking in the graph what do u observe?
A) dependency analysis
20) After that what do u do?
A) I modified the graph.
21) Before modifying what do u do?
A) I lock the graph.
22) what is background operation when the graph is locked?
A) it prevents the other users with out modifying the graph.,
23. list some air commands?
A) air project import/export, air object ls/rm
24) hv u heard abt “ air dump” command.
A) I hv never used air dump I don’t know abt it.
25)in performance issues what is the good component?
A) pby RR
26) What r the different types of departition methods?
A) concatenate,gather,merger,interleave.
Q) Explain the differences and behavior of them?
27) I hv a file that contains duplicate records and unique records then I wnt to get only unique record

what is the ab graph?
A) use dedup sorted
Q) what do u do in that?
A) keep unique.
28)How can u create a mfs?
A)m_mkfs
29)I hv today date in graph param and jan1 in sandbox param.
I wnt to get today date while I m runnig the graph?
How>
A) We hv to overwrite the sandbox param.
Q) I don’t want that ans. If I run the graph what do I get whether today date or jan1.
A) jan1.
(silence for 10 secs………….)
30 ) what do u mean by abinitorc file?
31)I hv a file as

C1 t1 1
C1 t2 2
C1 t3 3
C2 t1 1
C2 t2 2
C2 t3 3
. . .
. . .
And so on.
I wnt to get the sum of amts along with higest three transactions.

Approximately around 40 questions…… I don’t remember all.

Abinitio Interview Questions

Sunday, 25 September 2016

Interview Questions

3 comments:

Total Pageviews