Each component is divided in 3 parts : begin, main and end. begin and end parts are executed only once in a row flow while the main part can be executed inside a loop.
In the first simple example, here is how the 3 parts of the 3 components are gathered:
tFOX begin (open an output stream) tMap begin (initialize useful constants) tFID begin (open an input stream, start loop, read line by line and extract fields) ---------- tFID main (instantiate main row, copy the data from row1 to main) tMap main (the mapper complex Java generated code) tFOX main (write an XML line, encapsulating each field) ---------- tFID end (stop loop, close the input stream) tMap end (does nothing) tFOX end (write last XML line, close the output stream)
For a row stream, the order rules are:
The trick is that the tFID begin is responsible of the loop start. The loop start is really in the tFID_begin.perljet (or tFID_begin.javajet) template file.
We keep the previous example for the row stream, but this time we add a tFileList (tFL) component linked to the tFID with an iterate link.
tFL begin (open a loop on a file list) ---------- tFOX begin tMap begin tFID begin ---------- tFID main tMap main tFOX main ---------- tFID end tMap end tFOX end ---------- tFL end (close the file list loop)
The begin part of component with the iterate output comes first and the end part of the same component comes last. The main part is not used. In a flow, you can't have a iterate link after a row link. You can have as many iterate then row as you want, but not iterate/row/iterate.
Talend can support another generate order, while many input branches join in a merge node, the case following:
tUnite begin (open a output stream) ---------- tRG begin tRG main (does nothing) tUnite main (output datas) tRG end ----- tFID begin tFID main (does nothing) tUnite main (output datas) tFID end ---------- tUnite end (close the output stream)
The begin part of component with the merge output comes first and the end part of the same component comes last. The main part is inside the loop of every branch of the merge node.
When creating a job by linking several components together, Talend Open Designer isolates a list of subjobs. A subjobs is a list of components linked by a row or iterate links.
Subjobs are linked together with trigger links. In the above screenshot, we have 2 subjobs: the first one is compound of a tRowGenerator and a tLogRow, the second one is the tSendMail.
A component is a directory in plugins/org.talend.designer.core_<version>/components. Each component is made of 6 files:
TOS can support generating codes in two languages: java and perl.
We take the tFileInputRegex as example.
<COMPONENT> <HEADER PLATEFORM="ALL" SERIAL="" VERSION="0.102" STATUS="ALPHA" COMPATIBILITY="ALL" AUTHOR="Talend" RELEASE_DATE="20050320A" STARTABLE="true" SCHEMA_AUTO_PROPAGATE="true" DATA_AUTO_PROPAGATE="false" > <SIGNATURE/> </HEADER> <DOCUMENTATION> <URL/> </DOCUMENTATION>
SCHEMA_AUTO_PROPAGATE=“true” means that if you link your component in input with another component, it will initialize its schema with the previous component schema.
DATA_AUTO_PROPAGATE=“false” means that you don't want TOS to automaticaly copy data from input to current component.
<CONNECTORS> <CONNECTOR CTYPE="FLOW" MAX_INPUT="0"/> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="1"/> <CONNECTOR CTYPE="REFERENCE"/> <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" /> <CONNECTOR CTYPE="COMPONENT_OK" /> <CONNECTOR CTYPE="COMPONENT_ERROR" /> <CONNECTOR CTYPE="RUN_IF" /> </CONNECTORS>
For each connector, you set a CTYPE, a MAX_INPUT and a MAX_OUTPUT. Each component must have the given list of connector. If you want to deactivate the ITERATE connector, set MIN_OUTPUT and MIN_OUTPUT to “0”.
One connector will define the property for each link/connection in the job design. One connector can hold one or several connections for input or output.
TOS provide the enhancement connectors for some special cases, the config for it as the following fragment:
<CONNECTORS> <CONNECTOR CTYPE="FLOW" MAX_OUTPUT="0" MAX_INPUT="1"/> <CONNECTOR NAME="UNIQUE" CTYPE="FLOW" COLOR="086438" BASE_SCHEMA="FLOW" /> <CONNECTOR NAME="DUPLICATE" CTYPE="FLOW" LINE_STYLE="2" COLOR="f36300" BASE_SCHEMA="FLOW" /> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="0"/> <CONNECTOR CTYPE="REFERENCE"/> <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" /> <CONNECTOR CTYPE="COMPONENT_OK" /> <CONNECTOR CTYPE="COMPONENT_ERROR" /> <CONNECTOR CTYPE="RUN_IF" /> </CONNECTORS>
Note: the connector “UNIQUE” and “DUPLICATE” belong to the “FLOW” type. The property BASE_SCHEMA=“FLOW” indicate it.
The property BASE_SCHEMA indicate that the schema will always be based on the FLOW connector. This means there can be several schemas in output, but the main schema will always be FLOW, others will always have the same schemas columns as FLOW, but can add by default some specific columns (See for example for the REJECT connector and SCHEMA_TYPE in any db output component)
if one connector is based on another one, only one SCHEMA_TYPE can be enough. Several SCHEMA_TYPE in this case can be usefull only to have additional columns to another schema. (See for example also with the REJECT connector and SCHEMA_TYPE in any db output component) If one SCHEMA_TYPE is added for a specific connector (beside main one), this one MUST be either hidden or in read only. Only the main schema should allow modification.
In the template file, we can use it like this:
<% List<? extends IConnection> connsUnique = node.getOutgoingConnections("UNIQUE"); List<? extends IConnection> connsDuplicate = node.getOutgoingConnections("DUPLICATE"); %>
As seen above, it's possible to have several connectors based on another, but each connector can be different also. One thing possible here for example:
<CONNECTORS> <CONNECTOR CTYPE="FLOW" MAX_INPUT="0" MAX_OUTPUT="0"/> <CONNECTOR NAME="CONN1" CTYPE="FLOW"/> <CONNECTOR NAME="CONN2" CTYPE="FLOW"/> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="1" MAX_INPUT="0"/> <CONNECTOR CTYPE="REFERENCE"/> <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" /> <CONNECTOR CTYPE="COMPONENT_OK" /> <CONNECTOR CTYPE="COMPONENT_ERROR" /> <CONNECTOR CTYPE="RUN_IF" /> </CONNECTORS>
Note that if use this system, actually it needs first one “blank” FLOW type. This one help for schema propagation. (the use of this blank connector could be removed in a future version) So here we got 2 connectors of type FLOW, but each connector can hold different schema. Here there can be several connections for each schema, but it's recommanded to set one output connection maximum for each connector.
In the template file, we can use it like this:
<% List<? extends IConnection> connsUnique = node.getOutgoingConnections("CONN1"); List<? extends IConnection> connsDuplicate = node.getOutgoingConnections("CONN2"); %>
First one way, but quite specific for the tMap:
<CONNECTOR BUILTIN="true" CTYPE="FLOW" MIN_INPUT="1" MIN_OUTPUT="1"/>
If BUILTIN is set to true here, this means this connector will hold several schemas on the same connector. And after there will be one connection for each schema.
Note that this type is only usefull for external components, means components setup in another plugin. This BUILTIN type shouldn't be setup for standard components.
Standard components should use instead MULTI_SCHEMA. For example in the tFileInputEBCDIC (in version 3.1)
<CONNECTORS> <CONNECTOR MULTI_SCHEMA="true" CTYPE="FLOW" MAX_INPUT="0" MIN_OUTPUT="1" /> <CONNECTOR CTYPE="ITERATE" MAX_OUTPUT="0" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_OK" MAX_INPUT="1" /> <CONNECTOR CTYPE="SUBJOB_ERROR" MAX_INPUT="1" /> <CONNECTOR CTYPE="COMPONENT_OK" /> <CONNECTOR CTYPE="COMPONENT_ERROR" /> <CONNECTOR CTYPE="RUN_IF" /> </CONNECTORS>
Don't forget that this system will have several (any number) of connection, and that each connection will hold different schema !
To deal with this connector, the simple SCHEMA_TYPE parameter can't be used, as there will be a dynamic number of schema in the component. For this it will need to use one table, like for example:
<PARAMETER NAME="SCHEMAS" FIELD="TABLE" NUM_ROW="2" NB_LINES="6"> <ITEMS> <ITEM NAME="SCHEMA" FIELD="SCHEMA_TYPE" /> <ITEM NAME="CODE" FIELD="TEXT" /> </ITEMS> </PARAMETER>
In this table we can see one field is SCHEMA_TYPE, this means each line of this table will hold a different schema. So there will be one line for any schema or connection when use this component.
In the template file, we can use it like this to get the connections (standard function in fact):
<% List<? extends IConnection> conns = node.getOutgoingSortedConnections(); %>
Later in the code, for each connection you can get the schema, and the specific name given to this schema, for example:
for (int i=0;i<conns.size();i++) { IConnection conn = conns.get(i); List<Map<String, String>> schemas = (List<Map<String,String>>)ElementParameterParser.getObjectValue(node, "__SCHEMAS__"); for(Map<String, String> schemaMap : schemas){ if(schemaMap.get("SCHEMA").equals(conn.getMetadataTable().getLabel())){ /* Line in the table has been found, so we can set specific code for this connection. This means that we can use the field CODE for example, which is specific for this line. */ } }
Remembers that there is no way to use directly for example : node.getOutgoingConnections(“CONN1”); Simply because all connections belongs to one connector.
Mixed between standard mode and Multi Schema mode has not been really tested, and is not recommanded ! Mix the two mode is not really usefull for the components.
Then we come to parameter definitions. Each components has 0 to many parameters. Each parameter is a property in the graphical user interface (GUI).
Let's see some examples:
<PARAMETER NAME="PROPERTY" FIELD="PROPERTY_TYPE" SHOW="true" NUM_ROW="1" > <DEFAULT/> </PARAMETER>
The NAME attribute of the parameter is its identifier. We'll use this identifier to set its label in component.properties file and to retrieve the value in the templates.
The FIELD attribute is the type of parameter. The PROPERTY_TYPE field type is described later in this page.
The NUM_ROW attribute is the line number where the property will appear in the GUI.
<PARAMETER NAME="FILENAME" FIELD="FILE" NUM_ROW="2" REQUIRED="true" > <DEFAULT>"__COMP_DEFAULT_FILE_DIR__/in.csv"</DEFAULT> </PARAMETER>
Next parameter is of FILE type. It will show a text field with a browse button making it easier to choose a file in your local directory tree.
Here we have a default value. This default value will be displayed in the text field at the component drop on the job.
Here is the list of available parameter types:
<PARAMETER NAME="REMOVE_EMPTY_ROW" FIELD="CHECK" REQUIRED="true" NUM_ROW="7" > <DEFAULT>true</DEFAULT> </PARAMETER>
the CHECK parameter type will create a checkbox in the GUI. If you want it checked by default, use “true” as default value.
<PARAMETER NAME="OUTPUT" FIELD="CLOSED_LIST" NUM_ROW="2" > <ITEMS DEFAULT="OUTPUT_TO_CONSOLE"> <ITEM NAME="OUTPUT_TO_CONSOLE" VALUE="OUTPUT_TO_CONSOLE" /> <ITEM NAME="RETRIEVE_OUTPUT" VALUE="RETRIEVE_OUTPUT" /> </ITEMS> </PARAMETER>
A list of ITEM is associated to the CLOSED_LIST. We have to say explicitely that the list of items is for “perl” language. The DEFAULT of the ITEMS list is the VALUE of an ITEM. The VALUE is what you'll retrieve in the template for this property.
<PARAMETER NAME="DIRECTORY" FIELD="DIRECTORY" NUM_ROW="1" > <DEFAULT/> </PARAMETER>
Display a textfield and a “browse” button with which you select a directory.
<PARAMETER NAME="FILENAME" FIELD="FILE" NUM_ROW="2" REQUIRED="true" REPOSITORY_VALUE="FILE_PATH" > <DEFAULT>"__COMP_DEFAULT_FILE_DIR__/in.csv"</DEFAULT> </PARAMETER>
Display a textfield and a “browse” button with which you select a file.
<PARAMETER NAME="MESSAGE" FIELD="MEMO" REQUIRED="false" NB_LINES="10" NUM_ROW="5" > <DEFAULT>"Hello"</DEFAULT> </PARAMETER>
A multiline text field. Specify the number of lines with NB_LINES attribute.
<PARAMETER NAME="CODE" FIELD="MEMO_PERL" REQUIRED="false" NUM_ROW="1" NB_LINES="9" > <DEFAULT>String myfoo = "bar";</DEFAULT> </PARAMETER>
Another multiline text field, with Perl syntax highlighting.
<PARAMETER NAME="QUERY" FIELD="MEMO_SQL" NUM_ROW="6" > <DEFAULT>"select id, name from employee"</DEFAULT> </PARAMETER>
Yet another multiline text field. The difference with simple MEMO will be obvious in the future.
<PARAMETER NAME="PROCESS" FIELD="PROCESS_TYPE" NUM_ROW="1"/>
Choose another job and an associated context.
<PARAMETER NAME="SCHEMA" FIELD="SCHEMA_TYPE" NUM_ROW="5" > <DEFAULT /> </PARAMETER>
Display a listbox with “Built-in” and “Repository”. If you choose “Repository”, it means that you want to use a predefined schema, just choose the schema you want in the new listbox.
If you modify the schema with the “View schema” button, the schema becomes “Built-in”.
Note: By default this parameter will be linked directly to the “FLOW” type, but you can setup also to use another connector. You can setup the connector name you want by add the attribute for example context=“REJECT” (See for example in any db output component)
<PARAMETER NAME="FILES" FIELD="TABLE" REQUIRED="false" NUM_ROW="6" NB_LINES="5" > <ITEMS CODE_LANGUAGE="perl"> <ITEM NAME="FILEMASK" /> <ITEM NAME="NEWNAME" SHOW_IF="ACTION=='RENAME'"/> </ITEMS> </PARAMETER>
Multilines and multicolumns property. In the given example, we have 5 lines thanks to NB_LINES attribute. We have 2 columns FILEMASK and NEWNAME.
<PARAMETER NAME="REMOTEDIR" FIELD="TEXT" REQUIRED="false" NUM_ROW="4" > <DEFAULT>"/share/ftp"</DEFAULT> </PARAMETER>
The simplest field type: a simple text box.
As described in “Code generation model” section, the code of a component is divided in 3 templates:
Templates are written in Java and output Perl or Java code.
<%@ jet imports=" org.talend.core.model.process.INode org.talend.core.model.process.ElementParameterParser org.talend.core.model.metadata.IMetadataTable org.talend.core.model.metadata.IMetadataColumn org.talend.designer.codegen.config.CodeGeneratorArgument java.util.List " %>
What may change between components and parts is the class name.
<% CodeGeneratorArgument codeGenArgument = (CodeGeneratorArgument) argument; INode node = (INode)codeGenArgument.getArgument();
We need a CodeGeneratorArgument to retrieve values from defined properties of the component.
The INode is the Java name for “component”.
List<IMetadataTable> metadatas = node.getMetadataList(); String cid = node.getUniqueName(); if ((metadatas!=null)&&(metadatas.size()>0)) { IMetadataTable metadata = metadatas.get(0); if (metadata!=null) { String host = ElementParameterParser.getValue(node, "__HOST__"); String port = ElementParameterParser.getValue(node, "__PORT__"); String user = ElementParameterParser.getValue(node, "__USERNAME__"); String pass = ElementParameterParser.getValue(node, "__PASSWORD__"); %>
This code extract comes the same in every template. The “cid” is the component identifier. We'll use it in many variables to make them unique in the final script.
host, port user and pass Java variables are the value the user gave in the GUI for parameters HOST, PORT, USERNAME and PASSWORD. These parameters are TEXT type parameters.
Here is an example is Java:
<% public void useShareConnection(INode node) { String sharedConnectionName = ElementParameterParser.getValue(node, "__SHARED_CONNECTION_NAME__"); %> String sharedConnectionName_<%=cid%> = <%=sharedConnectionName%>; conn_<%=cid%> = SharedDBConnection.getDBConnection("<%=this.getDirverClassName(node)%>",url_<%=cid%>,userName_<%=cid%> , password_<%=cid%> , sharedConnectionName_<%=cid%>); <% } %>
In the template the Java code is surrounded by ”<% Java code %>”. The above code sample is not Java code but Java output code, it is not surrounded by ”<% … %>”. In this output, you can use Java variable values with ”<%= javaVariable %>”.
In this Java output code, you can see that the DB connection variable is using the component id in its name. This is useful if you instantiate a component more than once.
At the end of the template, we have to close the opened blocks:
<% } } %>
1. Output connections: when writing a input component, and want to output datas to next component, maybe you need to check the connection types like this:
List< ? extends IConnection> conns = node.getOutgoingSortedConnections(); ... if (conn.getLineStyle().hasConnectionCategory(IConnectionCategory.DATA)){ // output here }
In [Subversion] r3673 (available in TOS 2.1.0M2) was introduced a set of default icons in org.talend.designer.components.localprovider/icons, take the one that corresponds to your component.