Introduction to using .NET in Talend Studio

What is this thing?

It's a java and visual C++ library that's been built to allow simpler means of integrating between java and .NET. In the past, this was only feasible by either exposing the dlls as webservices or via stubs/proxies and lots of other complicated procedures. In an effort to streamline this, we've taken advantage of some interesting capabilities of the two technologies. In the end it makes this integration much easier. You no longer have to expose your dlls as webservices or go through these complicated processes to integrate your dlls into java. Note the screen shots were taken prior to the components having custom icons.

Getting started

  • Obtain the janet dll (janet-win<32 or 64>.dll) here for .NET 3.5 or here for .NET 4.0
  • Either place it in a Path directory (i.e. C:\Program Files\Java\jre6\bin, C:\Windows\System32, etc) or place it anywhere on the system and pass -Djava.library.path=<path to directory containing the dll>.
  • Place the janet jar somewhere handy. If you're using the components, they already know where it is, otherwise, you'll have to import it.
  • Working knowledge of .NET is handy such as C#, VB, or VC++ but it's not required. This can help prototype what you need to do. If desired this can be done with the express edition of visual studio.

With the prerequisites completed the actual implementation can begin.

Working with .NET

Assuming there's already a .NET Object with which to integrate, the first thing that needs to be done is to determine the correct way to load that assembly.

  1. If it's a system assembly (e.g. System::Data::OleDb::OleDbConnection), you can use the name of the assembly such as “System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089”
  2. If it's a custom dll, you can use the absolute path to the dll (e.g. “C:\\WINDOWS\\system32\\ClassLibrary1.dll”).

Now let's get started with the integration. From this point forward there are two possible paths. The first being to utilize the components in the DotNET family of the palette and the second being through
custom code. The components under the DotNET family serve two different purposes.

  • tDotNETInstantiate is intended to instantiate a .NET Object for later reuse.
  • tDotNETRow is for interacting with .NET Object methods, either static or instance methods.

tDotNETInstantiate

In order to instantiate a .NET Object there is some required information. That assembly name or dll absolute path that was determined above, the fully qualified class name and all parameters to pass to the
constructor, if any are necessary. .NET Does not automatically provide a default constructor Looking at the component settings to see where all of these can be provided.tDotNETInstantiate settings

The top parameter Dll to load contains either the absolute path or the assembly name. The underlying technology will determine which of the two it is. In the example above, it's a small dll that has been
made to simply concatenate two strings together and return that result. The second parameter Fully qualified class name contains the class name along with any namespace in which the class resides. This
class doesn't take any parameters to the constructor so leave that list empty.

Instantiate a system class with a parameterized constructor

This screen shot shows the component configured for a system class.

tDotNETRow

Now that an Object has been instantiated, its instance methods can be invoked. That is the purpose of the tDotNETRow component. There a lot more options to this component because there are lot more
ways you can invoke a method than there are to instantiate an object. Some of these options are on the basic settings pane for the tDotNETRow component. default basic settings view for tDotNETRow

Some of these options are familiar as this component will also allow object instantiation - both at the beginning and at each row if necessary. Starting at the top of the parameters:

  1. Schema you can customize the output to add a column for the result or you can simply propagate the same schema.
  2. Use a static method this parameter means that it doesn't need an instance of the class to invoke the method.
  3. Propagate data to output will propagate the data from the input to the output.
  4. Use an existing instance Will allow use of an instance created in a tDotNETInstantiate or in another tDotNETRow.
  5. Dll to load Only relevant when either a static method is invoked or the tDotNETRow is creating its own instance. Same purpose as in the tDotNETInstantiate.
  6. Fully qualified class name Again, only relevant for a static method or a new instantiation. Again serves the same purpose as in the tDotNETInstantiate.
  7. Method Name The name of the method to invoke.
  8. Value(s) to pass to the constructor Only relevant for a new instance, this serves the same purpose as in the tDotNETInstantiate
  9. Method Parameters The values to pass to the method. Assuming these are from the input row, this can contain input_row.<column name> (i.e. input_row.First_Name)
  10. Output value target column When the method returns a value, this is the column that should contain the value.

If electing to reuse an existing instance, the the parameters that are necessary to create one are now hidden and then the exact instance to use can be selected.
reuse an existing instance

list of available instances

If electing to invoke a static method, the parameters again will adjust accordingly. It requires the dll absolute path or assembly name and the fully qualified class name again.
Loaded assemblies and dlls are cached so that it will not reload the same on more than once.

use a static method

Going to the Advanced settings parameters will allow utilization of even more of the flexibility of method invocation.

advanced settings pane

  1. Method doesn't return a value Allows invocation of a void method that doesn't return an actual value.
  2. Returns an instance of a .NET Object Disables implicit conversion between .NET and java types. This is required for Objects that the two types don't share.
    Types such as String, Integer, Float, etc. can all be implicitly converted whereas a complex type most likely can't be.
  3. Store the returned value for later use Stores the result of the method into the global map for use in later tDotNETRow components.

If the method doesn't return a value and the Method doesn't return a value parameter is set to true, the basic settings change to reflect this.

Basic settings void method

The tDotNETRow component can be used either mid-flow or it can start the flow or it can end the flow.

Under the circumstance that a new instance of an Object is necessary for each row of data, we have to have Use an existing instance set to false and on the advanced settings pane,
there is now a Create a new instance at each row parameter that will allow this.
Note: "Create a new instance at each row"

With that covered, enjoy the new found ability to integrate .NET class in Talend Studio!

Below are some screen shots that show the tDotNETInstantiate and tDotNETRow component in various configurations in some Talend jobs.

The first is a simple one and will look familiar if you've seen the use case scenario for the component that was written for the component guide.
This job passes a first name and a last name into a .NET class that simply concatenates the strings together, in this instance forming a full name.
a simple job using the .NET components

The second one uses the components to select data out of sql server using Ole DB. This one entails a lot of instantiation and reuse as well as showing that the tDotNETRow can be used by
itself or in a subjob as a start component.
a more complicated job using .NET Objects.

 
doc/dotnet.txt · Last modified: 2011/12/30 13:34 by rbaldwin
 
 
Recent changes RSS feed Driven by DokuWiki