<?xml version="1.0" encoding="utf-8" standalone="yes" ?>
<feed xmlns="http://www.w3.org/2005/Atom">

	<title>Planet Talend</title>
	<link rel="self" href="http://talendforge.org/atom.xml"/>
	<link href="http://talendforge.org/planet"/>
	<id>http://talendforge.org/atom.xml</id>
	<updated>2008-11-22T01:30:06+00:00</updated>
	<generator uri="http://www.planetplanet.org/">Planet/2.0 +http://www.planetplanet.org</generator>

	<entry xml:lang="en">
		<title type="html">Datamining type</title>
		<link href="http://scorreiait.wordpress.com/2008/11/04/datamining-type/"/>
		<id>http://scorreiait.wordpress.com/?p=102</id>
		<updated>2008-11-04T20:42:48+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;In Talend Open Profiler, when you create a column analysis, you can see a combo box near each column in the editor which represents the data mining type of the column. What is it? And what is it useful for? &lt;/p&gt;
&lt;p&gt;The available data mining types are &lt;/p&gt;
&lt;ol&gt;
&lt;li&gt;nominal&lt;/li&gt;
&lt;li&gt;interval&lt;/li&gt;
&lt;li&gt;unstructured text&lt;/li&gt;
&lt;li&gt;other&lt;/li&gt;
&lt;/ol&gt;
&lt;p&gt;Not all indicators (or metrics) can be computed on all kind of data. These data mining types helps Talend Open Profiler to choose the appropriate metrics for the column. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#nomdat&quot;&gt;Nominal&lt;/a&gt; (also called &lt;a href=&quot;http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#catdat&quot;&gt;&amp;#8220;categorical&amp;#8221;&lt;/a&gt; sometimes) means that the data can serve as label. For example, the type of a column called &amp;#8220;WEATHER&amp;#8221; with values: &amp;#8220;sun&amp;#8221;, &amp;#8220;cloud&amp;#8221;, &amp;#8220;rain&amp;#8221; would be nominal. In Talend Open Profiler, textual data are set to nominal data mining type. &lt;/p&gt;
&lt;p&gt;But it happens that data such as &amp;#8220;52200&amp;#8243;, &amp;#8220;75014&amp;#8243; are nominal data too although they are represented by numbers. In fact, a column called &amp;#8220;POSTAL_CODE&amp;#8221; could have these values. It is clear for the user that these data are of nominal type because they identify a postal code in France. Computing mathematical quantities such as the average on these data is a non sense. In that case, the user should set the data mining type of this column to &amp;#8220;nominal&amp;#8221;, because there is currently no way to automatically guess the correct type in Talend Open Profiler in such a case.&lt;br /&gt;
The same is true for primary or foreign key data. Keys are most of the time numerical data, but their data mining type is &amp;#8220;nominal&amp;#8221;. &lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#intscale&quot;&gt;&amp;#8220;Interval&amp;#8221;&lt;/a&gt; data mining type is used for numerical data and time data. Difference between two values, averages can be computed on this kind of data. In databases, sometimes numerical quantities are stored in textual fields. With Talend Open Profiler, it&amp;#8217;s possible to declare a textual column (e.g. a column of type VARCHAR) as an interval. In that case, the data should be treated as numerical data and summary statistics should be available. Currently, it&amp;#8217;s not yet implemented because there is not yet an interface which allows the user to specify the format of the data. But this feature is planned for a future release. &lt;/p&gt;
&lt;p&gt;The other two data mining types are not usual data mining types. In data mining we find sometimes the types &lt;a href=&quot;http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html#orddat&quot;&gt;&amp;#8220;ordinal&amp;#8221;&lt;/a&gt; and &lt;a href=&quot;http://www.statistics.com/resources/glossary/r/ratioscale.php&quot;&gt;&amp;#8220;ratio&amp;#8221;&lt;/a&gt;. &lt;/p&gt;
&lt;p&gt;The reason is that for the indicators currently available in Talend Open Profiler, these two types are not needed. Instead we added two other types to handle textual data. For example, a column &amp;#8220;COMMENT&amp;#8221; which contains text is not a nominal data, but still we could be interested in seeing the duplicate values of this column. Or we could implement metrics specific to text mining (but this is not for the current release&amp;#8230;). &lt;/p&gt;
&lt;p&gt;And finally, we also have the type &amp;#8220;other&amp;#8221; which design a data which Talend Open Profiler does not know how to handle yet. &lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/102/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/102/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/102/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/102/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/102/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/102/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/102/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/102/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/102/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/102/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=102&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">How to compute a median in SQL</title>
		<link href="http://scorreiait.wordpress.com/2008/10/28/how-to-compute-a-median-in-sql/"/>
		<id>http://scorreiait.wordpress.com/?p=105</id>
		<updated>2008-10-28T21:46:47+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;In Talend Open Profiler, we generate SQL queries to get statistical informations. Among the currently available indicators, the median is one of the most difficult to compute. Nevertheless this indicator is worth computing because it is more stable than the mean indicator (average). By stable, I mean that it is less influenced by extremal values. This is not the case with the average which can vary a lot when extremal values exist.&lt;/p&gt;
&lt;p&gt;I found several ways to compute the median depending on the database type. The most simple is for example with Oracle 10g which provides a MEDIAN function, so that your query writes&lt;br /&gt;
&lt;code&gt;SELECT MEDIAN(salary) FROM employee&lt;/code&gt;&lt;/p&gt;
&lt;p&gt;But for other databases, things begin to be more tricky. Let&amp;#8217;s take &lt;strong&gt;MySQL&lt;/strong&gt; first. One way to compute the median is the following:&lt;br /&gt;
&lt;code&gt;SELECT AVG(salary) FROM (&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
WHERE salary IS NOT NULL&lt;br /&gt;
ORDER by salary ASC&lt;br /&gt;
LIMIT p, n) T&lt;/code&gt;&lt;br /&gt;
where p=1 and n=N/2-1 when the number of non null rows N is even, or p=2 and n=(N-1)/2 when N is odd.&lt;/p&gt;
&lt;p&gt;For &lt;strong&gt;Postgresql&lt;/strong&gt;, the query is similar to the MySQL query and uses LIMIT too.&lt;br /&gt;
&lt;code&gt;SELECT AVG(salary) FROM (&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
WHERE salary IS NOT NULL&lt;br /&gt;
ORDER by salary ASC&lt;br /&gt;
LIMIT n OFFSET p) T&lt;/code&gt;&lt;br /&gt;
This query can also be used on MySQL but not on old versions of MySQL (before 5.0).&lt;br /&gt;
For &lt;strong&gt;Oracle 9i&lt;/strong&gt;, the MEDIAN function does not exists and we must use the PERCENTILE_CONT function:&lt;br /&gt;
&lt;code&gt;SELECT PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY salary)&lt;br /&gt;
FROM employee&lt;/code&gt;&lt;br /&gt;
For &lt;strong&gt;DB2&lt;/strong&gt;, the following query is used in Talend Open Profiler:&lt;br /&gt;
&lt;code&gt;SELECT AVG(salary) FROM (&lt;br /&gt;
SELECT salary, COUNT(*) OVER( ) total, CAST(COUNT(*) OVER( ) AS DECIMAL)/2 mid, CEIL(CAST(COUNT(*) OVER( ) AS DECIMAL)/2) next, ROW_NUMBER() OVER ( ORDER BY salary) rn FROM employee&lt;br /&gt;
) x&lt;br /&gt;
WHERE ( MOD(total,2) = 0     AND rn IN ( mid, mid+1 ) )&lt;br /&gt;
OR&lt;br /&gt;
( MOD(total,2) = 1 AND rn = next )&lt;/code&gt;&lt;br /&gt;
For &lt;strong&gt;Microsoft SQL Server&lt;/strong&gt;, we used the TOP clause as follows&lt;br /&gt;
&lt;code&gt;SELECT AVG(CAST(salary AS NUMERIC)) FROM (&lt;br /&gt;
SELECT TOP n salary FROM (&lt;br /&gt;
SELECT TOP m salary FROM employee&lt;br /&gt;
WHERE salary IS NOT NULL ORDER BY salary ASC&lt;br /&gt;
) AS FOO&lt;br /&gt;
ORDER BY salary DESC&lt;br /&gt;
) AS BAR&lt;br /&gt;
&lt;/code&gt;&lt;br /&gt;
where n is given as in the MySQL case and m=n+p (p being given above for the MySQL case).&lt;/p&gt;
&lt;p&gt;Up to now, the only way I found for computing the median on &lt;strong&gt;Sybase ASE&lt;/strong&gt; is the following:&lt;br /&gt;
&lt;code&gt;SELECT AVG(CAST (salary AS NUMERIC)) FROM (&lt;br /&gt;
SELECT DISTINCT salary FROM (&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
UNION ALL&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
) STT&lt;br /&gt;
WHERE&lt;br /&gt;
(SELECT COUNT(salary) FROM employee) &amp;lt;= (SELECT COUNT(salary) FROM (&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
UNION ALL&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
) AS SOU&lt;br /&gt;
WHERE SOU.salary &amp;lt;= STT.salary)&lt;br /&gt;
AND&lt;br /&gt;
(SELECT COUNT(salary) FROM employee) &amp;lt;= (SELECT COUNT(salary) FROM (&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
UNION ALL&lt;br /&gt;
SELECT salary FROM employee&lt;br /&gt;
) AS SUR&lt;br /&gt;
WHERE SUR.salary &amp;gt;= STT.salary) ) T&lt;/code&gt;&lt;br /&gt;
This query makes heavy use of correlated subqueries and I hope to find a more efficient way to compute a median on this database.&lt;/p&gt;
&lt;p&gt;Median can be computed by other approaches. Temporary tables could be used or cursors. But Talend Open Profiler must only use SELECT statements because a data profiler could not have the permissions to create a table on a database and the use of cursors is too complex for this tool.&lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/105/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/105/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/105/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/105/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/105/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/105/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/105/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/105/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/105/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/105/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=105&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">Talend Free Webinars, Case Studies and White Papers</title>
		<link href="http://ocarbone.free.fr/blog/?p=339"/>
		<id>http://ocarbone.free.fr/blog/?p=339</id>
		<updated>2008-10-22T15:07:03+00:00</updated>
		<content type="html">&lt;p&gt;New sections have been added on our website that you will probably find of interest:&lt;/p&gt;
&lt;p&gt;- the new &lt;strong&gt;&lt;a href=&quot;http://www.talend.com/webinar/archive&quot; title=&quot;On Demand Webinars&quot; target=&quot;_blank&quot;&gt;On Demand Webinars&lt;/a&gt; section&lt;/strong&gt;, where replays of selected past Webinars are now available. The Free Talend Webinar are recorded and you can see it when you wan&amp;#8217;t ! This section is a complement of the &lt;a href=&quot;http://www.talend.com/webinar/index.php?src=oca&quot; title=&quot;Free Talend Webinar Calendar&quot; target=&quot;_blank&quot;&gt;calendar of the real time webinar&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;- a reformatted &lt;strong&gt;Talend Customers section&lt;/strong&gt;, with links to &lt;a href=&quot;http://www.talend.com/open-source-provider/reference.php&quot; title=&quot;Customer Case Studies&quot; target=&quot;_blank&quot;&gt;case studies&lt;/a&gt; (most of them are available in several languages) and a rotation of customer and partners logos.&lt;/p&gt;
&lt;p&gt;- the new &lt;strong&gt;&lt;a href=&quot;http://www.talend.com/library/reflibrary.php&quot; title=&quot;White Paper Library&quot; target=&quot;_blank&quot;&gt;white paper library&lt;/a&gt;&lt;/strong&gt;, featuring WPs and analyst reports.&lt;/p&gt;
&lt;p&gt;All the documents are provide in French and in English.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/website.png&quot; title=&quot;Talend.com&quot; alt=&quot;Talend.com&quot; vspace=&quot;0&quot; width=&quot;503&quot; border=&quot;0&quot; height=&quot;383&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;a href=&quot;http://ocarbone.free.fr/blog/index.php?lang=&amp;amp;tag=talend&quot; rel=&quot;tag&quot;&gt;talend&lt;/a&gt;&lt;p&gt;Source: &lt;a href=&quot;http://ocarbone.free.fr/blog&quot; title=&quot;Formateur elearning ; Responsable Formation, Ingénieur Recherche et Développement&quot;&gt;ocarbone.free.fr&lt;/a&gt;&lt;/p&gt;
&lt;p class=&quot;akst_link&quot;&gt;&lt;a href=&quot;http://ocarbone.free.fr/blog/?p=339&amp;amp;akst_action=share-this&quot; title=&quot;E-mail this, post to del.icio.us, etc.&quot; id=&quot;akst_link_339&quot; class=&quot;akst_share_link&quot; rel=&quot;nofollow&quot;&gt;Share This&lt;/a&gt;
&lt;/p&gt;&lt;div class=&quot;feedflare&quot;&gt;
&lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=3ujKM&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=3ujKM&quot; border=&quot;0&quot; /&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=93hZm&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=93hZm&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
&lt;/div&gt;</content>
		<author>
			<name>Olivier CARBONE</name>
			<uri>http://ocarbone.free.fr/blog</uri>
		</author>
		<source>
			<title type="html">ocarbone.free.fr</title>
			<subtitle type="html">Ingénieur Recherche et Développement - membre de la communauté Talend</subtitle>
			<link rel="self" href="http://feeds.feedburner.com/ocarbone-cat3"/>
			<id>http://feeds.feedburner.com/ocarbone-cat3</id>
			<updated>2008-11-18T01:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">When memory matters, multiple keys hash</title>
		<link href="http://le-gall.net/pierrick/en/blog/index.php?post/2008/10/17/When-memory-matters-multiple-keys-hash"/>
		<id>urn:md5:1fb5fee93e2ebfcb65cd899332e57d5d</id>
		<updated>2008-10-16T22:41:00+00:00</updated>
		<content type="html">&lt;p&gt;&lt;img src=&quot;http://le-gall.net/pierrick/en/blog/public/post_120/corsaire_memory.jpg&quot; alt=&quot;corsaire_memory.jpg&quot; /&gt;&lt;/p&gt;


&lt;p&gt;One more time I need to load a huge number of data in memory with a Perl hash. See previous post about this in &lt;a href=&quot;http://le-gall.net/pierrick/en/blog/index.php?post/2008/07/16/When-memory-matters&quot; hreflang=&quot;en&quot;&gt;When memory matters&lt;/a&gt;. This time the hash key is made of several values. The value corresponding to each key is an array of scalar data. I have 6 string values for the key and 10 numeric float values for the corresponding value.&lt;/p&gt;    &lt;p&gt;My data looks like this:&lt;/p&gt;

&lt;pre&gt;
x5;7Sm;aI8w;2bNuW;lcDogi;pBTAdSKxha;2.73718;6.86649;0.49441;6.97707;2.62658;0.29478;3.97692;1.76468;4.86533;7.82656
sR;Ndu;PvlX;k3IJu;ZTwnAN;jTmtvmlqru;3.03941;1.84579;3.66076;7.14519;8.18409;5.87612;0.56569;1.49385;8.77644;6.98281
jm;cQL;ZXul;J0NIC;UwBz4S;O9VfAA74ao;6.20567;0.16487;3.02578;9.16104;0.14110;8.24097;8.23658;3.53050;7.11429;8.31212
3s;gId;wjXk;jJ3X2;7C833a;tzlTSHet4Y;8.08147;2.62935;9.00534;9.98493;5.17147;4.15576;8.60537;0.41931;9.79758;6.75664
&lt;/pre&gt;


&lt;p&gt;Here comes my basic script. The hash key is made of 6 first columns, the values are the next 10 columns. In the next examples, only code between mark 1 and mark 2 will change.&lt;/p&gt;

&lt;pre&gt;
#!/usr/bin/perl

use strict;
use warnings;

use Time::HiRes qw(gettimeofday tv_interval);
use Sys::Statistics::Linux::Processes;

my $lxs = Sys::Statistics::Linux::Processes-&amp;gt;new;
$lxs-&amp;gt;init;

my $start = [gettimeofday];
my %cache = ();

open(my $ifh, '&amp;lt;'.$ARGV[0])
    or die 'cannot open input file';

while (&amp;lt;$ifh&amp;gt;) {
    chomp;
    my @fields = split ';', $_;

    # mark 1
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[6];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[7];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[8];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[9];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[10];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[11];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[12];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[13];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[14];
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]}{operation1}{sum} = $fields[15];
    # mark 2
}

# use Data::Dumper;
# print Dumper(\%cache);

close($ifh);

my $stop = [gettimeofday];
my $stat = $lxs-&amp;gt;get;

printf(
    &amp;quot;time: %.1f seconds, memory : %uM\n&amp;quot;,
    tv_interval($start, $stop),
    $stat-&amp;gt;{$$}{vsize} / (1024 * 1024)
);
&lt;/pre&gt;


&lt;p&gt;It corresponds to the way Talend Open Studio currently stores the aggregation hash, so you may understand what &quot;operation1&quot; and &quot;sum&quot; mean :-). Here comes a second way to do it:&lt;/p&gt;

&lt;pre&gt;
    # mark 1
    $cache{$fields[0]}{$fields[1]}{$fields[2]}{$fields[3]}{$fields[4]}{$fields[5]} = [
        $fields[6],
        $fields[7],
        $fields[8],
        $fields[9],
        $fields[10],
        $fields[11],
        $fields[12],
        $fields[13],
        $fields[14],
        $fields[15],
    ];
    # mark 2
&lt;/pre&gt;


&lt;p&gt;and a third one, very compact:&lt;/p&gt;

&lt;pre&gt;
    # mark 1
    $cache{join($;, @fields[0..5])} = join($;, @fields[6..15]);
    # mark 2
&lt;/pre&gt;


&lt;p&gt;I've performed the test on a 1 million lines file :&lt;/p&gt;
&lt;pre&gt;
$ perl hash_memory1.pl /tmp/agg.in; perl hash_memory2.pl /tmp/agg.in; perl hash_memory3.pl /tmp/agg.in
time: 35.4 seconds, memory : 991M
time: 22.1 seconds, memory : 1205M
time: 15.7 seconds, memory : 197M
&lt;/pre&gt;


&lt;p&gt;As hash_memory1.pl is the current &quot;way to do it&quot; in Talend Open Studio, I take it as the reference. The second method is 1.6 times faster, but 1.2 times more memory consuming. Third method is 2.25 times faster and 5 times less memory consuming. Huge improvement, really!&lt;/p&gt;


&lt;p&gt;According to the limit of 3GB per process on a Linux x86 32bits architecture, it means that before we could have 3 millions keys and with this improvement we can have 15 million lines in memory.&lt;/p&gt;</content>
		<author>
			<name>Pierrick LE GALL</name>
			<uri>http://le-gall.net/pierrick/en/blog/index.php</uri>
		</author>
		<source>
			<title type="html">Pierrick Le Gall - talend</title>
			<link rel="self" href="http://le-gall.net/pierrick/en/blog/index.php?feed/tag/talend/rss2"/>
			<id>http://le-gall.net/pierrick/en/blog/index.php?feed/tag/talend/rss2</id>
			<updated>2008-10-17T07:30:05+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">krunner history</title>
		<link href="http://scorreiait.wordpress.com/2008/10/18/krunner-history/"/>
		<id>http://scorreiait.wordpress.com/?p=79</id>
		<updated>2008-10-16T07:36:09+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;&lt;a href=&quot;http://www.linuxpedia.fr/doku.php/kde/krunner&quot;&gt;krunner&lt;/a&gt; is the small linux application launcher that pops-up when you hit Alt+F2. Its behavior in kde4 changed from the behavior of &lt;a href=&quot;http://www.linuxpedia.fr/doku.php/kde/minicli&quot;&gt;minicli&lt;/a&gt; of kde-3.5. In order to get the previous behavior back, right-click on the text field of krunner and change the text completion mode to &amp;#8220;short automatic&amp;#8221;.&lt;/p&gt;
&lt;p&gt;I never knew this menu before today. That&amp;#8217;s why I give this tip here.&lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/79/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/79/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/79/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/79/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/79/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/79/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/79/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/79/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/79/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/79/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=79&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">Want to create a startup?</title>
		<link href="http://scorreiait.wordpress.com/2008/10/16/want-to-create-a-startup/"/>
		<id>http://scorreiait.wordpress.com/?p=93</id>
		<updated>2008-10-16T07:35:17+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;Before you start to create your startup, listen to (or read) &lt;a href=&quot;http://www.b-eye-network.com/view/index.php?cid=8628&quot;&gt;Bill Inmon&amp;#8217;s advice&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Some things that you must keep in mind are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;be pragmatic: &lt;em&gt;&amp;#8220;build what  you can sell&amp;#8221;&lt;/em&gt;, &lt;em&gt;&amp;#8220;95% of the startup resources will go to marketing and sales.&amp;#8221;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;protect your work &lt;em&gt;&amp;#8220;intellectual property is the backbone of the startup&amp;#8221;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;be ready for competition &amp;#8220;&lt;em&gt;competition is lurking behind every shadow and every bush&amp;#8221;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;business is business, don&amp;#8217;t keep unproductive people &lt;em&gt;&amp;#8220;Get rid of unproductive people quickly&amp;#8221;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;Remember that a lot of money is needed to start because there is a &lt;em&gt;&amp;#8220;long way to go before there is any income&amp;#8221;&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;But &lt;em&gt;avoid venture capitalists at all costs&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;&amp;#8220;be flexible&amp;#8221;&lt;/em&gt; and adapt your business plan&lt;/li&gt;
&lt;li&gt;&amp;#8230;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;I have not created a startup, but as I work in a startup, I can see how these advice are more or less followed by my employers. And these advice apply to open source editors too as soon as they need to make money.&lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/93/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/93/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/93/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/93/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/93/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/93/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/93/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/93/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/93/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/93/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=93&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">Talend Open Studio v3.0.0: discover the essential knowledge</title>
		<link href="http://ocarbone.free.fr/blog/?p=334"/>
		<id>http://ocarbone.free.fr/blog/?p=334</id>
		<updated>2008-10-14T23:01:20+00:00</updated>
		<content type="html">&lt;p&gt;Talend Open Studio v3.0.0 has just been released. Please &lt;a href=&quot;http://www.talend.com/download.php&quot; title=&quot;Download Talend Open Studio&quot; target=&quot;_blank&quot;&gt;download it now&lt;/a&gt; and check all that this new release has to offer! To discover all the new features, please visit the &lt;a href=&quot;http://talendforge.org/bugs/changelog_page.php&quot; title=&quot;Talend Open Studio Changelog&quot; target=&quot;_blank&quot;&gt;changelog&lt;/a&gt; page.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/tos3.0.0.png&quot; title=&quot;Talend Open Studio V3.0.0&quot; alt=&quot;Talend Open Studio V3.0.0&quot; vspace=&quot;0&quot; width=&quot;435&quot; border=&quot;0&quot; height=&quot;250&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To discover the essential knowledge, &lt;a href=&quot;http://www.talend.com/resources/documentation.php&quot; title=&quot;Talend Open Studio Documentation&quot; target=&quot;_blank&quot;&gt;Talend Open Studio documentation&lt;/a&gt; is split into two guides:&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;&lt;strong&gt; &lt;/strong&gt;The User Guide provide general use information and describe the Graphical Interface User&lt;/li&gt;
&lt;li&gt;The Reference Guide of Talend components describe the use of all components and include several use cases&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;strong&gt;To turn theory into practice, Talend propose several resources :&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Tutorials (video) published in &lt;a href=&quot;http://ocarbone.free.fr/blog/wp-admin/post.php#%20http://www.talendforge.org/tutorials/menu.php&quot; title=&quot;Talend Open Studio Tutorials&quot; target=&quot;_blank&quot;&gt;TalendForge&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Job Design Scenarios described in our documentation&lt;/li&gt;
&lt;li&gt;A Demo Project included in Talend Open Studio&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This 3 resources are independent, the examples files are not common and the use cases are not the same.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;To have ready-to-use Job Designs at your disposal, import the Demo Project include in Talend Open Studio.&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;In this release, the demo project are updated.&lt;/p&gt;
&lt;h1&gt;How to import the Demo Project?&lt;/h1&gt;
&lt;p&gt;Simply follow the steps below:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;Launch Talend Open Studio&lt;/li&gt;
&lt;li&gt;In the login window, click on Manage and click on Demo&lt;/li&gt;
&lt;li&gt;Choose Java or Perl Demo Project&lt;/li&gt;
&lt;li&gt;Click on Finish to complete the operation&lt;/li&gt;
&lt;li&gt;Choose the TALENDDEMOJAVA or TALENDDEMOPERL on the &amp;#8220;Existing&amp;#8221; list&lt;/li&gt;
&lt;li&gt;Click on Open&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/talenddemoproject-open-demo.png&quot; title=&quot;Discover the essential knowledge&quot; alt=&quot;Discover the essential knowledge&quot; vspace=&quot;0&quot; width=&quot;578&quot; border=&quot;0&quot; height=&quot;315&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;h1&gt;How to install the example files?&lt;/h1&gt;
&lt;p&gt;In the view Repository, open the node Job Designs and Contexts:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt; The default folder used to store the example files is &amp;#8220;C// &amp;#8220;. To change this value, let&amp;#8217;s double click on &lt;em&gt;globalContext&lt;/em&gt; (node Contexts), &lt;em&gt;Next&lt;/em&gt; and let&amp;#8217;s go in the panel &lt;em&gt;Value as Tree&lt;/em&gt; to change the value of the var &amp;#8220;defaultDir&amp;#8221;&lt;/li&gt;
&lt;li&gt;Open the job named beforeRunJobs and execute it to install the example files&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Additional setting: Open the Metadata/Db Connections/demoMysql and change the settings to establish the connection with your database.&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/talenddemoproject-context.png&quot; title=&quot;How to install the example files&quot; alt=&quot;How to install the example files&quot; vspace=&quot;0&quot; width=&quot;500&quot; border=&quot;0&quot; height=&quot;291&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;h1&gt;How to take advantage of the Demo Project?&lt;/h1&gt;
&lt;p&gt;Open each Job Design and press F6 to run the job. If you wan&amp;#8217;t some help about a component, select it and press F1. In the view Run Job, 2 features are available: &lt;em&gt;Traces&lt;/em&gt; and &lt;em&gt;Statistics.&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/talenddemoproject-runjob.png&quot; title=&quot;How to take advantages of the Demo Project&quot; alt=&quot;How to take advantages of the Demo Project&quot; vspace=&quot;0&quot; width=&quot;500&quot; border=&quot;0&quot; height=&quot;97&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;/strong&gt;&lt;strong&gt;Statistics feature is available to observe the repartition of the data:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/talenddemoproject-statistics.png&quot; title=&quot;How to take advantages of the Demo Project&quot; alt=&quot;How to take advantages of the Demo Project&quot; vspace=&quot;0&quot; width=&quot;458&quot; border=&quot;0&quot; height=&quot;248&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;The &lt;/strong&gt;&lt;strong&gt;Traces feature is available to observe the transformation of the data:&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/talenddemoproject-traces.png&quot; title=&quot;How to take advantages of the Demo Project&quot; alt=&quot;How to take advantages of the Demo Project&quot; vspace=&quot;0&quot; width=&quot;500&quot; border=&quot;0&quot; height=&quot;399&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Want to talk about Talend Open Studio? Need help to use the software? Want to make suggestions and discuss with the dev team? &lt;strong&gt;&lt;a href=&quot;http://www.talendforge.org/forum/&quot; title=&quot;Make suggestions and discuss with the dev team!&quot; target=&quot;_blank&quot;&gt;Forum is the right place!&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;
&lt;a href=&quot;http://ocarbone.free.fr/blog/index.php?lang=&amp;amp;tag=talend&quot; rel=&quot;tag&quot;&gt;talend&lt;/a&gt;&lt;p&gt;Source: &lt;a href=&quot;http://ocarbone.free.fr/blog&quot; title=&quot;Formateur elearning ; Responsable Formation, Ingénieur Recherche et Développement&quot;&gt;ocarbone.free.fr&lt;/a&gt;&lt;/p&gt;
&lt;p class=&quot;akst_link&quot;&gt;&lt;a href=&quot;http://ocarbone.free.fr/blog/?p=334&amp;amp;akst_action=share-this&quot; title=&quot;E-mail this, post to del.icio.us, etc.&quot; id=&quot;akst_link_334&quot; class=&quot;akst_share_link&quot; rel=&quot;nofollow&quot;&gt;Share This&lt;/a&gt;
&lt;/p&gt;&lt;div class=&quot;feedflare&quot;&gt;
&lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=txMLM&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=txMLM&quot; border=&quot;0&quot; /&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=1V93m&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=1V93m&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
&lt;/div&gt;</content>
		<author>
			<name>Olivier CARBONE</name>
			<uri>http://ocarbone.free.fr/blog</uri>
		</author>
		<source>
			<title type="html">ocarbone.free.fr</title>
			<subtitle type="html">Ingénieur Recherche et Développement - membre de la communauté Talend</subtitle>
			<link rel="self" href="http://feeds.feedburner.com/ocarbone-cat3"/>
			<id>http://feeds.feedburner.com/ocarbone-cat3</id>
			<updated>2008-11-18T01:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">Talend Open Studio: free technical workshops</title>
		<link href="http://ocarbone.free.fr/blog/?p=332"/>
		<id>http://ocarbone.free.fr/blog/?p=332</id>
		<updated>2008-10-14T00:16:55+00:00</updated>
		<content type="html">&lt;p&gt;The best way to demonstrate to our potential users the value that Talend’s technology can bring to them is to have them try it themselves. &lt;strong&gt; So we are getting on the road and organizing a series of free technical workshops&lt;/strong&gt; in a number of cities in France, Switzerland and Belgium.&lt;/p&gt;
&lt;p&gt;During 3 hours the participants to the workshops will receive complimentary training on Talend Open Studio - and at the end of the workshop they will leave with a USB key containing the data integration jobs they have developed.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Next year, we’ll be taking this concept to other countries and continents&lt;/strong&gt; - we have also been running several workshops in Germany, which have been very successful.&lt;/p&gt;
&lt;p align=&quot;center&quot;&gt;&lt;img src=&quot;http://ocarbone.free.fr/blog/images/talend/bus-talend.jpg&quot; title=&quot;Roadshow Talend: atelier découverte aux logiciels Talend&quot; alt=&quot;Roadshow Talend: atelier découverte aux logiciels Talend&quot; vspace=&quot;0&quot; width=&quot;531&quot; border=&quot;0&quot; height=&quot;278&quot; hspace=&quot;0&quot; /&gt;&lt;/p&gt;
&lt;p&gt;Check out the calendar and &lt;a href=&quot;http://www.surveymonkey.com/s.aspx?sm=viPBdDXcy6FdxZu_2fdPY2Hw_3d_3d&quot; target=&quot;_blank&quot; title=&quot;Talend Roadshow - sign up&quot;&gt;sign up now&lt;/a&gt;:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;23rd of october - Lille&lt;/li&gt;
&lt;li&gt;24th of october - Luxembourg City&lt;/li&gt;
&lt;li&gt;28th of october - Genève&lt;/li&gt;
&lt;li&gt;29th of october - Genève&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Following the numerous requests received, I am please to announce you that the Talend Discovery Roadshow will make a &lt;strong&gt;stop in Brussels on the 25th of November&lt;/strong&gt; 2008. We will make there 2 workshops in French and English.&lt;/p&gt;
&lt;p&gt;If you wish to register for this workshop, or simply suggest another location, please click on &lt;a href=&quot;http://www.surveymonkey.com/s.aspx?sm=viPBdDXcy6FdxZu_2fdPY2Hw_3d_3d&quot; target=&quot;_blank&quot; title=&quot;Talend Roadshow - sign up&quot;&gt;this link&lt;/a&gt;.&lt;/p&gt;
&lt;a href=&quot;http://ocarbone.free.fr/blog/index.php?lang=&amp;amp;tag=talend&quot; rel=&quot;tag&quot;&gt;talend&lt;/a&gt;&lt;p&gt;Source: &lt;a href=&quot;http://ocarbone.free.fr/blog&quot; title=&quot;Formateur elearning ; Responsable Formation, Ingénieur Recherche et Développement&quot;&gt;ocarbone.free.fr&lt;/a&gt;&lt;/p&gt;
&lt;p class=&quot;akst_link&quot;&gt;&lt;a href=&quot;http://ocarbone.free.fr/blog/?p=332&amp;amp;akst_action=share-this&quot; title=&quot;E-mail this, post to del.icio.us, etc.&quot; id=&quot;akst_link_332&quot; class=&quot;akst_share_link&quot; rel=&quot;nofollow&quot;&gt;Share This&lt;/a&gt;
&lt;/p&gt;&lt;div class=&quot;feedflare&quot;&gt;
&lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=8NuaM&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=8NuaM&quot; border=&quot;0&quot; /&gt;&lt;/a&gt; &lt;a href=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?a=Ptadm&quot;&gt;&lt;img src=&quot;http://feeds.feedburner.com/~f/ocarbone-cat3?i=Ptadm&quot; border=&quot;0&quot; /&gt;&lt;/a&gt;
&lt;/div&gt;</content>
		<author>
			<name>Olivier CARBONE</name>
			<uri>http://ocarbone.free.fr/blog</uri>
		</author>
		<source>
			<title type="html">ocarbone.free.fr</title>
			<subtitle type="html">Ingénieur Recherche et Développement - membre de la communauté Talend</subtitle>
			<link rel="self" href="http://feeds.feedburner.com/ocarbone-cat3"/>
			<id>http://feeds.feedburner.com/ocarbone-cat3</id>
			<updated>2008-11-18T01:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">Data quality blogs and resources</title>
		<link href="http://scorreiait.wordpress.com/2008/10/07/data-quality-blogs-and-resources/"/>
		<id>http://scorreiait.wordpress.com/?p=84</id>
		<updated>2008-10-10T13:41:22+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;The best data quality blogs are now listed in one place: &lt;a href=&quot;http://www.dataqualitypro.com/data-quality-blogs&quot;&gt;Data Quality Pro Blog Finder&lt;br /&gt;
&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;http://www.dataqualitypro.com/data-quality-home/who-are-the-data-quality-blogging-heroes.html&quot;&gt;This post&lt;/a&gt; explains how the blogs are evaluated and shows that independent bloggers have more interactions with the community than vendor bloggers.&lt;/p&gt;
&lt;p&gt;Data Quality Tools vendors are listed &lt;a href=&quot;http://www.dataqualitypro.com/data-quality-technology/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Latest news are aggregated &lt;a href=&quot;http://www.dataqualitypro.com/data-quality-quick-link/&quot;&gt;here&lt;/a&gt; and Data quality events are &lt;a href=&quot;http://www.dataqualitypro.com/data-quality-events-planner/&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/84/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/84/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/84/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/84/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/84/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/84/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/84/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/84/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/84/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/84/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=84&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

	<entry xml:lang="en">
		<title type="html">What&amp;#8217;s in your databases?</title>
		<link href="http://scorreiait.wordpress.com/2008/10/02/whats-in-your-databases/"/>
		<id>http://scorreiait.wordpress.com/?p=68</id>
		<updated>2008-10-02T20:54:37+00:00</updated>
		<content type="html">&lt;div class=&quot;snap_preview&quot;&gt;&lt;br /&gt;&lt;p&gt;Often you only know approximately what&amp;#8217;s in your databases. Data profiling tools can help you to get a better idea of your database content. The goal of a data profiler is not to analyze your data in depth but to give you at a glance the main features of your data. Especially, data profilers can give you information about missing data, duplicates, badly formatted data, invalid data (out of range, incorrect business pattern&amp;#8230;)&lt;/p&gt;
&lt;p&gt;Talend Open Profiler (TOP) can help you to explore your data. The latest version is the &lt;a href=&quot;http://www.talend.com/download.php?src=BlogSCorreia&quot;&gt;1.1.0&lt;/a&gt;. Its official &lt;a href=&quot;http://www.talend.com/resources/documentation.php&quot;&gt;documentation&lt;/a&gt; is available &lt;a href=&quot;http://www.talend.com/resources/documentation.php&quot;&gt;here&lt;/a&gt;. A lot of other informations can be found on the &lt;a href=&quot;http://www.dataqualitypro.com/&quot;&gt;Data Quality Pro website&lt;/a&gt; which also made a&lt;a href=&quot;http://www.dataqualitypro.com/data-quality-home/data-profiling-for-beginners-download-a-complete-tutorial-in.html&quot;&gt; 21 page tutorial for addressing your data quality issues&lt;/a&gt; with Talend Open Profiler and their free &lt;a href=&quot;http://www.dataqualitypro.com/data-quality-home/need-to-trap-data-defects-in-oracle-download-this-free-data.html&quot;&gt;DQ Pattern analyser&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;This tutorial shows you how to use TOP to explore your data and gives a lot of tips about how to interpret your profiling results. And this is really important, because you can profile easily your data and produce nice graphics with TOP, but if you don&amp;#8217;t know what to do once you obtained the results, then profiling your data did not really help you to enhance your data quality. The tutorial also presents a very useful function called &amp;#8220;DQ Pattern analyser&amp;#8221; that lists the patterns existing in the data. It helps you to quickly see what&amp;#8217;s wrong with your data and permits to identify rare occurences.&lt;br /&gt;
This function does not exists yet in TOP, but it will be implemented for the next version along with other new features.&lt;/p&gt;
&lt;p&gt;By the way, if you are missing a feature, it&amp;#8217;s time to &lt;a href=&quot;http://www.talendforge.org/forum/viewforum.php?id=12&quot;&gt;tell Talend&amp;#8217;s team&lt;/a&gt; which new feature would be great to be in TOP.&lt;/p&gt;
&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gocomments/scorreiait.wordpress.com/68/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/comments/scorreiait.wordpress.com/68/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godelicious/scorreiait.wordpress.com/68/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/delicious/scorreiait.wordpress.com/68/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/gostumble/scorreiait.wordpress.com/68/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/stumble/scorreiait.wordpress.com/68/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/godigg/scorreiait.wordpress.com/68/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/digg/scorreiait.wordpress.com/68/&quot; /&gt;&lt;/a&gt; &lt;a rel=&quot;nofollow&quot; href=&quot;http://feeds.wordpress.com/1.0/goreddit/scorreiait.wordpress.com/68/&quot;&gt;&lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://feeds.wordpress.com/1.0/reddit/scorreiait.wordpress.com/68/&quot; /&gt;&lt;/a&gt; &lt;img alt=&quot;&quot; border=&quot;0&quot; src=&quot;http://stats.wordpress.com/b.gif?host=scorreiait.wordpress.com&amp;amp;blog=1762559&amp;amp;post=68&amp;amp;subd=scorreiait&amp;amp;ref=&amp;amp;feed=1&quot; /&gt;&lt;/div&gt;</content>
		<author>
			<name>scorreia</name>
			<uri>http://scorreiait.wordpress.com</uri>
		</author>
		<source>
			<title type="html">My IT Weblog</title>
			<subtitle type="html">Just another WordPress.com weblog</subtitle>
			<link rel="self" href="http://scorreiait.wordpress.com/feed/atom/"/>
			<id>http://scorreiait.wordpress.com/feed/atom/</id>
			<updated>2008-11-04T21:30:03+00:00</updated>
		</source>
	</entry>

</feed>
