Scheduling in Talend Open Studio
ETL jobs are created with the intent of pulling fresh data from one or more sources. Performing some transformations on them and pushing the cleansed and modified data into the target db. Once you have your wonderful job created, you would naturally wish for it to run more than once. You may want it to run periodically (eg. Weekly, monthly, fortnightly) or more likely even daily at a specific time. Now, in order to reach your goal, what you would have to do is manually run your job every single day. Imagine how tiresome and pointless this activity may seem after a while when have to manually run the same job every single day. Wouldn’t it be just wonderful, if we could somehow make the job run on its own at a certain time.
Fortunately, there is a provision in Talend to achieve just this and the process of going about achieving this is called Scheduling in Talend Open Studio.
Talend Export Types
If you are using the enterprise edition of Talend, then you would be using the Talend Administration Center (TAC) to schedule your jobs. However, if you are using the Open Studio, you would need to export your job into a number of export types. The various Talend export types are :-
- Standalone Jobs
- Axis WebService (WAR)
- Axis Webservice(ZIP)
- JBoss ESB
- Petals ESB
- OSGI Bundle for ESB
For this tutorial, we would be considering a simple job that does no more than read a file, and print a few lines of it onto the console output. The design of this job is as shown below :-
To export your job as an Standalone Job, right -click your Job in the Talend Repository Browser and select Export Items. This would pop up the Export Items dialog box.
Of the two options, we are interested in the “Select Archive file” option as it creates a zip file of your job and it’s dependencies (Note the Export Dependencies checkbox).
Building Jobs in Talend
Variations can also be introduced based on the nature of export that we need to make. For example if we wish to deploy our job as a web service, a normal zip archive may not be useful. In this case, we need a Web archive (WAR). To achieve this, we need to select the Build Job option from before instead of the Export Items option. This gives you an entirely different window :-
The Build Type option provides details on the type of archive you wish to create and what all additional files you wish to include in the archive. However, in this tutorial we will focus mainly on creating standalone jobs. Selecting the build type as Standalone would change your window to below
Here Shell Launcher helps you specify if you would like to create any shell launcher scripts for your job. These are Unix and Windows type scripts for launching your job.
The Context Script option is for exporting the scripts of each and every context variable present in your job. By default, the Default context of the job is selected, however this can be easily changed as can be understood from the above diagram.
There is a small checkbox at the side “Apply Context to children Jobs”. This only affects the job if the current job has child jobs. If this is checked and a child job is called, this job will take up the context of the parent and not the one defined in it’s configuration.
By selecting Override Parameter’s values, a dialog will be displayed that will allow you to override the value of any Context Variables irrespective of the Context in which your Job executes. This dialog also allows you to specify new parameters that are passed to your Job. When you override a parameter’s values, no change is made to your Context Scripts. Parameters are overridden by passing them as command-line arguments to your Job.
There are two more small check boxes at the bottom of the window. Java sources, when checked, specifies that the Java source code generated by Talend for your job should be exported into your archive. Similarly, Items represents the Talend definition of your job, i.e the files Talend uses to display and edit the job in the Open studio. If this option is checked, these files too are exported into the archive.
The purpose of all this being to extract as many files as possible so that deployment and hence scheduling becomes simpler. Let us export a job to build an archive. Once we have an archive, let’s try to extract it to see what files are bundled into it. The image below shows some of the files you will receive after extracting your job. Please note that my job name is tSimpleCSVRead2.
We have an executable batch script for windows platforms and even a shell script for Linux platforms. If you run the batch script, you will see the output of you program on screen (given your job dumps the output row to console through tLogRow) . This implies that your job was run by the batch script. Similarly, on a linux machine, if you run the shell script, your program would execute.
Scheduling Jobs in Windows environment
As seen from the example above, double clicking on the <job_name>_run.bat file created in the built archive of the job executes the job. This can also be run from the command line interface to provide the same results. If your job uses a dynamic context parameter, it can be called as follows :-
“<jobName>_run.bat –context_param input=<not standard>”
Now that we know how to execute our job from the command line, we need to know how we can schedule it to run periodically. This can be easily achieved in Windows using the Windows scheduler. You would find this program in your Control Panel>System and Maintenance> Administrative Tools> Task Scheduler.
On the right side of the screen you can see a Create Basic Task link. Clicking here let’s you choose how often you wish the Task to run and at what times.
After selecting the schedule, the scheduler would ask you what you would like to run. Select “Start a program” and provide the path to your batch file in the Program/Script box on the following screen.
On the next screen review you choices and choose finish to create the scheduled task to run your program.
Scheduling jobs in a Linux/ Unix environment
Same way as before, to run the job from Terminal, you would need the following command: –
“bash <jobName>_run.sh –context_param input=<not standard>”
To set up a recurring task in Linux, we would need to program a cron job.
To open up the list of cron jobs, type “crontab -e”
Here you can schedule tasks on each line with the following format:
1 2 3 4 5 /path/to/command arg1 arg2
Each number means:
1: Minutes, 2: Hours, 3: Day, 4: Month, 5: Day of the Week
The following entry runs every day at 11:00 PM:
At TalendExpert.com, we have helped many organisations with our expertise to develop & deploy Talend solutions and we have an excellent feedback from our customers.
If you are looking for help & assistance, check our existing offered packages for consulting we can work together and assist your internal teams, this will save you time & cost and get the best expertise.