Call of Papers for Current Volume **************** OnLine Submission of Paper

Volume & Issue no: Volume 3, Issue 12, December 2014


ETL Transformations Performance Optimization
Author Name:
Sunil Kumar, Dr. M.P. Thapliyal and Dr. Harish Chaudhary
ABSTRACT The ETL data load process is mainly depends on how effectively the ETL transformations are used and configured properly at appropriate place. The proper usage of transformation and with proper setting will always give the best data transformation and loading into respective staging area. A transformation is the defined as object that generate, modify or pass the data to next stage of process. The transformation performs specific function in ETL mapping. The Integration Services (IS) will perform the required task based on type of transformation used in mapping. The transformations are divided into active and passive types of two categories only. The transformation may be connected, unconnected in mapping and mapplets. Generally an unconnected transformation is called within another transformation, and returns a value to that transformation then that returned values are used in subsequent transformations. In ETL data load process, mapping-level optimization takes time to implement, but can significantly boost performance. Sometimes the mapping is the biggest bottleneck in the load process due to business rules that determine the number and complexity of transformations in a mapping. Before deciding on the best route to optimize the mapping architecture, we need to resolve some basic issues. Tuning mappings is a tiered and iterative process. Some factors to consider when choosing tuning processes at the mapping level include the specific environment, software/ hardware limitations, and the number of records going through a mapping. If there is a change in number of rows that pass through the transformation, changes in rows type, change the transaction boundary then this kind of transformations are called active transformations. If a transformation does not change the number of rows that pass through the transformation, maintains the transaction boundary, and maintains the row type then these are called as passive transformation. The use of too many lookups and aggregators will slow down performance because each requires index cache and data cache to process the data. Since both are fighting for memory space, decreasing the number of these transformations in a mapping can help improve performance. Splitting this kind of mappings is another option to gain the performance. Limiting the number of Aggregators in a mapping will also give performance gain because a high number of Aggregators can increase I/O activity on the cache directory of Integration Services unless the seek/access time is fast on the directory itself, having too many aggregators can also cause a bottleneck. Similarly, too many Lookups in a mapping causes contention of disk and memory, that can lead to thrashing, leaving insufficient memory to run a mapping efficiently and smoothly. Keywords: ETL (Extract, Transform, Load), ETL Transformation, Session and Workflow Optimization, Infrastructure , Application and Database Server Optimization.
Cite this article:
Sunil Kumar, Dr. M.P. Thapliyal and Dr. Harish Chaudhary , " ETL Transformations Performance Optimization " , International Journal of Application or Innovation in Engineering & Management (IJAIEM) , Volume 3, Issue 12, December 2014 , pp. 315-321 , ISSN 2319 - 4847.
Full Text [PDF]                           Back to Current Issue