This part of the series focuses on optimizing the data flow components of a data flow task. Before getting into the “nitty gritty”, it is important to understand the data flow’s architecture. the SSIS data flow:
the data flow consists of three types of components: source, transformation, and destination. Source components extract data, transformation components transform data, and destination components load data. Each of these three phases of data flow processing can be tuned for performance. However, there isn’t a “silver bullet” which solves all data flow performance problems. Below are a multitude of strategies, methods and techniques that can be employed to performance tune the data flow components.
Extraction tuning
- Increase the connection manager’s packet size property: Use separate connection managers for bulk loading and smaller packet size for ole db command transformations.
- Affinitize network connections: This can be accomplished if a machine has multiple cores and multiple NICs.
- Tune Queries
- Select only needed columns
- Use a hint to specify that no shared locks be used during the select (query can potentially read uncommitted data). Used only if the query must have the best performance.
- Lookups
- Select only needed columns.
- Use the “Shared Lookup Cache” (available in SSIS 2008 only).
- Sorting: The Merge and Merge Join transformations require sorted inputs. Source data for these transformations that is already sorted obviates the need for an upstream Sort transformation and improves data flow performance. The following properties must be configured on a source component if the source data is already sorted:
- IsSorted: The outputs of a source component have a property called IsSorted. The value of this property must be true.
- SortKeyPosition: Each output column of a source component has this property, which indicates whether a column is sorted, the column’s sort order, and the sequence in which multiple columns are sorted. This property must be set for each column of sorted data.
Transformation tuning
- Partially Blocking (asynchronous): Merge, Merge Join, Union All can possibly be optimized in the source query.
- Use SSIS 2008
- Improved data flow task scheduler
- Union All transforms no longer necessary to split up and parallelize execution trees.
- Blocking Transforms (asynchronous): Aggregate, Sort, Pivot, UnPivot should be limited to one per data flow on the same data.
- Aggregate Transformation: This transformation includes the Keys, KeysScale, CountDistinctKeys, and CountDistinctScale properties, which improve performance by enabling the transformation to preallocate the amount of memory that the transformation needs for the data that the transformation caches. If the exact or approximate number of groups that are expected to result from a Group by operation is known, then set the Keys and KeysScale properties, respectively. If the exact or approximate number of distinct values that are expected to result from a Distinct count operation is known, then set the CountDistinctKeys and CountDistinctScale properties, respectively.
If the creation of multiple aggregations in a data flow is necessary, then consider the creation of multiple aggregations that use one Aggregate transformation instead of creating multiple transformations. Performance is improved with this approach because when one aggregation is a subset of another aggregation, the transformation’s internal storage is optimized by scanning the incoming data only once. For example, if an aggregation uses a GROUP BY clause and an AVG aggregation, then performance can be improved by combining them into one transformation. However, aggregation operations are serialized when multiple aggregations are performed within one Aggregate transformation. Therefore, performance might not be improved when multiple aggregations must be computed independently.
- Merge Join Transformation
- MaxBuffersPerInput: This property specifies the maximum number of buffers that can be active for each input at one time. This property can be used to tune the amount of memory that the buffers consume, and consequently, the performance of the transformation. As the number of buffers increases, the more memory the transformation uses…which improves performance. The default value of MaxBuffersPerInput is 5. This is the number of buffers that works well in most scenarios. Performance can be tuned by using a slightly different number of buffers, such as 4 or 6. Using a very small number of buffers should be avoided if possible. For example, there is a significant impact on performance when MaxBuffersPerInput is set to 1 instead of 5. Additionally, MaxBuffersPerInput shouldn’t be set to 0 or less. Throttling doesn’t occur with this range of values. Also, depending on the data load and the amount of memory available, the package may not complete.
- Slowly Changing Dimension Wizard: This wizard creates a set of data flow transformation components which work together with the slowly changing dimension transformation component. This wizard creates OLE DB Command transformation components that perform UPDATEs against a single row at a time. Performance can be improved by replacing these transformation components with destination components that save all rows to be updated to a staging table. Then, an Execute SQL task can be added that performs a single set-based T-SQL UPDATE statement against all rows at the same time.
- Data Types:
- Use the smallest possible data types in the data flow.
- Use the CAST or CONVERT functions in the source query if possible.
- Miscellaneous:
- Sort in the source query if possible.
- If possible, use the t-sql MERGE statement instead of the SCD transformation.
- If possible, use the t-sql INSERT INTO statement instead of the data flow.
- A data reload may perform better than a delta refresh.
Load Tuning
- Use the SQL Server Destination
- Only helps if the data flow and the destination database are on the same machine.
- Weaker error handling than the OLE DB Destination.
- Set Commit Size = 0
- Use the OLE DB Destination
- Set Commit Size = 0
- Drop Indexes based on the expected % load growth.
- Don’t drop an index if it’s the only clustered index: Data in a table is sorted by a clustered index. Primary keys are clustered indexes. Loading will always be faster than dropping and recreating a primary key, and usually be faster than dropping and re-creating a clustered index.
- Drop a non-clustered index if the load will cause a 100% increase: This is a rule of thumb.
- Don’t drop non-clustered indexes if the load increase is under 10%: Not a rule of thumb. Experiment to find the optimal value.
- Use partions if necessary
- Use SQL Server Profiler to trace the performance.
- Use the t-sql TRUNCATE statement instead of the t-sql DELETE statement: DELETE is a logged operation which performs slower than TRUNCATE.
- Affinitize the network
No comments:
Post a Comment