At TechEd (https://lowendahl.net/showShout.aspx?id=169) last year this feature got a massive round of applauds. The ability to compress backups and data- and log files. Kalen gave a great talk on this today where she show cased the benefits you get from the new compression features.
Since large databases are expensive in terms of storage cost, performance costs and maintainability costs. Microsoft decided to help DBA's out and ensure that the database can be as small as possible. For the best advantage both for data in memory and on disk, they decided to look at the row storage in the page. The trade off for this limitation in size will be CPU cycles since compression / decompression will add load to the processors.
There is three possibilities in compression:
Row compression
- Compresses data in the columns and makes every row use less space. The compression algorithm used can't take advantage of repeating values in the whole page. This is an extension to the "vardecimal datatype" that was introduced to SQL Server 2005 / SP2 where it tried to store decimals with as few bytes as possible. Row compression will give you a reasonable compression with a small overhead for the CPU.
Page compression
- Compresses whole page and can use a compression algorithm that utilizes the fact that data can be repeated over the whole page. Page also includes row compression. Page compression will give you the best compression with a high CPU cost.
Backup compression
- Compresses the backup file which leads to less disk I/O and smaller backup files.
One really interesting option for the compression is the ability to have different compression for different data partitions, which makes it possible to compress data based on rules. Maybe you got a year's worth of data where only the last couple of months are frequently accessed. To get the best performance / compression you can partion that data and apply row compression while the older data will get the full page compression. I can see how this will be very important in data warehouse scenarios.
Speaking about data warehouses; Kalen stressed that the design focus for the compression feature was for large data warehouse installations where we have to handle a vast amount of data and therefor the trade off with CPU power is acceptable, for scenarios where CPU is more important, just stay off compression.
Kalen digged deep into how the compression works, being an application developer with limited knowledge of how pages are layed out, I did get that it was effective, but the exact details eluded me. But that's fine, I'm not going to switch career and start hunting bytes in data pages of SQL Server.
Over all, this session gave a great insight in the problems that compression is set out to solve and how SQL Server actually does the compression.
Thank you Kalen :)
--
Sql Summit 2008 Event site:
http://www.expertzone.se/sql2k8/
More on the compression features:
http://blogs.msdn.com/sqlserverstorageengine/archive/2007/11/12/types-of-data-compression-in-sql-server-2008.aspx
Kalen's blog:
http://sqlblog.com/blogs/kalen_delaney/
|