Performance concerns are often a topic we discuss with BP3 customers and we use the term “layers of an onion” to describe the process. Just when you peel back one layer, there's another one just below (also about to make you cry). Recently, I started to peel back another layer of the performance onion related to database compression and how it might impact applications like BPM and Brazos Portal. When this issue was first presented, we simply did not know enough about how database compression might impact the performance of a system, so we started researching.
What is database compression?
In simple terms, it's compressing data in the table space to reduce the amount of disk space needed, but as with everything, there is a cost and in our case, the cost is CPU cycles to compress and decompress the data as needed. Unfortunately, as this article (http://enterprisesystemsmedia.com/article/top-10-ways-to-waste-cpu-using-db2#sr=g&m=o&cp=or&ct=-tmc&st=%28opu%20qspwjefe%29&ts=1465583683) points out:
“DB2 compression often is used everywhere without any verification there’s a benefit and not a detriment. Often, shops have a standard to compress everything.”
It can be beneficial in some cases, but first, we should understand why there is a trade-off between compression and CPU performance.
To compress or not to compress...that is the question.
There are several factors to consider for database compression. We will not be discussing them all, but rather focus on just the most common practices:
- Database row size -
- Short rows – compression can yield diminishing returns because the 8 bytes of overhead required to store each record in a data page...meaning some “compressed” rows are actually bigger than the original. In this case, DB2 does not compress the row.
- Long rows – compression may not yield reduction in size because the DB2 rows cannot span data pages. In this case, you can use larger page sizes, but again, this can be problematic since, as the page size grows, so does the overhead in CPU and memory required.
- Table space size – compression tends to work well for table spaces that are large and relatively static.
- This is because if the table space is compressed, then trying to INSERT, UPDATE, or DELETE from this table space is expensive due to the operations of decompressing, running the query, and recompression.
- If, on the other hand, the table space has more SELECTS run on the table space, then the space is relatively static data and would likely be a good candidate for compression.
What is the data flow in BPM and Brazos Portal?
To understand whether or not using compression is a good idea, we need to understand the data flow from the BPM tables and the Brazos Portal tables. In general, there are two parts to this process:
- Getting data from BPM to Brazos Portal
- Sending data from Brazos Portal to the connected clients (browsers)
It is really the first part of this process (Getting data from BPM to Brazos Portal) we need to look at and this can be done in two ways (depending on a customers specific needs):
- RESTAPI - This is the default setup. There are 3 pollers that can impact database events that run at different intervals and are run for different reasons. With this method, the Brazos Portal TASK table is updated but then a trigger on the Brazos Portal TASK table updates the EVENT_LOG table. So at a high level we have:
- Poller 1 – looks for open tasks in active instances and runs every 2 seconds
- Poller 2 – looks for closed tasks in active instances (can update business data) and runs every 30 seconds.
- Poller 3 – looks for closed or completed instances and does not look at tasks. This runs every 30 minutes
- The above pollers update the TASK table in Brazos portal which also sets off a DB trigger.
- The DB trigger then updates the EVENT_LOG table.
- Database Triggers
- In this case, triggers are placed on the BPM tables so that when updates/inserts into the LSW_TASK table, new entries are put into the STAGED_EVENT table.
- A thread on the server periodically looks at the STAGED_EVENT table to make updates to the Brazos Portal TASK table or, if business data changed, update the EVENT_LOG table. This runs every 500ms.
All that data flow means that there will likely be a high level of INSERT, UPDATE, and DELETE events going through those tables.
When is it appropriate to apply compression to my BPM or Brazos Portal databases?
As IBM notes (http://www.ibm.com/developerworks/data/library/techarticle/dm-1205db210compression/):
- Tables that are read only are excellent candidates for row compression. Tables with a read/write ratio of 70 percent or more reads and 30 percent or less writes are good candidates for row compression.
- Tables that experience only a limited number of updates are likely to be good candidates for row compression.
- Tables that undergo heavy updates might not be as good candidates for row compression.
- Large tables accessed mostly through table scans rather than index scans are good candidates for compression. This generally includes large fact tables in data warehouses, where a significant number of queries perform heavy aggregations.
Bearing this in mind, it would appear that compression is not something we generally recommend. This is because the influx of tasks and data would be changing the tables resulting in a high percent of updates.
So now we know that tables with generally static data that are doing lots of SELECTS are good candidates for compression. Those tables that are highly transactional or often undergo UPDATES, DELETES, and INSERTS, are not.
Thus, with Brazos Portal, and the changing data in the to the TASK and EVENT_LOG tables, compression doesn't make sense. Other tables within the Brazos Portal database may be candidates for compression, but in general, compression is not recommended.