Data retention in BPM

Follow

PROBLEM:

Some companies need to retain data for years due to compliance and/or government regulations.  This can lead to increasingly large databases with completed instance data, resulting in poor runtime performance for users, which may conflict with the data retention policies.

It is commonly said that

BPM should not be considered as a system-of-record (SOR).

However, we still need to balance performance with compliance.

REASONS:

IBM BPM makes it very easy to construct large variable types and pass that data throughout a process.  This ease of design leads many people to infer that the system is a SOR. 

Whilst true for active instances and their associated tasks, problems may arise once those instances are completed or terminated and are retained simply for historical purposes.  An increasing number of completed items means the searches (including portal inbox) must evaluate those rows in the database as well.  This will cause performance issues over time as the number of BPM instances and tasks increase (and subsequently the searching takes longer with more rows to examine each time). 

To mitigate the performance degradation over time, we suggest using a second archive or SOR database.

POSSIBLE ARCHITECTURES:

There are a number of possible scenarios to store historical data.

First, recall the Performance Data Warehouse stores timing intervals and tracked data, so the information may be available to meet compliance requirements there.

If the PDW data is not acceptable compliant, here are some alternatives:

  1. Architect your solution with an SOR database.
    • When designing your BPM solution, factor in a SOR database for your important data.  In that manner
      • the data source on the BPM system can change if the SOR has to move, change, etc.
      • the SOR database is completely separate from the Process DB which means the Process DB can only hold current instances and tasks, thereby being more performant and not needing to search old instances/tasks.
      • the SOR can still be reached for reporting or data recovery needs.
  2. Create an Archive database specifically for completed or terminated instances.
    • If you have a solution in place, but are lacking an SOR, then this is another possible path to consider.  The idea is to take the instances and task you would delete and move them to another database.  In this manner, you "archive" the instance and task, but keep the data in a second data source.
      • Using the LSW_BPD_INSTANCE_DELETE stored procedure, you can modify it and the associated stored procedures (LSW_ERASE_TASK, LSW_ERASE_BPD_INSTANCE, etc) to move the instances first to another database before deleting.
      • Your instance data is now stored in a separate "Archive" database that is searchable and mitigates the performance hit on the runtime Process database of active instances and tasks.

CONCLUSIONS:

Data retention is difficult but the main things to remember are:

  1. A separate data store for the SOR or old instance and task data.
  2. Keep the runtime Process database clean to help performance remain high.
  3. BPM is not a SOR database.

 

As always, should you have any questions about this or any article from BP3, please contact our labs group and we will be happy to assist.

 

Have more questions? Submit a request

Comments

  • Avatar
    Rahul Pisal

    In your article one of the approaches you have mentioned include creation of SOR database and you have mentioned "the SOR can still be reached for reporting or data recovery needs"
    How would IBM BPM as a product be report on a DB that is not under its control. What are the ways to build reports/queries to get data from the SOR database. Would this be an out-of-the-box feature or would this require custom coding ?

    Thanks,
    Rahul.

  • Avatar
    Dave Rosen

    Good question Rahul! The SOR would be an additional database to the standard databases that BPM uses. For a review of what those are, look here:
    https://bp3.zendesk.com/hc/en-us/articles/200518608-Databases-History-and-descriptions

    Since this database is specific to your process application or solution, you would have to work with your solution architects to implement/structure the database schema for your needs. The database itself would be accessed via JDBC by creating a Data Source in Websphere.

    As this new SOR database is specific to your process application, it can be structured to suit your needs and queried on to create dashboards and reports for your BPM services.

    If you'd like additional information or assistance, BP3 can help. If you are a BPLabs customer, simply submit a ticket via:
    support@bp-3.com

    or contact us at:
    http://www.bp-3.com/#contact

Powered by Zendesk