The Data Charmer: Tungsten Replicator Filters: A trove of golden secrets unveiled

UPDATE: This post is obsolete, for most of the URLs referenced here have been removed by the contents owner.

Since I joined the company in late 2010, I have known that one of the strong points of Tungsten Replicator is its ability of setting filters. The amazing capabilities offered by Tungsten filters cannot be fully grasped unless we explain how stage replication works.

There are several default stages in the replication stream. Every stage has an extraction task and an apply task. The extraction task will get data from the previous step repository and the apply task will save the data to the next repository, which can be either a temporary storage (memory queue, THL file) or the final destination (slave database server). Consider that the architecture allows developers to add stages, and you will appreciate its full power. For every stage, we can insert one or more filter between the two tasks. This architecture gives us ample freedom to design our pipelines.

What’s more interesting, filters in Tungsten can be of two types:

built-in filters, written in Java, which require users to compile and build the software before the filter is available;
Javascript filters, which can do almost everything the built-in scripts can do, but don’t require recompiling. All you need is deploy the filter in the appropriate folder, and run a tpm command to install or update the replicator.

The Tungsten team has developed a large number of filters, either to achieve general purpose pipeline control (such as renaming objects, excluding or including schemas and tables from replication) or filters providing very specific tasks that were needed to meet customer requirements or to help implementing heterogeneous pipelines (such as converting ENUM and SET columns to strings, dropping UPDATE and DELETE events, normalize DDL, etc). There are 23 built-in and 18 Javascript filters available. They were largely undocumented, or sparsely documented in the code. Now, thanks to my colleague MC Brown, Continuent director of documentation, the treasure is there for the taking.

First off, there is a general guide of filter usage, which tells you how to enable or disable filters.
Then, we have a full list of the built-in filters with their purpose, required parameters, and some applicability notes.
For Javascript filters there is a section that explains in detail How to create and using Javascript filters, followed by the Reference list of all available filters.

Tungsten filters are a beautiful feature to explore, but also a powerful tool for users with special needs who want to create their own customized pipeline. A word of warning, though. Filters are powerful and sometimes fun to implement and use, but they don’t come free. There are two main things to keep in mind:

Using more than one filter for the same stage may result in performance loss. The amount of performance loss depends on the stage and on your systems resources. For example if your replication weakest point is data transfer across the network, adding two or three filters to this stage (remote-to-thl) will make things worse. If you can apply the filter to the previous or next stages, you may achieve your purpose without many side effects. As always, the advice is: “test, benchmark, repeat.”
Filters that alter data (such as exclude schemas or table, drop events) will make the slave unsuitable for promotion. You should always ask yourself if the filter may make the slave a lousy master, and if it does, make sure you have at least another slave that is replicating without filters.

As a final parting thought, be aware that yours truly and the above-mentioned MC Brown will be speaking at Percona Live London 2013, with a full, take-no-prisoners, 6 hours long complete Tungsten Replicator tutorial, where we will cover filters and give some explosive and sensational examples.

2 comments:

Unknown said...: Data Charmer,

I have clicked on all of the provided links which redirects me to error 404.

Thank you.
Sai; January 14, 2019 at 8:57:00 PM GMT+1
Giuseppe Maxia said...: Unfortunately, the URLs used in this article were removed by the contents owner. Sorry for the inconvenience.; January 14, 2019 at 9:05:00 PM GMT+1