pgAdmin 1.6 online documentation
14. Log Shipping - Slony-I with Files
One of the new features for 1.1, that only really stabilized as
of 1.2.11, is the ability to serialize the updates to go out into log
files that can be kept in a spool directory.
The spool files could then be transferred via whatever means
was desired to a “slave system,” whether that be via FTP,
rsync, or perhaps even by pushing them onto a 1GB “USB
key” to be sent to the destination by clipping it to the ankle
of some sort of “avian transport” system.
There are plenty of neat things you can do with a data stream
in this form, including:
Replicating to nodes that
aren't securable
Replicating to destinations where it is not
possible to set up bidirection communications
Supporting a different form of PITR
(Point In Time Recovery) that filters out read-only transactions and
updates to tables that are not of interest.
-
If some disaster strikes, you can look at the logs
of queries in detail
This makes log shipping potentially useful even though you
might not intend to actually create a log-shipped
node.
This is a really slick scheme for building load for
doing tests
We have a data “escrow” system that
would become incredibly cheaper given log shipping
-
You may apply triggers on the “disconnected
node ” to do additional processing on the data
For instance, you might take a fairly “stateful”
database and turn it into a “temporal” one by use of
triggers that implement the techniques described in
[Developing Time-Oriented Database Applications in SQL
] by Richard T. Snodgrass.
- 14.1. Where are the “spool files” for a
subscription set generated?
- 14.2. What takes place when a FAILOVER/ MOVE SET takes
place?
- 14.3. What if we run out of “spool
space”?
- 14.4. How do we set up a subscription?
- 14.5. What are the limitations of log
shipping?
|
14.1.
|
Where are the “spool files” for a
subscription set generated? |
|
Any slon subscriber node
can generate them by adding the -a option.
Note
Notice that this implies that in order to use log
shipping, you must have at least one subscriber node.
|
|
14.2.
|
What takes place when a FAILOVER/ MOVE SET takes
place? |
|
Nothing special. So long as the archiving node remains
a subscriber, it will continue to generate logs. |
|
Warning
If the archiving node becomes the origin, on
the other hand, it will continue to generate logs.
|
|
14.3.
|
What if we run out of “spool
space”? |
|
The node will stop accepting SYNCs
until this problem is alleviated. The database being subscribed to
will also fall behind. |
|
14.4.
|
How do we set up a subscription? |
|
The script in tools called
slony1_dump.sh is a shell script that dumps
the “present” state of the subscriber node.
You need to start the slon for the subscriber node with logging turned on.
At any point after that, you can run
slony1_dump.sh, which will pull the state
of that subscriber as of some SYNC event. Once the
dump completes, all the SYNC logs generated from
the time that dump started may be added to the
dump in order to get a “log shipping subscriber.”
|
|
14.5.
|
What are the limitations of log
shipping? |
|
In the initial release, there are rather a lot of
limitations. As releases progress, hopefully some of these
limitations may be alleviated/eliminated. |
|
The log shipping functionality amounts to
“sniffing” the data applied at a particular subscriber
node. As a result, you must have at least one “regular”
node; you cannot have a cluster that consists solely of an origin and
a set of “log shipping nodes.”. |
|
The “log shipping node” tracks the
entirety of the traffic going to a subscriber. You cannot separate
things out if there are multiple replication sets. |
|
The “log shipping node” presently only
fully tracks SYNC events. This should be
sufficient to cope with some changes in cluster
configuration, but not others.
A number of event types are handled in
such a way that log shipping copes with them:
SYNC events are, of course,
handled.
DDL_SCRIPT is handled.
-
UNSUBSCRIBE_SET
This event, much like SUBSCRIBE_SET is not
handled by the log shipping code. But its effect is, namely that
SYNC events on the subscriber node will no longer
contain updates to the set.
Similarly, SET_DROP_TABLE,
SET_DROP_SEQUENCE,
SET_MOVE_TABLE,
SET_MOVE_SEQUENCE DROP_SET,
MERGE_SET, will be handled
“appropriately”.
-
SUBSCRIBE_SET
Unfortunately, there is some “strangeness” in the
handling of this... When SUBSCRIBE_SET occurs, it
leads to an event being raised, and processed, purely on the
subscriber, called ENABLE_SUBSCRIPTION.
SUBSCRIBE_SET is really quite a simple
event; it merely declares that a node is
subscribing to a particular set via a particular provider. It doesn't copy data!
The meat of the subscription work is done by
ENABLE_SUBSCRIPTION, which is an event that is
raised on the local node, not in the same sequence as the other events
coming from other nodes (notably the data provider).
With revisions in sequencing of logs that took place in 1.2.11,
this now presents no problem for the user.
The various events involved in node configuration are
irrelevant to log shipping:
STORE_NODE,
ENABLE_NODE,
DROP_NODE,
STORE_PATH,
DROP_PATH,
STORE_LISTEN,
DROP_LISTEN
Events involved in describing how particular sets are
to be initially configured are similarly irrelevant:
STORE_SET,
SET_ADD_TABLE,
SET_ADD_SEQUENCE,
STORE_TRIGGER,
DROP_TRIGGER,
TABLE_ADD_KEY
|
|
It would be nice to be able to turn a “log
shipped” node into a fully communicating Slony-I node that you
could failover to. This would be quite useful if you were trying to
construct a cluster of (say) 6 nodes; you could start by creating one
subscriber, and then use log shipping to populate the other 4 in
parallel.
This usage is not supported, but presumably one could add the
Slony-I configuration to the node, and promote it into being a new
node. Again, a Simple Matter Of Programming (that might not
necessarily be all that simple)...
|
Note
Here are some more-or-less disorganized notes about how
you might want to use log shipping...
-
You don't want to blindly apply
SYNC files because any given
SYNC file may not be the right
one. If it's wrong, then the result will be that the call to
setsyncTracking_offline() will fail, and your
psql session will ABORT, and then run through the remainder of that
SYNC file looking for a COMMIT
or ROLLBACK so that it can try to move on to the
next transaction.
But we know that the entire remainder of
the file will fail! It is futile to go through the parsing effort of
reading the remainder of the file.
Better idea:
Read the first few lines of the file, up to and
including the setsyncTracking_offline()
call.
Try to apply it that far.
If that failed, then it is futile to continue;
ROLLBACK the transaction, and perhaps consider
trying the next file.
If the setsyncTracking_offline() call succeeds, then you have the right next
SYNC file, and should apply it. You should
probably ROLLBACK the transaction, and then use
psql to apply the entire file full of
updates.
In order to support the approach of grabbing just the first few
lines of the sync file, the format has been set up to have a line of
dashes at the end of the “header” material:
-- Slony-I log shipping archive
-- Node 11, Event 656
start transaction;
select "_T1".setsyncTracking_offline(1, '655', '656', '2005-09-23 18:37:40.206342');
-- end of log archiving header
-
Note that the header includes a timestamp indicating
SYNC time.
-- Slony-I log shipping archive
-- Node 11, Event 109
start transaction;
select "_T1".setsyncTracking_offline(1, '96', '109', '2005-09-23 19:01:31.267403');
-- end of log archiving header
This timestamp represents the time at which the
SYNC was issued on the origin node.
The value is stored in the log shipping configuration table
sl_setsync_offline.
If you are constructing a temporal database, this is likely to
represent the time you will wish to have apply to all of the data in
the given log shipping transaction log.
You may find information on how relevant activity is
logged in Section 24.4.1, “ Log Messages Associated with Log Shipping ”.
As of 1.2.11, there is an even better idea
for application of logs, as the sequencing of their names becomes more
predictable.
-
The table, on the log shipped node, tracks which log
it most recently applied in table
sl_archive_tracking.
Thus, you may predict the ID number of the next file by taking
the latest counter from this table and adding 1.
-
There is still variation as to the filename,
depending on what the overall set of nodes in the cluster are. All
nodes periodically generate SYNC events, even if
they are not an origin node, and the log shipping system does generate
logs for such events.
As a result, when searching for the next file, it is necessary
to search for files in a manner similar to the following:
ARCHIVEDIR=/var/spool/slony/archivelogs/node4
SLONYCLUSTER=mycluster
PGDATABASE=logshipdb
PGHOST=logshiphost
NEXTQUERY="select at_counter+1 from \"_${SLONYCLUSTER}\".sl_archive_tracking;"
nextseq=`psql -d ${PGDATABASE} -h ${PGHOST} -A -t -c "${NEXTQUERY}"
filespec=`printf "slony1_log_*_%20d.sql"
for file in `find $ARCHIVEDIR -name "${filespec}"; do
psql -d ${PGDATABASE} -h ${PGHOST} -f ${file}
done
14.2. slony_logshipper Tool
As of version 1.2.12, Slony-I has a tool designed to help
apply logs, called slony_logshipper. It is
run with three sorts of parameters:
-
Options, chosen from the following:
-
h
display this help text and exit
-
v
display program version and exit
-
q
quiet mode
-
l
cause running daemon to reopen its logfile
-
r
cause running daemon to resume after error
-
t
cause running daemon to enter smart shutdown mode
-
T
cause running daemon to enter immediate shutdown mode
-
c
destroy existing semaphore set and message queue (use with caution)
-
f
stay in foreground (don't daemonize)
-
w
enter smart shutdown mode immediately
-
A specified log shipper configuration file
This configuration file consists of the following specifications:
-
logfile = './offline_logs/logshipper.log';
Where the log shipper will leave messages.
-
cluster name = 'T1';
Cluster name
-
destination database = 'dbname=slony_test3';
Optional conninfo for the destination database. If given, the log shipper will connect to thisdatabase, and apply logs to it.
-
archive dir = './offline_logs';
The archive directory is required when running in “database-connected” mode to have a place to scan for missing (unapplied) archives.
-
destination dir = './offline_result';
If specified, the log shipper will write the results of data massaging into result logfiles in this directory.
-
max archives = 3600;
This fights eventual resource leakage; the daemon will enter “smart shutdown” mode automatically after processing this many archives.
-
ignore table "public"."history";
One may filter out single tables from log shipped replication
-
ignore namespace "public";
One may filter out entire namespaces from log shipped replication
-
rename namespace "public"."history" to "site_001"."history";
One may rename specific tables.
-
rename namespace "public" to "site_001";
One may rename entire namespaces.
-
post processing command = 'gzip -9 $inarchive';
Pre- and post-processign commands are executed via system(3).
An “@” as the first character causes the exit code to be ignored. Otherwise, a nonzero exit code is treated as an error and causes processing to abort.
Pre- and post-processing commands have two further special variables defined:
-
error command = ' ( echo
"archive=$inarchive" echo "error messages:" echo "$errortext" ) | mail
-s "Slony log shipping failed" postgres@localhost ';
The error command indicates a command to execute upon encountering an error. All logging since the last successful completion of an archive is available in the $errortext variable.
In the example shown, this sends an email to the DBAs upon
encountering an error.
|
|