资源同步的一种思路

发表于 2018-07-24 | 分类于技术

在两个节点之间或两个时刻之间需要同步有顺序的资源，例如防火墙策略，如何用比较小的代价来实现资源一致。

有一种思路是类似文本比较，用diff的方法来修改待同步一侧的资源。

需要使用的python的lib库：difflib

difflib

difflib实现了一个类：SequenceMatcher，它可以比较任意两个类型的数据（前提是它们可以hash）组成的list。并给出一个转变的方案。

然通过get_opcodes()方法可以直接获取转变过程中所有的改动。它返回一个5元tuple的列表，描述如何从一个列表转变成另外一个列表。每个tuple的形式：(tag, i1, i2, j1, j2) ，其中tag的解释如下：

Value	Meaning
'replace'	a[i1:i2] should be replaced by b[j1:j2].
'delete'	a[i1:i2] should be deleted. Note that j1 == j2 in this case.
'insert'	b[j1:j2] should be inserted at a[i1:i1]. Note that i1 == i2 in this case.
'equal'	a[i1:i2] == b[j1:j2] (the sub-sequences are equal).

官方描述：

This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are hashable. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980’s by Ratcliff and Obershelp under the hyperbolic name “gestalt pattern matching.” The idea is to find the longest contiguous matching subsequence that contains no “junk” elements (the Ratcliff and Obershelp algorithm doesn’t address junk). The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people.
Timing: The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. SequenceMatcheris quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear.
Automatic junk heuristic: SequenceMatcher supports a heuristic that automatically treats certain sequence items as junk. The heuristic counts how many times each individual item appears in the sequence. If an item’s duplicates (after the first one) account for more than 1% of the sequence and the sequence is at least 200 items long, this item is marked as “popular” and is treated as junk for the purpose of sequence matching. This heuristic can be turned off by setting the autojunk argument to False when creating the SequenceMatcher.

demo代码：

import difflib


class Resource(object):

    def __hash__(self):
        return hash(self.name)

    def __init__(self, name, property):
        self.name = name
        self.property = property

    def __eq__(self, other_obj):
        return self.name == other_obj.name and self.property == other_obj.property

    def __str__(self):
        return "%s:%s" % (self.name, self.property)

    def __repr__(self):
        return self.__str__()


resource_list_old = [Resource('huawei', 'nova'),
                     Resource("apple", "apple9")]

resource_list_new = [Resource("apple", "apple11"),
                     Resource("huawei", "nova"),
                     Resource("apple", "apple10")]


def do_resource_insert(old_start, old_end, new_start, new_end):
    print '[insert ] insert resource_list_new[%d:%d] into resource_list_old at index %d' % (
        new_start, new_end, old_start)
    resource_list_old[old_start:old_end] = resource_list_new[new_start:new_end]


def do_resource_replace(old_start, old_end, new_start, new_end):
    print '[replace] replace resource_list_old[%d:%d] to resource_list_new[%d:%d]' % (
        old_start, old_end, new_start, new_end)
    resource_list_old[old_start:old_end] = resource_list_new[new_start:new_end]


def do_resource_delete(old_start, old_end, new_start, new_end):
    print '[delete ] remove resource_list_old[%d:%d]' % (old_start, old_end)
    del resource_list_old[old_start:old_end]


def do_resource_equal(old_start, old_end, new_start, new_end):
    print '[equal  ] resource_list_old[%d:%d] equal resource_list_new[%d:%d]' % (
        old_start, old_end, new_start, new_end)


def diff_resources():
    print 'Initial data:'
    print 'resource_list_old =', resource_list_old
    print 'resource_list_new =', resource_list_new
    print 'resource_list_old == resource_list_new:', resource_list_old == resource_list_new
    print ""

    matcher = difflib.SequenceMatcher(None, resource_list_old, resource_list_new)
    for tag, old_start, old_end, new_start, new_end in reversed(matcher.get_opcodes()):

        if tag == 'delete':
            do_resource_delete(old_start, old_end, new_start, new_end)
        elif tag == 'equal':
            do_resource_equal(old_start, old_end, new_start, new_end)
        elif tag == 'insert':
            do_resource_insert(old_start, old_end, new_start, new_end)
        elif tag == 'replace':
            do_resource_replace(old_start, old_end, new_start, new_end)

    print ""
    print 'Finish data:'
    print 'resource_list_old =', resource_list_old
    print 'resource_list_new =', resource_list_new
    print 'resource_list_old == resource_list_new:', resource_list_old == resource_list_new

diff_resources()

最终的执行结果是：

Initial data:
resource_list_old = [huawei:nova, apple:apple9]
resource_list_new = [apple:apple11, huawei:nova, apple:apple10]
resource_list_old == resource_list_new: False

[replace] replace resource_list_old[1:2] to resource_list_new[2:3]
[equal  ] resource_list_old[0:1] equal resource_list_new[1:2]
[insert ] insert resource_list_new[0:1] into resource_list_old at index 0

Finish data:
resource_list_old = [apple:apple11, huawei:nova, apple:apple10]
resource_list_new = [apple:apple11, huawei:nova, apple:apple10]
resource_list_old == resource_list_new: True

参考：

difflib — Helpers for computing deltas

python list

< AMQP学习笔记

cloudinit解惑 - 简介 >

仅有 1 条评论

太专业了表示看不懂

腾讯云 July 28th, 2018 at 02:23 pm回复

资源同步的一种思路

difflib

仅有 1 条评论

发表新评论