Setting up a redundant MySQL with HAST and CARP

Common Address Redundancy Protocol ( CARP )

  1. 角色說明
    主要會有兩種角色, 一種是Host(  Primary ), 一種是Provider ( Secondary )
    Host 主要就是我們提供服務的機器
    Provider 就是當Host 掛掉後, 接手服務的機器
  2. 參數說明 ( 這部份看原文吧, 這是我看OpenBSD上找來的, FreeBSD上敘述的比較少, 請大家搭配著一起服用 )
    vhid
    The Virtual Host ID. This is a unique number that is used to identify the redundancy group to other nodes on the network. Acceptable values are from 1 to 255.
    password
    The authentication password to use when talking to other CARP-enabled hosts in this redundancy group. This must be the same on all members of the group.
    carpdev
    This optional parameter specifies the physical network interface that belongs to this redundancy group. By default, CARP will try to determine which interface to use by looking for a physical interface that is in the same subnet as the ipaddress and mask combination given to the carp(4) interface.
    advbase
    This optional parameter specifies how often, in seconds, to advertise that we’re a member of the redundancy group. The default is 1 second. Acceptable values are from 1 to 255.
    advskew
    This optional parameter specifies how much to skew the advbase when sending CARP advertisements. By manipulating advskew, the master CARP host can be chosen. The higher the number, the less preferred the host will be when choosing a master. The default is 0. Acceptable values are from 0 to 254.
    state
    Force a carp(4) interface into a certain state. Valid states are init, backup, and master.
    group, -group
    Add or remove a carp(4) interface to a certain interface group. By default all carp(4) interfaces are added to the carp group. Each group has a carpdemote counter affecting all carp(4) interfaces belonging to that group. As described below, it can be useful to group certain interfaces together for failover purposes.
    ipaddress
    This is the shared IP address assigned to the redundancy group. This address does not have to be in the same subnet as the IP address on the physical interface (if present). This address needs to be the same on all hosts in the group, however.
    mask
    The subnet mask of the shared IP.
  3. 實作
    要用CARP的功能有兩個方法
    一個是 rebuild kernel

    device carp

    另一個是load Kernel Module, 在/boot/loader.conf加入下面這行

    if_carp_load="YES"

    我個人是傾向用Kernel Module, 這以用freebsd-update 昇級時比較方便.

    接下來先在hasta ( Host ) 這台機器中, 編輯/etc/rc.conf 加入下面的設定

    hostname="hasta.example.org"
    ifconfig_fxp0="inet 192.168.1.51 netmask 255.255.255.0"
    cloned_interfaces="carp0"
    ifconfig_carp0="vhid 1 pass testpass 192.168.1.50/24"

    再編輯 hastb ( Provider ) 的 /etc/rc.conf

    hostname="hastb.example.org"
    ifconfig_fxp0="inet 192.168.1.52 netmask 255.255.255.0"
    cloned_interfaces="carp0"
    ifconfig_carp0="vhid 1 advskew 100 pass testpass 192.168.1.50/24"

    然後Host重開機, 等個幾秒再重開 Provider, 不重開機的話, 也可以試試

    # kldload if_carp.ko
    # ifconfig carp0 create
    # ifconfig carp0 down && ifcofig carp0 up ( 兩台都要, 先Host, 再Provider )

    這樣Host 應該就會拿到192.168.1.50的IP了, 從同網段的機器ping 一下, 看是不是有成功

  4. failback
    在 Host 那台執行

    # sysctl net.inet.carp.preempt=1

    就可以了, 若搭配了HAST, 會滿不建議這樣處理的, 最好還是人工去看狀況怎麼樣, 再做調整及切換

Highly Available Storage  ( HAST )

  1. 角色說明
    primary ( carp 中的 Host ) => 會把資料傳給slave 叫他乖乖的寫進去
    secondary ( carp 中的 provider ) => 乖乖的收 master 的資料來同步
  2. 實作
    編輯兩台機器上的/etc/hast.conf, 加入以下資料

    resource test {
        on hasta {
            local /dev/ad6 ( 看你那顆HD要來做同步 )
            remote 192.168.1.52
        }
        on hastb {
            local /dev/ad6
            remote 192.168.1.51
        }
    }

    及執行下列指令

    # hastctl create test
    # /etc/rc.d/hastd onestart

    在 “primay” 機器上

    hastctl role primary test

    在 “secondary” 機器上

    # hastctl role secondary test

    接下來在 primary 就newfs, mount

    # newfs -U /dev/hast/test
    # mkdir -p /hast/test
    # mount /dev/hast/test /hast/test

    最後在編輯兩台機器/etc/rc.conf

    # hastd_enable="YES"

    這樣系統Boot後HAST就會自已啟動了…

  3. Failover
    在之前的Wiki上是要搭配ifstated一起服用
    新的文件上則是透過devd來做
    作法如下
    編輯兩台機器的 /etc/devd.conf

    notify 30 {
        match "system" "IFNET";
        match "subsystem" "carp0";
        match "type" "LINK_UP";
        action "/usr/local/sbin/carp-hast-switch master";
    };
    
    notify 30 {
        match "system" "IFNET";
        match "subsystem" "carp0";
        match "type" "LINK_DOWN";
        action "/usr/local/sbin/carp-hast-switch slave";
    };

    詳細的意思可以man devd.conf
    接著重啟devd

    # /etc/rc.d/devd restart

    再來就是編輯/usr/local/sbin/carp-hast-switch 了( 兩台都要有 )

    #!/bin/sh
    
    # Original script by Freddie Cash <fjwcash@gmail.com>
    # Modified by Michael W. Lucas <mwlucas@BlackHelicopters.org>
    # and Viktor Petersson <vpetersson@wireload.net>
    
    # The names of the HAST resources, as listed in /etc/hast.conf
    resources="test"
    
    # delay in mounting HAST resource after becoming master
    # make your best guess
    delay=3
    
    # logging
    log="local0.debug"
    name="carp-hast"
    
    # wait_count
    wait_count=7
    
    # end of user configurable stuff
    
    case "$1" in
        master)
            logger -p $log -t $name "Switching to primary provider for ${resources}."
            sleep ${delay}
    
            # Wait for any "hastd secondary" processes to stop
            for disk in ${resources}; do
                while $( pgrep -lf "hastd: ${disk} (secondary)" > /dev/null 2>&1 && [ $wait_count -gt 0 ] ); do
    								logger -p $log -t $name "countdown => ${wait_count}."
    								wait_count=`expr $wait_count - 1`
                    sleep 1
                done
    						wait_count=7
    
                # Switch role for each disk
                hastctl role primary ${disk}
                if [ $? -ne 0 ]; then
                    logger -p $log -t $name "Unable to change role to primary for resource ${disk}."
                    exit 1
                fi
            done
    
            # Wait for the /dev/hast/* devices to appear
            for disk in ${resources}; do
                for I in $( jot 60 ); do
                    [ -c "/dev/hast/${disk}" ] && break
                    sleep 0.5
                done
    
                if [ ! -c "/dev/hast/${disk}" ]; then
                    logger -p $log -t $name "GEOM provider /dev/hast/${disk} did not appear."
                    exit 1
                fi
            done
    
            logger -p $log -t $name "Role for HAST resources ${resources} switched to primary."
    
            logger -p $log -t $name "Mounting disks."
            for disk in ${resources}; do
                mkdir -p /hast/${disk}
                fsck -p -y -t ufs /dev/hast/${disk}
                mount /dev/hast/${disk} /hast/${disk}
    	    ## start mysql server
    	    logger -p $log -t $name "start mysql"
    	    /bin/sh /usr/local/etc/rc.d/mysql-server start
            done
    
        ;;
    
        slave)
            logger -p $log -t $name "Switching to secondary provider for ${resources}."
    
            # Switch roles for the HAST resources
            for disk in ${resources}; do
                if ! mount | grep -q "^/dev/hast/${disk} on "
                then
                else
    	        ## stop mysql server
                    logger -p $log -t $name "stop mysql"
    	        /bin/sh /usr/local/etc/rc.d/mysql-server stop
    	        sleep 0.5
    	        ## umount
                    logger -p $log -t $name "umount ${disk}."
                    umount -f /hast/${disk}
                fi
                sleep $delay
                hastctl role secondary ${disk} 2>&1
                if [ $? -ne 0 ]; then
                    logger -p $log -t $name "Unable to switch role to secondary for resource ${disk}."
                    exit 1
                fi
                logger -p $log -t $name "Role switched to secondary for resource ${disk}."
            done
        ;;
    esac

    這個script我有改了一小部份, 加入了mysql 的啟動, 和最大等待次數

  4. 測試方法
    # ifconfig carp0 down && ifconfig carp0 up

    然後用 hastctl status test 觀看吧

  5. 錯誤回復
    確定那一台的資料比較新, 在舊的那台執行

    # hastctl role init <resource>
    # hastctl create <resource>
    # hastctl role secondary <resource>

    然後觀看 primary 的 HAST status 應該會發現 dirty: 的部份會重跑

MySQL

  1. 請把DB 的 dir 設到HAST 上面
    我的/etc/rc.conf 設定如下

    ## MYSQL
    mysql_enable="YES"
    mysql_dbdir="/hast/test/mysql"
    mysql_args="--bind-address=192.168.1.50 --skip-name-resolve"

    然後把/usr/local/etc/rc.d/mysql-server 的權限設成000
    怕mysql 在開機後自已啟了

注意事項

  1. 不要在 dirty: 還有值的情況下去切換 primary, secondary 會爆炸的
  2. 請打開 promiscuous model 不然完全ping不到vip
  3. 可以把 secondary 這台機器的開機加上delay, 以防整個機房跳電, primary, secondary 同時開機造成split-brain
    編輯/boot/loader.conf

    ## delay secondary boot
    autoboot_delay="240"

參考資料

  1. http://www.freebsd.org/doc/handbook/carp.html
  2. http://www.freebsd.org/doc/handbook/disks-hast.html
  3. http://www.openbsd.org/faq/pf/carp.html
  4. http://developer.51cto.com/art/200509/3863.htm
  5. man hast
  6. man hastctl
  7. man carp
  8. man devd.conf
  9. /usr/src/sys/netinet/ip_carp.c ( 因為我man carp 怎麼樣都沒看到他說 net.inet.carp.preempt 設成1 就會failback……, 所以就開這個來看, 裡面的 http://paste.plurk.com/show/394406/ 有寫到 )

debug

  1. /var/log/message
  2. /var/log/debug.log

截圖
pic

發佈留言

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料