вторник, 31 июля 2012 г.

Notes on configuring two-nodes proxmox cluster with drbd-backed storage

We had a task to deploy two new visualization servers with possibility of live migration and high availability data. The second means that in case of physical server failure you don't want faulted VMs to be powered up automagically on another node, but just that you can do it by hand in five minutes.

We decided to use proxmox VE 2, because it's free, we have experience of maintaining proxmox 1.9 systems and because it supports live migration without shared storage.

So, we configured two nodes with 4 additional LVM volume groups each: for VZ data for each node (n1vz with one lvm mounted on first node on /var/lib/vz and n2vz with one volume mounted on /var/lib/vz/ on second, n1kvm and n2kvm as VM disk storage on each node, n1kvm is used by VMs running normally on first node, n2kvm - by VMs running on second node). 4 DRBD volumes with primary-primary configuration was created for each of 4 volume groups. Using separate pair of drbd devices for VM's disks makes split brain recovery easier, as explained here. And note, we can't use drbd-mirrored (quazy-shared) disk for VZ storage, because final step of VZ migration includes "rm -rf" after rsyncing container private area.

In such configuration we can do live migration of KVM VMs and VZ. Also we have copy of each VM and VZ for emergencies (falling of one node).

Some difficulties we met were related to LVM and DRBD startup ordering. First one was the following: LVM locked drbd backing storage and drbd couldn't use them. It was solved with correct filter in lvm.conf. The other one was more difficult. Physical volumes n1vz and n2vz available over DRBD couldn't be mounted normally - they should be mounted after initial system startup. Usually firstly starts lvm (and init script makes vgchange -ay, activating volume groups), then drbd, and now we have additional VG, but they are not active.

To solve this problem we are supposed to use hearthbeat. But I am too lazy to study it. So I adopted things more familiar to me - automounter (autofs) to mount /var/lib/vz and udev to make volume groups available on drbd* device appearance. I've added "/- /etc/auto.direct" line to /etc/auto.master and created /etc/auto.direct file, containing:

/var/lib/vz              -fstype=ext4            :/dev/mapper/n1vz-data
Configuration of udev consisted from creation of /etc/udev/rules.d/80-drbd-lvm.rules file, containing:
ACTION=="add|change", SUBSYSTEM=="block",KERNEL=="drbd*", RUN+="/bin/sh -c /sbin/lvm vgscan; /sbin/lvm vgchange -a y'"

I consider this more elegant then just including "vgchange -a y && mount ..." in rc.local.

пятница, 1 июня 2012 г.

php-fpm troubles

Life is boring without troubles. We were almost ready to push new servers in production, however, php-fpm started falling randomly with the following message:
Jun  1 04:14:36 srv2 kernel: [566115.463835] php5-fpm[29696]: segfault at 0 ip 00007f3b191e5558 sp 00007fff70f193c8 error 4 in libc-2.15.so[7f3b190b3000+1b3000]
Jun  1 04:14:36 srv2 kernel: [566115.463847] php5-fpm/29696: potentially unexpected fatal signal 11.
Jun  1 04:14:36 srv2 kernel: [566115.463850] 
Jun  1 04:14:36 srv2 kernel: [566115.463851] CPU 5 
Jun  1 04:14:36 srv2 kernel: [566115.463853] Modules linked in: vesafb psmouse i7core_edac edac_core ioatdma dca serio_raw joydev mac_hid lp parport usbhid hid e1000e megaraid_sas
Jun  1 04:14:36 srv2 kernel: [566115.463868] 
Jun  1 04:14:36 srv2 kernel: [566115.463871] Pid: 29696, comm: php5-fpm Not tainted 3.2.0-24-generic #39-Ubuntu Supermicro X8DTT-H/X8DTT-H
Jun  1 04:14:36 srv2 kernel: [566115.463876] RIP: 0033:[<00007f3b191e5558>]  [<00007f3b191e5558>] 0x7f3b191e5557
Jun  1 04:14:36 srv2 kernel: [566115.463882] RSP: 002b:00007fff70f193c8  EFLAGS: 00010206
Jun  1 04:14:36 srv2 kernel: [566115.463885] RAX: 0000000000000000 RBX: 00007f3b1b53f000 RCX: 0000000000000011
Jun  1 04:14:36 srv2 kernel: [566115.463888] RDX: 0000000000000066 RSI: 0000000000af8b15 RDI: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463890] RBP: 0000000002705b08 R08: 0000000000000011 R09: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463893] R10: eea633fc2a689ca0 R11: 00007f3b192344d0 R12: 0000000000000001
Jun  1 04:14:36 srv2 kernel: [566115.463896] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000002705bd0
Jun  1 04:14:36 srv2 kernel: [566115.463899] FS:  00007f3b1b531700(0000) GS:ffff8803332a0000(0000) knlGS:0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  1 04:14:36 srv2 kernel: [566115.463904] CR2: 0000000000000000 CR3: 0000000329cb2000 CR4: 00000000000006e0
Jun  1 04:14:36 srv2 kernel: [566115.463907] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463910] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun  1 04:14:36 srv2 kernel: [566115.463913] Process php5-fpm (pid: 29696, threadinfo ffff88025be20000, task ffff880326120000)
Jun  1 04:14:36 srv2 kernel: [566115.463916] 
Jun  1 04:14:36 srv2 kernel: [566115.463917] Call Trace:
After examining core dump I got the following backtrace:
(gdb) bt
#0  __strstr_sse42 (s1=0x0, s2=) at ../sysdeps/x86_64/multiarch/strstr.c:175
#1  0x0000000000736d13 in fpm_status_handle_request () at /home/alp/build/php5-5.3.10/sapi/fpm/fpm/fpm_status.c:128
#2  0x000000000042b4ab in main (argc=11237155, argv=0x0) at /home/alp/build/php5-5.3.10/sapi/fpm/fpm/fpm_main.c:1809
Firstly I started to blame libc. However, nothing else crashed. I rebuilt php from sources. The result was the same. So after getting magic kick from my chief I sighed deeply and looked at php5-5.3.10/sapi/fpm/fpm/fpm_status.c:
               /* full status ? */
                full = SG(request_info).request_uri && strstr(SG(request_info).query_string, "full");
                short_syntax = short_post = NULL;
                full_separator = full_pre = full_syntax = full_post = NULL;
                encode = 0;
It seems to be just a copy-paste or something like that. It was a piece of code from path processing php-fpm status. Firstly, I've disabled php-fpm monitoring and didn't have new segfaults since then. So, I patched the file with the following:
 
--- php5-5.3.10/sapi/fpm/fpm/fpm_status.c       2012-06-01 04:00:43.492744472 -0400
+++ php5-5.3.10/sapi/fpm/fpm/fpm_status.c       2012-06-01 04:03:59.233040497 -0400
@@ -125,7 +125,7 @@
                }
 
                /* full status ? */
-               full = SG(request_info).request_uri && strstr(SG(request_info).query_string, "full");
+               full = SG(request_info).query_string && strstr(SG(request_info).query_string, "full");
                short_syntax = short_post = NULL;
                full_separator = full_pre = full_syntax = full_post = NULL;
                encode = 0;
and recompiled php... It seems I caught one more bug. Interesting thing to note is that no one else hited it. I've just reported bug 62205 and PHP team reacted quickly.

P.S. The same error appeared in two more places in fpm_status.c. The above mentioned bug report is resolved now, fix was committed to php head.

суббота, 28 апреля 2012 г.

Qt fonts in Ubuntu 12.04

After updating from 10.04 to 12.04 and adapting a bit to Unity (Adaptation included creation of setxkbmap startup script) I found annoying problem: fonts in Skype looked ugly...

I've tried using qtconfig to set fonts, but it was useless. Settings were not used. However they worked if application was run with sudo.

I found that reason was the following: applications run with sudo couldn't communicate with DBUS and so used font settings in ~/.config/Trolltech.conf. Whe run normally they communicated with gconfd-2 and tryed to use default system font (Ubuntu11). But Qt doesn't know anything about this font (don't know why). So I just changed default system fonts to DejaVu font family and now Qt applications (including Skype) looks better.

пятница, 13 апреля 2012 г.

OmniTI announced OmniOS

I've just read about OmniOS. Cool, it seems in the nearest future we'll have as much Illumos-based distros as Linux-based ones :)
I really hope that OmniTI guys will collaborate with illumos and illumos-userland teams in productive way. By the way, I really like OmniTI group - they develop and support my favorite DBMS and one of my favorite server operating systems :) The only thing which is frightening me is possible fragmentation. IMHO, without big developers community it is vital to be as much united as possible and try to be friendly to beginner users and developers.

четверг, 5 апреля 2012 г.

How large root do you need?

If you have separate /var, /usr and /home partitions, what should be the size of / in Ubuntu 10.04 Desktop ?
I've always thought that 3 GB is more than enough. I was wrong, today my laptop said me that it has only 138 MB free in / partition.
I was rather shocked. The first victim for me seemed /tmp. Yes, it is not cleared on boot in Ubuntu, and this is unusual (I usually enable /tmp cleanup on my FreeBSD desktop systems or use mfs for it and previously on my OpenSolaris desktop it was by default in RAM/swap and didn't bother me). But that didn't help. Maybe I just didn't wait enough for fs counters update.
But that forced me to continue investigation: what can use 3 GB in root fs ?
The short answer is /boot. I had about 20 versions of kernel installed: from manually compiled 2.6.31 kernel (I suddenly realized that I must have recompiled it to get SunSPOT support several years ago and it was still booted by default) to 2.6.32.41 one. Kernel plus initrd image occupies just some more than 20 MB. But if you have about 20 versions of them, it becomes significant for root fs...
Of course, I don't want my old kernel version to disappear after system update. But I would appreciate if package manager said something like this: "Hey, guy, you have 10 versions of kernel installed, are you sure that you need them?" :)

P.S. Just found out - /tmp is cleared up by default if it is on separate FS (look at /etc/init/mounted-tmp.conf). I understand that having /tmp in root is a bad habit, but nonetheless I don't think that it is so uncommon setup to ignore this possibility in startup scripts.

вторник, 3 апреля 2012 г.

PostgreSQL mostly online REINDEX

What do you do, when you want to do REINDEX online? Build new index concurrently, drop old one and rename new. Only you can't do it for indexes supporting primary keys and other constraints without dropping constraint. So, in any case, you have to REINDEX them. And let everyone to be waiting...
And if you have several hundred indexes, it is not very convenient to do it manually. Also note that CREATE INDEX CONCURRENTLY can't be EXECUTE'd from plpgsql function. I wish we had concurrent reindex in PostgreSQL or at least autonomous transactions to deal with the latter problem... But, we must work with available functionality...
So, the decision for me was to write PL/pgSQL script which will generate necessary SQL commands and feed them to psql.
The script itself is the following:

DO
$$
DECLARE
ind record;
str text;
str_drop text;
str_rename text;
BEGIN
FOR ind IN (SELECT i.oid,i.relname FROM
pg_class r,pg_class i , pg_index ir, pg_namespace ns
WHERE ir.indexrelid=i.oid AND ir.indrelid=r.oid AND ns.oid=r.relnamespace
AND ns.nspname='public' AND NOT ir.indisprimary \
AND i.oid NOT IN
(SELECT conindid FROM pg_constraint)) LOOP
str:=replace(
pg_get_indexdef(ind.oid),
'INDEX '||ind.relname|| ' ON ',
'INDEX CONCURRENTLY '||ind.relname|| '_X999 ON ');
str_drop:='DROP INDEX ' || ind.relname;
str_rename:='ALTER INDEX ' || ind.relname||'_X999 RENAME TO ' ||ind.relname ;
RAISE NOTICE '%', str;
RAISE NOTICE '%', str_drop;
RAISE NOTICE '%', str_rename;
END LOOP;
FOR ind IN (SELECT i.oid,i.relname FROM
pg_class r,pg_class i, pg_index ir, pg_namespace ns
WHERE ir.indexrelid=i.oid AND ir.indrelid=r.oid AND ns.oid=r.relnamespace
AND ns.nspname='public' AND (ir.indisprimary
OR i.oid IN (SELECT conindid FROM pg_constraint)) LOOP
str:='REINDEX INDEX ' || ind.relname;
raise notice '%', str;
end loop;
END;
$$ LANGUAGE PLPGSQL;


Now with something like this:

$ psql -d DBNAME -f generate_reindex.plpgsql" 2>&1| awk -F 'NOTICE:' '{ if (NF==2) {print $2 ";" } ; }' | psql -d DBNAME

we can mostly concurrently rebuild all indexes in database DBNAME. And last thing to remember is to add
\set ON_ERROR_STOP
to your ~/.psqlrc file just to be safe...

четверг, 22 марта 2012 г.

Redmine/Webrick rc script

I've just installed redmine 1.3.2 on FreeBSD server and met several inconviniences.

  1. It is not in ports

  2. It depends on old versions of different ruby gems, which are already not in ports.
    In particular, it didn't want to work with rack 1.4.1 and I had to do

    # gem install rake -v=1.1.0


  3. It doesn't have rc script to startup Webrick web server (I didn't want to install apache on this host)


So, I've taken mongrel_cluster rc script and made redmine rc script from it. It is here:

#!/bin/sh
# PROVIDE: redmine
# REQUIRE: DAEMON
# KEYWORD: shutdown
#
# This script is modified by placing the following variables inside
# /etc/rc.conf:
#
# redmine_enable (bool):
# Set it to YES to enable this service.
# Default: NO
# redmine_dir (path):
# The directory containing redmine
# Default: /usr/local/redmine/
# redmine_user (username):
# The user to run redmine as
# Default: redmine
# redmine_args (string):
# Additional command flags for ruby
# Default: "-e production -d"

. /etc/rc.subr

name=redmine
rcvar=redmine_enable

command="/usr/local/bin/ruby"

load_rc_config $name

: ${redmine_enable="NO"}
: ${redmine_dir="/usr/local/redmine/"}
: ${redmine_args="-e production -d"}
: ${redmine_user="redmine"}

start_cmd="redmine_cmd start"
stop_cmd="redmine_cmd stop"
restart_cmd="redmine_cmd restart"
status_cmd="redmine_cmd status"

redmine_cmd()
{

if [ ! -d "${redmine_dir}/." ]; then
warn "${redmine_dir} is not a directory."
return 1
fi

case $1 in
"start")
su -l ${redmine_user} -c "cd ${redmine_dir} && pwd && eval $command script/server webrick $redmine_args"
;;
"stop")
if [ -f ${redmine_dir}/tmp/pids/server.pid ] ; then
PID=$(/usr/bin/head -1 ${redmine_dir}/tmp/pids/server.pid)
/bin/kill -s int $PID 2>/dev/null
sleep 3
/bin/kill -s 0 $PID 2>/dev/null && /bin/kill -s kill $PID;
fi
;;
"restart")
redmine_cmd stop
redmine_cmd start
;;
"status")
if [ -f ${redmine_dir}/tmp/pids/server.pid ] ; then
echo "PID file exists"
PID=$(/usr/bin/head -1 ${redmine_dir}/tmp/pids/server.pid)
/bin/kill -s 0 $PID 2>/dev/null && echo "Server is running with pid $PID" && exit 0;
echo "But server is not running..." && exit 1;
fi
echo "Server is not running" && exit 0;
;;
esac
}

run_rc_command "$1"


The interesting thing to note is that to terminate webrick correctly, you have to send it SIGINT, not SIGTERM.