пятница, 1 июня 2012 г.

php-fpm troubles

Life is boring without troubles. We were almost ready to push new servers in production, however, php-fpm started falling randomly with the following message:
Jun  1 04:14:36 srv2 kernel: [566115.463835] php5-fpm[29696]: segfault at 0 ip 00007f3b191e5558 sp 00007fff70f193c8 error 4 in libc-2.15.so[7f3b190b3000+1b3000]
Jun  1 04:14:36 srv2 kernel: [566115.463847] php5-fpm/29696: potentially unexpected fatal signal 11.
Jun  1 04:14:36 srv2 kernel: [566115.463850] 
Jun  1 04:14:36 srv2 kernel: [566115.463851] CPU 5 
Jun  1 04:14:36 srv2 kernel: [566115.463853] Modules linked in: vesafb psmouse i7core_edac edac_core ioatdma dca serio_raw joydev mac_hid lp parport usbhid hid e1000e megaraid_sas
Jun  1 04:14:36 srv2 kernel: [566115.463868] 
Jun  1 04:14:36 srv2 kernel: [566115.463871] Pid: 29696, comm: php5-fpm Not tainted 3.2.0-24-generic #39-Ubuntu Supermicro X8DTT-H/X8DTT-H
Jun  1 04:14:36 srv2 kernel: [566115.463876] RIP: 0033:[<00007f3b191e5558>]  [<00007f3b191e5558>] 0x7f3b191e5557
Jun  1 04:14:36 srv2 kernel: [566115.463882] RSP: 002b:00007fff70f193c8  EFLAGS: 00010206
Jun  1 04:14:36 srv2 kernel: [566115.463885] RAX: 0000000000000000 RBX: 00007f3b1b53f000 RCX: 0000000000000011
Jun  1 04:14:36 srv2 kernel: [566115.463888] RDX: 0000000000000066 RSI: 0000000000af8b15 RDI: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463890] RBP: 0000000002705b08 R08: 0000000000000011 R09: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463893] R10: eea633fc2a689ca0 R11: 00007f3b192344d0 R12: 0000000000000001
Jun  1 04:14:36 srv2 kernel: [566115.463896] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000002705bd0
Jun  1 04:14:36 srv2 kernel: [566115.463899] FS:  00007f3b1b531700(0000) GS:ffff8803332a0000(0000) knlGS:0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463902] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Jun  1 04:14:36 srv2 kernel: [566115.463904] CR2: 0000000000000000 CR3: 0000000329cb2000 CR4: 00000000000006e0
Jun  1 04:14:36 srv2 kernel: [566115.463907] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Jun  1 04:14:36 srv2 kernel: [566115.463910] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Jun  1 04:14:36 srv2 kernel: [566115.463913] Process php5-fpm (pid: 29696, threadinfo ffff88025be20000, task ffff880326120000)
Jun  1 04:14:36 srv2 kernel: [566115.463916] 
Jun  1 04:14:36 srv2 kernel: [566115.463917] Call Trace:
After examining core dump I got the following backtrace:
(gdb) bt
#0  __strstr_sse42 (s1=0x0, s2=) at ../sysdeps/x86_64/multiarch/strstr.c:175
#1  0x0000000000736d13 in fpm_status_handle_request () at /home/alp/build/php5-5.3.10/sapi/fpm/fpm/fpm_status.c:128
#2  0x000000000042b4ab in main (argc=11237155, argv=0x0) at /home/alp/build/php5-5.3.10/sapi/fpm/fpm/fpm_main.c:1809
Firstly I started to blame libc. However, nothing else crashed. I rebuilt php from sources. The result was the same. So after getting magic kick from my chief I sighed deeply and looked at php5-5.3.10/sapi/fpm/fpm/fpm_status.c:
               /* full status ? */
                full = SG(request_info).request_uri && strstr(SG(request_info).query_string, "full");
                short_syntax = short_post = NULL;
                full_separator = full_pre = full_syntax = full_post = NULL;
                encode = 0;
It seems to be just a copy-paste or something like that. It was a piece of code from path processing php-fpm status. Firstly, I've disabled php-fpm monitoring and didn't have new segfaults since then. So, I patched the file with the following:
 
--- php5-5.3.10/sapi/fpm/fpm/fpm_status.c       2012-06-01 04:00:43.492744472 -0400
+++ php5-5.3.10/sapi/fpm/fpm/fpm_status.c       2012-06-01 04:03:59.233040497 -0400
@@ -125,7 +125,7 @@
                }
 
                /* full status ? */
-               full = SG(request_info).request_uri && strstr(SG(request_info).query_string, "full");
+               full = SG(request_info).query_string && strstr(SG(request_info).query_string, "full");
                short_syntax = short_post = NULL;
                full_separator = full_pre = full_syntax = full_post = NULL;
                encode = 0;
and recompiled php... It seems I caught one more bug. Interesting thing to note is that no one else hited it. I've just reported bug 62205 and PHP team reacted quickly.

P.S. The same error appeared in two more places in fpm_status.c. The above mentioned bug report is resolved now, fix was committed to php head.