Tâche #11218
Distribution EOLE - Scénario #11062: Publier la mise à jour 2.3.16 Stable
Problèmes de plantage du backend Zéphir suite à mise à jour 2.3.16 RC
Description
Suite à des tests sur les derniers paquets candidats, des remontées ont été faites concernant des plantages fréquents du backend Zéphir.
différents symptomes ont été remontés :
- segfault du processus avec des erreurs de ce type (constatés à Lyon et sur les serveurs Zéphir du Medde). En partciulier, certains segfault semble se produire sur la librairie etree.so
[824254.914007] twistd[3747]: segfault at 4 ip 0810d0a1 sp a1454410 error 6 in python2.6[8048000+1e1000] (besançon)
[ 164.798183] twistd[3961]: segfault at 7200000061 ip 00000000004d71d0 sp 00007f1cfeedf330 error 4 in python2.6[400000+21d000] (thierry)
[ 1087.216732] twistd[12042]: segfault at 8 ip 00000000004d6a0e sp 00007f7b6b2bc808 error 6 in python2.6[400000+21d000]
karim[ac-lyon]> [968853.042550] twistd[3460]: segfault at 8 ip 00000000004d6a0e sp 00007f0215630598 error 6 in python2.6[400000+21d000] <karim[ac-lyon]> [1149075.256246] twistd[15499]: segfault at 7500000041 ip 0000000000456dea sp 00007f10824d1600 error 4 in python2.6[400000+21d000] <karim[ac-lyon]> [1749261.295040] twistd[24456]: segfault at 7f584dd19a70 ip 00007f581c7cda6d sp 00007f58013ad5a0 error 4 in libc-2.11.1.so[7f581c755000+17f000] <karim[ac-lyon]> [1761839.561921] twistd[9513]: segfault at 7200000065 ip 00007ff775d2c56e sp 00007ff75b1f9800 error 4 in etree.so[7ff775d02000+108000] <karim[ac-lyon]> [2352852.289447] twistd[9336]: segfault at 8 ip 00000000004d6a0e sp 00007f852e7f9808 error 6 in python2.6[400000+21d000] <karim[ac-lyon]> [2366609.420815] twistd[15071]: segfault at 74000000f4 ip 0000000000456dea sp 00007fdc177fafc0 error 4 in python2.6[400000+21d000] <karim[ac-lyon]> [2366843.524214] twistd[13503]: segfault at 8 ip 00000000004d6a0e sp 00007f8fa5ff8808 error 6 in python2.6[400000+21d000]
- problème à priori différent remonté par Versailles (Arnaud Bougeard) :
Traceback (most recent call last): Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/http.py", line 1371, in dataReceived Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 finishCallback(data[contentLength:]) Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/http.py", line 1585, in _finishRequestBody Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 self.allContentReceived() Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/http.py", line 1641, in allContentReceived Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 req.requestReceived(command, path, version) Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/http.py", line 807, in requestReceived Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 self.process() Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011--- <exception caught here> --- Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/server.py", line 125, in process Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 self.render(resrc) Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/twisted/web/server.py", line 132, in render Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 body = resrc.render(self) Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 File "/usr/lib/python2.6/dist-packages/zephir/backend/xmlrpceole.py", line 137, in render Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011 cx = PgSQL.connect(database=config.DB_NAME,user=config.DB_USER,password=config.DB_PASSWD) Apr 10 14:25:21 zephir zephir_backend: [HTTPChannel,333894,213.41.210.90] #011psycopg2.OperationalError: could not create socket: Too many open files Apr 10 14:25:23 zephir zephir_backend: [twisted.web.server.Site] Could not accept new connection (EMFILE) Apr 10 14:25:23 zephir zephir_backend: last message repeated 9 times Apr 10 14:25:23 zephir zephir_backend: [twisted.web.server.Site] Could not accept new connection (EMFILE) Apr 10 14:25:23 zephir zephir_backend: last message repeated 467 times
Related issues
Associated revisions
Correction de la fermeture de connexions à la base en cas d'erreurs.
ref #11218 @1h
History
#1 Updated by Bruno Boiget almost 8 years ago
- Remaining (hours) changed from 10.0 to 6.0
#2 Updated by Bruno Boiget almost 8 years ago
- Status changed from Nouveau to En cours
- Remaining (hours) changed from 6.0 to 2.0
Dans l'état actuel, le problème n'est pas résolu mais un contournement semble possible en désactivant l'utilisation des threads dans le backend Zéphir
-> gen_config (mode expert) -> personnalisation -> passer "Activer l'utilisation de threads" à "non"
Cela semble avoir stabilisé la situation sur un serveur Zéphir du MEDDE.
Voir avec Bourritux si le serveur continue de fonctionner demain, et faire une annonce sur la liste Zéphir (éventuellement aussi en réponse du mail sur la liste Amon) pour recommander temporairement cette solution.
#3 Updated by Joël Cuissinat almost 8 years ago
- Status changed from En cours to Reporté
- Parent task set to #11062
- Remaining (hours) changed from 2.0 to 0.0