Project

General

Profile

Tâche #8567

Distribution EOLE - Scénario #8830: Fermer les taches => Évolutions Haute-disponibilité

Sphynx : problème de timeout avec la haute disponibilité

Added by Karim Ayari almost 7 years ago. Updated over 6 years ago.

Status:
Fermé
Priority:
Normal
Assigned To:
Start date:
11/24/2014
Due date:
% Done:

100%

Estimated time:
0.25 h
Spent time:
Remaining (hours):
0.0

Description

Bonjour,

Encore une fois ce matin nous nous somme retrouvés avec tous nos tunnels 2.3 plantés à cause d'un plantage au niveau de notre sphynx 2.3 maitre.
il semble qu'un timeout au niveau de corosync vienne mettre fin à la ressource ipsec et ce sans même basculer sur le sphynx esclave. Nous avons du effectuer la bascule manuellement!

voici ce que je trouve dans le log haute-dispo.warn.log côté maitre :

Jul  6 20:41:45 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1010 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 6 22:19:41 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 10:32:26 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 350 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 11:19:45 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 11:23:15 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 11:29:17 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 540 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 16:29:39 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 600 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 19:57:47 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 350 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 7 21:46:05 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 820 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 8 04:29:19 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 8 12:15:38 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 290 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 8 15:11:21 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 540 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 8 15:14:27 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1010 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 8 15:38:17 sphynx7 lrmd: [26996]: WARN: G_SIG_dispatch: Dispatch function for SIGCHLD was delayed 1000 ms (> 100 ms) before being called (GSource: 0x1f877f0)
Jul 9 07:06:15 sphynx7 lrmd: [26996]: WARN: ipsec_rsc:monitor process (PID 14766) timed out (try 1). Killing with signal SIGTERM (15).
Jul 9 07:06:15 sphynx7 lrmd: [26996]: WARN: operation monitor[196] on lsb::ipsecSphynx::ipsec_rsc for client 26999, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.1] CRM_meta_timeout=[30000] CRM_meta_disabled=[false] CRM_meta_interval=[10000] disabled=[false] : pid [14766] timed out
Jul 9 07:06:15 sphynx7 lrmd: [15075]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:06:16 sphynx7 lrmd: [15086]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:06:25 sphynx7 lrmd: [15199]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:06:25 sphynx7 lrmd: [15227]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:07:05 sphynx7 lrmd: [26996]: WARN: ipsec_rsc:monitor process (PID 15407) timed out (try 1). Killing with signal SIGTERM (15).
Jul 9 07:07:05 sphynx7 lrmd: [26996]: WARN: operation monitor[205] on lsb::ipsecSphynx::ipsec_rsc for client 26999, its parameters: CRM_meta_name=[monitor] crm_feature_set=[3.0.1] CRM_meta_timeout=[30000] CRM_meta_disabled=[false] CRM_meta_interval=[10000] disabled=[false] : pid [15407] timed out
Jul 9 07:07:05 sphynx7 lrmd: [15716]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:07:06 sphynx7 lrmd: [15727]: WARN: For LSB init script, no additional parameters are needed.
Jul 9 07:07:26 sphynx7 lrmd: [26996]: WARN: ipsec_rsc:stop process (PID 15727) timed out (try 1). Killing with signal SIGTERM (15).
Jul 9 07:07:26 sphynx7 lrmd: [26996]: WARN: operation stop[212] on lsb::ipsecSphynx::ipsec_rsc for client 26999, its parameters: crm_feature_set=[3.0.1] CRM_meta_timeout=[20000] : pid [15727] timed out

le résultat de la commande crm_mon renvoyait seulement la ressource IPSEC en FAILED

ici ce le même log côté sphynx esclave : http://pastebin.com/mMdQhGEY

dans la configuration crm pour la ressource ipsec il y a un timeout de 30s

primitive ipsec_rsc lsb:ipsecSphynx \
op monitor interval="10s" timeout="30s" disabled="false" \
meta target-role="started"

je l'ai passé à 60s sans même savoir si cela réglera le problème ci dessus. En cas de relance du script appliquer_hautedispo on se retrouvera avec la configuration initiale.

j'ouvre donc cette demande afin de pouvoir configurer ce paramètre... si cela peut régler le problème.

Associated revisions

Revision 5c3eeeb7 (diff)
Added by Philippe Caseiro over 6 years ago

dicos/02_haute_dispo.xml: Ajout de la variable service_resource_timeout
scripts/appliquer_hautedispo : Utilisation de la variable service_resource_timeout

Pemert à l'utilisateur de définir un timeout pour les resources de type "Service".

Fixes #8567 @1h

History

#1 Updated by Joël Cuissinat over 6 years ago

  • Parent task set to #8830

#2 Updated by Fabrice Barconnière over 6 years ago

  • Tracker changed from Tâche to Bac à idée
  • Project changed from Distribution EOLE to eole-pacemaker
  • Distribution changed from EOLE 2.3 to EOLE 2.4

Plus d'évolution en 2.3.
Creoliser le timeout (et peut-être d'autre paramètres) pour chaque primitive en mode expert.

#3 Updated by Philippe Caseiro over 6 years ago

  • Status changed from Nouveau to Résolu
  • % Done changed from 0 to 100

#4 Updated by Philippe Caseiro over 6 years ago

  • Assigned To set to Philippe Caseiro

#5 Updated by Emmanuel GARETTE over 6 years ago

  • Start date set to 11/24/2014
  • Estimated time set to 0.25 h
  • Remaining (hours) set to 0.25

#6 Updated by Fabrice Barconnière over 6 years ago

  • Status changed from Résolu to Fermé
  • Remaining (hours) changed from 0.25 to 0.0
root@sphynx:~# CreoleGet service_resource_name
ipsec_rsc
arv_rsc
root@sphynx:~# CreoleGet service_resource_timeout
40
60

root@sphynx:~# crm configure show ipsec_rsc
primitive ipsec_rsc lsb:ipsecSphynx \
    op monitor interval="20s" timeout="40s" disabled="false" \
    meta target-role="started" 
root@sphynx:~# crm configure show arv_rsc
primitive arv_rsc lsb:arv \
    op monitor interval="30s" timeout="60s" disabled="false" \
    meta target-role="started" 

Also available in: Atom PDF