Project

General

Profile

Tâche #29642

Scénario #29652: Traitement express MEN (10-12)

Soucis de cohabitation z_stats et dhcp

Added by Thierry Bertrand over 3 years ago. Updated almost 3 years ago.

Status:
Fermé
Priority:
Normal
Assigned To:
-
Start date:
03/04/2020
Due date:
% Done:

100%

Estimated time:
0.00 h
Remaining (hours):
0.0

Related issues

Related to eole-ead3 - Scénario #20753: Visibilité des baux DHCP restants (EAD3 et agent) Terminé (Sprint) 12/16/2019 01/17/2020

Associated revisions

Revision 240b3fee (diff)
Added by Joël Cuissinat over 3 years ago

zstats.cfg : add "activer_dhcp"

Ref: #29642

Revision 7cb991fa (diff)
Added by Joël Cuissinat over 3 years ago

agentmanager/config.py : add ACTIVER_DHCP

Ref: #29642

Revision e3bf99d7 (diff)
Added by Joël Cuissinat over 3 years ago

Do not load the agent if dhcp is not activated

Ref: #29642

History

#1 Updated by Thierry Bertrand over 3 years ago

sur des serveurs seth 2.7.1, que la fonctionnalité dhcpd soit activée ou pas, on obtient des erreurs z_stats.
Au delà des erreurs, cela génère des coupures de communication avec zephir.

1er serveur :

root@set-30-09:~# systemctl status z_stats.service 
● z_stats.service - Agent zephir
   Loaded: loaded (/lib/systemd/system/z_stats.service; enabled; vendor preset: enabled)
   Active: active (running) since Wed 2020-02-19 11:26:06 CET; 1 day 6h ago
 Main PID: 11397 (twistd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/z_stats.service
           └─11397 /usr/bin/python2 /usr/bin/twistd --nodaemon --no_save --pidfile /run/z_stats.pid zephiragents --tmp=data --data=stats --archive=/tmp --static=static --actions=actions

févr. 20 17:23:10 set-30-09 zephiragents[11397]:         
févr. 20 17:23:10 set-30-09 zephiragents[11397]: [-]         Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
févr. 20 17:23:10 set-30-09 zephiragents[11397]: [-] Unhandled error in Deferred:
févr. 20 17:23:10 set-30-09 zephiragents[11397]: [-] Unhandled Error
févr. 20 17:23:10 set-30-09 zephiragents[11397]: [-]         Traceback (most recent call last):
févr. 20 17:23:10 set-30-09 zephiragents[11397]: [-]         Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
févr. 20 17:26:12 set-30-09 zephiragents[11397]: 2020-02-20T17:26:12+0100 [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 20 17:26:12 set-30-09 zephiragents[11397]: 2020-02-20T17:26:12+0100 [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>\n    ranges = get_ranges()\n  File "/usr/shar
févr. 20 17:26:12 set-30-09 zephiragents[11397]: [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 20 17:26:12 set-30-09 zephiragents[11397]: [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>\n    ranges = get_ranges()\n  File "/usr/share/eole/sbin/dhcp-tool", l
root@set-30-09:~# CreoleGet eole_release
2.7.1
root@set-30-09:~# CreoleGet activer_dhcp
oui
root@set-30-09:~# dhcp-tool 
ddtm30 30 51

Serveur 2:

root@seth-49-04:/usr/lib/nagios/plugins# systemctl status z_stats.service 
● z_stats.service - Agent zephir
   Loaded: loaded (/lib/systemd/system/z_stats.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-02-20 05:33:54 CET; 11h ago
 Main PID: 1765 (twistd)
    Tasks: 1 (limit: 4915)
   CGroup: /system.slice/z_stats.service
           └─1765 /usr/bin/python2 /usr/bin/twistd --nodaemon --no_save --pidfile /run/z_stats.pid zephiragents --tmp=data --data=stats --archive=/tmp --static=static --actions=actions

févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-]         Traceback (most recent call last):
févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-]         Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-] Unhandled error in Deferred:
févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-] Unhandled Error
févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-]         Traceback (most recent call last):
févr. 20 17:23:07 seth-49-04 zephiragents[1765]: [-]         Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
févr. 20 17:23:59 seth-49-04 zephiragents[1765]: 2020-02-20T17:23:59+0100 [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 20 17:23:59 seth-49-04 zephiragents[1765]: 2020-02-20T17:23:59+0100 [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>\n    ranges = g
févr. 20 17:23:59 seth-49-04 zephiragents[1765]: [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 20 17:23:59 seth-49-04 zephiragents[1765]: [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>\n    ranges = get_ranges()\n  File "/usr
root@seth-49-04:/usr/lib/nagios/plugins# CreoleGet eole_release
2.7.1
root@seth-49-04:/usr/lib/nagios/plugins# CreoleGet activer_dhcp
non
root@seth-49-04:/usr/lib/nagios/plugins# dhcp-tool 
Traceback (most recent call last):
  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>
    ranges = get_ranges()
  File "/usr/share/eole/sbin/dhcp-tool", line 26, in get_ranges
    for ip, netmask in zip(list(cfg.creole.dhcp.adresse_network_dhcp.adresse_network_dhcp),
  File "/usr/lib/python2.7/dist-packages/tiramisu/config.py", line 248, in __getattr__
    return self.getattr(name)
  File "/usr/lib/python2.7/dist-packages/tiramisu/config.py", line 316, in getattr
    raise props
tiramisu.error.PropertiesOptionError: ne peut accéder à l'optiondescription "dhcp" a cause de la propriété disabled (la valeur de "Activer le serveur DHCP" est "non")

#2 Updated by Thierry Bertrand over 3 years ago

  • Tracker changed from Scénario to Proposition Scénario

#3 Updated by Philippe Carre over 3 years ago

D'apres mes tests , l'activation du dhcp change quand même la donne :
Sur un eSSL 2.7.1

Demande de synchronisation auprès du service z_stats : Traceback (most recent call last):
  File "/usr/bin/synchro_zephir", line 66, in <module>
    sys.stdout.write(z_stats_proxy.archive_for_upload())
  File "/usr/lib/python2.7/xmlrpclib.py", line 1243, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1602, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1283, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1316, in single_request
    return self.parse_response(response)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1493, in parse_response
    return u.close()
  File "/usr/lib/python2.7/xmlrpclib.py", line 800, in close
    raise Fault(**self._stack[0])
xmlrpclib.Fault: <Fault 8002: 'error'>

root@essl-271-6371:~# dhcp-tool 
Traceback (most recent call last):
  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>
    ranges = get_ranges()
  File "/usr/share/eole/sbin/dhcp-tool", line 26, in get_ranges
    for ip, netmask in zip(list(cfg.creole.dhcp.adresse_network_dhcp.adresse_network_dhcp),
  File "/usr/lib/python2.7/dist-packages/tiramisu/config.py", line 248, in __getattr__
    return self.getattr(name)
  File "/usr/lib/python2.7/dist-packages/tiramisu/config.py", line 316, in getattr
    raise props
tiramisu.error.PropertiesOptionError: ne peut accéder à l'optiondescription "dhcp" a cause de la propriété disabled (la valeur de "Activer le serveur DHCP" est "non")

Activation du DHCP , puis reconfigure .
Tout est ok.

root@essl-271-6371:~# synchro_zephir 
Demande de synchronisation auprès du service z_stats : ok

root@essl-271-6371:~# dhcp-tool
plage0 10 10
lugdunum 767 767

root@essl-271-6371:~# systemctl status z_stats.service
● z_stats.service - Agent zephir
   Loaded: loaded (/lib/systemd/system/z_stats.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-02-27 11:20:31 CET; 21min ago

Si je re-désactive le dhcp c'est de nouveau HS.

root@essl-271-6371:~# systemctl status z_stats.service
● z_stats.service - Agent zephir
   Loaded: loaded (/lib/systemd/system/z_stats.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2020-02-27 11:50:10 CET; 2h 41min ago
 Main PID: 2923 (twistd)
    Tasks: 1 (limit: 3546)
   CGroup: /system.slice/z_stats.service
           └─2923 /usr/bin/python2 /usr/bin/twistd --nodaemon --no_save --pidfile /run/z_stats.pid zephiragents --tmp=data --data=stats --archive=/tmp --static=static 

févr. 27 14:27:27 essl-271-6371 zephiragents[2923]: [-]         Traceback (most recent call last):
févr. 27 14:27:27 essl-271-6371 zephiragents[2923]: [-]         Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition:
févr. 27 14:28:27 essl-271-6371 zephiragents[2923]: 2020-02-27T14:28:27+0100 [-] KernelMaintenance : pas de dernière mesure disponible.
févr. 27 14:28:27 essl-271-6371 zephiragents[2923]: [-] KernelMaintenance : pas de dernière mesure disponible.
févr. 27 14:30:27 essl-271-6371 zephiragents[2923]: 2020-02-27T14:30:27+0100 [-] KernelMaintenance : pas de dernière mesure disponible.
févr. 27 14:30:27 essl-271-6371 zephiragents[2923]: [-] KernelMaintenance : pas de dernière mesure disponible.
févr. 27 14:30:30 essl-271-6371 zephiragents[2923]: 2020-02-27T14:30:30+0100 [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 27 14:30:30 essl-271-6371 zephiragents[2923]: [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
févr. 27 14:30:30 essl-271-6371 zephiragents[2923]: 2020-02-27T14:30:30+0100 [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-too
févr. 27 14:30:30 essl-271-6371 zephiragents[2923]: [-] got stderr: 'Traceback (most recent call last):\n  File "/usr/share/eole/sbin/dhcp-tool", line 102, in <module>

root@essl-271-6371:~# systemctl start z_stats.service
Pas de message d'erreur
mais le service est toujours HS.

Si je réactive le dhcp , tout redevient OK.

Au final, il semble bien que ce soit dhcp-tool qui cause l'erreur (nouveauté uniquement présente sur les 2.7.1), en fonction de l'activation ou non du dhcp.
Et, ça ne concerne que les stats zephir, l'envoi de conf ou la sauvegarde vers zephir sont ok.

#4 Updated by équipe eole Academie d'Orléans-Tours over 3 years ago

Bonjour,

je plussoie cette demande sur des AMON 2.7.1 a jour.
Sur ma maquette, je reproduit ce problème, qui apparait sur tous nos AMON déployés en etab privé : aucun n'a de DHCP. RAS sur nos AMON en EPLE publique : on a partout du DHCP.

Sur maquette, la désactivation du service dhcp (pas besoin de reconfigure, juste CreoleSet + restart z_stats) donne dans /var/log/rsyslog/local/zephiragents/zephiragents.info.log :

2020-02-28T17:00:06.941839+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:00:06+0100 [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
2020-02-28T17:00:06.942239+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents: [-] Erreur remontée par /usr/share/eole/sbin/dhcp-tool
2020-02-28T17:00:06.942606+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:00:06+0100 [-] got stderr: 'Traceback (most recent call last):\n'
2020-02-28T17:00:06.942807+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents: [-] got stderr: 'Traceback (most recent call last):\n'
2020-02-28T17:02:51.171601+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:02:51+0100 [twisted.internet.defer#critical] Unhandled error in Deferred:
2020-02-28T17:02:51.173427+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:02:51+0100 [twisted.internet.defer#critical]
2020-02-28T17:02:51.173608+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011Traceback (most recent call last):
2020-02-28T17:02:51.173806+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011Failure: twisted.internet.error.ProcessTerminated: A process has ended with a probable error condition: process ended with exit code 1.
2020-02-28T17:02:51.173937+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011
2020-02-28T17:02:58.268781+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:02:58+0100 [_GenericHTTPChannelProtocol,373,127.0.0.1] Unhandled Error
2020-02-28T17:02:58.269502+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011Traceback (most recent call last):
2020-02-28T17:02:58.269627+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 195, in process
2020-02-28T17:02:58.269746+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    self.render(resrc)
2020-02-28T17:02:58.269862+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/twisted/web/server.py", line 255, in render
2020-02-28T17:02:58.270001+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    body = resrc.render(self)
2020-02-28T17:02:58.270144+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/twisted/web/resource.py", line 250, in render
2020-02-28T17:02:58.270282+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    return m(request)
2020-02-28T17:02:58.270404+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/twisted/web/xmlrpc.py", line 172, in render_POST
2020-02-28T17:02:58.270540+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    d = defer.maybeDeferred(function, *args)
2020-02-28T17:02:58.270672+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011--- <exception caught here> ---
2020-02-28T17:02:58.270797+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred
2020-02-28T17:02:58.270912+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    result = f(*args, **kw)
2020-02-28T17:02:58.271038+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/zephir/monitor/agentmanager/zephirservice.py", line 456, in xmlrpc_archive_for_upload
2020-02-28T17:02:58.271172+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    self.wakeup_for_upload(False)
2020-02-28T17:02:58.271304+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/zephir/monitor/agentmanager/zephirservice.py", line 281, in wakeup_for_upload
2020-02-28T17:02:58.271428+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    agent.archive()
2020-02-28T17:02:58.271547+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/zephir/monitor/agentmanager/agent.py", line 415, in archive
2020-02-28T17:02:58.271678+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    self.ensure_data_uptodate()
2020-02-28T17:02:58.271792+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/zephir/monitor/agentmanager/agent.py", line 426, in ensure_data_uptodate
2020-02-28T17:02:58.271916+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    self.write_data()
2020-02-28T17:02:58.272029+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011  File "/usr/lib/python2.7/dist-packages/zephir/monitor/agents/dhcp.py", line 49, in write_data
2020-02-28T17:02:58.272141+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011    self.table.table_data = self.last_measure.value['leases']
2020-02-28T17:02:58.272253+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011exceptions.TypeError: 'NoneType' object has no attribute '__getitem__'
2020-02-28T17:02:58.272367+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: #011
2020-02-28T17:02:58.278959+01:00 amon36-0999999f.etab-maquette36-dsi.lan zephiragents[2472]: 2020-02-28T17:02:58+0100 [twisted.python.log#info] 127.0.0.1 - - [28/Feb/2020:16:02:56 +0000] "POST /xmlrpc HTTP/1.1" 200 263 "-" "xmlrpclib.py/1.0.1 (by www.pythonware.com)" 

Et synchro_zephir devient alors inopérant.

Nicolas

#5 Updated by Joël Cuissinat over 3 years ago

  • Related to Scénario #20753: Visibilité des baux DHCP restants (EAD3 et agent) added

#6 Updated by Joël Cuissinat over 3 years ago

  • Subject changed from soucis de cohabitation z_stats et dhcp to Soucis de cohabitation z_stats et dhcp
  • Parent task set to #29652

#7 Updated by Joël Cuissinat over 3 years ago

  • Status changed from Nouveau to En cours
  • Assigned To set to Joël Cuissinat
  • Start date set to 03/04/2020

L'agent "dhcp" ajouté en 2.7.1 dans #20753 ne devait pas être chargé si le service DHCP n'est pas activé.
=> 3 paquets refaits et ajoutés à la candidate en cours

#8 Updated by Joël Cuissinat over 3 years ago

  • Status changed from En cours to Résolu
  • % Done changed from 0 to 100

#9 Updated by Daniel Dehennin over 3 years ago

Sur un Amon 2.7.1 avec les paquets à jour cela fonctionne correctement :

  • eole-server 2.7.1-59
  • eole-dhcp 2.7.1-10
  • zephir-client 2.7.1-16

#10 Updated by Daniel Dehennin over 3 years ago

  • Status changed from Résolu to Fermé
  • Remaining (hours) set to 0.0

#11 Updated by Thierry Bertrand almost 3 years ago

  • Status changed from Fermé to Nouveau
  • Assigned To deleted (Joël Cuissinat)
  • Estimated time set to 0.00 h

#12 Updated by Thierry Bertrand almost 3 years ago

  • Status changed from Nouveau to Fermé

Also available in: Atom PDF