Wednesday, September 11, 2019

Ansible-playbook (in comments) failing when run against too many Cisco hosts.

Playbook that does the following tasks:

  • enter enable mode (this is written as a task because we have a non-standard 'enable' prompt and the 'become' commands do not work)

  • Make a backup of running config

  • Copy a change script to flash

  • Copy a backout script to flash

I dialed in this playbook using a couple of lab hosts. It works great, does exactly what I need. Last night I ran it against 50 hosts and it failed miserably on about 45 of them. I re-ran it a couple of times and different it kept giving me anywhere from 40-45 hosts would fail. I re-ran it against only 10 hosts and it again it worked as expected. Based on some googling I adjusted some settings in the ansible.cfg file because I believe what is happening is the Ansible connection is timing out with the default settings.

[persistent_connection]

connect_timeout = 100

command_timeout=60

After adjusting the above settings I re-ran the playbook against all 50 hosts and it worked fine... I've gotta ramp this playbook up to run against 4000 hosts over the next couple of days. Will I just need to play with the above settings to keep things from timing out? Should I expect to have to increase the timeout-values linearly in accordance with how many hosts I'm running this against?

Admittedly I'm a beginner when it comes to Ansible/python. I can figure things out through trial/error but would love it if someone else has already figured this out and can point me in the right direction.



No comments:

Post a Comment