Prevent authentication timeouts during long Chef runs

Chef Posted on

If you have ever had a Chef Client run fail with a mysterious "Authentication Failed" error after a few minutes of execution, then this post is for you! Because of the way the Chef Client loads resources, it is possible for authentication headers to "timeout" in the middle of a run. Thankfully there is an easy solution!

Why does the timeout occur?

In order to understand why the timeout occurs, we must first discuss the anatomy of a Chef Client run. During the Chef Client run, the list of required cookbooks is compiled and synchronized with the remote Chef Server. In the past there were concerns about bandwidth consumption and disk space during this phase. As a workaround, the Chef Client was changed to selectively download files as they are requested in a recipe. Instead of downloading the entire cookbook at the start of the Chef Client run, only the cookbook's metadata, attributes, libraries, recipes, resources, and providers were downloaded. The templates and files were then downloaded "on demand", as they appeared in a recipe.

To put it another way, each call to file or template in a recipe is actually the equivalent of a remote_file call, but the remote host is the Chef Server or S3. In order to download those filesa signed URL is generated at the start of the Chef Client run. But that signed URL from S3 is valid for a short period of time. After that signed URL expires, future requests will return a 403 unauthorized.

This signed URL is generated at the start of the Chef Client run, and it is used to sign all future requests for the duration of that run. In the event of a long Chef Client run (such as when compiling from source or during an initial bootstrap), the URL may expire before all the required files have been downloaded from S3 or Bookshelf. Even more bizarre, a second Chef Client run will likely not reproduce the issue, since the "long" part of the Chef Client run has already completed; you may also see the same error, but on a different resource. Needless to say, tracking down this error can be obtuse.

Avoiding the timeout

Thankfully the solution for avoiding the authentication timeout is simple. Simply add the following to your /etc/chef/client.rb:

# Do not lazy-load files/templates
no_lazy_load true

Setting the no_lazy_load option to true forces the Chef Client to download all required resources at the beginning of the Chef Client run in one sweep!

Update August 11, 2014 - no_lazy_load is now the default value in Chef 12.

About Seth

Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.