I recently gave a talk at Philly.rb, the Ruby meetup in Philadelphia, PA entitled "The Cleanroom Pattern - More safely evaluating DSLs with Ruby". You can watch the full Cleanroom DSL video online, but I decided to also write the contents of the talk into a blog post.
Background on DSLs
Most Ruby-based DSLs are created using a simple
instance_eval. While slightly less dangerous than
instance_eval still opens the system up to dangerous circumstances. Consider the following DSL file,
Project, which has a
class Project NULL = Object.new.freeze def name(val = NULL) if val.equal?(NULL) @name else @name = sanitize(val) end end private def sanitize(string) string.gsub(/\s+/, '-').downcase end end
There are a few things to take note of here:
- A new, frozen object is created to represent "NULL". While it is true that Ruby has a native implementation of
nil, having a default value of
nilwould actually prevent the user from setting the value to
nil(since that would be assumed to have passed "nothing"). You may have seen this problem when working with some Chef resources.
- There is a DSL method called
name, which is essentially overloaded as two methods. When given no parameters, the method simply returns the instance variable
@name. When given a value, the value is sanitized and then set on the
- There is a private sanitize method that replaces all space-like characters with a dash (because otherwise the world explodes!).
Inside your system, you would likely load this DSL file as such:
path = '/path/to/dsl.rb' contents = File.read(path) project = Project.new project.instance_eval(contents, File.basename(path), 0) project
So, given a DSL file like:
The loading process would result in a
#<Project> object with a name of "hamlet":
project.name #=> "hamlet"
Problem #1 - Private Methods
instance_exec), the entire instance is exposed to the user - it is just as if you were writing code directly in
project.rb in a text editor. That means
private methods are all accessible by the user:
Project.new.instance_eval do sanitize("String Here") end #=> "string-here"
This is not "terrible", since Rubyists are quite familiar with the ability to use
__send__) to call these methods anyway. However, it is very unclear to the DSL author what methods are public and private.
Problem 2 - Scope Creep
sanitize method has a pretty generic name. Since these are Ruby DSLs, it is feasible that a savvy developer may create a "helper" methods to ease the development process. Consider the following DSL file:
# # Define a +sanitize+ method that uppercases the value for ... # # @param [#to_s] string # the string to parameterize # # @return [String] # def sanitize(string) string.to_s.upcase end name "Some String"
The resulting output would be:
project.name #=> "SOME STRING"
The DSL author unintentionally changed the behavior of the instance by making a simple helper method. While this is a contrived example, consider larger DSL-based projects like Chef or Omnibus which have hundreds of tiny helper methods - the possibility of collision is much higher.
Thankfully, since this is
instance_eval, the change to the
sanitize method is scoped to this DSL method (meaning changing the method here does not change it for future evaluations). We only edited the igenclass.
Problem 3 - Bypassing Validation
Consider a user who really wants to have spaces in his/her project name. They could easily bypass the entire system by just setting the instance variable manually:
@name = "My Custom Name"
When this file is evaluated in the context of the Project object:
project.name #=> "My Custom Name"
The user has completely circumvented our
sanitize method by just accessing the instance variable directly. Worse, this is an intentional design in Ruby:
In order to set the context, the variable
selfis set to
objwhile the code is executing, giving the code access to
obj's instance variables.
Problem 4 - Persisted Changes
The biggest problem with
instance_eval is that it gives you access to
self, an instance of the
Project class in these examples.
self has access to its parent, so truly malicious code could permanently change the behavior of future
instance_evals A very clever developer could permanently change the behavior of
sanitize for all future instance of this class (until
project.rb is reloaded from disk):
Project.new.instance_eval do self.class.class_eval do def sanitize(val) val.upcase end end end
Project.new.sanitize("foo") => "FOO" Project.new.sanitize("foo") => "FOO" Project.new.sanitize("foo") => "FOO" Project.new.sanitize("foo") => "FOO"
This code has permanently changed the behavior of the instance's
sanitize method (note how I am creating a new instance). If you are writing a Ruby application that accepts a user-given DSL or dealing with a long-running Ruby proceess, a malicious user could alter the underlying state of the sytem in memory.
Explaining the Cleanroom Pattern
The cleanroom pattern is an idiomatic way evaluate Ruby DSLs in an isolated environment while restricting the methods and level of access a user has. I want to be clear: I did not invent the cleanroom pattern! It can be found in Metaprogamming Ruby books, various blog posts, and popular community projects. I actually learned of the cleanroom pattern from my good friend and fellow Berkshelf-core-team member Jamie Winsor, so thanks!
The general pattern for a cleanroom looks like this:
- The class defines which methods should be exposed on its DSL instances.
- During evaluation a new, anonymous instance, which only has those defined methods is created. This object is created in the top-level
Objectspace to prevent leaking.
- The Ruby file is
instance_evaled against this anonymous instance which has very restricted access to the parent instance.
- The anonymous instance then proxies data back to the original instance using
Thus there are three areas of protection:
- The class defines which values are public within the DSL. Only those methods exist on the anonymous instance, thus preventing namespace collisions.
- The anonymous instance is created fresh, each time. Even if a malicious attacker is able to craft something to permanently modify the class, it would only persist for that anonymous instance, which is cleaned up during the next GC run.
- The anonymous instance proxies back to the "real" instance using
public_send. So, even if an attacker was able to bypass all the existing mitigations, they would only be able to call public methods on the instance.
The code for creating the cleanroom object is a bit complex and meta:
def cleanroom Class.new(Object) do # <1> define_method(:initialize) do |instance| # <2> define_singleton_method(:__instance__) do # <3> unless caller.include?(__FILE__) # <4> raise Cleanroom::InaccessibleError.new(:__instance__, self) end instance # <5> end end exposed.each do |exposed_method| # <6> define_method(exposed_method) do |*args, &block| __instance__.public_send(exposed_method, *args, &block) end end end end
- First we create a new anonymous class inheriting for
Next we dynamically define an
#initializemethod on the class which accepts an instance as the parameter. In normal-Ruby:
def initialize(instance) # ... end
During initialization, a new singleton method is created on the igenclass of the instance. This is basically the same as a regular
defmethod, but it only exists inside the context of this instance. Furthermore, we create it during initialization, thus allowing us to bind to the parent, giving us access to the given
instanceparameter. Basically we are doing this:
def initialize(instance) @instance = instance end def __instance__ @instance end
But since this anonymous class is what gets
instance_evaled, exposing the real instance as an instance variable would allow an attacker to completely bypass the system (remember, instance variables are within scope during an
Instead, we are creating a dynamic method at runtime that refers to the parameter given to the
#initializemethod. This allows us to "store" the value in a method, but not expose it in an instance variable.
Inside the aforementioned method, we add an extra guard that only permits the method to be called from inside
self. This is a major hack, but we inspect the
callerobject and make sure the person who called the
__instance__method is the name of the file we are currently running (not a DSL file).
If an error was not raised, we return the instance that was given to us in the
For each exposed method (which I have just called
exposedin the code snippet), we define a method and public send to
Using the Cleanroom
Fortunately you do not need to understand all of this to utilize a cleanroom in your projects! I have wrapped all this logic, plus tests and custom RSpec matchers into the cleanroom gem. The gem is already in use in popular projects like Omnibus and Berkshelf, and you can easily use it too!
After you have added the
cleanroom gem to your Gemfile and executed the
bundle command to install, simply require and include the Cleanroom module in any DSL:
# my_dsl_file.rb require 'cleanroom' class MyDSLFile include Cleanroom end
Immediately, without writing any code, you have been given access to the following methods:
MyDSLFile.evaluate_file- evaluate a file against an instance
MyDSLFile.evaluate- evaluate raw Ruby (as a String) or a block against an instance
MyDSLFile#evaluate_file- evaluate a file against this instance
MyDSLFile#evaluate- evaluate raw Ruby (as a String) or a block against this instance
dsl = MyDSLFile.new dsl.evaluate_file('/path/to/file.rb') dsl #=> #<MyDSLFile:0xabc123>
For each method you want to be exposed as part of the DSL API (which may be separate from the public API), simply call
require 'cleanroom' class MyDSLFile include Cleanroom def some_dsl_method # ... end expose :some_dsl_method end
With just that one additional line of code for the
expose method, you get all of the features and magic described before. Go ahead and try it out!
- Example cleanroom method
- Example cleanroom method
The slides and video from my talk are linked above, but I have included them here as well:
On a final note - there is still much exploration to be done in this area. Perhaps the DSL evaluation should set Ruby's
$SAFE level or prevent against
%x calls... maybe it should not. The cleanroom pattern and gem has been especially useful in my daily work, and I really hope you benefit from it as well!
Seth Vargo is an engineer at Google. Previously he worked at HashiCorp, Chef Software, CustomInk, and some Pittsburgh-based startups. He is the author of Learning Chef and is passionate about reducing inequality in technology. When he is not writing, working on open source, teaching, or speaking at conferences, Seth advises non-profits.