We make heavy use of a Web caching or proxy server for outgoing HTTP requests, and squid is our preferred tool. I implemented squid a number of years ago, and its squid.conf configuration file has historically (or histerically) grown out of reasonable proportions with all manner of access control lists (ACL) strewn throughout. (If you need an overview of squid’s ACL language, they are documented here.)

We make heavy use of user authentication with LDAP, but for certain URLs or domains, we offer un-authenticated access. We had dozens lists of ACLs such as these:

    acl Custom1 url_regex -i ^http://www\.example\.org/
    http_access allow Custom1
    acl MS_something url_regex -i ^http://www\.microsoft\.com/
    http_access allow MS_something
    acl goog_whatvr dstdomain google.com
    acl allow CONNECT goog_whatvr

You get the idea. It is hideous.

I forget whether this feature didn’t exist at the time, or whether I’d simply overlooked it, but squid can (obviously – it does almost anything you need it to do) read lists of “things” from files on the file system, using the

    acl acl_name acl_type "file"

syntax. acl is a keyword, acl_name is the name I give this particular access control list, acl_type is one of the allowed types of ACL, such as dst for destination address, dstdomain for DNS domains, url_regex for matching URLs against regular expressions, etc. (see them all here.) The last argument file must be in quotes and specifies the filename from which squid reads the values.

So, what I then did, was to painstakingly create three new files containing our ACLs for destination domains, addresses and regular expressions, making sure I retained the comments indicating why that particular ACL was entered, and when, and by whom.

These value files as I call them, I then reference in squid.conf. The next few lines of configuration replace and endless list of ACLs in squid’s configuration:

    acl f_urlregex url_regex -i "/etc/squid/urlregex.rules"
    http_access allow f_urlregex
    acl f_dstaddress dst "/etc/squid/dstaddress.rules"
    http_access allow f_dstaddress
    ## domain names:
    #    names in this list are allowed for HTTP *and* HTTPS
    acl f_dstdomain dstdomain "/etc/squid/dstdomain.rules"
    http_access allow f_dstdomain
    http_access allow CONNECT f_dstdomain

The result is a much cleaner squid.conf, which we don’t have to touch. Alterations (additions, usually) to the rules are added to the rule files, and we do this semi-mechanically using make.

The value files contain lists of things: IP address/netmask ranges, destination domains, and URL regular expressions. For example, the file dstdomain.rules looks like this:


There are two things that you must keep in mind, if you organize your ACLs thusly:

  • Make liberal use of squid -k parse to have squid check the contents of these files before you attempt to squid -k reconfigure (or restart) the service. This will detect dud DNS domains, illegal address ranges for dst-type ACLs, etc.
  • Unfortunately (though probably just as well – avoids fubar), squid doesn’t reload these files automatically, so whenever you change one of the files, you’ll still have to instruct squid to reload itself, manually.

I reloaded the proxies early this morning, and almost everything went well. (I say “almost”, because Michael was quick to report two sites that didn’t work in un-authorized state. I fixed that quickly enough, and all seems well.)

The copying, pasting and massaging of our ACLs into this structure was boring, but I’m pleased I did it. It’ll make life easier, long-term. :-)

Linux, Database, MacOSX, CLI, squid, proxy, and ACL :: 18 Sep 2009 :: e-mail