This approach has some advantages:
- on a congested network you can still open webpages faster because some contents doesn’t need to be retrieved from the internet but from a local cache (within your local network);
- you can install a parental control and/or an antivirus to check what pages can be opened from the computers within the network (and properly configured to use the proxy).
Obviously there are some disadvantages, such as the fact that you can’t be sure that the cached objects are fresh (not changed) so you can encounter strange problems with websites; you can also encounter some problems with audio/video contents. Some of these problems can be avoided with properly configurations.
Let’s start with the installation and configuration of Squid on a home-server based on ArchLinux: the procedure is almost the same with other distributions.
First download the package using your package manager (pacman if you’re using ArchLinux):
pacman -S squid
Then you need to configure squid, to do so open /etc/squid.conf and read carefully the comments. There are a lot of options but you really need to check and change only a few of them:
After these initial configuration, where only two directly affect efficacy (cache_mem and cache_dir) there are some really important configuration that Squid uses to understand what and how the elements have to be cached.
The directives uses a pattern that matches the objects by extension and/or name, then a minimum and maximum lifetime and a % is used to statistically determine when an item is stale and needs to be discarded, for example:
- 10080 90% 43200: this means that the item is considered fresh if his time is between now and 10080 (3 hours) seconds ago, stale (discarded) if his time is older than 43200 (12 hours) second, if the time is between 10800 seconds and 43200 seconds the item is fresh with a 90% probably (high);
- 1440 20% 10080: same as above, if time is less than 1440 the item is fresh, if the time is higher than 10800 the item is stale and finally if the time is between 1440 and 10080 the item is fresh with a 20% probability (low).
High % means that an object is unlikely to change, low % should be used for items that probably will ofter change. This is not an exact science, if an element changes (such as a new css of a newer version of a javasript) you still may load is your browser the older version, then never use a too long time (a day, not more). Be sure to read the official documentation for a more in-depth explanation: http://www.squid-cache.org/Doc/config/refresh_pattern/
My configuration is:
refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern -i \.(gif|png|jpg|jpeg|ico)$ 10080 90% 43200 override-expire ignore-no-cache ignore-no-store ignore-private
refresh_pattern -i \.(iso|avi|wav|mp3|mp4|mpeg|swf|flv|x-flv)$ 43200 90% 432000 override-expire ignore-no-cache ignore-no-store ignore-no-private
refresh_pattern -i \.(deb|rpm|exe|zip|tar|tgz|ram|rar|bin|ppt|doc|tiff)$ 10080 90% 43200 override-expire ignore-no-cache ignore-no-store ignore-no-private
refresh_pattern -i \.index.(html|htm)$ 0 40% 10080
refresh_pattern -i \.(html|htm|css|js)$ 1440 40% 40320
refresh_pattern . 0 40% 40320
Let’s examine it line by line:
- ftp are fresh under 1440 seconds and stale after 10080 but they are likely to change (20%)
- gopher are fresh under 1440 and then stale
- cgi-bin (scripts such as php) are never cached because you know, they change every time…
- images are fresh under 10080 and stale after 43200 and they are unlikely to change (90%)
- videos are fresh under 43200 and stale after 432000 (5 days) and they are unlikely to change (90%)
- archives are fresh under 10080 and stale after 43200 and they are unlikely to change (90%)
- index pages of some sites are fresh until 10080 seconds
- all other things are never fresh and they stale after 40320
For strange cases, such as windows update archives, you can find on the internet the line/s that you need to add. Keep in mind that the first line that matches is used so you need to order the rules in reverse order.
Finally enable and start the daemon. On ArchLinux, that uses systemd, this can be accomplished with these two commands:
systemctl enable squid
systemctl stop squid
A few final considerations:
- install on your server a tool like webmin, this way you can check squid’s statistics to see the cache hit %;
- remember that browser cache may alter the statistics since the object is retrieved locally and not on the squid cache, for testing purposes disable the browser cache and then set it to a lower amount (ssd disks will benefit and you save space);
- more computers uses the caches, more the cache is fresh and more you can expect a higher cache-hit %;
- to avoid over-kill Squid should be used on networks that have at least 2-3 computer, otherwise you’ll benefit only because you can have a huge cache (gigabytes not megabytes);
- cache-hit should be at least 15-20% but don’t expect values such as 80-90% because https is never cached (and it’s better this way since to enable https you have to do things that it’s better to not do) and because not all the objects can be cached (such as php pages).
Next time I’ll show you how to configure and install an antivirus layer using clamav. As always if you have any questions feel free to contact me using the comments below 🙂