Setup a cache proxy with Squid

Today I’m going to explain how to setup a cache proxy within your local network. A cache proxy is a system that stores frequently accessed web objects for a fast retrieval, it works well with static contents such as html pages, css scripts, javascripts, images and even downloaded files if correctly configured.

This approach has some advantages:

  • on a congested network you can still open webpages faster because some contents doesn’t need to be retrieved from the internet but from a local cache (within your local network);
  • you can install a parental control and/or an antivirus to check what pages can be opened from the computers within the network (and properly configured to use the proxy).

Obviously there are some disadvantages, such as the fact that you can’t be sure that the cached objects are fresh (not changed) so you can encounter strange problems with websites; you can also encounter some problems with audio/video contents. Some of these problems can be avoided with properly configurations.

Let’s start with the installation and configuration of Squid on a home-server based on ArchLinux: the procedure is almost the same with other distributions.

First download the package using your package manager (pacman if you’re using ArchLinux):

pacman -S squid

Then you need to configure squid, to do so open /etc/squid.conf and read carefully the comments. There are a lot of options but you really need to check and change only a few of them:

  • http_port: the port where squid will listen for request, usually 3128 but you can change it without problems;
  • http_access: these lines defines the access permissions to the proxy, usually you want to allow access for localhost and localnet and then deny the access for everything else. To do so (it should be already into the default configuration file):
    # Define what is localnet
    acl localnet src
    acl localnet src
    acl localnet src
    acl localnet src fc00::/7
    acl localnet src fe80::/10
    # Enable localhost and localnet
    http_access allow localnet
    http_access allow localhost
    # And finally deny all other access to this proxy
    http_access deny all
  • cache_mgr: the email address for the cache_manager;
  • shutdown_lifetime: defines the time to wait until the service is stopped when required;
  • cache_mem: the memory (RAM) used as a buffer for requests: at least 256/512MB to have decent performance;
  • visible_hostname:  the hostname of your server;
  • fqdncache_size: the size of the resolved domain cache, use at least 1024;
  • maximum_object_size: the maximum size of objects in the cache, set this at least to 10MB otherwise you’ll only cache small files (no large images for example);
  • cache_dir: the location of the cache, this parameter is quite complex. It’s defined as:
    cache_dir ufs /var/cache/squid 20000 16 256: first the file system (ufs), then the location of the cache(/var/cache/squid), then the maximum size (20000MB or ~20Gb), then the number of folder at the first level (16) and finally the number of folder at the second level (256). To be honest you just have to change the maximum size to a serious amount such as 20-100GB. More cache means more files that doesn’t need to be retrieved from the internet.

After these initial configuration, where only two directly affect efficacy (cache_mem and cache_dir) there are some really important configuration that Squid uses to understand what and how the elements have to be cached.

The directives  uses a pattern that matches the objects by extension and/or name, then a minimum and maximum lifetime and a % is used to statistically determine when an item is stale and needs to be discarded, for example:

  • 10080 90% 43200: this means that the item is considered fresh if his time is between now and 10080 (3 hours) seconds ago, stale (discarded) if his time is older than 43200 (12 hours) second, if the time is between 10800 seconds and 43200 seconds the item is fresh with a 90% probably (high);
  • 1440 20% 10080: same as above, if time is less than 1440 the item is fresh, if the time is higher than 10800 the item is stale and finally if the time is between 1440 and 10080 the item is fresh with a 20% probability (low).

High % means that an object is unlikely to change, low % should be used for items that probably will ofter change. This is not an exact science, if an element changes (such as a new css of a newer version of a javasript) you still may load is your browser the older version, then never use a too long time (a day, not more). Be sure to read the official documentation for a more in-depth explanation:

My configuration is:

refresh_pattern ^ftp: 1440 20% 10080
refresh_pattern ^gopher: 1440 0% 1440
refresh_pattern -i (/cgi-bin/|\?) 0 0% 0
refresh_pattern -i \.(gif|png|jpg|jpeg|ico)$ 10080 90% 43200 override-expire ignore-no-cache ignore-no-store ignore-private
refresh_pattern -i \.(iso|avi|wav|mp3|mp4|mpeg|swf|flv|x-flv)$ 43200 90% 432000 override-expire ignore-no-cache ignore-no-store ignore-no-private
refresh_pattern -i \.(deb|rpm|exe|zip|tar|tgz|ram|rar|bin|ppt|doc|tiff)$ 10080 90% 43200 override-expire ignore-no-cache ignore-no-store ignore-no-private
refresh_pattern -i \.index.(html|htm)$ 0 40% 10080
refresh_pattern -i \.(html|htm|css|js)$ 1440 40% 40320
refresh_pattern . 0 40% 40320

Let’s examine it line by line:

  • ftp are fresh under 1440 seconds and stale after 10080 but they are likely to change (20%)
  • gopher are fresh under 1440 and then stale
  • cgi-bin (scripts such as php) are never cached because you know, they change every time…
  • images are fresh under 10080 and stale after 43200 and they are unlikely to change (90%)
  • videos  are fresh under 43200 and stale after 432000 (5 days) and they are unlikely to change (90%)
  • archives are fresh under 10080 and stale after 43200 and they are unlikely to change (90%)
  • index pages of some sites are fresh until 10080 seconds
  • other html pages, css and javascript scripts are fresh under 1440 second and they stale after 40320
  • all other things are never fresh and they stale after 40320

For strange cases, such as windows update archives, you can find on the internet the line/s that you need to add. Keep in mind that the first line that matches is used so you need to order the rules in reverse order.

Finally enable and start the daemon. On ArchLinux, that uses systemd, this can be accomplished with these two commands:

systemctl enable squid
systemctl stop squid

A few final considerations:

  • install on your server a tool like webmin, this way you can check squid’s statistics to see the cache hit %;
  • remember that browser cache may alter the statistics since the object is retrieved locally and not on the squid cache, for testing purposes disable the browser cache and then set it to a lower amount (ssd disks will benefit and you save space);
  • more computers uses the caches, more the cache is fresh and more you can expect a higher cache-hit %;
  • to avoid over-kill Squid should be used on networks that have at least 2-3 computer, otherwise you’ll benefit only because you can have a huge cache (gigabytes not megabytes);
  • cache-hit should be at least 15-20% but don’t expect values such as 80-90% because https is never cached (and it’s better this way since to enable https you have to do things that it’s better to not do) and because not all the objects can be cached (such as php pages).

Next time I’ll show you how to configure and install an antivirus layer using clamav. As always if you have any questions feel free to contact me using the comments below 🙂


A simple script to backup all mysql databases into separate files

Hi, today I’m going to present you a very simple script that I use to backup all my mysql databases into separate files compressed with gz compression. Doing this way, to restore a database you just need to extract the dump from the file and restore it with this command:

mysql -uUSER -pPASSWORD DB_NAME < db_dump.sql

The script is the following:


#       croma25td simple mysql backup script v1.0
#       croma25td at gmail dot com

# User defined variables
# Mysql user with read privileges on all databases

# The current date
date=$(date +%Y-%m-%d_%H:%M:%S)

# The folder that will cointain all the backups
# End user defined variables

# If the output folder doesn't exists create it
test -d $BACKUP_DIR || mkdir -p $BACKUP_DIR
# Get the database list, excluding some db names
for db in $(mysql -B -s -u $MYSQL_USER --password=$MYSQL_PASS -e 'show databases' | grep -vE '(information_schema|performance_schema|mysql)' )
  # dump each database in a separate file
  mysqldump -u $MYSQL_USER --password=$MYSQL_PASS $db | gzip > $BACKUP_DIR/$db.sql.gz

First there are some user defined variables:

  • MYSQL_USER and MYSQL_PASS: the username and the password of a user with read privileges on all the database to include in the backup;
  • date: it’s simply the current date/time, used to easily identify different backups;
  • BACKUP_DIR: the folder where the backups will be saved, I used a /backup folder and every backup is within a folder named as the current date. This will create a structure like this:

Then comes the script:

  • first there is a check on the output folder, to verify that exists and otherwise to create it;
  • then we obtain all the databases names, in two different ways:
    • if you want to exclude some database include their names within the grep -vE command using | as separator: mysql -B -s -u $MYSQL_USER –password=$MYSQL_PASS -e ‘show databases’ | grep -vE ‘(information_schema|performance_schema|mysql)
    •  otherwise to get all the names: mysql -B -s -u $MYSQL_USER –password=$MYSQL_PASS -e ‘show databases’
  • lastly mysqldump will create a backup for all the databases and gzip will compress it into a single file.

As I mentioned you need a user with read privileges on all the databases, and forget about using root user 🙂

So, to create a user just use these commands:

  • open a mysql console using the root account: mysql -uroot -p
  • create the user with name USER and with a PASSWORD:GRANT LOCK TABLES, SELECT ON *.* TO ‘USER’@’%’ IDENTIFIED BY ‘PASSWORD’;
  • then finally flush the privileges:flush privileges;
  • close the mysql console with \q , make the script executable with: chmod +x  and test the script.

Now you also may want to add a crontab entry to execute it automatically.

As always if you have any questions of suggestions just use the comments below :)


A simple backup script using rsync

Today I’m going to present you a simple backup script that I wrote a while ago to maintain backups for my home computers and my server.

This script requires only one program to be installed, rsync. And optionally ssh if you want to store your backups on a remote machine.

With rsync you can do a lot of things, but in this post I’m going to explain only a simple way to copy a folder into another located:

  • on the same computer (for example an external hard drive);
  • on a remote computer (for example a server or a NAS).

Let’s start by examining the contents of the following script:


#       croma25td simple backup script v1.0
#       croma25td at gmail dot com

#user defined variables

#destination folder on the local or remote machine

#server parameter -- use this only if you need to backup on a remote machine (via internet or local network)
#the ssh host with username@address
#the ssh port, 22 by default
#the path for the private key, with this you don't have to provide a password during the connection via ssh, usually stored in ~/.ssh/id_rsa.

#the log file, usually in /var/log

#the header for the new log
echo '-----------------------------------------------------------------------------' >> $LOG_FILE
date >> $LOG_FILE

#rsync command list
#the actual commands -- remote machine version
rsync -aR --stats --delete --rsh='ssh -p '$PORT'  -i '$KEY_PATH'' /etc $HOST:$DEST &>> $LOG_FILE
rsync -aR --stats --delete --rsh='ssh -p '$PORT' -i '$KEY_PATH'' /home  $HOST:$DEST &>> $LOG_FILE

#the actual commands -- local machine version
rsync -aR --stats --delete /etc $DEST &>> $LOG_FILE
rsync -aR --stats --delete /home $DEST &>> $LOG_FILE

The first section of the script will contain the user defined variables:

  • DEST: is the destination folder of your backup; this is the full path on your local or remote machine;
  • HOST: used only if you need to save the data on a remote machine, it’s the combination of the username (with write permissions on DEST folder) on the remote machine plus the hostname; the format is: username@hostname;
  • PORT: the ssh port, by default 22;
  • KEY_PATH: the fullpath to the local user’s (who executes the script) private key;
  • LOG_FILE: the fullpath to a logfile on the local machine (logs are usually stored in /var/log/).

The second section will print only a small header on the log file at each run of the script to separate the old outputs and know the last backup date.

The third and most important section issues the rsync commands, one per folder to backup. In this example we are going to backup two folder, /etc and /home where are stored respectively the system configuration and the user’s datas. Add one line per folder you want to backup.

Each rsync command has some parameters:

  • -a: ‘archive’ activates recursion into the folders and preserve all file’s metadata;
  • -R: ‘relative’ will create the same folder structure on the server;
  • –stats: rsync will write, at the end of the job, a small report;
  • –delete: activate the propagation of file’s deletion: if a file is deleted on the source it will be deleted even in the backup; remove this if you want to preserve all your old datas!

Then there are the connection parameters (if applicable), the source folder and the destination folder:

  • to store the data on a remote machine: rsync -a –stats –delete –rsh=’ssh -p ‘$PORT’ -i ‘$KEY_PATH” /home $HOST:$DEST
    We are using ssh as transfer protocol using the private key for the autentication, then we specify the source folder and the destination folder stored on the server.
  • to store the data on the same local machine: rsync -a –stats –delete /home $DEST
    We are telling rsync to copy the data from /home to $DEST.

At the end of the command there is the &  operator that tells bash to fork the process in a subshell and >> $LOG_FILE  to redirect all outputs to our log file.


Now a three-step how-to to create a private/public set of keys to authenticate via ssh:

  • execute this command with the user that will launch the script (manually or with a cron):

    Then follow the instructions: you have just created your public and private keys; Don’t use any passphrase if you want to execute this script with a cron job otherwise the script will wait for a password that you can’t enter… so just press enter two times to enter a blank passphrase;

  • save into the server your public key so you can authenticate on it by using your private key (do not share this one with anyone!):
    ssh-copy-id -i ~/.ssh/ remote-host

    Change remote-host to your server host;

  • try to login with:
    ssh remote-host

    If it’s all ok it shouldn’t be necessary to write a password.

As always if you have any questions of suggestions just use the comments below 🙂