If you do pretty much anything with data, learning the tar
command is going to be worth your while. But suppose you do things with lots and lots of heavy data?
Well, one trick you’ll want to add to your arsenal is the –-exclude
option.
Let’s say you want to package up a website with a ton of images located in an image directory. But that there are so many images located in that image directory that it would take all night to tar it, so we’re talking REALLY big.
Where there’s a will, there’s a way. You can simply exclude the unwanted baggage and tar the rest by using --exclude
.
It works like this…
Let’s say you want to tar an entire website located in /httpdocs
, but you want to exclude the directory located at httpdocs/files/images
It’s really just this simple…
This will tar the whole site (which is located in /httpdocs
) into a tar.gz file called mysitename.tar.gz but won’t include the directory /httpdocs/files/images
.
The same works for individual files. To exclude more than one file or directory, just add more -- excludes.
For reference, here’s a bit more about some of the more common .tar commands.
- c – create a new archive.
- t - list the contents of an archive.
- x - extract the contents of an archive.
- f - the archive file name is given on the command line (required whenever the tar output is going to a file)
- p – preserves the permissions.
- v - print verbose output (lists file names as they are processed).
- u - add files to the archive if they are newer than the copy in the tar file.
- z - compress or decompress files automatically.
This is the localized code for our select list switcher
When doing anything from the command line, like using Mac's Terminal or running commands in Linux, dealing with spaces can be problematic.
Spaces function as a separator so something coming after a space will be treated as an independent entity across unix-based systems, like Mac and Linux. For example, if there were a folder named "Open Active" and you wanted to navigate inside using the cd command, you might be tempted to type:
However, this only tells the machine to try to navigate inside a folder called "Open" in the current directory while "Active" sort of hangs off to the side like a sixth toe. Assuming you don't have a folder called "Open," you'll probably get a message something like this:
What gives? It's that pesky space. Rather than rename your directory, there are a couple of easy ways to deal with this little problem.
Quoting
My favorite for its utter simplicity and universal applicability is to simply put file names in single quotes (') like this:
Single quotes is pretty much a universal way to tell a machine to deal with what's inside of the quotes just the way it is, not to do any fancy stuff.
Escaping
There is another method you should be aware of which is also a great way to deal with a whole variety of scenarios beyond spaces, but works for spaces as well, which is to use the backlash (\) as an escape. What this does is tells the machine to deal with the next character just the way it is, not to do any fancy stuff.
So, you can also deal with the aforementioned space in the file name problem like this:
The reason that it's not quite as elegant is that if you have a directory or file name that has a lot of spaces, you'll need to do a lot of escaping. For example, if you wanted to remove a file (using the rm command) called "general instructions for how to escape a blank space.txt" You would have to escape 9 times:
Irritating right? Life is much easier when you just do the following:
That's all there is to it. With these two arrows in your quiver, you can defeat spaces in file and directory names without breaking a sweat.
If you're running Apache (and you probably are), there is a simple built in load test that you can run to get a basic idea of how your site / server will perform. From the command line, you can run...
- ab = apache bench
- n = number of requests
- c = the number to run concurrently (at the same time)
- http://www.greencrescent.com/ is the domain we're testing (replace with your target domain)
The output looks like...
Options are: -n requests Number of requests to perform -c concurrency Number of multiple requests to make -t timelimit Seconds to max. wait for responses -b windowsize Size of TCP send/receive buffer, in bytes -p postfile File containing data to POST. Remember also to set -T -u putfile File containing data to PUT. Remember also to set -T -T content-type Content-type header for POSTing, eg. 'application/x-www-form-urlencoded' Default is 'text/plain' -v verbosity How much troubleshooting info to print -w Print out results in HTML tables -i Use HEAD instead of GET -x attributes String to insert as table attributes -y attributes String to insert as tr attributes -z attributes String to insert as td or th attributes -C attribute Add cookie, eg. 'Apache=1234. (repeatable) -H attribute Add Arbitrary header line, eg. 'Accept-Encoding: gzip' Inserted after all normal header lines. (repeatable) -A attribute Add Basic WWW Authentication, the attributes are a colon separated username and password. -P attribute Add Basic Proxy Authentication, the attributes are a colon separated username and password. -X proxy:port Proxyserver and port number to use -V Print version number and exit -k Use HTTP KeepAlive feature -d Do not show percentiles served table. -S Do not show confidence estimators and warnings. -g filename Output collected data to gnuplot format file. -e filename Output CSV file with percentages served -r Don't exit on socket receive errors. -h Display usage information (this message) -Z ciphersuite Specify SSL/TLS cipher suite (See openssl ciphers) -f protocol Specify SSL/TLS protocol (SSL2, SSL3, TLS1, or ALL)
We recently had to remove a file with a pretty odd file name: "\ я\ 005-2.jpg".
Normally, you can deal with oddities (such as spaces in file names) by just putting the file in quotes and removing normally like...
However, when it comes to backslashes and other special characters in file names, things get tricky.
The way to remove a file like the above is to first find the Inode number. You can do this in one of two ways:
First, using \ я\ 005-2.jpg as an example, you can run stat
from the command line.
A second option is to run ls -li
as in...
This will give you an output like this:
In either event, you can see that the Inode value is 13770830.
To get rid of the troublesome \ я\ 005-2.jpg (a.k.a. 13770830) use the find
command and tell it to remove by Inode as in -exec rm -i
. For example:
Farewell \ я\ 005-2.jpg, we hardly knew ya.