New Document
Unix - Quota program

Disk quotas for user corin (uid 11035): .

Filesystem  blocks   quota   limit   grace   files   quota   limit   grace
      /Sucia/u4    2650   60000   80000             295       0       0        

On these Alphas, blocks are 1KB, so I see that I'm using a bit less than 3 MB of space out of my total allotment of 60MB. .

Now, let's say that, instead of using less than 5% of your quota, you're up nearly 95% of your quota, and you'd like to find where all your space has gone. A tool that will come in handy here is the du command (short for disk usage). My usage on the IWS machines isn't very intersting, so I'll show you what I see on my GWS machine:

tobiko% du -k
[...]
120     ./142/section
12      ./142/old-RCS
204     ./142/mac-mw
596     ./142
288     ./558/proj1/fig
776     ./558/proj1
16      ./558/proj2/lsys/CVS
448     ./558/proj2/lsys
16      ./558/proj2/smooth/CVS
3600    ./558/proj2/smooth
4052    ./558/proj2
5012    ./558
4       ./tmp
105952  .
tobiko% 

Of course, I omitted a lot of output here. Each line shows a directory and the size, in KB, of the files in that directory (that's what the -k option flag told du; the default is to report blocks of 512 bytes). The last line shows the total usage of all file in and below the current directory. Here, you see that I'm using not quite 106MB of space. .

Now is the time to be introduced to the first major emphasis of this tutorial: the power of pipes. We'll talk about pipes now, and the power will become evident by the end of the tutorial (I hope...). Anyway, let's run du again, but this time, let's see it a page at a time: .

$ cat /etc/passwd | sed 'd'
$

Instead of invoking sed by sending a file to it through a pipe, you can instruct sed to read the data from a file, as in the following example.

The following command does exactly the same thing as the previous Try It Out, without the cat command:

du -k | more

We talked about more in the first UNIX tutorial, so this isn't very exciting. What would be neat would be to see our disk usage sorted in order of greatest to least. We can do just that using the sort utility. .

The following command adds a basic address to the sed command you've been using:

du -k | sort -rn | more

The -n flag tells sort to sort the entries numerically. The -r says to output in reverse (descending) order. The first several entries that I see now are: .

105952  .
35932   ./archives
23232   ./www
20660   ./archives/research
13332   ./archives/courses
9640    ./557
8264    ./mail

I can immediately see that about a third of my space is dedicated to holding old coursework and research, and another quarter are my web pages. The next two entries, however, aren't really useful. My home directory space is tree-like, and I'm really only concerned about finding how the space is distributed one level down -- I don't care about how the archive space is split in research and courses. A solution to this problem is to select only the lines in du's output that contain a string of a certain type -- a regular expression. We'll use the egrep tool (cousin to grep) to do just that: :

tobiko% du -k | egrep '\./[^/]*$' | sort -rn | head
35932   ./archives
23232   ./www
9640    ./557
8264    ./mail
6960    ./acm
5872    ./research
5012    ./558
2780    ./elisp
1860    ./.netscape
856     ./sw
tobiko%

I now see just the top-level directories that are taking up most of the space in my home directory. Note, also, that I'm using the head command to grab just the first 10 lines of output. head -n retrieves the first n lines of output. :

I think that we're pretty well warmed up now. Let's move on to some more interesting applications, including more fun UNIX utilities and really cool pipelines. .

Number of hits to web pages:

The first thing we'll try is to find out how many hits the department's course web pages had yesterday. Each entry in the log file is a separate hit, so we'll just count the lines. UNIX's wc tool is the best bet here. The -l option asks wc (word count) to report the number of lines in a file (the default, with no option, reports the number of characters, lines, and paragraphs):

tobiko% wc -l log
  18784 log
tobiko%
Number of hits to HTML pages:
tobiko% grep html log | wc -l
  11095
tobiko%

Unfortunately, this method still counts hits to non-HTML pages. In particular, if you look at a single line in the access log, you'll see that there are two URLs listed -- the destination, and the referring URL. If a web page has an inlined image, then that web page is listed as the referring URL for the access of the image. We don't want to count that page twice. What do we do? I'm glad you ask! :

List of machine, referrer, and destination:

The solution to the dilemma above is to select only certain fields of the access log to consider. For the moment, let's select the host that accessed the page, the page accessed, and the referring URL. The tool that we'll use here is awk. :



tobiko% awk '{ print $1 "\t" $11 "\t->\t" $7 }' log | head

orcas.cs.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Messages/paper9/0003.html"   ->      /education/courses/551/CurrentQtr/Messages/paper9/0004.html
tide16.microsoft.com    "-"     ->      /education/courses/401/98sp/
ohaton.cs.ualberta.ca   "-"     ->      /education/courses/401/CurrentQtr/
ohaton.cs.ualberta.ca   "-"     ->      /education/courses/401/CurrentQtr/
dhcp133i.ee.washington.edu      "http://www.cs.washington.edu/education/courses/143/98sp/homework/hw1/solution/index.html"  ->      /education/courses/143/98sp/homework/hw1/solution/lmatrix.cpp
orcas.cs.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Messages/paper9/0004.html"   ->      /education/courses/551/CurrentQtr/Messages/paper9/0005.html
cs210-81.student.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Papers/index.html"   ->      /education/courses/551/CurrentQtr/Papers/paper_9_index.html
cs210-81.student.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Papers/paper_9_index.html"   ->      /education/courses/551/CurrentQtr/Messages/paper9
cs210-81.student.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Papers/paper_9_index.html"   ->      /education/courses/551/CurrentQtr/Messages/paper9/
cs210-81.student.washington.edu "http://www.cs.washington.edu/education/courses/551/CurrentQtr/Messages/paper9/"    ->      /education/courses/551/CurrentQtr/Messages/paper9/0000.html

tobiko% 


Number of hits to HTML pages:

Now that we know how to select only the fields that we want, we can grep for HTML pages in just the accessed pages. Let's do that. :

tobiko% awk '{print $7}' log | grep html | wc -l
   5464
tobiko% 
Number of hits to HTML pages, including directory hits:

We now have a new problem. Recall that there are two ways to access the main index.html file in a directory:

http://www.cs/people/acm/
http://www.cs/people/acm/index.html

When counting hits to HTML pages, we'd like to also count hits to URLs that are just the directory, as above. What we need is some way to canonicallize the two URLs into the same. We'll use the sed (stream editor) tool to add in an index.html string to any URLs that end in a / (i.e., are directories).

http://www.cs/people/acm/
http://www.cs/people/acm/index.html

Previous                                                                                                                                                       Next

Back to Top