Tuesday, July 31, 2012

Cache Issues in the I/O Performance Test

Sometime you have to benchmark IO performance such as file read/write. If it was the first time you did such kind of task, you almost probability notice that the reading speed is very high, hundreds of MB per second, with a slow hard disk, such as mechanical one with 5400rpm. That is the effects of file caching.

I will show you here some techniques to overcome or minimize the cache effects.

1. Re-mount


The easiest way is to umount and re-mount the corresponding mount point. For example,

# umount /home
# mount /home

2. Clear Cache of the OS


On Linux, you can clear or drop cache of the OS by using the following command.
# sync && echo 3 > /proc/sys/vm/drop_caches

The former part of the above command will commit buffer cache to disk, and the latter part will tell OS to drop buffer caches immediately.

There are three levels of dropping cache with corresponding numbers.
1 - Free pagecache
2 - Free dentries and inodes.
3 - Free pagecache, dentries, and inodes.

3. Use Direct I/O for POSIX


When you open a file with the flag O_DIRECT, you bypass the I/O buffers and therefore bypass the cache effects of cache at the operating system level. Almost device drivers support POSIX compatible API also support O_DIRECT flag, except of some parallel distributed file systems such as PanFS.

Benchmark tools such as IOR (-B), IOzone (-I), dd (direct) have flag for this feature. In programming, use the following example.

open(filename, O_DIRECT);

4. Clear Cache for a Specific File


If you want to clear cache for a specific file, you can use the following program clearcache.c to tell OS not to cache the file. It actually helps you clear cache of a list of files provided via program's arguments.

/* clearcache.c - mrcuongnv */

#include <stdio.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <errno.h>
#include <string.h>

int
clear_file_cache(filename)
     char *filename;
{
  int fd, rs;

  printf("%s", filename);
  if ((fd = open(filename, O_RDONLY)) != -1) {
    if ((rs = posix_fadvise(fd, 0, 0, POSIX_FADV_DONTNEED)) == 0) {
      printf(" --> Cleared\n");
      return 0;
    }
  }
  printf(" --> %s (%d)\n", strerror(errno), errno);
  return 1;
}

int
main(argc, argv)
     int argc;
     char *argv[];
{
  if (argc < 2) {
    fprintf(stderr, "Syntax: %s FILES\n", argv[0]);
    return 1;
  }

  int i, rs = 0;
  for (i = 1; i < argc; i++) {
    rs += clear_file_cache(argv[i]);
  }

  return rs;
}

The most important part in the above code is posix_fadvise(), which tells OS that the specified data will not be accessed in the near future.

Thursday, July 19, 2012

Git: Ignore All Contents of a Directory but the Directory Itself

You sometime want to ignore all contents of a directory but the directory itself in a commitment, such as cache/ directory for example. It can both reduce the amount of data and temporary files to be stored in the repository.

In case of Git, all you need is to create a .gitignore file inside the cache/ directory with  the content as follows.

*
!.gitignore

The first line tells Git to ignore all contents, but the second line tells Git to keep the file .gitignore, and therefore, keeps it to be a non-empty directory.