Tuesday, October 12, 2010

Overhaul

As none of you have noticed (as I have no readers), I've given my blog a major overhaul. Along with a different look, I'm also now using SyntaxHighlighter which allows me to make some really pretty posts.

I've also renamed my blog from "Russell Harmon's Blog" to "Page Fault" in reference to what occurrs when you attempt to access a virtual address which is not resident in memory.

EDIT 2010-10-13: I appear to be working on yet another blog overhaul. This one is far from done. There must be something wrong with me.

Friday, October 1, 2010

Object Oriented C

So, thanks to blocks, Apple's new extension to C, you can now do basic object-orientation. Have a look over at github for a short example on how to do it.

To break it down, an object is a struct, which contains both fields and blocks which act as the object's methods.
typedef struct {
    Object super;
    char *_value;
    const char *(^getValue)();
    void (^setValue)( const char * );
} String;
This creates a String object which inherits from Object and has a field _value, and methods getValue and setValue.

Because this String's first field is an Object, it can be safely casted to one (upwards casting).

More to come... maybe.

Monday, July 5, 2010

Bash Scripters: Stop using subshells to call functions.

When writing in bash, zsh, sh, etc... stop using subshells to call functions. There is a significant speed overhead to using a subshell and there is a much better alternative. Instead, you should just have a convention where a particular variable is always the return value of the function (I use retval). This has the added benefit of also allowing you to return arrays from your functions.

If you don't know what a subshell is, a subshell is another bash shell which is spawned whenever you use $() or `` and is used to execute the code you put inside.

I did some simple testing to allow you to observe the overhead. For two functionally equivalent scripts:

This one uses a subshell:
#!/bin/bash
function a() {
    echo hello
}
for (( i = 0; i < 10000; i++ )); do
    echo "$(a)"
done
This one uses a variable:
#!/bin/bash
function a() {
    retval="hello"
}
for (( i = 0; i < 10000; i++ )); do
    a
    echo "$retval"
done
The speed difference between these two is noticeable and significant.
$ for i in variable subshell; do
> echo -e "\n$i"; time ./$i > /dev/null
> done

variable

real 0m0.367s
user 0m0.346s
sys 0m0.015s

subshell

real 0m11.937s
user 0m3.121s
sys 0m0.359s

Monday, May 24, 2010

Reclaimable Userspace Cache Memory

Caches are used all over your computer and for a huge variety of purposes. From apache to your physical CPU, cache is everywhere. Normally, when you want to cache something in memory, you malloc(3) a chunk of memory, and store data in that. This works well in the small scale, but when you and 30+ others want to cache some information, that can quickly turn into a large amount of memory taken up by information which can be (easily, or not so easily) regenerated, and there is no way for the operating system to reclaim that memory when it really needs it.

In Java, that's not the case. In Java, you can create SoftReference objects which are collected by the garbage collector when the VM is running out of memory. This exact idea is what I'd like to see in an operating system.

I propose a system, whereby you can allocate memory which the operating system can reclaim at it's own discretion. This would work by using malloc(3) to get some memory, then using madvise(2) to advise to the kernel that this is reclaimable memory. Then, before you read or write to the memory, you lock the memory (for read or write) using reclock, during which time the kernel guarantees not to reclaim the memory. Then, when you are done reading / writing to that memory, recunlock it.

The function prototypes for the reclock and recunlock functions (which don't exist) would be:
// Returns 0 on success, -1 if the memory
// is no longer available
int reclock( const void *addr, int perms );
void recunlock( const void *addr );
Under the hood, what would happen is that when you madvise(2) the kernel that a particular space is reclaimable, it would add it to a list of reclaimable addresses. Then, when the system is low on memory, it would scan the list for a chunk of memory large enough, check that the memory isn't locked (read next paragraph), mark that element in the list as reclaimed and with the pid that it was taken from, and give it to someone else.

Before simply giving a chunk of memory to someone else however, the kernel has to check to see if the memory is in use. In order to do that, there has to be a lock bit somewhere. I had originally thought to put it in the kernel's memory, but Clockfort noted that locking and unlocking would require a system call, which would be quite slow. Therefore, the bit can be kept in the processes memory space, and simply read by the kernel before reclaiming memory. That way, reclock and recunlock can be implemented entirely without syscalls.