Page Fault

Tuesday, October 12, 2010

Overhaul

As none of you have noticed (as I have no readers), I've given my blog a major overhaul. Along with a different look, I'm also now using SyntaxHighlighter which allows me to make some really pretty posts.

I've also renamed my blog from "Russell Harmon's Blog" to "Page Fault" in reference to what occurrs when you attempt to access a virtual address which is not resident in memory.

EDIT 2010-10-13: I appear to be working on yet another blog overhaul. This one is far from done. There must be something wrong with me.

Friday, October 1, 2010

Object Oriented C

So, thanks to blocks, Apple's new extension to C, you can now do basic object-orientation. Have a look over at github for a short example on how to do it.

To break it down, an object is a struct, which contains both fields and blocks which act as the object's methods.

typedef struct {
    Object super;
    char *_value;
    const char *(^getValue)();
    void (^setValue)( const char * );
} String;

This creates a String object which inherits from Object and has a field _value, and methods getValue and setValue.

Because this String's first field is an Object, it can be safely casted to one (upwards casting).

More to come... maybe.

Monday, July 5, 2010

Bash Scripters: Stop using subshells to call functions.

When writing in bash, zsh, sh, etc... stop using subshells to call functions. There is a significant speed overhead to using a subshell and there is a much better alternative. Instead, you should just have a convention where a particular variable is always the return value of the function (I use retval). This has the added benefit of also allowing you to return arrays from your functions.

If you don't know what a subshell is, a subshell is another bash shell which is spawned whenever you use $() or `` and is used to execute the code you put inside.

I did some simple testing to allow you to observe the overhead. For two functionally equivalent scripts:

This one uses a subshell:

#!/bin/bash
function a() {
    echo hello
}
for (( i = 0; i < 10000; i++ )); do
    echo "$(a)"
done

This one uses a variable:

#!/bin/bash
function a() {
    retval="hello"
}
for (( i = 0; i < 10000; i++ )); do
    a
    echo "$retval"
done

The speed difference between these two is noticeable and significant.

$ for i in variable subshell; do
> echo -e "\n$i"; time ./$i > /dev/null
> done

variable

real 0m0.367s
user 0m0.346s
sys 0m0.015s

subshell

real 0m11.937s
user 0m3.121s
sys 0m0.359s

Monday, May 24, 2010

Reclaimable Userspace Cache Memory

Caches are used all over your computer and for a huge variety of purposes. From apache to your physical CPU, cache is everywhere. Normally, when you want to cache something in memory, you malloc(3) a chunk of memory, and store data in that. This works well in the small scale, but when you and 30+ others want to cache some information, that can quickly turn into a large amount of memory taken up by information which can be (easily, or not so easily) regenerated, and there is no way for the operating system to reclaim that memory when it really needs it.

In Java, that's not the case. In Java, you can create SoftReference objects which are collected by the garbage collector when the VM is running out of memory. This exact idea is what I'd like to see in an operating system.

I propose a system, whereby you can allocate memory which the operating system can reclaim at it's own discretion. This would work by using malloc(3) to get some memory, then using madvise(2) to advise to the kernel that this is reclaimable memory. Then, before you read or write to the memory, you lock the memory (for read or write) using reclock, during which time the kernel guarantees not to reclaim the memory. Then, when you are done reading / writing to that memory, recunlock it.

The function prototypes for the reclock and recunlock functions (which don't exist) would be:

// Returns 0 on success, -1 if the memory
// is no longer available
int reclock( const void *addr, int perms );
void recunlock( const void *addr );

Under the hood, what would happen is that when you madvise(2) the kernel that a particular space is reclaimable, it would add it to a list of reclaimable addresses. Then, when the system is low on memory, it would scan the list for a chunk of memory large enough, check that the memory isn't locked (read next paragraph), mark that element in the list as reclaimed and with the pid that it was taken from, and give it to someone else.

Before simply giving a chunk of memory to someone else however, the kernel has to check to see if the memory is in use. In order to do that, there has to be a lock bit somewhere. I had originally thought to put it in the kernel's memory, but Clockfort noted that locking and unlocking would require a system call, which would be quite slow. Therefore, the bit can be kept in the processes memory space, and simply read by the kernel before reclaiming memory. That way, reclock and recunlock can be implemented entirely without syscalls.

Thursday, May 28, 2009

Pure bash cat

So just to see if I could, I wrote a version of cat using pure bash. Pure bash is a bash script which uses nothing but bash builtins to accomplish it's goal. To determine if a particular command is a builtin, you can use the command type -t "command" (the command type, is itself a builtin). Some notable commands which are builtins include echo, read, exec, return. Some notable commands which are not builtins include cat and grep. As follows is my implementation of cat in pure bash.

#!/bin/bash
INPUTS=( "${@:-"-"}" )
for i in "${INPUTS[@]}"; do
    if [[ "$i" != "-" ]]; then
        exec 3< "$i" || exit 1
    else
        exec 3<&0
    fi
    while read -ru 3; do
        echo -E "$REPLY"
    done
done

Now, keep reading if you want a small lesson in advanced bash. I'll go line by line to explain what this is doing.

#!/bin/bash
INPUTS=( "${@:-"-"}" )

Line 1 is the shebang.

#!/bin/bash
INPUTS=( "${@:-"-"}" )
for i in "${INPUTS[@]}"; do

Line 2 assigns the array variable INPUTS either the arguments provided on the command line if they exist, or the single character "-". The way this happens is as follows: $@ is the variable to reference the positional parameters (the arguments to your program). If you have not heard of $*, read this. The way I reference the positional parameters is like ${@}. That's because the brackets allow me to add a "default value" to the variable. A default value is the value that the variable will seem to have if the variable is not set. The way to use a default value is with the :-, like so: ${@:-"hello"}. So if $@ is not set, it will seem to have the value "hello". You will then notice that is all enclosed in (). That makes an array out of the positional parameters (the first argument to the program becomes the first element in the array, the second argument becomes the second element, etc.).

INPUTS=( "${@:-"-"}" )
for i in "${INPUTS[@]}"; do
    if [[ "$i" != "-" ]]; then

Line 3 begins a for loop which will assign to i each value stored in the array INPUTS which was discussed earlier. The @ index used is the same for arrays as $@ is for the positional parameters.

Maybe i'll explain more when i'm less lazy.

Sunday, March 22, 2009

Chromium on Linux

So I decided today to make a shot at compiling google chrome on Linux... aaand after a number of compile errors, it works! Here's a screenshot of what you see when you start it up:

Some things I noted about it:

It took a long time to connect to many web sites.
It crashed a lot
There was no tab interface... opening a new tab worked, but you couldn't close it or get back to any old tabs.
It caused google to block me
No dialog boxes worked... couldn't open the options pane, no about pane, etc...

In short, the browser is not usable yet.

P.S. As you can see in my screenshot, there is a big disclaimer that this browser is NOT READY YET, so DON'T judge the quality of the linux port of chromium using any information you can get about it today!

Friday, January 23, 2009

Fixed lighttpd

So I made up a patch to lighttpd to allow the xattr Content-Type override anything in the configuration file. Here it is.

EDIT: A similar patch has been applied to the lighttpd trunk at r2425