MemeCode - Blog

Page: 0 ... 5 ... 10 ... 15 ... 20 ... 25 ... 30 ... 35 ... 40

Rise Of The Machines
Date: 22/4/2005	After yesterday's spike of website hits I thought I'd look into where that was comming from, and no referer spiked up to explain it. But the number of "bots" hitting the site had gone up to some 30% of the page loads on www.memecode.com, and I don't know about you, but thats er kinda high isn't it? I've starting tracking the bots by counting their hits per useragent string. And obviously Googlebot and Msnbot are leading the race early on but I suspect a rouge bot 'telnet0.1 noone@example.org' is responsible for yesterday's spike. I have in the past banned IP's due to the shear number of incomming hits for no apparent reason. Then of course Googlebot itself decided to hit my site over 61000 times in a 24 hr period some time ago now. Is it just me, or does 30% of your site traffic being eaten away by bots just annoy you? On a somewhat related note, I've also instituted a kill file for porn / scam sites that spam my referer log to help boost their Google ranking. I might add that I've also set my robots.txt file to stop scanning of the stats anyway, so even if you evade the kill file you won't receive any benifit from getting listed as a referer. I suspect that these sites are inserting themselves as the referer by infecting machines with spyware that frigs with IE's outgoing referer field thus littering the web's stats pages with their URL, which in turn makes their Google ranking grow. But I have no conclusive evidence that happens, but it's my current theory.
(2) Comments \| Add Comment

Site Update
Date: 16/4/2005	I've added a new feature to the forums that allows you to receive email notification of replies to a thread. A simple thing but it should save people some time checking back on a thread. Also I've fixed old posts not appearing in the forums. When I redid the threading code I killed the old posts, and that would show as error messages in the tables. I'm wondering if I set up an RSS feed for software releases would people be interested? Currently I have email notifications setup but some people might prefer an RSS feed.
(2) Comments \| Add Comment

Built In Obsolescence In Consumer Goods
Date: 11/4/2005	Recently in our household we have had a rash of devices have their interfaces, mostly buttons, fail on us. Firstly one of our Doro cordless landline phones stopped responding to some of the numbers making it useless to call anyone. When we bought it 2 odd years ago I was expecting good things from it, partly because it wasn't the cheapest off the rack and partly because it seemed european and maybe better quality. So far we're pretty disappointed with it, firstly the answering machine doesn't have a mode whereby you can screen calls and secondly you can't force it to not ring. A useful feature if your trying to sleep and you want the answering machine to take the call without it ringing. And now the other handset is showing signs of dying. Don't get a Doro phone if your in the market, they suck. Their only saving grace is that being digital the sound quality is very good. Then there is the pair of Nokia 3105 phones we got from Orange when we moved to a cheaper plan. And after just bit less than a year one of the handsets is not responding to some numbers and hanging every now and then. It hasn't been abused at all, but it's had fairly regular use. Pretty disappointing that it didn't even last a year. A black mark for Nokia. Now the remote for the VHS recorder is failing as well, the play button is non-functional and the device spits tapes out that it doesn't like, which is highly annoying. The player is now at least 5 years old, so it has lasted a little longer than the others I've mentioned. It seems I'm in the market for a new cordless phone and I'm a bit hesitant to buy any old device off the shelf. I want something that will last, not some flimsy throwaway appliance. But how do I know something will be still working in 5, or even 10 years? Is it unfair to expect a phone to still work after that long? I would have kept our cordless phones in operation for many years yet if they hadn't up and died on us. So I really only got half or less value out of the A$300 we spent on them. It's no surprise that most companies warrent their product for only a year. It seems consumer goods manufactures are taking us for a ride. Anyone had some good experiences with a cordless phone?
(5) Comments \| Add Comment

Optimizing Memory Usage

Date: 7/4/2005

It occured to me the other day that Scribe has definately lost some of that "lightweight" character that it used to have. In fact it was down right scary when I looked at the memory usage the other day after Scribe had been running for a few hours and it was 110mb. What the? Huh?

After I calmed down I decided to get to the bottom of it. For starters I did a leak test, and fixed every damn leak. But still the memory usage would rocket up to around 100mb. But it'd do that after the first receive. Alright what gets loaded during a mail receive? The bayesian spam word tables... bugger. Well they are only a few MB on disk, why are they adding 60mb to my memory image?

Good question.

So firstly I looked at the hash table sizes, and lo and behold they were much larger than needed because some of the word counts were way out which put off the preallocation of hash table space. Fixed that. But still quite a lot of memory was unaccounted for.

One thing that bothered me about the hash table implementation I'm using is that it does an allocation for each value stored to hold the key name (a string). On a small table it's no big deal but on a hash table of half a million entries it really hurts, both in allocation / free time overhead, and the extra memory being used to track all those blocks. As a side effect of the extra time spent freeing 500k blocks of memory you risk slowing the program to a halt for minutes on end if that memory has been swapped out to disk. I'm assumed the memory manager wants to "touch" each peice of memory it free's, which means swapping all those 500k bits of memory into physical ram just to free it. Nasty nasty nasty.

So I've given the hash table the option of using a string pool. Which works by doing one big allocation and putting lots of strings end to end inside it. This has 2 very important features, firstly it's very fast to allocate and free, secondly it doesn't require swapping vast amounts of memory into physical ram to free.

The downside of course is that if you delete a key in the hash table it leaves a hole in the string pool's memory, which is wasted space. But for a large static hash table it's perfect.

Now, in doing all this work I decide to keep track of the numbers involved to find the optimal values for hash table size, and the effect it has on the overall memory usage. Check it out:

Table Size	Allocs (MB)	Load Time (s)	VM Size (KB)
Before:
x2	16.08	35	79672
x2.5	18.60	10	82252
x3	21.13	13	84844
x4	26.19	4	90024
After adding string pooling:
x2	16.08	20	38124
x2.5	18.60	5	40720
x3	21.13	7	43304
x4	26.19	1.3	48484

The hash table is preallocated to a multiple of the number of words it has to store, thats the first column. After the string pooling optimization the memory image is drastically smaller due to the overhead of maintain 500k blocks being gone.

I've settled on a multiplier of 2.5 because it seems to have the best memory/speed trade off. For reference Scribe is using about 20000kb without the word lists loaded, in debug mode. So it's getting close to just the data for the strings and hash table's, with little or no overhead.

I think there are better solutions yet, but they take a lot of coding and testing. So I'll leave them for another day. I like the idea of using a tree structure for the word lists to avoid duplicate storage of letters... e.g. if there are 1000 words starting with 'a' then it should be possible to store all of them in a container marked 'a' and store just the rest of the string minus the 'a'. Saving more memory. But thats an idea for later.

(3) Comments | Add Comment

Got Anti-Virus?
Date: 1/4/2005	Man, what a long 3 days it's been. So Scribe is keeping the world safe from viruses, but I need some anti-virus that'll run on my body. On tuesday about 3am I woke up feel oh so not good. A lay there for 15 minutes groggily wondering "huh?" then the chucking started... and didn't stop for 17 hours or so. Aggh. Not fun. In the mean time both adults in the house are sick and there was no one available to look after the children, so we had to do it. I think the defining moment was about 11am I was crawling down the hall as fast as I could go and I was overtaken by my smiling little 10 month old son, also crawling. But more to the point, now that I'm feeling better I'll update you all on the progress of "The Software(tm)". Firstly Scribe is almost ready for another test release, and this time it's got a swath of fixes for obsure crash bugs. Especially to do with displaying text. These fixes are already in the release of LgiRes. Also the Outlook import/export functionality got some charset fixes. It won't be perfect but it will suck a whole lot less. Also while I'm talking updates Lgi is due for a release as well. I've got a whole bunch of cool stuff and bugs fixes incorperated so I really should finalise that and upload it. I'll also be putting the documentation online as well this time for google to index. I realize that API documentation is usually much better with a good search engine indexing it, and what better engine than google? Even if it's only me using it ;)
(2) Comments \| Add Comment

Scribe and LgiRes Bugs
Date: 22/3/2005	Firstly, I've just fixed a rather interesting bug in Lgi that is almost humorous in it's simplicity. Lgi doesn't glyph substitution when displaying text that can't be displayed in the current font by using characters (glyphs) from another font installed on the system. This means maintaining lookup tables of characters and what font you can find them in. Now I implemented that as a table for every unicode char I wanted to map (0 -> 0xffff) of bytes (64kb) that hold an index into a font cache (0 -> 255). And immediately you can see what my problem was... "what happens when there is more than 256 fonts in the system?". Crash bang splat. Now for the moment I've just limited the search through fonts to stop when it runs out of table space. But ultimately I'd like to make it handle the full unicode range as well as more than 256 fonts. But that would cost more memory so I'm trying to think of a better solution. Increasing the unicode character index size from 8 bits up to say 10 bits might fix the font limitation but add increased overhead into the glyph sub code to set/get non byte aligned bits of memory. Supporting the whole unicode range for just 256 fonts (8 bit indexes) would be about 1 MB... which is a lot of memory to sacrifice for this feature. Thoughts and suggestions are welcome. On another note, LgiRes has a new version out to cope with the new Scribe lr8 format. But it crashes on Win9x/ME. If you'd like to help, download this and unzip it into a v1.80 install. Run (and crash) LgiRes and gather any .txt files it creates and send them to me. The Scribe lr8 file has been getting corrupted utf-8 strings, and the new version of LgiRes does some consistancy checking. If you intend to do some translation work I suggest waiting for the next build of Scribe (test8) and LgiRes and using that as a base for your work, because I've fixed all (most?) of the corrupted strings in the lr8 file.
(3) Comments \| Add Comment

Sitemap