The iPad – Less than it should have been

It has to be said that, in my search for Gadget Nirvana, I’ve frequently flip-flopped over which vendor or sector was more likely to provide the ultimate, all in one solution to … well … everything!

Circa 5 years ago my money was on open source (hardware, software, flapjacks, whatever!) to combine my massive list of entertainment, communication and connectivity requirements into one, affordable, hackable, workable device.

With the purchase of my first MacBook Pro I shifted over to Apple as the possible saviour of we the fickle, techno devout members of the Nintendo generation.

With the release of Android, Google OS, the Nexus One and their awesome web-based tool kit, my allegiances shifted to Mountain View for a while – they promised the answer to all my needs (well, almost all) in a shiny fun a clearly “Not Evil” package.

And now?

Read more

ColdFusion 8.0.1 – server stability and memory usage issue

The following is a log of a recurring issue with our ColdFusion 8 Standard server.

The JRun server is currently crashing every other day with the active thread error (error 1).

If the ColdFusion service is not restarted the system will then proceed to run out of memory (error 2) despite Windows still having 0.5 – 1GB to spare

A couple of things worth noting:

  • The server normally runs with a thread count of 170 – 190
  • When a crash is in progress this jumps to > 400
  • There doesn’t appear to be a specific script or schedule to the crashes
  • When tweaking the JVM Config attempts to assign more than 1024MB of heap space result in failure to start JRun

Preventative action

  • We’ve switched the server to use Oracle’s JRockit JVM due to the monitoring tools available with this VM
  • We’ve tuned the heap to use 1024MB static heap size with a 100M nursery (see VM Config below)
  • We’re using an aggressively tuned GC (see VM Config below)

Questions

  • Why won’t the JVM address more than 1024MB of RAM? Is this a limitation of 32 bit Windows / Java?
  • Why doesn’t the GC handle the thread count before it gets out of hand?
  • Is this expected from a 32 bit server once load reaches a certain level?
  • Is there any way to identify the CF scripts that are causing the thread count to climb?
  • Would switching to 64bit OS address these issues?

Answers

  • On a postcard please …

Read more

Don’t use Query of Queries for simple datasets!

I’ve just finished a whole batch of optimisation routines for an Excel import/export that had to run on our CF8 based CRM system (thanks to Ben Nadel for his amazing POI CFC).

The export had a tricky requirement whereby each row (of some 56,000 rows) had n items which needed to be set according to a lookup in a link table in the DB. Rather that running each compairson as a query against the live DB (56,000 rows x 20+ items = 1,120,000 queries!) I figured I could pull the whole link table into memory using a simple query and then use Query of Queries to save a bit of time.

Trouble is, benchmarking the initial run, the QoQ was restricting throughput to some 500 rows a minute and, whilst this was an improvement on querying the live DB server, it was still way too slow.

Bear in mind the queries were on 2 integer based primary keys. E.g:

SELECT col1
FROM linkTable
WHERE col1 = #val1#
AND col2 = #val2#

So it occurred to me that I could look at pulling the whole link table into an array as a series of values in the form “val1-val2″ and using the indexOf method to determine if the link was present. The results were amazing! Throughput jumped from 500 a minute to 15,000 a minute and memory usage dropped massively despite in the new array variable holding all values from the link table.

I decided to benchmark Arrays vs. Query of Queries over a series of data set sizes running from 1000 to 10,000 in 1000 item increments and recorded the following results:

Seeding source query done in 7812ms

Val
Count
Array
Seed
Array
Query
Array
Total
QofQ
Total
Ratio
(Array to QoQ)
Total
1000 578ms 63ms 641ms 3703ms 17% 4344ms
2000 390ms 172ms 562ms 7672ms 7% 8234ms
3000 438ms 531ms 969ms 12281ms 8% 13250ms
4000 453ms 1125ms 1578ms 16547ms 10% 18125ms
5000 422ms 2172ms 2594ms 22515ms 12% 25109ms
6000 406ms 3484ms 3890ms 27235ms 14% 31125ms
7000 343ms 5485ms 5828ms 32890ms 18% 38718ms
8000 359ms 8282ms 8641ms 39062ms 22% 47703ms
9000 344ms 12375ms 12719ms 45046ms 28% 57765ms
10000 344ms 19703ms 20047ms 52062ms 39% 72109ms

Run of 10 iterations completed in 324.31 seconds

As you can see, the difference in processing times is rather amazing!

Of course QoQ is still a better choice when dealing with “proper” tables rather than 2 value link tables but this technique is definitely going into my “batch” toolkit for CF8.

Things to bear in mind when considering the above

  • Whilst the differences in processing times are significant this is still a pretty slow process with a 10,000 matches against 10,000 rows taking 20 seconds using the array method.
  • This benchmark was performed on our dev server, a quad core Xeon 1.60GHz with 1GB of RAM
  • This is only a portion of the processing that needed to be done on the original report.
  • I’ve also run this benchmark on my local workstation which runs CF9 and found the results to be similar

For those who like this kind of thing the benchmark script is included below

<cfset scriptStartTime = gettickcount() />
<cfflush interval="1" />

<!--- Turn on some debug and increase timeout --->
<cfsetting showdebugoutput="true" requestTimeOut="6000" />
<cfoutput>
<html>
<head>
<title>Array vs Query of Query Benchmark</title>
</head>
<body>

<!--- Set working Directory --->
<cfset workingDir = GetDirectoryFromPath(getcurrenttemplatepath()) />

<!--- Test for a file to allow us to kill this script mid run --->
<cfif fileExists("#workingDir#kill.die")>
	<p>Die File Found - Aborting</p><cfabort />
</cfif>

<!--- Setup the seed query --->
<cfset demoQuery = queryNew('col1, col2') />
<p>Seeding source query
<!--- Seed our source Query with 100,000 rows --->
<cfset tickBegin = GetTickCount() />
<cfloop from="1" to="100000" index="i">

	<!--- Generate 2 random values --->
	<cfset rand1 = randRange(1,999) />
	<cfset rand2 = randRange(1,99999) />

	<!--- Add new row to our source query --->
	<cfset QueryAddRow(demoQuery) />
	<cfset demoQuery["col1"][demoQuery.RecordCount] = rand1 />
	<cfset demoQuery["col2"][demoQuery.RecordCount] = rand2 />
</cfloop>

<cfset tickEnd = GetTickCount() /><cfset loopTime = tickEnd - tickBegin />

done in #loopTime#ms</p>

<!--- Setup our demo array --->
<cfset demoArray = arrayNew(1) />
<table border="1" cellspacing="0" cellpadding="2">
	<tr>
		<th>Val Count</th>
		<th>Array Seed</th>
		<th>Array Query</th>
		<th>Array Total</th>
		<th>QofQ Total</th>
		<th>Ratio (array to QoQ)</th>
		<th>Total</th>
	</tr>

	<cfset loopCount = 0 />
<!--- Start to loop to 10,000 at 1,000 intervals --->
<cfloop from="1000" to="10000" step="1000" index="i">
	<cfset iterationStartTime = gettickcount() />
	<tr>
	<!--- Test to see if the kill file has been created --->
	<cfif fileExists("#workingDir#kill.die")>
		<td colspan="7">Die File Found - Aborting</td></tr></table>
		<cfabort />
	</cfif>
	<td>#i#</td>

	<cfset arrayTickBegin = GetTickCount() />

	<cfquery name="getDemoDataSet" dbtype="query" maxrows="#i#">
		SELECT *
		FROM demoQuery
	</cfquery>

	<!--- Populate our demo array --->
	<cfloop query="getDemoDataSet">
		<cfset arrayValue = "#getDemoDataSet.col1#-#getDemoDataSet.col2#" />
		<cfset arrayAppend(demoArray,arrayValue) />
	</cfloop>
	<cfset tickEnd = GetTickCount() />
	<cfset popArrayTime = tickEnd - arrayTickBegin />

	<!--- Search the array--->
	<cfset arraySearchTickBegin = gettickcount() />
	<cfloop from ="1" to="#i#" index="s">
		<cfset searchVal1 = randRange(1,999) />
		<cfset searchVal2 = randRange(1,99999) />

		<cfset searchValCat = "#searchVal1#-#searchVal2#" />

		<cfset foundKey = demoArray.indexOf(searchValCat) />

	</cfloop>
	<cfset tickEnd = GetTickCount() />
	<cfset searchArrayTime = tickEnd - arraySearchTickBegin />
	<cfset totalArrayTime = tickend - arrayTickBegin />

	<td>#popArrayTime#ms</td>
	<td>#searchArrayTime#ms</td>
	<td>#totalArrayTime#ms</td>	

	<!---Now to loop through and search our query--->
	<cfset querySearchTickBegin = gettickcount() />
	<cfloop from ="1" to="#i#" index="s">
		<cfset searchVal1 = randRange(1,999) />
		<cfset searchVal2 = randRange(1,99999) />

		<cfquery name="searchQuery" dbtype="query">
			SELECT 	col1
			FROM 	getDemoDataSet
			WHERE 	col1 = #searchVal1#
			AND 	col2 = #searchVal2#
		</cfquery>
	</cfloop>
	<cfset tickEnd = GetTickCount() />
	<cfset searchQueryTime = tickEnd - querySearchTickBegin />
	<cfset iterationTime = tickEnd - iterationStartTime />
	<td>#searchQueryTime#ms</td>

	<!--- Calc the ratio of times --->

	<cfset iterationRatio = round((totalArrayTime/searchQueryTime)*100) />

	<td>#iterationRatio#%</td>

	<td>#iterationTime#ms</td>

	</tr>
	<cfset loopCount++ />
</cfloop>
</table>

<cfset tickEnd = GetTickCount() />
<cfset runTime = (tickEnd - scriptStartTime)/1000 />
<p>Run of #loopCount# iterations completed in #runTime# seconds</p>
</body>
</html>
</cfoutput>

Not one, not two but 6 Servers per Server

For the past 2.5 years I’ve maintained a dedicated server for hosting various sites, both of my own and for clients. I’d been running the same box for a while when, in January, I finally decided I needed to upgrade to a bigger, better machine with an OS I could actually patch (the old server was a Fedora Core 5 machine and Yum was borked beyond repair).

For a previous project I had been looking at various virtualisation options for building a small cloud (cloudlette?) for a managed service and had come across KVM which had taken my interest for several reasons:

  1. It’s free
  2. It’s the chosen virtualisation platform of Ubuntu
  3. It’s pretty easy to setup

So when the option arose to build my new dedicated server I actually took the opportunity to build 6 servers all running Ubuntu JEOS:

  • vm1 is my own “private” space which runs this site (along with others) as well as being my OpenVPN endpoint and a general hack about space. This is also the only ColdFusion server of the lot
  • vm2 has been resold as a VPS to a friend of mine who hosts a few sites.
  • vm3 and vm4 are dedicated to individual clients who need a full vm of hosting power.
  • vm5 is the general shared host for all of the small sites and freebies I host. This is the only server with FTP access.
  • vm6 (aka mx) is my mail relay server which keeps the loads down on the other 5 and also relays mail for various sites / applications.

This setup has a number of advantages – I can keep my high worth clients separate from the little and freebies in terms of resource usage and potential security issues whilst not having to run to the expense of hosting multiple physical servers. I can also manage / patch the servers as a group using Cluster SSH and the rather snazzy virt-viewer app under Ubuntu and I can install specific packages for the big clients without worrying about bloating my own sites.

If you’re interested in running KVM you could do worse than grab yourself a copy of Ubuntu Server and read through the documentation (and the manual) on their website.

Please bear in mind I was able to do this from the initial build – trying to add virtualisation (certainly KVM) to anything other than a clean, updated install is very likely to end in tears!