OSGalaxy

published by noreply@blogger.com (milek) on 2010-02-28 14:23:09
I didn't know that Windows has a similar technology to ZFS L2ARC which is called ReadyBoost. Nice.

I'm building my new home NAS server and I'm currently seriously considering putting OS on an USB pen drive leaving all sata disks for data only. It looks like with modern USB drives OS should actually boot faster than from a sata disks thanks to much better seek times. I'm planning on doing some experiments first.


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-02-25 06:27:16
When you create a ZFS volume its write cache is disabled by default meaning that all writes to the volume will be synchronous. Sometimes it might be handy though to be able to enable a write cache for a particular zvol. I wrote a small C program which allows you to check if WC is enabled or not. It also allows you to enable or disable write cache for a specified zvol.

First lets check if write cache is disabled for a zvol rpool/iscsi/vol1

milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1
Write Cache: disabled

Now lets issue 1000 writes

milek@r600:~/progs# ptime ./sync_file_create_loop /dev/zvol/rdsk/rpool/iscsi/vol1 1000

real 12.013566363
user 0.003144874
sys 0.104826470

So it took 12s and I also confirmed that writes were actually being issued to a disk drive. Lets enable write cache now and repeat 1000 writes

milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1 1
milek@r600:~/progs# ./zvol_wce /dev/zvol/rdsk/rpool/iscsi/vol1
Write Cache: enabled

milek@r600:~/progs# ptime ./sync_file_create_loop /dev/zvol/rdsk/rpool/iscsi/vol1 1000

real 0.239360231
user 0.000949655
sys 0.019019552

Worked fine.

The zvol_wce program is not idiot-proof and it doesn't check if operation succeeded or not. You should be able to compile it by issuing: gcc -o zvol_wce zwol_wce.c

milek@r600:~/progs# cat zvol_wce.c

/* Robert Milkowski
http://milek.blogspot.com
*/

#include <unistd.h>
#include <stropts.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <stropts.h>
#include <sys/dkio.h>


int main(int argc, char **argv)
{
char *path;
int wce = 0;
int rc;
int fd;

path = argv[1];

if ((fd = open(path, O_RDONLY|O_LARGEFILE)) == -1)
exit(2);

if (argc>2) {
wce = atoi(argv[2]) ? 1 : 0;
rc = ioctl(fd, DKIOCSETWCE, &wce);
}
else {
rc = ioctl(fd, DKIOCGETWCE, &wce);
printf("Write Cache: %sn", wce ? "enabled" : "disabled");
}

close(fd);
exit(0);
}


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-02-16 03:03:05

published by noreply@blogger.com (milek) on 2010-02-12 06:25:08

published by noreply@blogger.com (milek) on 2010-02-10 08:07:01
Third-party drives not permitted:
"[...]
Is Dell preventing the use of 3rd-party HDDs now?
[....]
Howard_Shoobe at Dell.com:

Thank you very much for your comments and feedback regarding exclusive use of Dell drives. It is common practice in enterprise storage solutions to limit drive support to only those drives which have been qualified by the vendor. In the case of Dell's PERC RAID controllers, we began informing customers when a non-Dell drive was detected with the introduction of PERC5 RAID controllers in early 2006. With the introduction of the PERC H700/H800 controllers, we began enabling only the use of Dell qualified drives. There are a number of benefits for using Dell qualified drives in particular ensuring a positive experience and protecting our data. While SAS and SATA are industry standards there are differences which occur in implementation. An analogy is that English is spoken in the UK, US and Australia. While the language is generally the same, there are subtle differences in word usage which can lead to confusion. This exists in storage subsystems as well. As these subsystems become more capable, faster and more complex, these differences in implementation can have greater impact. Benefits of Dell's Hard Disk and SSD drives are outlined in a white paper on Dell's web site at http://www.dell.com/downloads/global/products/pvaul/en/dell-hard-drives-pov.pdf"

I understand they won't support 3rd party disk drives but blocking a server (a RAID card) from using such disks is something new - an interesting comment here.

> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-02-09 16:29:39
From my own experience their sales people are very aggressive with an attitude of sale first and let someone else worry later. While I always take any vendor claims with a grain of salt I learnt to double or even triple check any IBM's claims.


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-02-09 03:53:45
Now lets wait for some benchmarks. I only wish Solaris was running on them as well as right now you need to go the legacy AIX route or not so mature Linux route - not an ideal choice.

> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-02-05 04:32:35
We came across an interesting issue with data corruption and I think it might be interesting to some of you. While preparing a new cluster deployment and filling it up with data we suddenly started to see below messages:

XXX cl_runtime: [ID 856360 kern.warning] WARNING: QUORUM_GENERIC: quorum_read_keys error:
Reading the registration keys failed on quorum device /dev/did/rdsk/d7s2 with error 22.

The d7 quorum device was marked as being offline and we could not bring it online again. There isn't much in documentation about the above message except that it is probably a firmware problem on a disk array and we should contact a vendor. But lets investigate first what is really going on.

By looking at the source code I found that the above message is printed from within quorum_device_generic_impl::quorum_read_keys() and it will only happen if quorum_pgre_key_read() returns with return code 22 (actually any other than 0 or EACCESS but from the syslog message we already suspect that the return code is 22).

The quorum_pgre_key_read() calls quorum_scsi_sector_read() and passes its return code as its own. The quorum_scsi_sector_read() will return with an error only if quorum_ioctl_with_retries() returns with an error or if there is a checksum mismatch.

This is the relevant source code:

406 int
407 quorum_scsi_sector_read(
[...]
449 error = quorum_ioctl_with_retries(vnode_ptr, USCSICMD, (intptr_t)&ucmd,
450 &retval);
451 if (error != 0) {
452 CMM_TRACE(("quorum_scsi_sector_read: ioctl USCSICMD "
453 "returned error (%d).n", error));
454 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
455 return (error);
456 }
457
458 //
459 // Calculate and compare the checksum if check_data is true.
460 // Also, validate the pgres_id string at the beg of the sector.
461 //
462 if (check_data) {
463 PGRE_CALCCHKSUM(chksum, sector, iptr);
464
465 // Compare the checksum.
466 if (PGRE_GETCHKSUM(sector) != chksum) {
467 CMM_TRACE(("quorum_scsi_sector_read: "
468 "checksum mismatch.n"));
469 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
470 return (EINVAL);
471 }
472
473 //
474 // Validate the PGRE string at the beg of the sector.
475 // It should contain PGRE_ID_LEAD_STRING[1|2].
476 //
477 if ((os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING1,
478 strlen(PGRE_ID_LEAD_STRING1)) != 0) &&
479 (os::strncmp((char *)sector->pgres_id, PGRE_ID_LEAD_STRING2,
480 strlen(PGRE_ID_LEAD_STRING2)) != 0)) {
481 CMM_TRACE(("quorum_scsi_sector_read: pgre id "
482 "mismatch. The sector id is %s.n",
483 sector->pgres_id));
484 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
485 return (EINVAL);
486 }
487
488 }
489 kmem_free(ucmd.uscsi_rqbuf, (size_t)SENSE_LENGTH);
490
491 return (error);
492 }

With a simple DTrace script I could verify if the quorum_scsi_sector_read() does indeed return with 22 and also I could print what else is going on within the function:

56 -> __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555744942019 enter
56 -> __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555744957176 enter
56 <- __1cZquorum_ioctl_with_retries6FpnFvnode_ilpi_i_ 6308555745089857 rc: 0
56 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745108310 enter
56 -> __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745120941 enter
56 -> __1cCosHsprintf6FpcpkcE_v_ 6308555745134231 enter
56 <- __1cCosHsprintf6FpcpkcE_v_ 6308555745148729 rc: 2890607504684
56 <- __1cNdbg_print_bufLdbprintf_va6Mbpcrpv_v_ 6308555745162898 rc: 1886718112
56 <- __1cNdbg_print_bufIdbprintf6MpcE_v_ 6308555745175529 rc: 1886718112
56 <- __1cXquorum_scsi_sector_read6FpnFvnode_LpnLpgre_sector_b_i_ 6308555745188599 rc: 22

From the above output we know that the quorum_ioctl_with_retries() returns with 0 so it must be a checksum mismatch! As CMM_TRACE() is being called above and there are only three of them in the code lets check with DTrace which one it is:

21 -> __1cNdbg_print_bufIdbprintf6MpcE_v_ 6309628794339298 quorum_scsi_sector_read: checksum mismatch.

So now I knew exactly what part of the code is casing the quorum device to be marked offline. The issue might have been caused by many things like: a bug in a disk array firmware, a problem on an SAN, a bug in a HBA's firmware, a bug in a qlc driver or a bug in SC software, or... However because the issue suggests a data corruption and we are loading the cluster with a copy of a database we might have a bigger issue that just an offline quorum device. The configuration is a such that we are using ZFS to mirror between two disks arrays. We have been restoring a couple of TBs of data into and we haven't read almost anything back. Thankfully it is ZFS so we might force a re-check off all data in the pool and I did. ZFS found 14 corrupted blocks and even identified which file is affected. The interesting thing here is that for all blocks both copies on both sides of the mirror were affected. This almost eliminates a possibility of a firmware problem on disk arrays and suggest that the issue was caused by something misbehaving on the host itself. There is still a possibility of an issue on SAN as well. It is very unlikely to be a bug in ZFS as the corruption affected reservation keys as well which has basically nothing to do with ZFS at all. Then we are still writing more and more data into the pool and I'm repeating scrubs and I'm not getting any new corrupted blocks nor quorum is misbehaving (I fixed it by temporarily adding another one, removing the original and re-adding it again while removing the temporary one).

While I still have to find what caused the data corruption the most important thing here is ZFS. Just think about it - what would happen if we were running on any other file system like: UFS, VxFS, ext3, ext4, JFS, XFS, ... Well, almost anything could have happened with them like some data of could be corrupted, some files lost, system could crash, fsck could be forced to run for many hours and still not being able to fix the filesystem and it definitely wouldn't be able to detect any data corruption withing files or everything would be running fine for days, months and then suddenly the system would panic, etc. when application would try to access the corrupted blocks for the first time. Thanks to ZFS what have actually happened? All corrupted blocks were identified, unfortunately both mirrored copies were affected so ZFS can't fix them but it did identified a single file which was affected by all these blocks. We can just remove the file which is only 2GB and restore it again. And all of these while the system was running and we haven't even stopped the restore or didn't have to start from the beginning. Most importantly there is no uncertainty about the state of the filesystem or data within it.

The other important conclusion is that DTrace is a sysadmin's best friend :)




> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-01-21 03:36:33
The European Commission clears Oracle's proposed acquisition of Sun Microsystems:
"The European Commission has approved under the EU Merger Regulation the proposed acquisition of US hardware and software vendor Sun Microsystems Inc. by Oracle Corporation, a US enterprise software company. After an in-depth examination, launched in September 2009 (see IP/09/1271 ), the Commission concluded that the transaction would not significantly impede effective competition in the European Economic Area (EEA) or any substantial part of it."


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-01-15 05:00:56
I need to observe MySQL load from time to time and DTrace is one of the tools to use. Usually I'm using one-liners or I come up with a short script. This time I thought it would be nice to write a script which other people like DBAs could use without understanding how it actually works. The script prints basic statistics for each client connecting to a database. It gives a nice overview of what clients and how are using a database.

CLIENT IP CONN CONN/s QRS QRS/s TIME VTIME
10.10.10.25 7 0 48 0 0 0
10.10.10.100 8 0 14789 146 0 0
xx-cms-1.portal 17 0 88 0 1 0
xx-www-11.portal 18 0 53 0 0 0
10.10.10.23 36 0 183 1 1 0
xx-www-12.portal 56 0 133 1 0 0
xx-www-8.portal 75 0 216 2 0 0
xx-www-6.portal 97 0 312 3 2 1
xx-www-5.portal 113 1 550 5 0 0
xx-www-2.portal 122 1 1095 10 0 0
xx-www-1.portal 129 1 529 5 0 0
xx-www-10.portal 136 1 414 4 0 0
xx-www-9.portal 169 1 381 3 0 0
xx-www-4.portal 180 1 510 5 2 2
xx-www-7.portal 237 2 574 5 41 40
xx-www-3.portal 363 3 1027 10 3 2
===== ===== ===== ===== ===== =====
1748 17 20865 206 57 50
Running for 101 seconds.

CONN total number of connections
CONN/s average number of connections per second
QRS total number of queries
QRS/s average number of queries per second
TIME total clock time in seconds for all queries
VTIME total CPU time in seconds for all queries

If values of VTIME are very close to values of TIME it means that queries are mostly CPU bound. On the other hand the bigger the difference between them the more time is spent on I/O. Another interesting thing to watch is how evenly load is comming from different clients especially in environments where clients are identical www servers behind load balancer and should be generating about the same traffic to a database.

All values are measured since the script was started. There might be some discrepancies with totals in the summary line - this is due to rounding errors. The script should work for MySQL versions 5.0.x, 5.1.x and perhaps for other versions as well. The script doesn't take into account connections made over a socket file - only tcp/ip connections.

The script requires PID of a mysql database as its first argument and a frequency at which output should be refreshed as a second argument, for example to monitor mysql instance with PID 12345 and refresh output every 10s:

./mysql_top.d 12345 10s


# cat mysql_top.d
#!/usr/sbin/dtrace -qCs

/*
Robert Milkowski
*/

#pragma D option dynvarsize=100000

#define CLIENTS self->client_ip == "10.10.10.11" ? "xx-www-1.portal" :
self->client_ip == "10.10.10.12" ? "xx-www-2.portal" :
self->client_ip == "10.10.10.13" ? "xx-www-3.portal" :
self->client_ip == "10.10.10.14" ? "xx-www-4.portal" :
self->client_ip == "10.10.10.15" ? "xx-www-5.portal" :
self->client_ip == "10.10.10.16" ? "xx-www-6.portal" :
self->client_ip == "10.10.10.17" ? "xx-www-7.portal" :
self->client_ip == "10.10.10.18" ? "xx-www-8.portal" :
self->client_ip == "10.10.10.19" ? "xx-www-9.portal" :
self->client_ip == "10.10.10.20" ? "xx-www-10.portal" :
self->client_ip == "10.10.10.21" ? "xx-www-11.portal" :
self->client_ip == "10.10.10.22" ? "xx-www-12.portal" :
self->client_ip == "10.10.10.29" ? "xx-cms-1.portal" :
self->client_ip


BEGIN
{
start = timestamp;
total_queries = 0;
total_conn = 0;
total_time = 0;
total_vtime = 0;

}

syscall::getpeername:entry
/ pid == $1 /
{
self->in = 1;

self->arg0 = arg0; /* int s */
self->arg1 = arg1; /* struct sockaddr * */
self->arg2 = arg2; /* size_t len */
}

syscall::getpeername:return
/ self->in /
{
this->len = *(socklen_t *) copyin((uintptr_t)self->arg2, sizeof(socklen_t));
this->socks = (struct sockaddr *) copyin((uintptr_t)self->arg1, this->len);
this->hport = (uint_t)(this->socks->sa_data[0]);
this->lport = (uint_t)(this->socks->sa_data[1]);
this->hport <<= 8; this->port = this->hport + this->lport;

this->a1 = lltostr((uint_t)this->socks->sa_data[2]);
this->a2 = lltostr((uint_t)this->socks->sa_data[3]);
this->a3 = lltostr((uint_t)this->socks->sa_data[4]);
this->a4 = lltostr((uint_t)this->socks->sa_data[5]);
this->s1 = strjoin(this->a1, ".");
this->s2 = strjoin(this->s1, this->a2);
this->s1 = strjoin(this->s2, ".");
this->s2 = strjoin(this->s1, this->a3);
this->s1 = strjoin(this->s2, ".");
self->client_ip = strjoin(this->s1, this->a4);

@conn[CLIENTS] = count();
@conn_ps[CLIENTS] = count();

total_conn++;

self->arg0 = 0;
self->arg1 = 0;
self->arg2 = 0;
}

pid$1::*mysql_parse*:entry
/ self->in /
{
self->t = timestamp;
self->vt = vtimestamp;

@query[CLIENTS] = count();
@query_ps[CLIENTS] = count();

total_queries++;
}

pid$1::*mysql_parse*:return
/ self->in /
{
@time[CLIENTS] = sum(timestamp-self->t);
@vtime[CLIENTS] = sum(vtimestamp-self->vt);

total_time += (timestamp - self->t);
total_vtime += (vtimestamp - self->vt);

self->t = 0;
self->vt = 0;
}

tick-$2
{
/* clear the screen and move cursor to top left corner */
printf("33[H33[J");

this->seconds = (timestamp - start) / 1000000000;

normalize(@conn_ps, this->seconds);
normalize(@query_ps, this->seconds);
normalize(@time, 1000000000);
normalize(@vtime, 1000000000);

printf("%-16s %s %s %s %s %s %sn", "CLIENT IP", "CONN", "CONN/s", "QRS", "QRS/s", "TIME", "VTIME");
printa("%-16s %@5d %@5d %@5d %@5d %@5d %@5dn", @conn, @conn_ps, @query, @query_ps, @time, @vtime);
printf("%-16s %s %s %s %s %s %sn", "", "=====", "=====", "=====", "=====", "=====", "=====");
printf("%-16s %5d %5d %5d %5d %5d %5dn", "",
total_conn, total_conn/this->seconds, total_queries, total_queries/this->seconds, total_time/1000000000, total_vtime/1000000000);

/*
denormalize(@conn_ps);
denormalize(@query_ps);
denormalize(@total_time);
denormalize(@total_vtime);
*/

printf("Running for %d seconds.n", this->seconds);
}


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-01-14 08:13:03
When doing MySQL performance tuning on a live server it is often hard to tell what impact there will be on all queries as sometimes by increasing one of the MySQL caches you can make some queries to execute faster but others might get actually slower. However, depending on your environment, it might not necessarily be a bad thing. For example in web serving if most queries would execute within 0.1s but some odd queries need 5s to complete it is generally very bad as user would need to wait at least 5s to get a web page. Now if by some tuning you manage to get these long queries down to below 1s with the cost of getting some sub 0.1s queries taking more time but still less than 1s it would generally be a very good thing to do. Of course in other environments the time requirements might be different but the principle is the same.

Now it is actually very easy to get such a distribution of number of queries being executed by a given MySQL instance within a given time slot if you use DTrace.

1s resolution
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4700573
1 | 6366
2 | 35
3 | 23
4 | 39
5 | 8
6 | 6
7 | 5
8 | 7
9 | 4
>= 10 | 9

Running for 73344 seconds.

The above histogram shows that 4,7mln queries were executed below 1s each, then for another 6366 queries it took between 1-2s for each query to execute, and so on. Now lets do some tuning and see the results again (of course you want to measure for a similar amount of time during similar period of activity - these are just examples):

1s resolution
value ------------- Distribution ------------- count
< 0 | 0
0 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 4686051
1 | 2972
2 | 0

Running for 73024 seconds.

That is much better. It is of course very easy to change the resolution of the histogram - but I will leave it for you.

The script requires 2 arguments - PID of a database and how often it should refresh its output, for example in order to get an output every 10s for a database running with PID 12345 run the script as:

./mysql_query_time_distribution.d 12345 10s

The script doesn't distinguish between cached and non-cached queries, it doesn't detect bad (wrong syntax) queries either - however it is relatively easy to extend it to do so (maybe another blog entry one day). It should work fine with all MySQL versions 5.0.x and 5.1.x, possibly with other versions as well.


# cat mysql_query_time_distribution.d
#!/usr/sbin/dtrace -qs


BEGIN
{
start=timestamp;
}

pid$1::*mysql_parse*:entry
{
self->t=timestamp;
}

pid$1::*mysql_parse*:return
/ self->t /
{
@["1s resolution"]=lquantize((timestamp-self->t)/1000000000,0,10);

self->t=0;
}

tick-$2
{
printa(@);
printf("Running for %d seconds.n", (timestamp-start)/1000000000);
}


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-01-06 20:25:26
Yesterday I was looking at some performance issues with a mysql database. The database is version 5.1.x so no built-in DTrace SDT probes but still much can be done even without them. What I quickly noticed is that mysql was issuing several hundred thousands syscalls per second and most of them were pread()s and read()s. The databases are using MyISAM engine so mysql does not have a data buffer cache and leaves all the caching to a filesystem. I was interested in how many reads were performed per given query so I wrote a small dtrace script. The script takes as arguments a time after which it will exit and a threshold which represents minimum number of [p]read()s per query to query be printed.

So lets see an example output where we are interested only in queries which causes at least 10000 reads to be issued:
# ./m2.d 60s 10000
### read() count: 64076 ###
SELECT * FROM clip WHERE parent_id=20967 AND type=4 ORDER BY quality ASC

### read() count: 64076 ###
SELECT * FROM clip WHERE parent_id=14319 AND type=4 ORDER BY quality ASC

### read() count: 64076 ###
SELECT * FROM clip WHERE parent_id=20968 AND type=4 ORDER BY quality ASC

There are about 60k entries form parent_id column which suggests that mysql is doing a full table scan when executing above queries. A quick check within mysql revealed that there was no index for parent_id column so mysql was doing full table scans. After the index was created:

# ./m2.d 60s 1
[filtered out all unrelated queries]
### read() count: 6 ###
SELECT * FROM clip WHERE parent_id=22220 AND type=4 ORDER BY quality ASC

### read() count: 8 ###
SELECT * FROM clip WHERE parent_id=8264 AND type=4 ORDER BY quality ASC

### read() count: 4 ###
SELECT * FROM clip WHERE parent_id=21686 AND type=4 ORDER BY quality ASC

### read() count: 4 ###
SELECT * FROM clip WHERE parent_id=21687 AND type=4 ORDER BY quality ASC

So now each query is issuing 5 orders of magnitude less reads()!

Granted, all these reads were satisfied from ZFS ARC cache but still it saves hundreds of thousands unnecessary context switches and memory copying s making the queries *much* more quicker to execute and saving valuable CPU cycles. The real issue I was working on was a little bit more complicated but you get the idea.

The point I'm trying to make here is that although MySQL lacks good tools to analyze its workload you have a very powerful tool called dtrace which allows you to relatively quickly identify what queries are causing an issue and why. And all of that on a running live service without having to reconfigure or restart mysql. I know there is the MySQL Query Analyzer (or whatever it is called) but it requires a mysql proxy to be deployed... In this case it was much quicker and easier to use dtrace.

Below you find the script. Please notice that I had hard-coded the PID of the database and the script could be clean up, etc. - it is the working copy I used. The script can be easily modified to provide lots of additional useful information or it can be limited to only a specific myisam file, etc.

# cat m2.d
#!/usr/sbin/dtrace -qs

#pragma D option strsize=8192


pid13550::*mysql_parse*:entry
{
self->a=1;
self->query=copyinstr(arg1);
self->count=0;

}

pid13550::*mysql_parse*:return
/ self->a && self->count > $2 /
{
printf("### read() count: %d ###n%snn", self->count, self->query);

self->a=0;
self->query=0;

}

pid13550::*mysql_parse*:return
/ self->a /
{
self->a=0;
self->query=0;
}

syscall::*read*:entry
/ self->a /
{
self->count++;
}

tick-$1
{
exit(0);
}


> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2010-01-05 02:40:35
PSARC/2009/511 zpool split:
OVERVIEW:

Some practices in data centers are built around the use of a volume
manager's ability to clone data. An administrator will attach a set of
disks to mirror an existing configuration, wait for the resilver to
complete, and then physically detach and remove those disks to a new
location.

Currently in zfs, the only way to achieve this is by using zpool offline
to disable a set of disks, zpool detach to permanently remove them after
they've been offlined, move the disks over to a new host, zpool
force-import of the moved disks, and then zpool detach the disks that were
left behind.

This is cumbersome and prone to error, and even then the new pool
cannot be imported on the same host as the original.

PROPOSED SOLUTION:

Introduce a "zpool split" command. This will allow an administrator to
extract one disk from each mirrored top-level vdev and use them to create
a new pool with an exact copy of the data. The new pool can then be
imported on any machine that supports that pool's version.
The new feature should be available in build 131.



> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2009-12-28 06:28:06
Adam Leventhal wrote an interesting article about the future of RAID.

> Read More... | Digg This!

published by noreply@blogger.com (milek) on 2009-12-17 15:30:54
Yesterday I did a presentation at London Open Solaris User Group on the backup platform I implemented. It utilizes open source technologies like Open Solaris, ZFS, RSYNC and a commodity hardware to effectively offer us a better backup solution than NetBackup and for a fraction of a cost. You can download the slides here. Before you do so it might be worth reading my two previous blog entries: 1 2 which should provide some additional background.


> Read More... | Digg This!