[ltp] Slightly OT: Filesystem performance on SSD vs HDD: test data request

Mon, 17 Aug 2009 22:05:32 +0200

Richard,

I don't have any SSD drive, so I can't help you with measurements.

However, I'm sceptical whether your test is representative for
PostgreSQL (or any other DBMS):

Richard Neill wrote:
> Dear All,
>=20
> I've been looking into filesystem performance for postgresql on various=

> configurations. [[...]]
>=20
> * Postgresql's main issue is when it is doing writes, and is frequently=

> very very much i/o-bound. The important measurement is fdatasync()
> speed, not write throughput.

Ok, that makes sense, because any DBMS running a transaction needs to
have at least the log entries written to stable storage before it
replies "ok" to a commit.

>=20
> [[...]]
>=20
>     Test                         Time on X60    Time on T60p
>=20
>   hdparm -t            94 MB/s        48 MB/s
>=20
>   syncspeed (ext2)              3.71 s           0.78 s
>   syncspeed (ext3)              11.4 s           2.1 s
>   syncspeed (ext4)              5.9 s            n/a
>=20

I don't know the internals of PostgreSQL - does it use the file system
to store the data? I'll comment on the times below.

>=20
> The T60p is running a kernel which doesn't support ext4
> syncspeed is a simple c program - see below

Taking the essential part of your program:

> #define NUM_ITER 1024
> [[...]]
> 	const char data[] =3D "Liberate";
> 	size_t data_len =3D strlen ( data );
> 	[[...]]
>=20
> 	for ( i =3D 0 ; i < NUM_ITER ; i++ ) {
> 		if ( write ( fd, data, data_len ) !=3D data_len ) {
> 			fprintf ( stderr, "Could not write: %s\n",
> 				  strerror ( errno ) );
> 			exit ( 1 );
> 		}
> 		if ( fdatasync ( fd ) !=3D 0 ) {
> 			fprintf ( stderr, "Could not fdatasync: %s\n",
> 				  strerror ( errno ) );
> 			exit ( 1 );
> 		}
> 	}
> 	return 0;

So your program performs 1024 executions of a loop in which it writes
eight (8) bytes of data to a file, then syncs it.

I would be very much surprised if your PostgreSQL (or any other DBMS)
really has so short write operations as its typical size.
IMO, you should somehow determine your typical log entry size and use
that (assuming that data are cached, and data write latency doesn't
really affect the transaction turnaround time).

Also, some DBMS are prepared to write to the raw disk, and for this they
use blocks which are a multiple of the disk block size (AFAIK, typically
4 kB, 8 kB, or 16 kB) and aligned (in memory) at an address which is
also such a multiple, this often gives an enormous speedup of transfer
(avoids buffering in the kernel, allows for direct copy).

I guess changing your test program accordingly would make it represent
the PostgreSQL workload better.

Regarding your times: You report that using a disk with 7200 rpm your
program took 0.78 s (2.1 s) when run on an ext2 (ext3) file system.
7200 rpm is 120 rps, about 8 ms per revolution.
0.78 s (2.1 s) for 1024 loops means about 0.78 ms (2.1 ms) per turn.

I fail to see how any file system can have short writes to the same
block in less time than one disk revolution - my only explanation is a
cache somewhere, let's hope in the drive, and let's hope its dirty pages
are written to disk even in the case of a crash or a power loss.

Regards,
J=F6rg

--=20
Joerg Bruehe  - persoenliche Aeusserung / speaking only for himself
mailto:joerg.bruehe@web.de