[ltp] Thinkpad T43 cached read Speed(hdparm -T) mystery

Wed, 25 Jun 2008 14:05:48 +0200

--=-eybeSTsDfAJ+0BuL2kcE
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Am Mittwoch, den 25.06.2008, 06:52 -0400 schrieb
jqian@physics.harvard.edu:
[...]
> update), with vanilla 2.6.18 kernel, hdparm -T shows about 1800MB/sec. =20
> However, now I'm in hardy, with ubuntu 2.6.24-18 kernel, the hdparm -T=20
> show only less than 900MB/sec. The disk read speed hdparm -t shows the=20
> same result(about 37-38MB/sec).
> 	My questions are:
> 	0. If you have a T43(p) with comparable set up, what's your=20
> number?
I don't have a T43p. On a 2GHz Core2Duo (T7200), FSB533 IIRC, I get
approximately 2200 MB/sec, kernel 2.6.25, x64-kernel with CONFIG_MCORE2.

> 	1. Does this mean my computer's bus speed is suddenly cut in=20
> half, as seen by linux? What's is the performance implication?
> 	2.What exactly does hdparm -T measure(processor bus speed?)? =20
> Is it a good gauge for performance? If it is not a good guage for=20
> performance, what is?
> 	3.What cause this dramatic reduction and performance penalty? =20
> Kernel? Fglrx?
Collective answer to the three questions: hdparm -T measures, how fast
the data is copied from the disk cache (in memory) to application
memory, so it is a "simple" memory-to-memory copy. I wrote simple into
quotes, because one has to be careful to use the first/second level
caches of the processor in an optimal way according to the use case. For
example, in the disk cache copying case, it makes no sense to put the
disk cache data into processor caches, as it will not be used that quick
again, whereas the destination buffer would be quite useful in the
cache, as most applications access the data read by the read() system
call (thats what hdparm measures; I don't know whether hdparm touches
the data).

You are really seeing a drop of memory-to-memory-copying performance to
one half. The probable cause is, that the optimal cache management is
buried into SSE/SSE2/SSE3/SSE4 instructions, and for really
memory-intensive task, each Intel processor prefers a slight different
way to get optimal performance. Your vanilla kernel was probably
optimized for the Pentium M architecture, whereas the ubunty kernel is
optimized for another architecture. Compile your own kernel to get up to
the old performance again.

The performance implications in day-to-day use are quite low (as long as
your day-to-day use is not copying gigagbytes of data or video cutting),
so you won't notice the difference. Memory-to-memory copies are avoided
if possible, for example by doing a read-only mmap of all code. That
means that the processor is set up in a way that applications can not
overwrite their own code (they get a segfault instead), and thus it is
safe to directly make the disk cache visible to the application instead
of making a copy before. The same holds for swap-in/swap-out: The hard
disk controller directly accesses the data in the memory range that is
visible to the application, and thus no memory-to-memory copy is needed.

Memory-to-memory copies within applications (or mem-to-video-mem in the
X-Server) is not using the kernel functions, and thus independent of
kernel compilation options (but an aptly optimized libc might help
there).

> 	Thanks in advance for enlightenment!
I hope the explanations cause enlightenment.

Regards,
  Michael Karcher

--=-eybeSTsDfAJ+0BuL2kcE
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: Dies ist ein digital signierter Nachrichtenteil

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQBIYjSczhek2R7EicoRAjhXAJ494R+biUYDiH2m2i/txmq8QhgxGgCbB/xA
mYdWcUtaTtvQsQbUSqVjC1A=
=X+Ar
-----END PGP SIGNATURE-----

--=-eybeSTsDfAJ+0BuL2kcE--