QuartzGL ?!
这 Tiger 里据称就是原来的 Quartz 2D Exteme,到 Leopard 里换了个名字叫 QuartzGL 了。刚刚在 Quartz Debug 的 Tools 菜单里看见有开启它的选项,所以就开了一下,用 xbench 跑了跑测试,结果居然比不开性能还差,而且还导致一些应用程序的不兼容现象出现(QuickSliver 启动画面成一半透明白框,据说Dashboard也会有问题),所以我还是关了。
开启方法是:
$ sudo defaults write /Library/Preferences/com.apple.windowserver QuartzGLEnabled -boolean YES
恢复方法是:
$ sudo defaults write /Library/Preferences/com.apple.windowserver QuartzGLEnabled -boolean NO
注销即可,不过我为了测试结果的准确性重启了电脑。
开了后的结果:
Results 179.93 System Info Xbench Version 1.3 System Version 10.5.5 (9F33) Physical RAM 2048 MB Model MacBookPro4,1 Drive Type FUJITSU MHY2200BH Quartz Graphics Test 144.47 Line 181.54 12.09 Klines/sec [50% alpha] Rectangle 145.47 43.43 Krects/sec [50% alpha] Circle 235.46 19.19 Kcircles/sec [50% alpha] Bezier 78.55 1.98 Kbeziers/sec [50% alpha] Text 190.48 11.92 Kchars/sec OpenGL Graphics Test 164.98 Spinning Squares 164.98 209.29 frames/sec User Interface Test 271.05 Elements 271.05 1.24 Krefresh/sec
没开的结果:
Results 206.56 System Info Xbench Version 1.3 System Version 10.5.5 (9F33) Physical RAM 2048 MB Model MacBookPro4,1 Drive Type FUJITSU MHY2200BH Quartz Graphics Test 192.86 Line 176.63 11.76 Klines/sec [50% alpha] Rectangle 232.50 69.41 Krects/sec [50% alpha] Circle 189.12 15.42 Kcircles/sec [50% alpha] Bezier 189.01 4.77 Kbeziers/sec [50% alpha] Text 185.73 11.62 Kchars/sec OpenGL Graphics Test 167.72 Spinning Squares 167.72 212.76 frames/sec User Interface Test 296.17 Elements 296.17 1.36 Krefresh/sec
完整的测试结果在:http://db.xbench.com/merge.xhtml?doc1=327066&doc2=327063
注意:因为单独的图形结果和完整测试不是同一个时间完成,所以数据有所偏差
Optimization DRI for double performance
X have some options for better work, most time X just use the universality options for run on most computer.
But you can tweak them. With the best result, you will get the double performance.
My system is Gentoo 2006.1 with X.org Server 1.1.1-r3 and xf86-video-ati 6.6.3 with DRI(Direct Render Interface), also I have the beryl 0.1.4 and AIXGL for the 3D desktop test.
Hardware is Powerbook pre-High definition, G4 1.67 and ATI 9700 with 128M.
Let's see the default Display adapter configuration.
Section "Device"
Identifier "Card0"
Driver "ati"
VendorName "ATI Technologies Inc"
BoardName "RV350 [Mobility Radeon 9600 M10]"
EndSection
The frames per seconds benchmark with glxgear looks like below.
Add some options in this device section for tweak.
Option "AGPMode" "4"
Option "UseFBDev" "false"
Option "AGPFastWrite" "true"
Option "EnablePageFlip" "true"
Option "DynamicClocks" "true"
Option "RenderAccel" "true"
Let's bechmark and see it again.
Wow. double performance !
How about these options works in the full hardware acceleration berly ?
Oh, it back to half and unstable. But I think it's the free ati driver's problem.
I feel Intel's GMA series open source driver is much better and stable than it, I will also test it if I have time.
OK, What's these options and value means ?
Option "AGPMode" "4"
AGPMode define the bandwidth of the AGP bus, the 4 means 4 times than the default AGP bus bandwidth, the defualt AGP running on 66Mhz with 266MB/s bandwidth, and Xorg will auto decide in 1X in default, you can tweak this option for more bandwith, the highest value of this option is 8 but it's need you conform your hardware support it.
Option "UseFBDev" "false"
It's a IMPORTANT option.
Most display adapter support and universal driver mode called framebuffer. It create a spool in memery and map it to the screen pixel, it's a FB device. Xorg write data into this spool, and the the driver read from this spool and write the data to the screen. it's not fully optimized.
ATI driver support a DMA(Direct Memery Access) mode, Xorg can write the data into the memory of the display adapter, it means Xorg can write to screen skip the memory spool of the FB device.
Option "AGPFastWrite" "true"
Fast write option looks like FBDev, I't also a memery access mode of display adapter.
It's skip memory access and write data to display adapter directly.
Option "EnablePageFlip" "true"
Enable Page Flip will increases performance, it tweak the memory access mode.
This option will be disabled if EAX architecture is in used.
Option "DynamicClocks" "true"
DynamicClock is "Intel SpeedStep" technology for display chips. it adjust the performance of display adapter only for save energy.
Option "RenderAccel" "true"
RenderAccel looks like it name, enable it for Render Acceleration Mode.
This option is default enabled.
Other options will increases performance more, I will do more around it and blog it more.
But you can tweak them. With the best result, you will get the double performance.
My system is Gentoo 2006.1 with X.org Server 1.1.1-r3 and xf86-video-ati 6.6.3 with DRI(Direct Render Interface), also I have the beryl 0.1.4 and AIXGL for the 3D desktop test.
Hardware is Powerbook pre-High definition, G4 1.67 and ATI 9700 with 128M.
Let's see the default Display adapter configuration.
Section "Device"
Identifier "Card0"
Driver "ati"
VendorName "ATI Technologies Inc"
BoardName "RV350 [Mobility Radeon 9600 M10]"
EndSection
The frames per seconds benchmark with glxgear looks like below.
Add some options in this device section for tweak.
Option "AGPMode" "4"
Option "UseFBDev" "false"
Option "AGPFastWrite" "true"
Option "EnablePageFlip" "true"
Option "DynamicClocks" "true"
Option "RenderAccel" "true"
Let's bechmark and see it again.
Wow. double performance !
How about these options works in the full hardware acceleration berly ?
Oh, it back to half and unstable. But I think it's the free ati driver's problem.
I feel Intel's GMA series open source driver is much better and stable than it, I will also test it if I have time.
OK, What's these options and value means ?
Option "AGPMode" "4"
AGPMode define the bandwidth of the AGP bus, the 4 means 4 times than the default AGP bus bandwidth, the defualt AGP running on 66Mhz with 266MB/s bandwidth, and Xorg will auto decide in 1X in default, you can tweak this option for more bandwith, the highest value of this option is 8 but it's need you conform your hardware support it.
Option "UseFBDev" "false"
It's a IMPORTANT option.
Most display adapter support and universal driver mode called framebuffer. It create a spool in memery and map it to the screen pixel, it's a FB device. Xorg write data into this spool, and the the driver read from this spool and write the data to the screen. it's not fully optimized.
ATI driver support a DMA(Direct Memery Access) mode, Xorg can write the data into the memory of the display adapter, it means Xorg can write to screen skip the memory spool of the FB device.
Option "AGPFastWrite" "true"
Fast write option looks like FBDev, I't also a memery access mode of display adapter.
It's skip memory access and write data to display adapter directly.
Option "EnablePageFlip" "true"
Enable Page Flip will increases performance, it tweak the memory access mode.
This option will be disabled if EAX architecture is in used.
Option "DynamicClocks" "true"
DynamicClock is "Intel SpeedStep" technology for display chips. it adjust the performance of display adapter only for save energy.
Option "RenderAccel" "true"
RenderAccel looks like it name, enable it for Render Acceleration Mode.
This option is default enabled.
Other options will increases performance more, I will do more around it and blog it more.
Kernel 2.6.17 和 2.6.18 在 SATA 硬盘上的性能对比
在最新的 Linux kernel 2.6.18 changlog 中,我们可以看到大量关于 SATA 的修正和增强,那么新内核在 SATA 硬盘的传输率方面是否真的有所提高呢?这个只有测试一下才能知道。
测试平台是 DELL M1210 的小笔记本,内存 1G,硬盘是 Toshiba 2.5", 40GB, SATA, 16MB, 5400转, 9.5mm,型号为 MK4032GS 的笔记本硬盘。 -_-#
结果是:
Everest 0.2 update 1 的 2.6.17.13-36smp 的内核:
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 4712 MB in 2.00 seconds = 2359.41 MB/sec
Timing buffered disk reads: 96 MB in 3.05 seconds = 31.43 MB/sec
Everest 0.2 自己编译的 2.6.18smp 内核
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 4788 MB in 2.00 seconds = 2395.98 MB/sec
Timing buffered disk reads: 96 MB in 3.03 seconds = 31.68 MB/sec
这是在接上 AC 电源,重复运行三次 hdparm,让硬盘能够全速转起来得到的结果,可以看到 2.6.18 内核确实对 SATA 性能有一定提高,但是效果不明显,硬盘物理速度依然是瓶颈。
找空拿个台式机来做测试,可能拿 iozone 跑一把。
另一件有趣的事情,双核的机器在运行多线程任务,满负载的时候,两个核的占用率相加接近 100%,如果用 yes 大法跑单线程的满负载,可以看见 CPU1 为 100%,同时运行两个 yes 尚未测试。
找空做进一步测试。
Cpu0 : 76.1% us, 2.7% sy, 0.0% ni, 16.6% id, 0.0% wa, 4.7% hi, 0.0% si
Cpu1 : 13.0% us, 1.7% sy, 0.0% ni, 79.7% id, 5.0% wa, 0.0% hi, 0.7% si
Cpu0 : 47.8% us, 2.0% sy, 0.0% ni, 49.2% id, 1.0% wa, 0.0% hi, 0.0% si
Cpu1 : 48.0% us, 2.0% sy, 0.0% ni, 46.3% id, 3.7% wa, 0.0% hi, 0.0% si
补充测试,同时运行两个 yes,基本到了 200% 了 :-)
Cpu0 : 82.0% us, 18.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 88.2% us, 9.8% sy, 0.0% ni, 2.0% id, 0.0% wa, 0.0% hi, 0.0% si
测试平台是 DELL M1210 的小笔记本,内存 1G,硬盘是 Toshiba 2.5", 40GB, SATA, 16MB, 5400转, 9.5mm,型号为 MK4032GS 的笔记本硬盘。 -_-#
结果是:
Everest 0.2 update 1 的 2.6.17.13-36smp 的内核:
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 4712 MB in 2.00 seconds = 2359.41 MB/sec
Timing buffered disk reads: 96 MB in 3.05 seconds = 31.43 MB/sec
Everest 0.2 自己编译的 2.6.18smp 内核
# hdparm -Tt /dev/sda
/dev/sda:
Timing cached reads: 4788 MB in 2.00 seconds = 2395.98 MB/sec
Timing buffered disk reads: 96 MB in 3.03 seconds = 31.68 MB/sec
这是在接上 AC 电源,重复运行三次 hdparm,让硬盘能够全速转起来得到的结果,可以看到 2.6.18 内核确实对 SATA 性能有一定提高,但是效果不明显,硬盘物理速度依然是瓶颈。
找空拿个台式机来做测试,可能拿 iozone 跑一把。
另一件有趣的事情,双核的机器在运行多线程任务,满负载的时候,两个核的占用率相加接近 100%,如果用 yes 大法跑单线程的满负载,可以看见 CPU1 为 100%,同时运行两个 yes 尚未测试。
找空做进一步测试。
Cpu0 : 76.1% us, 2.7% sy, 0.0% ni, 16.6% id, 0.0% wa, 4.7% hi, 0.0% si
Cpu1 : 13.0% us, 1.7% sy, 0.0% ni, 79.7% id, 5.0% wa, 0.0% hi, 0.7% si
Cpu0 : 47.8% us, 2.0% sy, 0.0% ni, 49.2% id, 1.0% wa, 0.0% hi, 0.0% si
Cpu1 : 48.0% us, 2.0% sy, 0.0% ni, 46.3% id, 3.7% wa, 0.0% hi, 0.0% si
补充测试,同时运行两个 yes,基本到了 200% 了 :-)
Cpu0 : 82.0% us, 18.0% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 88.2% us, 9.8% sy, 0.0% ni, 2.0% id, 0.0% wa, 0.0% hi, 0.0% si
不同平台的 nbench 性能测试
nbench 是个很有意思的基准性能测试工具。
能够对单颗 CPU Core 的性能进行内存、整型运算和浮点运算性能。
基准分是 AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
以后我如果有新机器会随意测试一下性能,并记录在这里。
当然,nbench 仅仅支持单核也是它的缺点,但该程序更多的是测试 Linux 系统本身在进行一些计算时的效率。
其主页和下载在:http://www.tux.org/~mayer/linux/bmark.html
邹鹏程也有一份相同的测试报告在:http://pczou.blogchina.com/1529012.html
能够对单颗 CPU Core 的性能进行内存、整型运算和浮点运算性能。
基准分是 AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
以后我如果有新机器会随意测试一下性能,并记录在这里。
当然,nbench 仅仅支持单核也是它的缺点,但该程序更多的是测试 Linux 系统本身在进行一些计算时的效率。
其主页和下载在:http://www.tux.org/~mayer/linux/bmark.html
邹鹏程也有一份相同的测试报告在:http://pczou.blogchina.com/1529012.html
测试平台 | 发行版 | 整数 | 浮点 | 内存 |
Macbook 063(注5) | Mac OS X 10.5.1 (leopard) | 13.635 | 24.792 | 21.704 |
PowerBook 17"(注1) | Gentoo 2006.0(gcc4.1.1) | 14.516 | 10.259 | 9.502 |
PowerBook 17"(注1) | Mac OS X 10.4.9 | 11.931 | 12.749 | 13.649 |
Intel 2.4G 开发平台(注释2) | Everest 0.1 | 16.060 | 30.186 | 18.684 |
Intel 2.4G 开发平台(注释2) | Gentoo | 18.387 | 16.692 | 21.326 |
Intel Core Duo*2(4Cores total) 3.2G | RFDC 5.0 | 13.480 | 13.488 | 15.706 |
DELL Latitiude D820@1GHz(注释3) | RFDT 5.0(2.6.17.1-7smp) | 8.507 | 16.480 | 10.019 |
DELL Latitiude D820@1.6GHz(注释3) | RFDT 5.0(2.6.17.1-7smp) | 8.469 | 16.362 | 9.986 |
DELL Precision 690 Xeon 3.2G | RFDT 5.0(2.6.17.1-7) | 9.098 | 17.261 | 15.180 |
Red Flag Alpha 下载服务器(注释4) | Red Flag Linux 2.0 for Alpha | 2.322 | 2.847 | 3.247 |
龙梦盒子 | Debian etch | 3.351 | 2.727 | 2.469 |
龙梦盒子 | Everest 0.5(64bits) | 3.183 | 4.754 | 2.728 |
Macbook Pro(T8300) | Mac OS X 10.6.0 Beta1 | 20.190 | 37.898 | 22.963 |
HP XW4600(E7200) | Fedora 11 | 17.013 | 36.669 | 16.797 |
RHEL in Xen(AMD 8356) | RHEL 5.3 Running in Xen | 11.694 | 14.412 | 9.201 |
注:绿底为同测试最高分,红底为同测试最低分。
PowerBook 17" 的 nbench 和 linpack 测试结果。
我的机器配制是:
Machine Name: PowerBook G4 17"
Machine Model: PowerBook5,7
CPU Type: PowerPC G4 (1.2)
Number Of CPUs: 1
CPU Speed: 1.67 GHz
L2 Cache (per CPU): 512 KB
Memory: 1GB
Bus Speed: 167 MHz
linpack 测试结果:
我的机器是:448679 Kflops
邹鹏程的测试纪录:
我的ASUS V6800V (PentiumM 2.0G) 的linpack结果为:340Mflops
我的desktop (Celeron 1.7G) 的结果为:200Mflops
builder (4 Xeon 3.4G) 的结果为:680Mflops
目前世界第一的Blue Genes的结果为:70720Gflops (Rmax)
曙光4000A的结果为:8061Gflops (Rmax)
nbench 测试结果:
我的机器是:
MEMORY INDEX : 12.511
INTEGER INDEX : 10.897
FLOATING-POINT INDEX: 12.605
邹鹏程的测试结果可以参考:这里
Machine Name: PowerBook G4 17"
Machine Model: PowerBook5,7
CPU Type: PowerPC G4 (1.2)
Number Of CPUs: 1
CPU Speed: 1.67 GHz
L2 Cache (per CPU): 512 KB
Memory: 1GB
Bus Speed: 167 MHz
linpack 测试结果:
我的机器是:448679 Kflops
邹鹏程的测试纪录:
我的ASUS V6800V (PentiumM 2.0G) 的linpack结果为:340Mflops
我的desktop (Celeron 1.7G) 的结果为:200Mflops
builder (4 Xeon 3.4G) 的结果为:680Mflops
目前世界第一的Blue Genes的结果为:70720Gflops (Rmax)
曙光4000A的结果为:8061Gflops (Rmax)
nbench 测试结果:
我的机器是:
MEMORY INDEX : 12.511
INTEGER INDEX : 10.897
FLOATING-POINT INDEX: 12.605
邹鹏程的测试结果可以参考:这里