我把perf放在/data/bin下。

adb shell /data/bin/perf list将列出所有的performance events,分成四类:hardware event,software event, hardware cache event和tracepoint event.adb shell /data/bin/perf stat ls会列出ls命令执行过程中各个performance counter的统计。adb shell /data/bin/perf -e event ls将会输出针对事件event的统计

执行adb shell /data/bin/perf stat ls,会发现如下的输出:

Error: open_counter returned with 19 (No such device). /bin/dmesg may provide additional information. Fatal: Not all events could be opened.

按提示执行adb shell dmesg,发现这个出错信息:

hw perfevents: unable to reserve pmu

对每个event逐个执行adb shell /data/bin/perf -e event ls,会发现,只要event是hardware event 或hardware cache event,就会出上面提到的错误,出错信息是一样的。而event是software event或者tracepoint event时,则成功。这意味着什么呢?意味着PMU硬件没有起作用,所有的hardware performance counter都没法统计。

Galaxy Nexus的CPU是OMAP (arm cortex A9),以前已经把对应的kernel源代码下载到了omap目录git clone cd omapgit checkout remotes/origin/android-omap-tuna-3.0 -b tuna

用上面的出错信息去搜索引擎检索,会发现很多有关omap的perf stat的出错的讨论,有人说,是这款cpu芯片设计有问题,导致没法发生中断。是不是硬件有问题,可以用ARM提供的gator来检验一下。gator-driver中,针对不同的内核版本,提供了不同的profiling方式。如果版本低于3.0.0,则用arm自己提供的PMU操作,否则,采用linux的perf体系。目前,该手机的内核版本是3.0.31,将会采用linux的perf,用DS-5的streamliner做实验,确实可以从dmesg输出中看到

hw perfevents: unable to reserve pmu

如果修改gator-driver对内核版本的判断,使其在版本高于3.1.0时才用perf体系,那么,在这款手机上,gator模块会用自己的pmu操作取counter数据,而不是依赖linux内核所带设备驱动。实验结果是,dmesg中的“hw perfevents: unable to reserve pmu" 消失了,hardware performance counter的值被读回来了。这证明硬件是没有问题的,应该把注意力放在内核代码上。

在内核代码中检索"unable to reserve pmu",可以发现,它只出现在omap/arch/arm/kernel/perf_event.c的armpmu_reserve_hardware()函数中,当reserve_pmu(ARM_PMU_DEVICE_CPU)返回错误码时,就会输出这个警告信息。391 static int392 armpmu_reserve_hardware(void)393 {394     struct arm_pmu_platdata *plat;395     irq_handler_t handle_irq;396     int i, err = -ENODEV, irq;397398     pmu_device = reserve_pmu(ARM_PMU_DEVICE_CPU);399     if (IS_ERR(pmu_device)) {400         pr_warning("unable to reserve pmu\n");401         return PTR_ERR(pmu_device);402     }403     ……

reserver_pmu(enum arm_pmu_type device)在omap/arch/arm/kernel/pmu.c中:

61 struct platform_device * 62 reserve_pmu(enum arm_pmu_type device) 63 { 64     struct platform_device *pdev; 65 66     if (test_and_set_bit_lock(device, &pmu_lock)) { 67         pdev = ERR_PTR(-EBUSY); 68     } else if (pmu_devices[device] == NULL) { 69         clear_bit_unlock(device, &pmu_lock); 70         pdev = ERR_PTR(-ENODEV); 71     } else { 72         pdev = pmu_devices[device]; 73     } 74 75     return pdev; 76 }

从中可以看到,找不到设备的情况下,返回ENODEV,正好和perf stat ls的出错信息吻合。

在omap/arch/arm/mach-omap2/devices.c中,对pmu设备进行初始化注册工作:

592 static void omap_init_pmu(void) 593 { 594     if (cpu_is_omap24xx()) 595         omap_pmu_device.resource = &omap2_pmu_resource; 596     else if (cpu_is_omap34xx()) 597         omap_pmu_device.resource = &omap3_pmu_resource; 598     else 599         return; 600 601     platform_device_register(&omap_pmu_device); 602 }

从dmesg的输出中,可以发现Galaxy Nexus的CPU型号是OMAP 4460.对照源码,当cpu为4460时,根本就没有分配resource,也没有进行设备注册。 omap24xx和omap34xx的resource定义如下

568 static struct resource omap2_pmu_resource = { 569     .start  = 3, 570     .end    = 3, 571     .flags  = IORESOURCE_IRQ, 572 }; 573 574 static struct resource omap3_pmu_resource = { 575     .start  = INT_34XX_BENCH_MPU_EMUL, 576     .end    = INT_34XX_BENCH_MPU_EMUL, 577     .flags  = IORESOURCE_IRQ, 578 };

可以看出,这个resource是中断号。那么omap 4460的PMU的中断号是多少呢?omap 4460有两个核(Cortex-A9 MPCore),每个核都有自己的PMU,每个PMU都有一个中断号,所以,应该有两个中断号。从网上搜索OMAP 4460 和PMU的结果是,这两个中断号为:

54 + OMAP44XX_IRQ_GIC_START 55 + OMAP44XX_IRQ_GIC_START

在omap/arch/arm/mach-omap2/omap_hwmod_44xx_data.c中:

#define OMAP44XX_IRQ_GIC_START  32

所以,这两个中断号就是86,87 于是,修改后的omap/arch/arm/mach-omap/devices.c如下:

568 static struct resource omap2_pmu_resource = { 569     .start  = 3, 570     .end    = 3, 571     .flags  = IORESOURCE_IRQ, 572 }; 573 574 static struct resource omap3_pmu_resource = { 575     .start  = INT_34XX_BENCH_MPU_EMUL, 576     .end    = INT_34XX_BENCH_MPU_EMUL, 577     .flags  = IORESOURCE_IRQ, 578 }; 579 580 static struct resource omap446x_pmu_resource = { 581     .start  = 86, 582     .end    = 87, 583     .flags  = IORESOURCE_IRQ, 584 }; 585 586 static struct platform_device omap_pmu_device = { 587     .name       = "arm-pmu", 588     .id     = ARM_PMU_DEVICE_CPU, 589     .num_resources  = 1, 590 }; 591 592 static void omap_init_pmu(void) 593 { 594     if (cpu_is_omap24xx()) 595         omap_pmu_device.resource = &omap2_pmu_resource; 596     else if (cpu_is_omap34xx()) 597         omap_pmu_device.resource = &omap3_pmu_resource; 598     else if (cpu_is_omap446x()) 599         omap_pmu_device.resource = &omap446x_pmu_resource; 600     else 601         return; 602 603     platform_device_register(&omap_pmu_device); 604 }

或者看git diff的输出

diff --git a/arch/arm/mach-omap2/devices.c b/arch/arm/mach-omap2/devices.cindex cf7a0ba..fce5cbc 100644--- a/arch/arm/mach-omap2/devices.c+++ b/arch/arm/mach-omap2/devices.c@@ -577,6 +577,12 @@ static struct resource omap3_pmu_resource = {        .flags  = IORESOURCE_IRQ, };+static struct resource omap446x_pmu_resource = {+       .start  = 86,+       .end    = 87,+       .flags  = IORESOURCE_IRQ,+};+ static struct platform_device omap_pmu_device = {        .name           = "arm-pmu",        .id             = ARM_PMU_DEVICE_CPU,@@ -589,6 +595,8 @@ static void omap_init_pmu(void)                omap_pmu_device.resource = &omap2_pmu_resource;        else if (cpu_is_omap34xx())                omap_pmu_device.resource = &omap3_pmu_resource;+       else if (cpu_is_omap446x())+               omap_pmu_device.resource = &omap446x_pmu_resource;        else                return;

重新编译好,烧制进手机,执行perf stat ls

hzh@fangtian:~/android/omap$ adb shell /data/bin/perf stat ls /sdcard/sdcard Performance counter stats for 'ls /sdcard':         9.735107 task-clock                #    0.761 CPUs utilized                7 context-switches          #    0.001 M/sec                0 CPU-migrations            #    0.000 M/sec              127 page-faults               #    0.013 M/sec          3351924 cycles                    #    0.344 GHz                0 stalled-cycles-frontend   #    0.00% frontend cycles idle                0 stalled-cycles-backend    #    0.00% backend  cycles idle                0 instructions              #    0.00  insns per cycle                0 branches                  #    0.000 M/sec                0 branch-misses             #    0.00% of all branches      0.012786865 seconds time elapsed