关注公众号不迷路:DumpStack
扫码加关注

目录
- 一、从设备树中提取C state信息
- 二、示例一:cpuidle-arm - ARM32平台
- 2.1 arm32中的cpuidle_ops结构组织关系
- 2.2 arm_idle_init - cpuidle_arm驱动初始化流程
- 2.3 arm_idle_init_cpu - 为指定的cpu初始化cpuidle_driver
- 2.4 arm_cpuidle_init - 读取平台在设备树中自定义的C state信息
- 2.5 进入指定级别的C state
- 2.5.1 cpu_do_idle - 执行wfi指令进入idle 0
- 2.5.2 arm_cpuidle_suspend - 进入指定级别的idle
- 2.5.2.1 ARM64
- 2.5.2.2 ARM32
- 2.5.2.2.1 amx3_idle_enter - 进入指定级别的C state
- 2.5.2.2.2 am33xx_pm_probe
- 2.5.2.2.3 am33xx_ops - 全局的pm_ops指向该结构
- 2.5.2.2.4 am33xx_do_sram_idle - 全局的idle_fn,进入不同等级的C state
- 2.5.2.2.5 am33xx_cpu_suspend
- 2.5.2.2.6 cpu_suspend - 调用指定的回调函数,进入指定的C state
- 2.5.2.2.7 __cpu_suspend
- 2.5.2.2.8 am33xx_do_wfi_sram - 进入指定级别的C state
- 2.5.2.2.9 amx3_get_sram_addrs - 计算pm_sram,即可执行代码存放位置
- 2.5.2.2.10 am33xx_pm_sram - 在汇编中描述一个am33xx_pm_sram_addr类型结构
- 2.5.2.2.11 am33xx_do_wfi - 进入不同等级的C state,实际就是在wfi之前关闭片上外设、刷cache等操作
- 2.5.2.2.12 am33xx_pm_ro_sram_data全局变量在哪定义
- 三、示例二:cpuidle-big_little - ARM32平台
- 3.1 大小核对应的cpuidle_driver
- 3.2 bl_idle_init - 驱动初始化
- 3.3 进入指定级别的C state
- 3.3.1 arm_cpuidle_simple_enter - 执行wfi进入idle0
- 3.3.2 bl_enter_powerdown -关闭cpu和cluster的电源,进入idle1
- 四、示例三:cpuidle-psci - ARM64平台
- 4.1 SMC Calling扫盲
- 4.2 arm64中的cpu_operations结构组织关系
- 4.3 psci_cpuidle_probe - 遍历所有cpu,完成cpuidle driver的初始化
- 4.4 psci_idle_init_cpu - 完成对指定cpu的cpuidle driver的初始化
- 4.5 psci_cpu_init_idle - 解析psci自定义的一些C state信息
- 4.6 进入指定级别的C state
- 五、总结
- 关注公众号不迷路:DumpStack
driver层直接和硬件对接,回答了下面问题:
-
怎样进入指定的C state?是操作对应的寄存器,还是执行对应的指令?
-
什么时候退出idle?要怎样才能退出idle?是收到中断自动退出还是写相应的寄存器?
-
怎样理解不同等级的C state?
可见driver层是和平台强相关的一层,不同平台的cpu会有不同的driver
一、从设备树中提取C state信息
注意:这里从设备树中提取的是标准内核支持的C state属性,还有一些平台自定义的一些属性,在arm_cpu_idle中解析
文件位置:W:\opensource\linux-5.10.61\drivers\cpuidle\dt_idle_states.c
1.1 设备树信息
设备树中描述C state信息存在下面两种格式:
1.1.1 msm8916设备树
W:\opensource\linux-5.10.61\arch\arm64\boot\dts\qcom\msm8916.dtsi
cpus { #address-cells = <1>; #size-cells = <0>;
CPU0: cpu@0 { device_type = "cpu"; compatible = "arm,cortex-a53"; reg = <0x0>; next-level-cache = <&L2_0>; enable-method = "psci"; clocks = <&apcs>; operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; power-domains = <&CPU_PD0>; //指定这个cpu所属的电源域 power-domain-names = "psci"; };
CPU1: cpu@1 { device_type = "cpu"; compatible = "arm,cortex-a53"; reg = <0x1>; next-level-cache = <&L2_0>; enable-method = "psci"; clocks = <&apcs>; operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; power-domains = <&CPU_PD1>; power-domain-names = "psci"; };
CPU2: cpu@2 { device_type = "cpu"; compatible = "arm,cortex-a53"; reg = <0x2>; next-level-cache = <&L2_0>; enable-method = "psci"; clocks = <&apcs>; operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; power-domains = <&CPU_PD2>; power-domain-names = "psci"; };
CPU3: cpu@3 { device_type = "cpu"; compatible = "arm,cortex-a53"; reg = <0x3>; next-level-cache = <&L2_0>; enable-method = "psci"; clocks = <&apcs>; operating-points-v2 = <&cpu_opp_table>; #cooling-cells = <2>; power-domains = <&CPU_PD3>; power-domain-names = "psci"; };
L2_0: l2-cache { compatible = "cache"; cache-level = <2>; };
# 描述不同等级的C state,本例只有一个等级 idle-states { entry-method = "psci";
# 下面描述一个C state CPU_SLEEP_0: cpu-sleep-0 { compatible = "arm,idle-state"; idle-state-name = "standalone-power-collapse"; arm,psci-suspend-param = <0x40000002>; entry-latency-us = <130>; #进入该状态延迟 exit-latency-us = <150>; #退出该状态延迟 min-residency-us = <2000>; #最小滞留时间 local-timer-stop; #在该状态下,local timer是否需要关闭 }; };
# 对不同的电源域的进入/退出延迟、最小滞留时间进行描述 domain-idle-states {
CLUSTER_RET: cluster-retention { compatible = "domain-idle-state"; arm,psci-suspend-param = <0x41000012>; entry-latency-us = <500>; #进入该状态延时 exit-latency-us = <500>; #退出该状态延迟 min-residency-us = <2000>; #最小滞留时间 };
CLUSTER_PWRDN: cluster-gdhs { compatible = "domain-idle-state"; arm,psci-suspend-param = <0x41000032>; entry-latency-us = <2000>; #进入该状态延时 exit-latency-us = <2000>; #退出该状态延迟 min-residency-us = <6000>; #最小滞留时间 }; }; };
#下面描述电源域 psci { compatible = "arm,psci-1.0"; method = "smc";
CPU_PD0: power-domain-cpu0 { #power-domain-cells = <0>; power-domains = <&CLUSTER_PD>; domain-idle-states = <&CPU_SLEEP_0>; #描述这个电源域的C state };
CPU_PD1: power-domain-cpu1 { #power-domain-cells = <0>; power-domains = <&CLUSTER_PD>; domain-idle-states = <&CPU_SLEEP_0>; };
CPU_PD2: power-domain-cpu2 { #power-domain-cells = <0>; power-domains = <&CLUSTER_PD>; domain-idle-states = <&CPU_SLEEP_0>; };
CPU_PD3: power-domain-cpu3 { #power-domain-cells = <0>; power-domains = <&CLUSTER_PD>; domain-idle-states = <&CPU_SLEEP_0>; };
CLUSTER_PD: power-domain-cluster { #power-domain-cells = <0>; domain-idle-states = <&CLUSTER_RET>, <&CLUSTER_PWRDN>; }; }; |
1.1.2 msm8998设备树
文件位置:W:\opensource\linux-5.10.61\arch\arm64\boot\dts\qcom\msm8998.dtsi
cpus { #address-cells = <2>; #size-cells = <0>;
CPU0: cpu@0 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x0>; enable-method = "psci";
#这个cpu支持的idle等级,这里表示有两级idle cpu-idle-states = <&LITTLE_CPU_SLEEP_0 &LITTLE_CPU_SLEEP_1>; next-level-cache = <&L2_0>; L2_0: l2-cache { compatible = "arm,arch-cache"; cache-level = <2>; }; L1_I_0: l1-icache { compatible = "arm,arch-cache"; }; L1_D_0: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU1: cpu@1 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x1>; enable-method = "psci"; cpu-idle-states = <&LITTLE_CPU_SLEEP_0 &LITTLE_CPU_SLEEP_1>; next-level-cache = <&L2_0>; L1_I_1: l1-icache { compatible = "arm,arch-cache"; }; L1_D_1: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU2: cpu@2 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x2>; enable-method = "psci"; cpu-idle-states = <&LITTLE_CPU_SLEEP_0 &LITTLE_CPU_SLEEP_1>; next-level-cache = <&L2_0>; L1_I_2: l1-icache { compatible = "arm,arch-cache"; }; L1_D_2: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU3: cpu@3 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x3>; enable-method = "psci"; cpu-idle-states = <&LITTLE_CPU_SLEEP_0 &LITTLE_CPU_SLEEP_1>; next-level-cache = <&L2_0>; L1_I_3: l1-icache { compatible = "arm,arch-cache"; }; L1_D_3: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU4: cpu@100 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x100>; enable-method = "psci"; cpu-idle-states = <&BIG_CPU_SLEEP_0 &BIG_CPU_SLEEP_1>; #大核的C state next-level-cache = <&L2_1>; L2_1: l2-cache { compatible = "arm,arch-cache"; cache-level = <2>; }; L1_I_100: l1-icache { compatible = "arm,arch-cache"; }; L1_D_100: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU5: cpu@101 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x101>; enable-method = "psci"; cpu-idle-states = <&BIG_CPU_SLEEP_0 &BIG_CPU_SLEEP_1>; next-level-cache = <&L2_1>; L1_I_101: l1-icache { compatible = "arm,arch-cache"; }; L1_D_101: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU6: cpu@102 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x102>; enable-method = "psci"; cpu-idle-states = <&BIG_CPU_SLEEP_0 &BIG_CPU_SLEEP_1>; next-level-cache = <&L2_1>; L1_I_102: l1-icache { compatible = "arm,arch-cache"; }; L1_D_102: l1-dcache { compatible = "arm,arch-cache"; }; };
CPU7: cpu@103 { device_type = "cpu"; compatible = "qcom,kryo280"; reg = <0x0 0x103>; enable-method = "psci"; cpu-idle-states = <&BIG_CPU_SLEEP_0 &BIG_CPU_SLEEP_1>; next-level-cache = <&L2_1>; L1_I_103: l1-icache { compatible = "arm,arch-cache"; }; L1_D_103: l1-dcache { compatible = "arm,arch-cache"; }; };
cpu-map { cluster0 { core0 { cpu = <&CPU0>; };
core1 { cpu = <&CPU1>; };
core2 { cpu = <&CPU2>; };
core3 { cpu = <&CPU3>; }; };
cluster1 { core0 { cpu = <&CPU4>; };
core1 { cpu = <&CPU5>; };
core2 { cpu = <&CPU6>; };
core3 { cpu = <&CPU7>; }; }; };
# 不同的C state的信息 idle-states { entry-method = "psci";
LITTLE_CPU_SLEEP_0: cpu-sleep-0-0 { compatible = "arm,idle-state"; idle-state-name = "little-retention"; arm,psci-suspend-param = <0x00000002>; entry-latency-us = <81>; #进入该C state的延迟 exit-latency-us = <86>; #退出该C state的延迟 min-residency-us = <200>; #最小滞留时间 };
LITTLE_CPU_SLEEP_1: cpu-sleep-0-1 { compatible = "arm,idle-state"; idle-state-name = "little-power-collapse"; arm,psci-suspend-param = <0x40000003>; entry-latency-us = <273>; exit-latency-us = <612>; min-residency-us = <1000>; local-timer-stop; #在该C state下,local timer关闭 };
BIG_CPU_SLEEP_0: cpu-sleep-1-0 { compatible = "arm,idle-state"; idle-state-name = "big-retention"; arm,psci-suspend-param = <0x00000002>; entry-latency-us = <79>; exit-latency-us = <82>; min-residency-us = <200>; };
BIG_CPU_SLEEP_1: cpu-sleep-1-1 { compatible = "arm,idle-state"; idle-state-name = "big-power-collapse"; arm,psci-suspend-param = <0x40000003>; entry-latency-us = <336>; exit-latency-us = <525>; min-residency-us = <1000>; local-timer-stop; }; }; }; |
1.2 从设备树中提取C state信息
1.2.1 dt_init_idle_driver - 每个cpu调用一次该函数
每个cpu调用一次该函数,初始化这个cpu的C state信息
/** * dt_init_idle_driver() - Parse the DT idle states and initialize the * idle driver states array * @drv: Pointer to CPU idle driver to be initialized * @matches: Array of of_device_id match structures to search in for * compatible idle state nodes. The data pointer for each valid * struct of_device_id entry in the matches array must point to * a function with the following signature, that corresponds to * the CPUidle state enter function signature: * * int (*)(struct cpuidle_device *dev, * struct cpuidle_driver *drv, * int index); * * @start_idx: First idle state index to be initialized * * If DT idle states are detected and are valid the state count and states * array entries in the cpuidle driver are initialized accordingly starting * from index start_idx. * * Return: number of valid DT idle states parsed, <0 on failure */ int dt_init_idle_driver( struct cpuidle_driver *drv, //为哪个driver解析C state const struct of_device_id *matches, //match表中记录着进入idle的函数 unsigned int start_idx) //要初始化的第一个C state { struct cpuidle_state *idle_state; struct device_node *state_node, *cpu_node; const struct of_device_id *match_id; int i, err = 0; const cpumask_t *cpumask; unsigned int state_idx = start_idx;
//1.最多允许支持10个C state if (state_idx >= CPUIDLE_STATE_MAX) return -EINVAL; /* * We get the idle states for the first logical cpu in the * driver mask (or cpu_possible_mask if the driver cpumask is not set) * and we check through idle_state_valid() if they are uniform * across CPUs, otherwise we hit a firmware misconfiguration. */ //2.获取这个driver控制的cpu,未指定的话就是所有可能的cpu cpumask = drv->cpumask ? : cpu_possible_mask;
//3.获取设备树中的cpu的节点 cpu_node = of_cpu_device_node_get(cpumask_first(cpumask));
//4.遍历每一级C state for (i = 0; ; i++) { //4.获取这个cpu的idle等级对应的设备树节点 state_node = of_get_cpu_state_node(cpu_node, i); if (!state_node) break;
//5.通过match表找到对应的节点,假设match表为"arm,idle-state" match_id = of_match_node(matches, state_node); if (!match_id) { err = -ENODEV; break; }
//6.判断节点是否有效 if (!of_device_is_available(state_node)) { of_node_put(state_node); continue; }
//7.校验cpumask中的所有cpu,在第i级的idle状态是不是一样的 // 也就是说:同一个driver中的所有的cpu的所有idle层级都应该是一样的 if (!idle_state_valid(state_node, i, cpumask)) { pr_warn("%pOF idle state not valid, bailing out\n", state_node); err = -EINVAL; break; }
//8.最多遍历10级C state if (state_idx == CPUIDLE_STATE_MAX) { pr_warn("State index reached static CPU idle driver states array size\n"); break; }
//9.获取该层级对应的cpuidle_state结构的空间, // 下面要从设备树中解析参数来填充这个空间了 idle_state = &drv->states[state_idx++]; err = init_state_node(idle_state, match_id, state_node); if (err) { pr_err("Parsing idle state node %pOF failed with err %d\n", state_node, err); err = -EINVAL; break; } of_node_put(state_node); }
of_node_put(state_node); of_node_put(cpu_node); if (err) return err; /* * Update the driver state count only if some valid DT idle states * were detected */ if (i) drv->state_count = state_idx;
/* * Return the number of present and valid DT idle states, which can * also be 0 on platforms with missing DT idle states or legacy DT * configuration predating the DT idle states bindings. */ return i; } |
1.2.2 of_get_cpu_state_node - 从设备树中找出描述idle等级的节点
找出指定cpu的C state节点,有两种格式的设备树用于描述这个cpu可用哪些C state
-
第一种格式是在cpu节点中使用power-domains和#power-domain-cells属性描述其所属的电源域节点,然后在根据电源域中的domain-idle-states属性描述对应的C state节点
-
另一种格式是直接在cpu节点中通过cpu-idle-states属性找到这个cpu支持的C state节点
/** * of_get_cpu_state_node - Get CPU's idle state node at the given index * * @cpu_node: The device node for the CPU * @index: The index in the list of the idle states * * Two generic methods can be used to describe a CPU's idle states, either via * a flattened description through the "cpu-idle-states" binding or via the * hierarchical layout, using the "power-domains" and the "domain-idle-states" * bindings. This function check for both and returns the idle state node for * the requested index. * * In case an idle state node is found at @index, the refcount is incremented * for it, so call of_node_put() on it when done. Returns NULL if not found. */ struct device_node *of_get_cpu_state_node( struct device_node *cpu_node, //哪个cpu int index) //idx表示idle等级 { struct of_phandle_args args; int err;
//1.第一种格式:msm8916 // 获取到CPU_PDx对应的设备树节点,赋值给args err = of_parse_phandle_with_args(cpu_node, "power-domains", "#power-domain-cells", 0, &args); if (!err) { //2.找到指定的idle等级对应的设备树 struct device_node *state_node = of_parse_phandle(args.np, "domain-idle-states", index);
of_node_put(args.np); if (state_node) return state_node; }
//2.第二种格式:msm8998 return of_parse_phandle(cpu_node, "cpu-idle-states", index); } |
1.2.3 idle_state_valid - 检验在idx级的idle状态是否一致
该函数的功能是:校验cpumask中的cpu,在第idx层级的idle状态对应的设备树节点是不是state_node,也就是说cpumask中所有cpu在第idx层级的idle状态应该是一样的
/* * Check that the idle state is uniform across all CPUs in the CPUidle driver * cpumask */ static bool idle_state_valid( struct device_node *state_node, //idle等级对应的设备树节点 unsigned int idx, //idx表示idle的等级 const cpumask_t *cpumask) { int cpu; struct device_node *cpu_node, *curr_state_node; bool valid = true;
/* * Compare idle state phandles for index idx on all CPUs in the * CPUidle driver cpumask. Start from next logical cpu following * cpumask_first(cpumask) since that's the CPU state_node was * retrieved from. If a mismatch is found bail out straight * away since we certainly hit a firmware misconfiguration. */ //1.遍历cpumask中的所有cpu for (cpu = cpumask_next(cpumask_first(cpumask), cpumask); cpu < nr_cpu_ids; cpu = cpumask_next(cpu, cpumask)) {
//2.通过下面两步,找到这个cpu的第idx对应的idle等级对应的设备树节点 cpu_node = of_cpu_device_node_get(cpu); curr_state_node = of_get_cpu_state_node(cpu_node, idx);
//3.如果这两个值不一样,则说明这个cpu的第idx级idle状态,并不是state_node if (state_node != curr_state_node) valid = false;
of_node_put(curr_state_node); of_node_put(cpu_node); if (!valid) break; }
return valid; } |
1.2.4 init_state_node - 通过设备树解析一个C state
static int init_state_node( struct cpuidle_state *idle_state, //解析后存放在哪 const struct of_device_id *match_id, //match表 struct device_node *state_node) //要解析哪一级的C state { int err; const char *desc;
/* * CPUidle drivers are expected to initialize the const void *data * pointer of the passed in struct of_device_id array to the idle * state enter function. */ //1.进入该C state的回调函数 idle_state->enter = match_id->data; /* * Since this is not a "coupled" state, it's safe to assume interrupts * won't be enabled when it exits allowing the tick to be frozen * safely. So enter() can be also enter_s2idle() callback. */ idle_state->enter_s2idle = match_id->data;
//2.从设备树中解析从该C state退出时的延迟 // 如果存在"wakeup-latency-us",of_property_read_u32函数返回0, // 不存在该属性时,进入if分支,通过其他属性解析 err = of_property_read_u32(state_node, "wakeup-latency-us", &idle_state->exit_latency); if (err) { u32 entry_latency, exit_latency;
//2.1 依次读出"entry-latency-us"和"exit-latency-us"属性 err = of_property_read_u32(state_node, "entry-latency-us", &entry_latency); if (err) { pr_debug(" * %pOF missing entry-latency-us property\n", state_node); return -EINVAL; }
err = of_property_read_u32(state_node, "exit-latency-us", &exit_latency); if (err) { pr_debug(" * %pOF missing exit-latency-us property\n", state_node); return -EINVAL; }
/* * If wakeup-latency-us is missing, default to entry+exit * latencies as defined in idle states bindings */ //2.2 两者的和表示从该级C state退出的耗时 idle_state->exit_latency = entry_latency + exit_latency; }
//3.解析"min-residency-us",该属性表示该C state的"最小滞留时间" err = of_property_read_u32(state_node, "min-residency-us", &idle_state->target_residency); if (err) { pr_debug(" * %pOF missing min-residency-us property\n", state_node); return -EINVAL; }
//4.解析这个C state的名称,若没有指定该属性,则直接使用C state的名称 err = of_property_read_string(state_node, "idle-state-name", &desc); if (err) desc = state_node->name;
//5.从设备树中解析flags信息,表示进入该C state是否需要关闭时钟 idle_state->flags = 0; if (of_property_read_bool(state_node, "local-timer-stop")) idle_state->flags |= CPUIDLE_FLAG_TIMER_STOP; /* * TODO: * replace with kstrdup and pointer assignment when name * and desc become string pointers */ //6.解析该C state的name和desc信息 strncpy(idle_state->name, state_node->name, CPUIDLE_NAME_LEN - 1); strncpy(idle_state->desc, desc, CPUIDLE_DESC_LEN - 1); return 0; } |
二、示例一:cpuidle-arm - ARM32平台
使能CONFIG_ARM_CPUIDLE宏会启用该cpuidle-arm,虽然arm32和arm64都有可能会启用该宏,但是实际上在Linux-5.10.61开源内核中,只有arm32中的omap2的pm33xx平台真正的把cpuidle-arm给用起来了。
这是因为cpuidle-arm在初始化的过程中,执行到arm_cpuidle_init的时候,会获取一个ops,在arm32平台中只有omap2的pm33xx平台会真正的注册这个ops,而在arm64平台中,获取到的ops中不会有全部没有设置cpu_suspend函数,所以arm_cpuidle_init会直接返回-EOPNOTSUPP,所以在cpuidle-arm初始化过程中也就不会注册cpuidle_driver驱动(除了arm32的omap2的pm33xx平台),cpuidle-arm也就没有真正的生效
那么arm32和arm64上怎样进入idle呢?实际上:
-
arm32中,只有normal和idle两个状态,idle也没有等级之分,所以只需要wfi进入idle即可
-
arm64中,基本都是使用cpuidle-psci驱动,关于psci我们在后面介绍
PS:我在手机上做了一个实验,在arm_idle_init函数刚进来的时候直接return掉,手机的各项功能也是正常的,可以正常进入退出idle状态,这也证明了在arm64平台上,cpuidle-arm这个驱动根本就没有起作用
下面我们主要以arm32的omap2的pm33xx平台为例,分析一下cpuidle-arm驱动
文件位置:W:\opensource\linux-5.10.61\drivers\cpuidle\cpuidle-arm.c
2.1 arm32中的cpuidle_ops结构组织关系
在arm32平台中,定义了cpuidle_ops数据结构,由该数据结构定义的位置也可以知道,该数据结构是只为arm32提供的,该数据结构定义了两个方法:
-
suspend: 指定这个cpu如何进入指定级别的C state
-
init: 初始化指定的cpu
W:\opensource\linux-5.10.61\arch\arm\include\asm\cpuidle.h
struct cpuidle_ops { int (*suspend)(unsigned long arg); //指定这个cpu如何进入指定级别的C state int (*init)(struct device_node *, int cpu); //初始化指定的cpu }; |
2.1.1 全局数组cpuidle_ops[NR_CPUS]
每个cpu定义一个cpuidle_ops类型的变量,如下
W:\opensource\linux-5.10.61\arch\arm\kernel\cpuidle.c
static struct cpuidle_ops cpuidle_ops[NR_CPUS] __ro_after_init; |
全局的cpuidle_ops[]数组在哪设置?怎样获取指定cpu对应的cpuidle_ops结构?
2.1.2 填充__cpuidle_method_of_table段
下面的CPUIDLE_METHOD_OF_DECLARE完成的工作是将这个amx3_cpuidle_ops放入一个名为__cpuidle_method_of_table的段中,然后在需要的时候再从这个段中将这些ops读出来,并赋值给全局的cpuidle_ops,这是Linux的惯用伎俩,简单看一下吧原理吧
U:\linux-5.10.61\arch\arm\mach-omap2\pm33xx-core.c
static struct cpuidle_ops amx3_cpuidle_ops __initdata = { .init = amx3_idle_init, .suspend = amx3_idle_enter, };
CPUIDLE_METHOD_OF_DECLARE(pm33xx_idle, "ti,am3352", &amx3_cpuidle_ops); CPUIDLE_METHOD_OF_DECLARE(pm43xx_idle, "ti,am4372", &amx3_cpuidle_ops); |
第一步:往__cpuidle_method_of_table段里面塞ops,CPUIDLE_METHOD_OF_DECLARE实现如下
#define CPUIDLE_METHOD_OF_DECLARE(name, _method, _ops) \ static const struct of_cpuidle_method __cpuidle_method_of_table_##name \ __cpuidle_method_section = { .method = _method, .ops = _ops }
其中: #define __cpuidle_method_section __used __section("__cpuidle_method_of_table") |
第二步:在连接器脚本中标记这个段的起始和结束
T:\arch\arm64\kernel\vmlinux.lds
... . = ALIGN(8); __cpuidle_method_of_table = .; KEEP(*(__cpuidle_method_of_table)) KEEP(*(__cpuidle_method_of_table_end)) ... |
第三步:在C中将这个段的起始和结束声明为外部变量,这样在C中就能正大光明的使用了
U:\linux-5.10.61\arch\arm\kernel\cpuidle.c
extern struct of_cpuidle_method __cpuidle_method_of_table[]; |
第四步:使用__cpuidle_method_of_table这个变量,获取cpuidle_ops
/** * arm_cpuidle_get_ops() - find a registered cpuidle_ops by name * @method: the method name * * Search in the __cpuidle_method_of_table array the cpuidle ops matching the * method name. * * Returns a struct cpuidle_ops pointer, NULL if not found. */ static const struct cpuidle_ops *__init arm_cpuidle_get_ops(const char *method) { struct of_cpuidle_method *m = __cpuidle_method_of_table;
//遍历段中的所有cpuidle_ops,直到找到method指定的那个cpuidle_ops for (; m->method; m++) if (!strcmp(m->method, method)) return m->ops;
return NULL; } |
2.1.3 arm_cpuidle_get_ops - 根据字符串,从段中找到对于的cpuidle_ops
该函数在上一节已经介绍过,不再赘述,调用关系如下:
2.1.4 arm_cpuidle_read_ops - 获取cpu对应的cpuidle_ops,并赋值给全局数组变量cpuidle_ops[]
文件位置:W:\opensource\linux-5.10.61\arch\arm\kernel\cpuidle.c
/** * arm_cpuidle_read_ops() - Initialize the cpuidle ops with the device tree * @dn: a pointer to a struct device node corresponding to a cpu node * @cpu: the cpu identifier * * Get the method name defined in the 'enable-method' property, retrieve the * associated cpuidle_ops and do a struct copy. This copy is needed because all * cpuidle_ops are tagged __initconst and will be unloaded after the init * process. * * Return 0 on sucess, -ENOENT if no 'enable-method' is defined, -EOPNOTSUPP if * no cpuidle_ops is registered for the 'enable-method', or if either init or * suspend callback isn't defined. */ static int __init arm_cpuidle_read_ops( struct device_node *dn, //这个cpu对于的设备树节点 int cpu) { const char *enable_method; const struct cpuidle_ops *ops;
//1.先从设备树中获取enable-method属性,以便后面从段中取出对于的cpuidle_ops结构 enable_method = of_get_property(dn, "enable-method", NULL); if (!enable_method) return -ENOENT;
//2.根据上面获取到的enable-method属性,从段中取出对于的cpuidle_ops结构 // 实际上大部分ARM32平台都在这里返回了-EOPNOTSUPP,目前只有omap的一个 // 平台设置了这个ops,所以大部分的arm32平台因为找不到ops导致cpuidle-arm初始化失败 ops = arm_cpuidle_get_ops(enable_method); if (!ops) { pr_warn("%pOF: unsupported enable-method property: %s\n", dn, enable_method); return -EOPNOTSUPP; }
//3.在ARM32平台中,要求cpuidle_ops必须设置了这两个回调函数 if (!ops->init || !ops->suspend) { pr_warn("cpuidle_ops '%s': no init or suspend callback\n", enable_method); return -EOPNOTSUPP; }
//4.赋值给全局变量,以便后期使用 cpuidle_ops[cpu] = *ops; /* structure copy */
pr_notice("cpuidle: enable-method property '%s'" " found operations\n", enable_method);
return 0; } |
2.2 arm_idle_init - cpuidle_arm驱动初始化流程
cpuidle_arm驱动初始化的时候调用该函数,实现为所有cpu初始化cpuidle_driver
/* * arm_idle_init - Initializes arm cpuidle driver * * Initializes arm cpuidle driver for all CPUs, if any CPU fails * to register cpuidle driver then rollback to cancel all CPUs * registeration. */ static int __init arm_idle_init(void) { int cpu, ret; struct cpuidle_driver *drv; struct cpuidle_device *dev;
//遍历系统中每一个可能的cpu for_each_possible_cpu(cpu) { ret = arm_idle_init_cpu(cpu); if (ret) goto out_fail; }
return 0;
out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); drv = cpuidle_get_cpu_driver(dev); cpuidle_unregister(drv); kfree(drv); }
return ret; } device_initcall(arm_idle_init); //cpuidle_arm驱动初始化的时候调用 |
2.3 arm_idle_init_cpu - 为指定的cpu初始化cpuidle_driver
U:\linux-5.10.61\drivers\cpuidle\cpuidle-arm.c
/* * arm_idle_init_cpu * * Registers the arm specific cpuidle driver with the cpuidle * framework. It relies on core code to parse the idle states * and initialize them using driver data structures accordingly. */ static int __init arm_idle_init_cpu(int cpu) { int ret; struct cpuidle_driver *drv;
//1.因为每个cpu都有自己的driver,首先copy一份模版 drv = kmemdup(&arm_idle_driver, sizeof(*drv), GFP_KERNEL); if (!drv) return -ENOMEM;
//2.这个mask中仅包含一个cpu,可见每个cpu对应一个driver drv->cpumask = (struct cpumask *)cpumask_of(cpu);
/* * Initialize idle states data, starting at index 1. This * driver is DT only, if no DT idle states are detected (ret * == 0) let the driver initialization fail accordingly since * there is no reason to initialize the idle driver if only * wfi is supported. */ //3.从设备树中提取C state信息,state1及其之后的C state必须从设备树中传进来 // 注意,下面传入的参数为1,表示从设备树中获取除了state0之外的所有state, // state0默认为wfi state,具体可参见下面静态定义的arm_idle_driver ret = dt_init_idle_driver(drv, arm_idle_state_match, 1); if (ret <= 0) { ret = ret ? : -ENODEV; goto out_kfree_drv; }
/* * Call arch CPU operations in order to initialize * idle states suspend back-end specific data */ //4.给个机会给平台端,让平台在设备树中读取自定义的C state信息 // 注意:对于ARM64平台,该函数返回-EOPNOTSUPP ret = arm_cpuidle_init(cpu);
/* * Allow the initialization to continue for other CPUs, if the * reported failure is a HW misconfiguration/breakage (-ENXIO). * * Some platforms do not support idle operations * (arm_cpuidle_init() returning -EOPNOTSUPP), we should * not flag this case as an error, it is a valid * configuration. */ //5.注意:对于ARM64在这里就返回了,没有执行下面的驱动注册, // ARM64平台真正的注册的地方是在psci驱动初始化的地方,具 // 体参见psci_idle_init_cpu if (ret) { if (ret != -EOPNOTSUPP) pr_err("CPU %d failed to init idle CPU ops\n", cpu); ret = ret == -ENXIO ? 0 : ret; goto out_kfree_drv; }
//6.只有ARM32平台才会走到这里,ARM64在上面退出了 // 注册这个driver ret = cpuidle_register(drv, NULL); if (ret) goto out_kfree_drv;
//7.暂不分析 cpuidle_cooling_register(drv);
return 0;
out_kfree_drv: kfree(drv); return ret; } |
2.3.1 arm_idle_driver - cpuidle_driver模版
由下面的注释可知:所有的ARM平台,都应提供默认的WFI standby状态,作为idle state 0,如果有例外,则需要在DTS中另行处理;
注意到:对于state0,其exit latency和target residency均为1(最小值),power usage为整数中的最大值。由此可以看出,这些信息不是实际信息(因为driver不可能知道所有ARM平台的WFI相关的信息),而是相对信息,其中的含义是:所有其它的state的exit latency和target residency都会比state0大,power usage都会比state0小
static struct cpuidle_driver arm_idle_driver __initdata = { .name = "arm_idle", .owner = THIS_MODULE, /* * State at index 0 is standby wfi and considered standard * on all ARM platforms. If in some platforms simple wfi * can't be used as "state 0", DT bindings must be implemented * to work around this issue and allow installing a special * handler for idle state index 0. */ .states[0] = { .enter = arm_enter_idle_state, //进入该C state的方法 .exit_latency = 1, //退出延迟 .target_residency = 1, //最小滞留时间 .power_usage = UINT_MAX, //功耗 .name = "WFI", .desc = "ARM WFI", } }; |
2.3.2 arm_idle_state_match - match表,用于从设备树中提取C state信息
static const struct of_device_id arm_idle_state_match[] __initconst = { { .compatible = "arm,idle-state", .data = arm_enter_idle_state }, //data中指定进入指定C state的函数 { }, }; |
2.3.3 arm_enter_idle_state - 进入指定级别的C state的方法
arm_enter_idle_state使指定的cpu进入指定级别的C state,调用的函数如下,这两个函数我们在下一章单独讲解,其中idx表示C state的级别
-
当idx为0时,调用cpu_do_idle
-
当idx不为0时,调用arm_cpuidle_suspend
函数返回时表示已经从idle中退出,返回值表示上传所处的C state
实现位置:U:\linux-5.10.61\drivers\cpuidle\cpuidle-arm.c
/* * arm_enter_idle_state - Programs CPU to enter the specified state * * dev: cpuidle device * drv: cpuidle driver * idx: state index * * Called from the CPUidle framework to program the device to the * specified target state selected by the governor. */ static int arm_enter_idle_state( struct cpuidle_device *dev, //要进入idle的cpu struct cpuidle_driver *drv, //进入idle所使用的驱动 int idx) //要进入idle的级别 { /* * Pass idle state index to arm_cpuidle_suspend which in turn * will call the CPU ops suspend protocol with idle index as a * parameter. */ //arm_cpuidle_suspend方法的具体实现在下一章讲解 return CPU_PM_CPU_IDLE_ENTER(arm_cpuidle_suspend, idx); } |
2.3.4 CPU_PM_CPU_IDLE_ENTER
如果是要进入idle state0(即WFI),调用传统cpu_do_idle接口
对于其它的state,首先调用cpu_pm_enter,发出CPU即将进入low power state的通知,成功后调用指定的low_level_idle_enter接口,也就是arm_cpuidle_suspend(arm32)或psci_cpu_suspend_enter(arm64)接口,让cpu进入指定的idle状态,最后,从idle返回时,再次发送退出low power state的通知;
#define __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, \ //回调函数 idx, \ //要进入的C state级别 state, \ //平台自定义的C state对应的数据 is_retention) \ ({ \ int __ret = 0; \ \ if (!idx) { \ //如果指定的等级为0,则执行这个函数 cpu_do_idle(); \ return idx; \ } \ \ if (!is_retention) \ __ret = cpu_pm_enter(); \ if (!__ret) { \ __ret = low_level_idle_enter(state); \ //调用回调函数进入直接级别的idle if (!is_retention) \ cpu_pm_exit(); \ } \ \ __ret ? -1 : idx; \ })
#define CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx) \ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, idx, 0) |
2.4 arm_cpuidle_init - 读取平台在设备树中自定义的C state信息
2.4.1 arm64实现:直接返回-EOPNOTSUPP
arm64实现如下,因为在linux-5.10.61开源内核中,arm64获取到的ops没有对cpu_suspend和cpu_init_idle接口进行了实现(这个后面会分析),所以该函数直接返回-EOPNOTSUPP
文件位置:W:\opensource\linux-5.10.61\arch\arm64\kernel\cpuidle.c
int arm_cpuidle_init(unsigned int cpu) { const struct cpu_operations *ops = get_cpu_ops(cpu); int ret = -EOPNOTSUPP;
//注意: // 在ARM64平台,上面获取到的ops为cpu_psci_ops,这个ops没有 // 定义cpu_suspend和cpu_init_idle,该函数直接返回EOPNOTSUPP if (ops && ops->cpu_suspend && ops->cpu_init_idle) ret = ops->cpu_init_idle(cpu);
return ret; } |
2.4.2 arm32实现:大部分返回-EOPNOTSUPP,仅omap的一个平台有效
arm32实现如下,在获取ops的时候,只有omap2的pm33xx平台会真正的注册这个ops,所以在这里大部分平台都会返回-EOPNOTSUPP,只有omap2的pm33xx平台会继续走下去
文件位置:W:\opensource\linux-5.10.61\arch\arm\kernel\cpuidle.c
/** * arm_cpuidle_init() - Initialize cpuidle_ops for a specific cpu * @cpu: the cpu to be initialized * * Initialize the cpuidle ops with the device for the cpu and then call * the cpu's idle initialization callback. This may fail if the underlying HW * is not operational. * * Returns: * 0 on success, * -ENODEV if it fails to find the cpu node in the device tree, * -EOPNOTSUPP if it does not find a registered and valid cpuidle_ops for * this cpu, * -ENOENT if it fails to find an 'enable-method' property, * -ENXIO if the HW reports a failure or a misconfiguration, * -ENOMEM if the HW report an memory allocation failure */ int __init arm_cpuidle_init(int cpu) { //1.获取这个cpu对应的设备树节点 struct device_node *cpu_node = of_cpu_device_node_get(cpu); int ret;
if (!cpu_node) return -ENODEV;
//2.读出这个cpu对应的cpuidle_ops,并赋值给全局数组cpuidle_ops[] // 如果平台端没有注册了ops的话,这里会返回-EOPNOTSUPP ret = arm_cpuidle_read_ops(cpu_node, cpu);
//3.调用cpuidle_ops对应的init函数,完成平台自己关心的初始化工作 // 例如在omap2平台中,会在这个init中继续从设备树中获取自定义的 // C state信息,实际上omap2的这个平台就是通过这些信息,决定在执 // 行wfi命令之前需要关闭哪些外设,进而区分不同等级的C state if (!ret) ret = cpuidle_ops[cpu].init(cpu_node, cpu);
of_node_put(cpu_node);
return ret; } |
2.4.2.1 amx3_idle_init - omap2平台自定义C state信息
由前面分析可知,在omap2平台上,cpuidle_ops[cpu].init指定为amx3_idle_init,实现如下
W:\opensource\linux-5.10.61\arch\arm\mach-omap2\pm33xx-core.c
static int __init amx3_idle_init(struct device_node *cpu_node, int cpu) { struct device_node *state_node; struct amx3_idle_state states[CPUIDLE_STATE_MAX]; int i; int state_count = 1;
//1.查看设备树中国是否有指定的属性,并初始化wfi_flags // wfi_flags用于指导:在执行wfi之前要执行哪些操作, // 例如刷cache,关闭片上外设等操作 for (i = 0; ; i++) { //2.通过for循环,获取这个cpu支持的所有C state对应的节点 state_node = of_parse_phandle(cpu_node, "cpu-idle-states", i); if (!state_node) break;
if (!of_device_is_available(state_node)) continue;
if (i == CPUIDLE_STATE_MAX) { pr_warn("%s: cpuidle states reached max possible\n", __func__); break; }
states[state_count].wfi_flags = 0;
//3.里从设备树中读取C state自定义的信息 if (of_property_read_bool(state_node, "ti,idle-wkup-m3")) states[state_count].wfi_flags |= WFI_FLAG_WAKE_M3 | WFI_FLAG_FLUSH_CACHE;
state_count++; }
//4.为全局的idle_states[]数组申请空间,该变量中保存了在idle状态下的一些特性 idle_states = kcalloc(state_count, sizeof(*idle_states), GFP_KERNEL); if (!idle_states) return -ENOMEM;
for (i = 1; i < state_count; i++) idle_states[i].wfi_flags = states[i].wfi_flags;
return 0; } |
omap2平台的设备树信息如下
cpus { #address-cells = <1>; #size-cells = <0>; cpu@0 { compatible = "arm,cortex-a8"; enable-method = "ti,am3352"; device_type = "cpu"; reg = <0>;
operating-points-v2 = <&cpu0_opp_table>;
clocks = <&dpll_mpu_ck>; clock-names = "cpu";
clock-latency = <300000>; /* From omap-cpufreq driver */ cpu-idle-states = <&mpu_gate>; //只支持一个C state };
idle-states { mpu_gate: mpu_gate { compatible = "arm,idle-state"; entry-latency-us = <40>; exit-latency-us = <90>; min-residency-us = <300>; ti,idle-wkup-m3; }; }; }; |
2.5 进入指定级别的C state
由上面对arm_enter_idle_state函数的分析可知,在cpuidle_arm中,cpu进入不同等级的C state对应的接口为
-
当idx为0时,调用cpu_do_idle
-
当idx不为0时,调用arm_cpuidle_suspend
其中:idx表示C state的级别
2.5.1 cpu_do_idle - 执行wfi指令进入idle 0
arm32平台实现如下:
函数位置:W:\opensource\linux-5.10.61\arch\arm\include\asm\glue-proc.h
#ifdef CONFIG_CPU_V7M # ifdef CPU_NAME # undef MULTI_CPU # define MULTI_CPU # else # define CPU_NAME cpu_v7m # endif #endif
#define cpu_do_idle __glue(CPU_NAME,_do_idle) |
以cpu_v7m为例,此时cpu_do_idle为cpu_v7m_do_idle
W:\opensource\linux-5.10.61\arch\arm\mm\proc-v7m.S
/* * cpu_v7m_do_idle() * * Idle the processor (eg, wait for interrupt). * * IRQs are already disabled. */ ENTRY(cpu_v7m_do_idle) wfi //wfi指令进入0级C state ret lr //返回到睡眠位置继续执行 ENDPROC(cpu_v7m_do_idle) |
2.5.2 arm_cpuidle_suspend - 进入指定级别的idle
arm_cpuidle_suspend直接调用操作函数集中的cpu_suspend回调函数,arm32和arm64实现还有点不一样,下面分情况介绍
2.5.2.1 ARM64
由前面的分析可知,任何arm64平台都不会通过下面接口进入idle,因为ops中就没有对cpu_suspend接口的定义
文件位置:W:\opensource\linux-5.10.61\arch\arm64\kernel\cpuidle.c
/** * arm_cpuidle_suspend() - function to enter a low-power idle state * @arg: argument to pass to CPU suspend operations * * Return: 0 on success, -EOPNOTSUPP if CPU suspend hook not initialized, CPU * operations back-end error code otherwise. */ int arm_cpuidle_suspend(int index) { int cpu = smp_processor_id(); const struct cpu_operations *ops = get_cpu_ops(cpu);
//进入指定的睡眠等级 return ops->cpu_suspend(index); } |
2.5.2.2 ARM32
由上面的分析可知,只有omap2中的pm33xx平台会走该接口进入idle
疑问:其他arm32怎么进入不同等级的idle呢?难道arm32就是不支持多等级的idle吗???
W:\opensource\linux-5.10.61\arch\arm\kernel\cpuidle.c
/** * arm_cpuidle_suspend() - function to enter low power idle states * @index: an integer used as an identifier for the low level PM callbacks * * This function calls the underlying arch specific low level PM code as * registered at the init time. * * Returns the result of the suspend callback. */ int arm_cpuidle_suspend(int index) { int cpu = smp_processor_id();
return cpuidle_ops[cpu].suspend(index); } |
如下,下面我们来分析一下amx3_idle_enter,看看arm32到底是怎样进入不同的idle等级的
W:\opensource\linux-5.10.61\arch\arm\mach-omap2\pm33xx-core.c
static struct cpuidle_ops amx3_cpuidle_ops __initdata = { .init = amx3_idle_init, .suspend = amx3_idle_enter, };
CPUIDLE_METHOD_OF_DECLARE(pm33xx_idle, "ti,am3352", &amx3_cpuidle_ops); CPUIDLE_METHOD_OF_DECLARE(pm43xx_idle, "ti,am4372", &amx3_cpuidle_ops); |
2.5.2.2.1 amx3_idle_enter - 进入指定级别的C state
下面属于扩展内容,我们来看一下在这个平台上,所谓的"不同等级的C state"究竟是个啥玩意
static int amx3_idle_enter( unsigned long index) //要进入的C state { //1.根据index,找到指定级别的C state对应的数据结构 struct amx3_idle_state *idle_state = &idle_states[index];
if (!idle_state) return -EINVAL;
//2.idle_fn完成进入idle函数,这个函数在哪赋值呢 if (idle_fn) idle_fn(idle_state->wfi_flags);
return 0; } |
上面的idle_fn在哪被初始化???
2.5.2.2.2 am33xx_pm_probe
如下,实际是在pm33xx驱动初始化时被设置
文件位置:U:\linux-5.10.61\drivers\soc\ti\pm33xx.c
static int am33xx_pm_probe(struct platform_device *pdev) { struct device *dev = &pdev->dev; int ret;
if (!of_machine_is_compatible("ti,am33xx") && !of_machine_is_compatible("ti,am43")) return -ENODEV;
pm_ops = dev->platform_data; ...
//1.获取可执行代码放在哪里 pm_sram = pm_ops->get_sram_addrs(); ...
//2.拷贝回调函数对应的二进制代码段 ret = am33xx_push_sram_idle(); ...
//3.初始化 ret = pm_ops->init(am33xx_do_sram_idle); if (ret) { dev_err(dev, "Unable to call core pm init!\n"); ret = -ENODEV; goto err_put_wkup_m3_ipc; }
return 0; } |
2.5.2.2.3 am33xx_ops - 全局的pm_ops指向该结构
上面的pm_ops指向下面的ops,在am33xx_suspend_init中对上面的idle_fn进行了设置,最终idle_fn会被设置为am33xx_do_sram_idle
static struct am33xx_pm_platform_data am33xx_ops = { .init = am33xx_suspend_init, .deinit = amx3_suspend_deinit, .soc_suspend = am33xx_suspend, .cpu_suspend = am33xx_cpu_suspend, .begin_suspend = amx3_begin_suspend, .finish_suspend = amx3_finish_suspend, .get_sram_addrs = amx3_get_sram_addrs, .save_context = am33xx_save_context, .restore_context = am33xx_restore_context, .check_off_mode_enable = am33xx_check_off_mode_enable, }; |
2.5.2.2.4 am33xx_do_sram_idle - 全局的idle_fn,进入不同等级的C state
static int am33xx_do_sram_idle(u32 wfi_flags) { int ret = 0;
if (!m3_ipc || !pm_ops) return 0;
if (wfi_flags & WFI_FLAG_WAKE_M3) ret = m3_ipc->ops->prepare_low_power(m3_ipc, WKUP_M3_IDLE);
return pm_ops->cpu_suspend(am33xx_do_wfi_sram, wfi_flags); } |
2.5.2.2.5 am33xx_cpu_suspend
static int am33xx_cpu_suspend( int (*fn)(unsigned long), //进入指定C state时的回调函数 unsigned long args) //这里传入的是wfi_flags { int ret = 0;
if (omap_irq_pending() || need_resched()) return ret;
//1.cpu_suspend函数中实际是调用了传递的fn回调函数,完成进入指定等级的C state // 注意:这里传入的参数args为wfi_flags,该参数用于指示在执行wfi指令之前需要 // 关闭哪些外设,用这个方法区分不同等级的C state ret = cpu_suspend(args, fn);
return ret; } |
2.5.2.2.6 cpu_suspend - 调用指定的回调函数,进入指定的C state
arm32实现如下:
U:\linux-5.10.61\arch\arm\kernel\suspend.c
#ifdef CONFIG_MMU int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) { struct mm_struct *mm = current->active_mm; u32 __mpidr = cpu_logical_map(smp_processor_id()); int ret;
if (!idmap_pgd) return -EINVAL;
/* * Function graph tracer state gets incosistent when the kernel * calls functions that never return (aka suspend finishers) hence * disable graph tracing during their execution. */ pause_graph_tracing();
/* * Provide a temporary page table with an identity mapping for * the MMU-enable code, required for resuming. On successful * resume (indicated by a zero return code), we need to switch * back to the correct page tables. */ //调用__cpu_suspend函数进入idle睡眠 ret = __cpu_suspend(arg, fn, __mpidr);
unpause_graph_tracing();
if (ret == 0) { cpu_switch_mm(mm->pgd, mm); local_flush_bp_all(); local_flush_tlb_all(); check_other_bugs(); }
return ret; } #else int cpu_suspend(unsigned long arg, int (*fn)(unsigned long)) { u32 __mpidr = cpu_logical_map(smp_processor_id()); int ret;
pause_graph_tracing(); //调用__cpu_suspend函数进入idle睡眠 ret = __cpu_suspend(arg, fn, __mpidr); unpause_graph_tracing();
return ret; } #define idmap_pgd NULL #endif |
2.5.2.2.7 __cpu_suspend
函数位置:U:\linux-5.10.61\arch\arm\kernel\sleep.S
/* * Save CPU state for a suspend. This saves the CPU general purpose * registers, and allocates space on the kernel stack to save the CPU * specific registers and some other data for resume. * r0 = suspend function arg0 * r1 = suspend function * r2 = MPIDR value the resuming CPU will use */ ENTRY(__cpu_suspend) stmfd sp!, {r4 - r11, lr} #ifdef MULTI_CPU ldr r10, =processor ldr r4, [r10, #CPU_SLEEP_SIZE] @ size of CPU sleep state #else ldr r4, =cpu_suspend_size #endif mov r5, sp @ current virtual SP add r4, r4, #12 @ Space for pgd, virt sp, phys resume fn sub sp, sp, r4 @ allocate CPU state on stack ldr r3, =sleep_save_sp stmfd sp!, {r0, r1} @ save suspend func arg and pointer ldr r3, [r3, #SLEEP_SAVE_SP_VIRT] ALT_SMP(ldr r0, =mpidr_hash) ALT_UP_B(1f) /* This ldmia relies on the memory layout of the mpidr_hash struct */ ldmia r0, {r1, r6-r8} @ r1 = mpidr mask (r6,r7,r8) = l[0,1,2] shifts compute_mpidr_hash r0, r6, r7, r8, r2, r1 add r3, r3, r0, lsl #2 1: mov r2, r5 @ virtual SP mov r1, r4 @ size of save block add r0, sp, #8 @ pointer to save block bl __cpu_suspend_save badr lr, cpu_suspend_abort ldmfd sp!, {r0, pc} @ call suspend fn ENDPROC(__cpu_suspend) |
2.5.2.2.8 am33xx_do_wfi_sram - 进入指定级别的C state
am33xx_do_wfi_sram完成真正的进入指定级别的C state,但是这是一个全局函数指针
U:\linux-5.10.61\drivers\soc\ti\pm33xx.c
static int (*am33xx_do_wfi_sram)(unsigned long unused); |
在am33xx_push_sram_idle中完成对全局函数指针am33xx_do_wfi_sram的赋值,调用关系如下:
am33xx_pm_probe -> am33xx_push_sram_idle
static int am33xx_push_sram_idle(void) { ... am33xx_do_wfi_sram = sram_exec_copy(sram_pool, (void *)ocmcram_location, pm_sram->do_wfi, *pm_sram->do_wfi_sz); if (!am33xx_do_wfi_sram) { dev_err(pm33xx_dev, "PM: %s: am33xx_do_wfi copy to sram failed\n", __func__); return -ENODEV; } ... } |
sram_exec_copy函数声明如下,我们知道是从pm_sram->do_wfi向ocmcram_location拷贝可执行代码
void *sram_exec_copy(struct gen_pool *pool, void *dst, void *src, size_t size); |
2.5.2.2.9 amx3_get_sram_addrs - 计算pm_sram,即可执行代码存放位置
由上面分析的probe函数可知,pm_sram是通过get_sram_addrs获得的
pm_sram = pm_ops->get_sram_addrs(); |
get_sram_addrs回调函数为amx3_get_sram_addrs,实现如下:
一看就是平台相关,挑一个分析,此处我们选择am33xx_pm_sram继续分析
static struct am33xx_pm_sram_addr *amx3_get_sram_addrs(void) { if (soc_is_am33xx()) return &am33xx_pm_sram; else if (soc_is_am437x()) return &am43xx_pm_sram; else return NULL; } |
2.5.2.2.10 am33xx_pm_sram - 在汇编中描述一个am33xx_pm_sram_addr类型结构
am33xx_pm_sram定义如下
U:\linux-5.10.61\arch\arm\mach-omap2\pm.h
extern struct am33xx_pm_sram_addr am33xx_pm_sram; |
实际上am33xx_pm_sram对应下面汇编,下面实际就是定义一个数据结构,每个是一个函数指针
U:\linux-5.10.61\arch\arm\mach-omap2\sleep33xx.S
ENTRY(am33xx_pm_sram) .word am33xx_do_wfi .word am33xx_do_wfi_sz .word am33xx_resume_offset .word am33xx_emif_sram_table .word am33xx_pm_ro_sram_data
resume_addr: .word cpu_resume - PAGE_OFFSET + 0x80000000 |
上面数据结构的类型就是am33xx_pm_sram_addr,所以上面在调用pm_sram->do_wfi进入指定级别的C state,实际就是调用am33xx_do_wfi
struct am33xx_pm_sram_addr { void (*do_wfi)(void); unsigned long *do_wfi_sz; unsigned long *resume_offset; unsigned long *emif_sram_table; unsigned long *ro_sram_data; unsigned long resume_address; }; |
2.5.2.2.11 am33xx_do_wfi - 进入不同等级的C state,实际就是在wfi之前关闭片上外设、刷cache等操作
由上面分析可知,am33xx_do_wfi真正的实现了进入不同等级的C state,在omap2平台上,进入不同等级的C state实际就是根据传进来的wfi_flags参数,决定在执行wfi指令之前,要执行哪些操作,比如要关闭哪些片上外设、是否需要flash cache等操作
注意:这只是arm32的omap2这个平台对"不同等级的C state"的定义,不同的平台由不同的定义
可用的wfi_flasg参数如下
/* * WFI Flags for sleep code control * * These flags allow PM code to exclude certain operations from happening * in the low level ASM code found in sleep33xx.S and sleep43xx.S * * WFI_FLAG_FLUSH_CACHE: Flush the ARM caches and disable caching. Only * needed when MPU will lose context. * WFI_FLAG_SELF_REFRESH: Let EMIF place DDR memory into self-refresh and * disable EMIF. * WFI_FLAG_SAVE_EMIF: Save context of all EMIF registers and restore in * resume path. Only needed if PER domain loses context * and must also have WFI_FLAG_SELF_REFRESH set. * WFI_FLAG_WAKE_M3: Disable MPU clock or clockdomain to cause wkup_m3 to * execute when WFI instruction executes. * WFI_FLAG_RTC_ONLY: Configure the RTC to enter RTC+DDR mode. */ #define WFI_FLAG_FLUSH_CACHE BIT(0) #define WFI_FLAG_SELF_REFRESH BIT(1) #define WFI_FLAG_SAVE_EMIF BIT(2) #define WFI_FLAG_WAKE_M3 BIT(3) #define WFI_FLAG_RTC_ONLY BIT(4) |
如下:
U:\linux-5.10.61\arch\arm\mach-omap2\sleep33xx.S
ENTRY(am33xx_do_wfi) stmfd sp!, {r4 - r11, lr} @ save registers on stack
/* Save wfi_flags arg to data space */ //1.首先将wfi_flags参数保存在r4中 mov r4, r0
//2.下面的操作只是为了保存wfi_flags,因为后面调用的一些操作可能或破坏r4 //2.1 am33xx_pm_ro_sram_data是在汇编中预留的一段空间 adr r3, am33xx_pm_ro_sram_data
//2.2 从am33xx_pm_ro_sram_data中读出amx3_pm_sram_data_virt // amx3_pm_sram_data_virt指向的是一个am33xx_pm_sram_data类型的结构 ldr r2, [r3, #AMX3_PM_RO_SRAM_DATA_VIRT_OFFSET]
//2.3 将上面传进来的wfi_flags存入am33xx_pm_sram_data->wfi_flags中去 str r4, [r2, #AMX3_PM_WFI_FLAGS_OFFSET]
/* Only flush cache is we know we are losing MPU context */ //3.下面根据wfi_flags,执行不同的操作 tst r4, #WFI_FLAG_FLUSH_CACHE beq cache_skip_flush
/* * Flush all data from the L1 and L2 data cache before disabling * SCTLR.C bit. */ ldr r1, kernel_flush blx r1
/* * Clear the SCTLR.C bit to prevent further data cache * allocation. Clearing SCTLR.C would make all the data accesses * strongly ordered and would not hit the cache. */ mrc p15, 0, r0, c1, c0, 0 bic r0, r0, #(1 << 2) @ Disable the C bit mcr p15, 0, r0, c1, c0, 0 isb
/* * Invalidate L1 and L2 data cache. */ ldr r1, kernel_flush blx r1
//4.下面的操作时恢复上面保存的wfi_flags参数 adr r3, am33xx_pm_ro_sram_data ldr r2, [r3, #AMX3_PM_RO_SRAM_DATA_VIRT_OFFSET] ldr r4, [r2, #AMX3_PM_WFI_FLAGS_OFFSET]
cache_skip_flush: /* Check if we want self refresh */ //5.继续根据wfi_flags,执行相应的操作 tst r4, #WFI_FLAG_SELF_REFRESH beq emif_skip_enter_sr
adr r9, am33xx_emif_sram_table
ldr r3, [r9, #EMIF_PM_ENTER_SR_OFFSET] blx r3
emif_skip_enter_sr: /* Only necessary if PER is losing context */ //6.继续根据wfi_flags,执行相应的操作 tst r4, #WFI_FLAG_SAVE_EMIF beq emif_skip_save
ldr r3, [r9, #EMIF_PM_SAVE_CONTEXT_OFFSET] blx r3
emif_skip_save: /* Only can disable EMIF if we have entered self refresh */ //7.继续根据wfi_flags,执行相应的操作 tst r4, #WFI_FLAG_SELF_REFRESH beq emif_skip_disable
/* Disable EMIF */ ldr r1, virt_emif_clkctrl ldr r2, [r1] bic r2, r2, #AM33XX_CM_CLKCTRL_MODULEMODE_DISABLE str r2, [r1]
ldr r1, virt_emif_clkctrl wait_emif_disable: ldr r2, [r1] mov r3, #AM33XX_CM_CLKCTRL_MODULESTATE_DISABLED cmp r2, r3 bne wait_emif_disable
emif_skip_disable: //8.继续根据wfi_flags,执行相应的操作 tst r4, #WFI_FLAG_WAKE_M3 beq wkup_m3_skip
/* * For the MPU WFI to be registered as an interrupt * to WKUP_M3, MPU_CLKCTRL.MODULEMODE needs to be set * to DISABLED */ ldr r1, virt_mpu_clkctrl ldr r2, [r1] bic r2, r2, #AM33XX_CM_CLKCTRL_MODULEMODE_DISABLE str r2, [r1]
wkup_m3_skip: /* * Execute an ISB instruction to ensure that all of the * CP15 register changes have been committed. */ isb
/* * Execute a barrier instruction to ensure that all cache, * TLB and branch predictor maintenance operations issued * have completed. */ dsb dmb
/* * Execute a WFI instruction and wait until the * STANDBYWFI output is asserted to indicate that the * CPU is in idle and low power state. CPU can specualatively * prefetch the instructions so add NOPs after WFI. Thirteen * NOPs as per Cortex-A8 pipeline. */ //9.执行wfi指令,进入idle wfi
nop nop nop nop nop nop nop nop nop nop nop nop nop
/* We come here in case of an abort due to a late interrupt */
/* Set MPU_CLKCTRL.MODULEMODE back to ENABLE */ ldr r1, virt_mpu_clkctrl mov r2, #AM33XX_CM_CLKCTRL_MODULEMODE_ENABLE str r2, [r1]
/* Re-enable EMIF */ ldr r1, virt_emif_clkctrl mov r2, #AM33XX_CM_CLKCTRL_MODULEMODE_ENABLE str r2, [r1] wait_emif_enable: ldr r3, [r1] cmp r2, r3 bne wait_emif_enable
/* Only necessary if PER is losing context */ //10.下面根据wfi_flags,恢复相应的操作 tst r4, #WFI_FLAG_SELF_REFRESH beq emif_skip_exit_sr_abt
adr r9, am33xx_emif_sram_table ldr r1, [r9, #EMIF_PM_ABORT_SR_OFFSET] blx r1
emif_skip_exit_sr_abt: //11.继续根据wfi_flags,恢复相应的操作 tst r4, #WFI_FLAG_FLUSH_CACHE beq cache_skip_restore
/* * Set SCTLR.C bit to allow data cache allocation */ mrc p15, 0, r0, c1, c0, 0 orr r0, r0, #(1 << 2) @ Enable the C bit mcr p15, 0, r0, c1, c0, 0 isb
cache_skip_restore: /* Let the suspend code know about the abort */ //12.执行成功返回1,也就是从idle中退出后返回1 mov r0, #1 ldmfd sp!, {r4 - r11, pc} @ restore regs and return ENDPROC(am33xx_do_wfi) |
2.5.2.2.12 am33xx_pm_ro_sram_data全局变量在哪定义
在汇编中预留了一段struct am33xx_pm_ro_sram_data空间
U:\linux-5.10.61\arch\arm\mach-omap2\sleep33xx.S
.align 3 ENTRY(am33xx_pm_ro_sram_data) .space AMX3_PM_RO_SRAM_DATA_SIZE |
其中,AMX3_PM_RO_SRAM_DATA_SIZE定义如下,定义方法在下面讲解
U:\linux-5.10.61\arch\arm\mach-omap2\pm-asm-offsets.c
int main(void) { ti_emif_asm_offsets();
DEFINE(AMX3_PM_WFI_FLAGS_OFFSET, offsetof(struct am33xx_pm_sram_data, wfi_flags)); DEFINE(AMX3_PM_L2_AUX_CTRL_VAL_OFFSET, offsetof(struct am33xx_pm_sram_data, l2_aux_ctrl_val)); DEFINE(AMX3_PM_L2_PREFETCH_CTRL_VAL_OFFSET, offsetof(struct am33xx_pm_sram_data, l2_prefetch_ctrl_val)); DEFINE(AMX3_PM_SRAM_DATA_SIZE, sizeof(struct am33xx_pm_sram_data));
BLANK();
DEFINE(AMX3_PM_RO_SRAM_DATA_VIRT_OFFSET, offsetof(struct am33xx_pm_ro_sram_data, amx3_pm_sram_data_virt)); DEFINE(AMX3_PM_RO_SRAM_DATA_PHYS_OFFSET, offsetof(struct am33xx_pm_ro_sram_data, amx3_pm_sram_data_phys)); DEFINE(AMX3_PM_RTC_BASE_VIRT_OFFSET, offsetof(struct am33xx_pm_ro_sram_data, rtc_base_virt)); DEFINE(AMX3_PM_RO_SRAM_DATA_SIZE, sizeof(struct am33xx_pm_ro_sram_data));
return 0; } |
其中am33xx_pm_ro_sram_data和am33xx_pm_sram_data数据结构定义如下:
struct am33xx_pm_ro_sram_data { u32 amx3_pm_sram_data_virt; u32 amx3_pm_sram_data_phys; void __iomem *rtc_base_virt; } __packed __aligned(8);
struct am33xx_pm_sram_data { u32 wfi_flags; u32 l2_aux_ctrl_val; u32 l2_prefetch_ctrl_val; } __packed __aligned(8); |
扩展:
注意上面宏定义的方法,在c中定义的宏,在汇编中使用,实际上是利用了预编译过程中会生成.s文件。对应的makefile如下。上面的main为"虚拟函数",这个函数永远不会被调用,但是汇编和编译器却是能看到的
U:\linux-5.10.61\arch\arm\mach-omap2\Makefile
$(obj)/pm-asm-offsets.h: $(obj)/pm-asm-offsets.s FORCE $(call filechk,offsets,__TI_PM_ASM_OFFSETS_H__)
$(obj)/sleep33xx.o $(obj)/sleep43xx.o: $(obj)/pm-asm-offsets.h
targets += pm-asm-offsets.s clean-files += pm-asm-offsets.h |
三、示例二:cpuidle-big_little - ARM32平台
使能了CONFIG_ARM_BIG_LITTLE_CPUIDLE该宏后启用该驱动
文件位置:U:\linux-5.10.61\drivers\cpuidle\cpuidle-big_little.c
3.1 大小核对应的cpuidle_driver
在cpuidle-big_little中,不管是大核还是小核,都只有两个C state,我们暂且称之为idle0和idle1
3.1.1 bl_idle_little_driver - 小核
/* * NB: Owing to current menu governor behaviour big and LITTLE * index 1 states have to define exit_latency and target_residency for * cluster state since, when all CPUs in a cluster hit it, the cluster * can be shutdown. This means that when a single CPU enters this state * the exit_latency and target_residency values are somewhat overkill. * There is no notion of cluster states in the menu governor, so CPUs * have to define CPU states where possibly the cluster will be shutdown * depending on the state of other CPUs. idle states entry and exit happen * at random times; however the cluster state provides target_residency * values as if all CPUs in a cluster enter the state at once; this is * somewhat optimistic and behaviour should be fixed either in the governor * or in the MCPM back-ends. * To make this driver 100% generic the number of states and the exit_latency * target_residency values must be obtained from device tree bindings. * * exit_latency: refers to the TC2 vexpress test chip and depends on the * current cluster operating point. It is the time it takes to get the CPU * up and running when the CPU is powered up on cluster wake-up from shutdown. * Current values for big and LITTLE clusters are provided for clusters * running at default operating points. * * target_residency: it is the minimum amount of time the cluster has * to be down to break even in terms of power consumption. cluster * shutdown has inherent dynamic power costs (L2 writebacks to DRAM * being the main factor) that depend on the current operating points. * The current values for both clusters are provided for a CPU whose half * of L2 lines are dirty and require cleaning to DRAM, and takes into * account leakage static power values related to the vexpress TC2 testchip. */ static struct cpuidle_driver bl_idle_little_driver = { .name = "little_idle", .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, //进入idle0的方法 .states[1] = { .enter = bl_enter_powerdown, //进入idle1的方法 .exit_latency = 700, .target_residency = 2500, .flags = CPUIDLE_FLAG_TIMER_STOP, .name = "C1", .desc = "ARM little-cluster power down", }, .state_count = 2, }; |
3.1.1.1 ARM_CPUIDLE_WFI_STATE - 执行wfi指令进入0级idle
/* Common ARM WFI state */ #define ARM_CPUIDLE_WFI_STATE_PWR(p) {\ .enter = arm_cpuidle_simple_enter,\ //进入该idle0的方法 .exit_latency = 1,\ .target_residency = 1,\ .power_usage = p,\ .name = "WFI",\ .desc = "ARM WFI",\ }
/* * in case power_specified == 1, give a default WFI power value needed * by some governors */ #define ARM_CPUIDLE_WFI_STATE ARM_CPUIDLE_WFI_STATE_PWR(UINT_MAX) |
3.1.2 bl_idle_big_driver - 大核
static struct cpuidle_driver bl_idle_big_driver = { .name = "big_idle", .owner = THIS_MODULE, .states[0] = ARM_CPUIDLE_WFI_STATE, //进入idle0的方法 .states[1] = { .enter = bl_enter_powerdown, //进入idle1的方法 .exit_latency = 500, .target_residency = 2000, .flags = CPUIDLE_FLAG_TIMER_STOP, .name = "C1", .desc = "ARM big-cluster power down", }, .state_count = 2, }; |
3.2 bl_idle_init - 驱动初始化
static int __init bl_idle_init(void) { int ret;
//1.找到设备树的根节点 struct device_node *root = of_find_node_by_path("/"); const struct of_device_id *match_id;
if (!root) return -ENODEV;
/* * Initialize the driver just for a compliant set of machines */ //2.只有match匹配上才允许使用bl模块 match_id = of_match_node(compatible_machine_match, root);
of_node_put(root);
if (!match_id) return -ENODEV;
if (!mcpm_is_available()) return -EUNATCH;
/* * For now the differentiation between little and big cores * is based on the part number. A7 cores are considered little * cores, A15 are considered big cores. This distinction may * evolve in the future with a more generic matching approach. */ //3.初始化driver,设置driver->cpumask // 小核A7,大核A15 ret = bl_idle_driver_init(&bl_idle_little_driver, ARM_CPU_PART_CORTEX_A7); if (ret) return ret;
ret = bl_idle_driver_init(&bl_idle_big_driver, ARM_CPU_PART_CORTEX_A15); if (ret) goto out_uninit_little;
/* Start at index 1, index 0 standard WFI */ //4.解析设备树中的C state信息 // 这里传入的参数为1,由上面注释可知,从设备树解析出来的 // C state信息是给idx>=1使用的,idx=0是wfi预留的 ret = dt_init_idle_driver(&bl_idle_big_driver, bl_idle_state_match, 1); if (ret < 0) goto out_uninit_big;
/* Start at index 1, index 0 standard WFI */ ret = dt_init_idle_driver(&bl_idle_little_driver, bl_idle_state_match, 1); if (ret < 0) goto out_uninit_big;
//5.注册driver ret = cpuidle_register(&bl_idle_little_driver, NULL); if (ret) goto out_uninit_big;
ret = cpuidle_register(&bl_idle_big_driver, NULL); if (ret) goto out_unregister_little;
return 0;
out_unregister_little: cpuidle_unregister(&bl_idle_little_driver); out_uninit_big: kfree(bl_idle_big_driver.cpumask); out_uninit_little: kfree(bl_idle_little_driver.cpumask);
return ret; } device_initcall(bl_idle_init); |
3.2.1 compatible_machine_match - match表
static const struct of_device_id compatible_machine_match[] = { { .compatible = "arm,vexpress,v2p-ca15_a7" }, { .compatible = "samsung,exynos5420" }, { .compatible = "samsung,exynos5800" }, {}, }; |
3.2.2 bl_idle_state_match
static const struct of_device_id bl_idle_state_match[] __initconst = { { .compatible = "arm,idle-state", .data = bl_enter_powerdown }, { }, }; |
3.2.3 bl_idle_driver_init
static int __init bl_idle_driver_init(struct cpuidle_driver *drv, int part_id) { struct cpumask *cpumask; int cpu;
cpumask = kzalloc(cpumask_size(), GFP_KERNEL); if (!cpumask) return -ENOMEM;
for_each_possible_cpu(cpu) if (smp_cpuid_part(cpu) == part_id) cpumask_set_cpu(cpu, cpumask);
drv->cpumask = cpumask;
return 0; } |
3.3 进入指定级别的C state
3.3.1 arm_cpuidle_simple_enter - 执行wfi进入idle0
/** * arm_cpuidle_simple_enter() - a wrapper to cpu_do_idle() * @dev: not used * @drv: not used * @index: not used * * A trivial wrapper to allow the cpu_do_idle function to be assigned as a * cpuidle callback by matching the function signature. * * Returns the index passed as parameter */ int arm_cpuidle_simple_enter( struct cpuidle_device *dev, struct cpuidle_driver *drv, int index) //wfi不关注等级,该参数被忽略 { cpu_do_idle();
return index; } |
3.3.1.1 cpu_do_idle - 不同架构有自己的idle方法
W:\opensource\linux-5.10.61\arch\arm\include\asm\glue-proc.h
#define cpu_do_idle __glue(CPU_NAME,_do_idle) |
我们以cpu_v7m为例,则CPU_NAME定义如下:
#ifdef CONFIG_CPU_V7M # ifdef CPU_NAME # undef MULTI_CPU # define MULTI_CPU # else # define CPU_NAME cpu_v7m # endif #endif |
此时cpu_do_idle为cpu_v7m_do_idle
W:\opensource\linux-5.10.61\arch\arm\mm\proc-v7m.S
/* * cpu_v7m_do_idle() * * Idle the processor (eg, wait for interrupt). * * IRQs are already disabled. */ ENTRY(cpu_v7m_do_idle) wfi //直接执行wfi进入睡眠 ret lr //醒来后,返回原来的位置继续执行 ENDPROC(cpu_v7m_do_idle) |
更多接口参见
3.3.2 bl_enter_powerdown -关闭cpu和cluster的电源,进入idle1
由上面driver的定义可知,在cpuidle-big_little中,不管是大核还是小核,都只有两个C state,我们暂且称之为idle0和idle1,该函数是进入idle1,因为固定为idle1,所以这里传入的idx并没有使用
/** * bl_enter_powerdown - Programs CPU to enter the specified state * @dev: cpuidle device * @drv: The target state to be programmed * @idx: state index * * Called from the CPUidle framework to program the device to the * specified target state selected by the governor. */ static int bl_enter_powerdown( struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) //要进入的idle级别,但是在这个函数中并没有用 { cpu_pm_enter();
//调用bl_powerdown_finisher函数进入指定的idle等级 //注意:下面传入的0在bl_powerdown_finisher也是没有使用的,因为固定进入idle1 cpu_suspend(0, bl_powerdown_finisher);
/* signals the MCPM core that CPU is out of low power state */ mcpm_cpu_powered_up();
cpu_pm_exit();
return idx; } |
3.3.2.1 bl_powerdown_finisher -关闭cpu和cluster的电源,进入idle1
/* * notrace prevents trace shims from getting inserted where they * should not. Global jumps and ldrex/strex must not be inserted * in power down sequences where caches and MMU may be turned off. */ static int notrace bl_powerdown_finisher( unsigned long arg) //传入的参数并没有使用 { /* MCPM works with HW CPU identifiers */ unsigned int mpidr = read_cpuid_mpidr(); unsigned int cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); unsigned int cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0);
//1.设置cpu从idle退出后,从哪开始执行 mcpm_set_entry_vector(cpu, cluster, cpu_resume);
//2.进入idle1 mcpm_cpu_suspend();
/* return value != 0 means failure */ return 1; } |
3.3.2.2 mcpm_cpu_suspend
mcpm是Multi-Cluster PM的缩写
void mcpm_cpu_suspend(void) { if (WARN_ON_ONCE(!platform_ops)) return;
/* Some platforms might have to enable special resume modes, etc. */ //1.一些平台可能需要执行一些准备工作 if (platform_ops->cpu_suspend_prepare) { unsigned int mpidr = read_cpuid_mpidr(); unsigned int cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); unsigned int cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); arch_spin_lock(&mcpm_lock); platform_ops->cpu_suspend_prepare(cpu, cluster); arch_spin_unlock(&mcpm_lock); }
//2.进入idle mcpm_cpu_power_down(); } |
3.3.2.2.1 tc2_pm_cpu_suspend_prepare - 设置cpu唤醒时从哪取指执行
在vexpress平台中cpu_suspend_prepare回调函数实现如下。该函数将"cpu从idle退出时要执行的地址"写入一个寄存器,cpu被唤醒后,会从这个寄存器里面读取地址,并在指定的地址取指执行
U:\linux-5.10.61\arch\arm\mach-vexpress\tc2_pm.c
static void tc2_pm_cpu_suspend_prepare(unsigned int cpu, unsigned int cluster) { ve_spc_set_resume_addr(cluster, cpu, __pa_symbol(mcpm_entry_point)); } |
3.3.2.2.2 ve_spc_set_resume_addr - 设置cpu唤醒时从哪取指执行
/** * ve_spc_set_resume_addr() - set the jump address used for warm boot * * @cluster: mpidr[15:8] bitfield describing cluster affinity level * @cpu: mpidr[7:0] bitfield describing cpu affinity level * @addr: physical resume address */ void ve_spc_set_resume_addr(u32 cluster, u32 cpu, u32 addr) { void __iomem *baseaddr;
if (cluster >= MAX_CLUSTERS) return;
//不同的cpu架构,读取的寄存器不一样 if (cluster_is_a15(cluster)) baseaddr = info->baseaddr + A15_BX_ADDR0 + (cpu << 2); else baseaddr = info->baseaddr + A7_BX_ADDR0 + (cpu << 2);
//将要唤醒的地址存取寄存器 writel_relaxed(addr, baseaddr); } |
3.3.2.3 mcpm_cpu_power_down - 关闭cpu和cluster的电源,进入idle1
void mcpm_cpu_power_down(void) { unsigned int mpidr, cpu, cluster; bool cpu_going_down, last_man; phys_reset_t phys_reset;
//1.读取当前cpu的mpidr寄存器,并从mpidr中提取出cls和cpu的值 mpidr = read_cpuid_mpidr(); cpu = MPIDR_AFFINITY_LEVEL(mpidr, 0); cluster = MPIDR_AFFINITY_LEVEL(mpidr, 1); pr_debug("%s: cpu %u cluster %u\n", __func__, cpu, cluster);
if (WARN_ON_ONCE(!platform_ops)) return; BUG_ON(!irqs_disabled());
setup_mm_for_reboot();
__mcpm_cpu_going_down(cpu, cluster); arch_spin_lock(&mcpm_lock); BUG_ON(__mcpm_cluster_state(cluster) != CLUSTER_UP);
//2.mcpm_cpu_use_count用于记录这个cpu的状态 // a) 0: 表示这个cpu一定down,也就是处于掉电状态 // b) 1: 表示这个cpu处于up状态,也就是上电状态 mcpm_cpu_use_count[cluster][cpu]--; BUG_ON(mcpm_cpu_use_count[cluster][cpu] != 0 && mcpm_cpu_use_count[cluster][cpu] != 1); cpu_going_down = !mcpm_cpu_use_count[cluster][cpu];
//3.判断这个cluster中是不是只有一个cpu了 last_man = mcpm_cluster_unused(cluster);
//4.给cluster或者cpu下电 if (last_man && __mcpm_outbound_enter_critical(cpu, cluster)) { //4.1 如果当前这个cpu是这个cluster中最后一个cpu,这个cpu下电后不会 // 有其他cpu运行了,则需要同时把cpu和cluster的电源都关闭掉 platform_ops->cpu_powerdown_prepare(cpu, cluster); platform_ops->cluster_powerdown_prepare(cluster); arch_spin_unlock(&mcpm_lock); platform_ops->cluster_cache_disable(); __mcpm_outbound_leave_critical(cluster, CLUSTER_DOWN); } else { //4.2 走入该分支,表示这个cpu下电后,cluster中还有其他 // cpu在运行,此时cluster不能下电 if (cpu_going_down) platform_ops->cpu_powerdown_prepare(cpu, cluster); arch_spin_unlock(&mcpm_lock); /* * If cpu_going_down is false here, that means a power_up * request raced ahead of us. Even if we do not want to * shut this CPU down, the caller still expects execution * to return through the system resume entry path, like * when the WFI is aborted due to a new IRQ or the like.. * So let's continue with cache cleaning in all cases. */ platform_ops->cpu_cache_disable(); }
__mcpm_cpu_down(cpu, cluster);
/* Now we are prepared for power-down, do it: */ //5.下面执行wfi指令进入睡眠状态 if (cpu_going_down) wfi();
//6.走到这里表示已经从idle中退出来了
/* * It is possible for a power_up request to happen concurrently * with a power_down request for the same CPU. In this case the * CPU might not be able to actually enter a powered down state * with the WFI instruction if the power_up request has removed * the required reset condition. We must perform a re-entry in * the kernel as if the power_up method just had deasserted reset * on the CPU. */ //7.调用cpu_reset汇编代码,完成这个cpu被唤醒后的准备工作 phys_reset = (phys_reset_t)(unsigned long)__pa_symbol(cpu_reset); phys_reset(__pa_symbol(mcpm_entry_point), false);
/* should never get here */ BUG(); } |
3.3.2.4 cpu_reset - cpu被唤醒后要执行的工作
W:\opensource\linux-5.10.61\arch\arm\include\asm\glue-proc.h
#define cpu_reset __glue(CPU_NAME,_reset) |
我们依然以cpu_v7m为例
文件位置:W:\opensource\linux-5.10.61\arch\arm\mm\proc-v7m.S
/* * cpu_v7m_reset(loc) * * Perform a soft reset of the system. Put the CPU into the * same state as it would be if it had been reset, and branch * to what would be the reset vector. * * - loc - location to jump to for soft reset */ .align 5 ENTRY(cpu_v7m_reset) ret r0 //直接跳转到r0所指向的地址执行 ENDPROC(cpu_v7m_reset) |
3.3.2.5 mcpm_entry_point - cpu退出idle时从这开始执行
W:\opensource\linux-5.10.61\arch\arm\common\mcpm_head.S
ENTRY(mcpm_entry_point)
ARM_BE8(setend be) THUMB( badr r12, 1f ) THUMB( bx r12 ) THUMB( .thumb ) 1: //第一步:判断要唤醒的cpu的id是否合法 mrc p15, 0, r0, c0, c0, 5 @ MPIDR ubfx r9, r0, #0, #8 @ r9 = cpu ubfx r10, r0, #8, #8 @ r10 = cluster mov r3, #MAX_CPUS_PER_CLUSTER
//mla指令完成的工作:r4 <- r3 * r10 +r9 mla r4, r3, r10, r9 @ r4 = canonical CPU index cmp r4, #(MAX_CPUS_PER_CLUSTER * MAX_NR_CLUSTERS) blo 2f
/* We didn't expect this CPU. Try to cheaply make it quiet. */ //这里是永远不会进来的 1: wfi wfe b 1b
//走到这里表示这个cpu马上就要启动了 2: pr_dbg "kernel mcpm_entry_point\n"
/* * MMU is off so we need to get to various variables in a * position independent way. */ //由上面的注释可知,因为此时mmu是关闭的,我们在这里需要用一种方法获取变量的地址 //下面3f的位置实际就是内存池 adr r5, 3f ldmia r5, {r0, r6, r7, r8, r11} add r0, r5, r0 @ r0 = mcpm_entry_early_pokes add r6, r5, r6 @ r6 = mcpm_entry_vectors ldr r7, [r5, r7] @ r7 = mcpm_power_up_setup_phys add r8, r5, r8 @ r8 = mcpm_sync add r11, r5, r11 @ r11 = first_man_locks
@ Perform an early poke, if any add r0, r0, r4, lsl #3 ldmia r0, {r0, r1} teq r0, #0 strne r1, [r0]
mov r0, #MCPM_SYNC_CLUSTER_SIZE mla r8, r0, r10, r8 @ r8 = sync cluster base
@ Signal that this CPU is coming UP: mov r0, #CPU_COMING_UP mov r5, #MCPM_SYNC_CPU_SIZE mla r5, r9, r5, r8 @ r5 = sync cpu address strb r0, [r5]
@ At this point, the cluster cannot unexpectedly enter the GOING_DOWN @ state, because there is at least one active CPU (this CPU).
mov r0, #VLOCK_SIZE mla r11, r0, r10, r11 @ r11 = cluster first man lock mov r0, r11 mov r1, r9 @ cpu bl vlock_trylock @ implies DMB
cmp r0, #0 @ failed to get the lock? bne mcpm_setup_wait @ wait for cluster setup if so
ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER] cmp r0, #CLUSTER_UP @ cluster already up? bne mcpm_setup @ if not, set up the cluster
@ Otherwise, release the first man lock and skip setup: mov r0, r11 bl vlock_unlock b mcpm_setup_complete
mcpm_setup: @ Control dependency implies strb not observable before previous ldrb.
@ Signal that the cluster is being brought up: mov r0, #INBOUND_COMING_UP strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND] dmb
@ Any CPU trying to take the cluster into CLUSTER_GOING_DOWN from this @ point onwards will observe INBOUND_COMING_UP and abort.
@ Wait for any previously-pending cluster teardown operations to abort @ or complete: mcpm_teardown_wait: ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER] cmp r0, #CLUSTER_GOING_DOWN bne first_man_setup wfe b mcpm_teardown_wait
first_man_setup: dmb
@ If the outbound gave up before teardown started, skip cluster setup:
cmp r0, #CLUSTER_UP beq mcpm_setup_leave
@ power_up_setup is now responsible for setting up the cluster:
cmp r7, #0 mov r0, #1 @ second (cluster) affinity level blxne r7 @ Call power_up_setup if defined dmb
mov r0, #CLUSTER_UP strb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER] dmb
mcpm_setup_leave: @ Leave the cluster setup critical section:
mov r0, #INBOUND_NOT_COMING_UP strb r0, [r8, #MCPM_SYNC_CLUSTER_INBOUND] dsb st sev
mov r0, r11 bl vlock_unlock @ implies DMB b mcpm_setup_complete
@ In the contended case, non-first men wait here for cluster setup @ to complete: mcpm_setup_wait: ldrb r0, [r8, #MCPM_SYNC_CLUSTER_CLUSTER] cmp r0, #CLUSTER_UP wfene bne mcpm_setup_wait dmb
mcpm_setup_complete: @ If a platform-specific CPU setup hook is needed, it is @ called from here.
cmp r7, #0 mov r0, #0 @ first (CPU) affinity level blxne r7 @ Call power_up_setup if defined dmb
@ Mark the CPU as up:
mov r0, #CPU_UP strb r0, [r5]
@ Observability order of CPU_UP and opening of the gate does not matter.
mcpm_entry_gated: ldr r5, [r6, r4, lsl #2] @ r5 = CPU entry vector cmp r5, #0 wfeeq beq mcpm_entry_gated dmb
pr_dbg "released\n" bx r5
.align 2
//内存池,保存一些特点的变量或函数的地址 3: .word mcpm_entry_early_pokes - . .word mcpm_entry_vectors - 3b .word mcpm_power_up_setup_phys - 3b .word mcpm_sync - 3b .word first_man_locks - 3b
ENDPROC(mcpm_entry_point) |
四、示例三:cpuidle-psci - ARM64平台
psci是Power State Coordination Interface的缩写,是一个固件,通过SMC Calling可以调用到固件里面的函数接口,(实际上函数调用就是执行smc指令,并通过r0指定要调用的函数接口,通过r1~r3传递参数)
当使能了CONFIG_ARM_PSCI_CPUIDLE宏的时候会启用cpuidle-psci,arm64一般走这个分支
4.1 SMC Calling扫盲
所谓的SMC Calling可理解为调用固件里面一个函数,实际上就是执行smc指令,并通过r0指定要调用的函数接口,通过r1~r3传递参数,关于SMC Calling更多更多信息,可以参见下面文档
下面我们以CPU_SUSPEND系列调用为例,介绍SMC Calling
4.1.1 函数调用时的参数和返回值的传递
对于SMC Calling的参数和返回值,参见《DEN0028E_SMC_Calling_Convention-1_4alp0.pdf》的2.6和2.7节,对于smc64:
-
要调用的function id通过x0传递
-
调用上面的function需要传递一些参数,参数通过x1~x17传递
-
上面函数调用完成后需要返回一些数据,返回的数据通过x0~x17返回
当然,对于不同的function id,参数的传递和返回是不一样的,对于CPU_SUSPEND函数调用在《Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf》中的5.2.1节描述如下:
-
参数由x0~x3这四个寄存器传递,其中x0用于指定function id
-
返回值通过x0传递
4.1.2 CPU_SUSPEND函数参数和返回值
《Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf》中,对CPU_SUSPEND的描述如下:
由下面可知,CPU_SUSPEND的作用是,使调用者进入对应的low-power状态
下面介绍了CPU_SUSPEND接口的function id和各个参数的含义
由上面可知各个参数的含义为:
参数 |
含义 |
function ID |
第一个参数:指定调用该函数 |
power_state |
第二个参数:指定要进入的C state 实际就是上面从设备树中提取出自定义的arm,psci-suspend-param信息,对应数组psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003} 依次对应小核的state0、小核的state1、大核的state0、大核的state1
各个字段的描述参见手册:在5.4.2节描述 |
entry_point_address |
第三个参数:当cpu从idle退出时,从哪里开始取指执行,这是一个32/64bit的物理地址 |
context_id |
第四个参数:这个参数通过x0或者r0传递 上下文,难道就是栈sp的地址???
仅对调用者caller有效 |
return |
返回值:可为下面,Linux中通过psci_to_linux_errno函数解析: SUCCESS : 0 INVALID_PARAMETERS : -1 INVALID_ADDRESS : -2 DENIED : -3 |
4.1.2.1 function ID - 指定要调用那个接口
CPU_SUSPEND的function id默认为0xc4000001或者0x84000001
4.1.2.2 power_state - 用于配置要进入什么样的idle等级
参见《Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf》手册5.4.2章节
关于power_state,实际就是下面psci_dt_cpu_init_idle函数中从设备树中解析arm,psci-suspend-param信息,这个字段被赋值给psci_states[4]数组中,psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003},依次对应小核的state0、小核的state1、大核的state0、大核的state1
power_state共有Original format和Extended StateID format两种,这两种格式的含义如下
ps:本文分析的文章实际是Extended StateID format
因为这两种格式各个字段含义是一样的,下面我们简单的来分析各个字段的含义,更多信息请参见手册
4.1.2.2.1 PowerLevel
PowerLevel描述如下,用于指定要控制的电源等级
-
0: 只控制cpu的电源
-
1: 控制cluster的电源
-
2: 控制整个系统的电源
关于电源域的拓扑结构描述如下:
4.1.2.2.2 StateType - cpu是否掉电
0: 表示仅仅是standby状态,此时cpu的上下文还是在的
1: 标志直接给cpu下电了,测试cpu的上下文丢失了,此时需要entry_point_address指定这个cpu唤醒后从哪开始执行,还需要context_id用于保存下电钱的上下文,以便重新上电时恢复现场
4.1.2.2.3 StateID
可用的组合如下:
4.1.2.3 entry_point_address - cpu重新上电后从哪运行
指定cpu从idle唤醒时,从哪开始执行,这个地址只能是物理地址,(当然是因为地址线不过MMU啦)
4.1.2.4 context_id - 一个地址,用于掉电钱保存上下文
因为上面power_state字段,在指定StateType中,有可能会时cpu掉电,此时就需要利用该参数保存掉电前的上下文,以便在后面重新上电时恢复这个上下文
4.1.2.5 psci_to_linux_errno - 解析psci调用返回值
《Power_State_Coordination_Interface_PDD_v1_1_DEN0022D.pdf》中的5.2.2节描述和函数调用的可能返回值
Linux中通过psci_to_linux_errno函数对返回值进行解析
static int psci_to_linux_errno(int errno) { switch (errno) { case PSCI_RET_SUCCESS: return 0; case PSCI_RET_NOT_SUPPORTED: return -EOPNOTSUPP; case PSCI_RET_INVALID_PARAMS: case PSCI_RET_INVALID_ADDRESS: return -EINVAL; case PSCI_RET_DENIED: return -EPERM; };
return -EINVAL; } |
其中:
/* PSCI return values (inclusive of all PSCI versions) */ #define PSCI_RET_SUCCESS 0 #define PSCI_RET_NOT_SUPPORTED -1 #define PSCI_RET_INVALID_PARAMS -2 #define PSCI_RET_DENIED -3 #define PSCI_RET_ALREADY_ON -4 #define PSCI_RET_ON_PENDING -5 #define PSCI_RET_INTERNAL_FAILURE -6 #define PSCI_RET_NOT_PRESENT -7 #define PSCI_RET_DISABLED -8 #define PSCI_RET_INVALID_ADDRESS -9 |
4.1.3 __invoke_psci_fn_smc - 通过smc指令实现SMC Calling,完成对psci固件里面的函数进行调用
__invoke_psci_fn_smc实现如下:
函数位置:U:\linux-5.10.61\drivers\firmware\psci\psci.c
static unsigned long __invoke_psci_fn_smc( unsigned long function_id, //要调用的函数 unsigned long arg0, //要进入的idle等级 unsigned long arg1, //退出idle时从哪执行 unsigned long arg2) //传入栈sp地址 { struct arm_smccc_res res;
//1.调用SMC Calling arm_smccc_smc(function_id, arg0, arg1, arg2, 0, 0, 0, 0, &res);
//2.返回值保存在x0中 return res.a0; } |
其中:
#define arm_smccc_smc(...) __arm_smccc_smc(__VA_ARGS__, NULL) |
4.1.4 __arm_smccc_smc - 汇编完成SMC Calling工作,访问psci固件里面的接口
由下面的函数注释可知,在执行smc命令之前,需要把要传递的参数放进x0~x7这8个寄存器中,smc命令返回时,所调用的function的返回值放在x0~x3这4个寄存器中
另外,一个可选的quirk数据结构提供给厂商用于做一些自定义的操作
/** * __arm_smccc_smc() - make SMC calls * @a0-a7: arguments passed in registers 0 to 7 * @res: result values from registers 0 to 3 * @quirk: points to an arm_smccc_quirk, or NULL when no quirks are required. *
* The content of the supplied param are copied to registers 0 to 7 prior * to the SMC instruction. The return values are updated with the content * from register 0 to 3 on return from the SMC instruction. An optional * quirk structure provides vendor specific behavior. */ asmlinkage void __arm_smccc_smc(unsigned long a0, unsigned long a1, unsigned long a2, unsigned long a3, unsigned long a4, unsigned long a5, unsigned long a6, unsigned long a7, struct arm_smccc_res *res, struct arm_smccc_quirk *quirk); |
W:\opensource\linux-5.10.61\arch\arm64\kernel\smccc-call.S
.macro SMCCC instr //下面调用smc指令,之后cpu将会进入睡眠状态 \instr #0 //调用指令smc #0或者hvc #0
//代码走到这里表示SMC Calling已经完成,cpu已经从idle中唤醒, //由上面的分析可知,SMC Calling函数调用的返回值记录在x0~x3中, //而不同的function id使用的寄存器的个数是不一样的,对于CPU_SUSPEND //只返回值只记录在x0中 //但是__arm_smccc_smc作为一个通用的接口,需要考虑所有的function //的返回情况,因此这里将x0~x3里面的所有值都保存起来了 ldr x4, [sp] //读取栈顶地址 stp x0, x1, [x4, #ARM_SMCCC_RES_X0_OFFS] //将x0, x1, x2, x3参数压栈 stp x2, x3, [x4, #ARM_SMCCC_RES_X2_OFFS]
//判断是否存在芯片厂商自定义的quirk结构 ldr x4, [sp, #8] //栈顶回退 cbz x4, 1f /* no quirk structure */ //判断地址是不是32位对齐的 ldr x9, [x4, #ARM_SMCCC_QUIRK_ID_OFFS] cmp x9, #ARM_SMCCC_QUIRK_QCOM_A6 b.ne 1f str x6, [x4, ARM_SMCCC_QUIRK_STATE_OFFS] 1: ret .endm
/* * void arm_smccc_smc(unsigned long a0, unsigned long a1, unsigned long a2, * unsigned long a3, unsigned long a4, unsigned long a5, * unsigned long a6, unsigned long a7, struct arm_smccc_res *res, * struct arm_smccc_quirk *quirk) */ SYM_FUNC_START(__arm_smccc_smc) SMCCC smc SYM_FUNC_END(__arm_smccc_smc) EXPORT_SYMBOL(__arm_smccc_smc) |
4.1.5 smc指令介绍
参考文档:
https://developer.arm.com/documentation/ddi0597/2021-12/Base-Instructions/SMC--Secure-Monitor-Call-?lang=en
https://blog.csdn.net/u011280717/article/details/77395675
https://www.cnblogs.com/arnoldlu/p/14175126.html
https://blog.csdn.net/chenying126/article/details/78638944
smc指令是Secure Monitor Call的缩写,官方介绍如下:
smc指令对应的机器码如下:
注意:SMC有Thumb编码和ARM编码,T1是Thumb,A1是ARM。可以看到不管是Thumb还是ARM都有一个4位的立即数imm4,这个imm4在armv7a的架构没有定义是干什么,用户可以自己选择如何使用它,通常就跟使用SVC后面的立即数差不多。需要注意的是上图中Thumb指令的低16位在右边,所有imm4其实跟ARM指令的位置是一样的,都是在32位中最低4位,这样在处理获取imm4的时候,我们就不需要去判断到底是ARM指令还是THUMB指令。
设置SMC异常向量表
SMC跟SVC指令很类似,都会进入一种软件异常模式。所以使用SMC必须要提供一个SMC的异常向量表。需要注意的是Monitor模式拥有自己的一套异常向量表,它并不与其他的异常/中断模式共享一套异常向量。Monitor模式所需要的异常向量入口保存在这个MVBAR(Monitor Vector Base Address)寄存器中,需要注意的是MVBAR必须在Secure world的PL1级别下才能够进行读写,也就是说MVBAR在系统中必须由底层软件去设置。
下面看一下如何设置SMC异常向量表,需要注意的是Monitor模式的不仅异常向量是自己用的一套,它的栈也是自己的,与其他模式如SVC模式使用的不是同一个栈,所以在系统初始化的时候需要指定好Monitor模式的栈
不深入研究了,脑壳疼
4.2 arm64中的cpu_operations结构组织关系
在arm64平台中,定义了cpu_operations数据结构,由该数据结构定义的位置也可以知道,该数据结构是专为arm64提供的
W:\opensource\linux-5.10.61\arch\arm64\include\asm\cpu_ops.h
/** * struct cpu_operations - Callback operations for hotplugging CPUs. * * @name: Name of the property as appears in a devicetree cpu node's * enable-method property. On systems booting with ACPI, @name * identifies the struct cpu_operations entry corresponding to * the boot protocol specified in the ACPI MADT table. * @cpu_init: Reads any data necessary for a specific enable-method for a * proposed logical id. * @cpu_prepare: Early one-time preparation step for a cpu. If there is a * mechanism for doing so, tests whether it is possible to boot * the given CPU. * @cpu_boot: Boots a cpu into the kernel. * @cpu_postboot: Optionally, perform any post-boot cleanup or necessary * synchronisation. Called from the cpu being booted. * @cpu_can_disable: Determines whether a CPU can be disabled based on * mechanism-specific information. * @cpu_disable: Prepares a cpu to die. May fail for some mechanism-specific * reason, which will cause the hot unplug to be aborted. Called * from the cpu to be killed. * @cpu_die: Makes a cpu leave the kernel. Must not fail. Called from the * cpu being killed. * @cpu_kill: Ensures a cpu has left the kernel. Called from another cpu. * @cpu_init_idle: Reads any data necessary to initialize CPU idle states for * a proposed logical id. * @cpu_suspend: Suspends a cpu and saves the required context. May fail owing * to wrong parameters or error conditions. Called from the * CPU being suspended. Must be called with IRQs disabled. */ struct cpu_operations { const char *name; int (*cpu_init)(unsigned int); int (*cpu_prepare)(unsigned int); int (*cpu_boot)(unsigned int); void (*cpu_postboot)(void); #ifdef CONFIG_HOTPLUG_CPU bool (*cpu_can_disable)(unsigned int cpu); int (*cpu_disable)(unsigned int cpu); void (*cpu_die)(unsigned int cpu); int (*cpu_kill)(unsigned int cpu); #endif #ifdef CONFIG_CPU_IDLE int (*cpu_init_idle)(unsigned int); int (*cpu_suspend)(unsigned long); //进入指定级别的C state #endif }; |
4.2.1 全局数组cpu_ops[NR_CPUS]
每个cpu都有一个自己的ops
W:\opensource\linux-5.10.61\arch\arm64\kernel\cpu_ops.c
static const struct cpu_operations *cpu_ops[NR_CPUS] __ro_after_init; |
4.2.2 init_cpu_ops - 赋值cpu_ops[NR_CPUS]全局数组
/* * Read a cpu's enable method and record it in cpu_ops. */ int __init init_cpu_ops(int cpu) { //1.从设备树中获取要使用哪种方法进入idle状态 const char *enable_method = cpu_read_enable_method(cpu);
//2.若设备树中没有指定,则退出 if (!enable_method) return -ENODEV;
//3.由根据设备树中指定的method信息,赋值cpu_ops // 注意:如果找不到对应的ops,就会返回-EOPNOTSUPP cpu_ops[cpu] = cpu_get_ops(enable_method); if (!cpu_ops[cpu]) { pr_warn("Unsupported enable-method: %s\n", enable_method); return -EOPNOTSUPP; }
return 0; } |
4.2.3 cpu_read_enable_method
static const char *__init cpu_read_enable_method(int cpu) { const char *enable_method;
if (acpi_disabled) { struct device_node *dn = of_get_cpu_node(cpu, NULL);
if (!dn) { if (!cpu) pr_err("Failed to find device node for boot cpu\n"); return NULL; }
//1.解析设备树中,cpu节点的"enable-method"属性 enable_method = of_get_property(dn, "enable-method", NULL); if (!enable_method) { /* * The boot CPU may not have an enable method (e.g. * when spin-table is used for secondaries). * Don't warn spuriously. */ if (cpu != 0) pr_err("%pOF: missing enable-method property\n",dn); } of_node_put(dn); } else { enable_method = acpi_get_enable_method(cpu); if (!enable_method) { /* * In ACPI systems the boot CPU does not require * checking the enable method since for some * boot protocol (ie parking protocol) it need not * be initialized. Don't warn spuriously. */ if (cpu != 0) pr_err("Unsupported ACPI enable-method\n"); } }
return enable_method; } |
4.2.4 acpi_get_enable_method
static inline const char *acpi_get_enable_method(int cpu) { if (acpi_psci_present()) return "psci";
if (acpi_parking_protocol_valid(cpu)) return "parking-protocol";
return NULL; } |
4.2.5 cpu_get_ops - 由method名称得到对应的cpu_operations
U:\linux-5.10.61\arch\arm64\kernel\cpu_ops.c
static const struct cpu_operations * __init cpu_get_ops(const char *name) { const struct cpu_operations *const *ops;
//1.系统中已经定义好了两个全局的ops数组 ops = acpi_disabled ? dt_supported_cpu_ops : acpi_supported_cpu_ops;
while (*ops) { //2.选择名字匹配的ops if (!strcmp(name, (*ops)->name)) return *ops;
ops++; }
return NULL; } |
4.2.6 cpu_ops[NR_CPUS]数组的实现
cpu_ops可能是下面两个全局数组中的一个,对于psci,走的是cpu_psci_ops
static const struct cpu_operations *const dt_supported_cpu_ops[] __initconst = { &smp_spin_table_ops, &cpu_psci_ops, NULL, };
static const struct cpu_operations *const acpi_supported_cpu_ops[] __initconst = { #ifdef CONFIG_ARM64_ACPI_PARKING_PROTOCOL &acpi_parking_protocol_ops, #endif &cpu_psci_ops, NULL, }; |
4.2.7 cpu_psci_ops - arm64适用的cpu_operations
注意:这个ops中并没有定义cpu_suspend回调函数哦
const struct cpu_operations cpu_psci_ops = { .name = "psci", .cpu_init = cpu_psci_cpu_init, .cpu_prepare = cpu_psci_cpu_prepare, .cpu_boot = cpu_psci_cpu_boot, #ifdef CONFIG_HOTPLUG_CPU .cpu_can_disable = cpu_psci_cpu_can_disable, .cpu_disable = cpu_psci_cpu_disable, .cpu_die = cpu_psci_cpu_die, .cpu_kill = cpu_psci_cpu_kill, #endif }; |
4.2.8 get_cpu_ops - 获取指定cpu对应的cpu_operations
在arm64平台中,get_cpu_ops实现如下:
const struct cpu_operations *get_cpu_ops(int cpu) { return cpu_ops[cpu]; } |
4.3 psci_cpuidle_probe - 遍历所有cpu,完成cpuidle driver的初始化
看一下这里的probe函数是不是和cpuidle-arm的驱动初始化函数arm_cpuidle_init函数长得一毛一样呢?该不会psci就是抄cpuidle-arm的吧!哈哈
文件位置:W:\opensource\linux-5.10.61\drivers\cpuidle\cpuidle-psci.c
/* * psci_idle_probe - Initializes PSCI cpuidle driver * * Initializes PSCI cpuidle driver for all CPUs, if any CPU fails * to register cpuidle driver then rollback to cancel all CPUs * registration. */ static int psci_cpuidle_probe(struct platform_device *pdev) { int cpu, ret; struct cpuidle_driver *drv; struct cpuidle_device *dev;
//1.遍历每一个cpu,对每一个cpu设备执行下面的初始化 for_each_possible_cpu(cpu) { ret = psci_idle_init_cpu(&pdev->dev, cpu); if (ret) goto out_fail; }
psci_idle_init_cpuhp(); return 0;
out_fail: while (--cpu >= 0) { dev = per_cpu(cpuidle_devices, cpu); drv = cpuidle_get_cpu_driver(dev); cpuidle_unregister(drv); psci_cpu_deinit_idle(cpu); }
return ret; } |
4.4 psci_idle_init_cpu - 完成对指定cpu的cpuidle driver的初始化
该函数在psci驱动的probe函数中被调用
static int psci_idle_init_cpu(struct device *dev, int cpu) { struct cpuidle_driver *drv; struct device_node *cpu_node; const char *enable_method; int ret = 0;
//1.获取这个cpu对应的设备树节点 cpu_node = of_cpu_device_node_get(cpu); if (!cpu_node) return -ENODEV;
/* * Check whether the enable-method for the cpu is PSCI, fail * if it is not. */ //2.如果这个cpu的"enable-method"属性值不是psci,就退出 enable_method = of_get_property(cpu_node, "enable-method", NULL); if (!enable_method || (strcmp(enable_method, "psci"))) ret = -ENODEV;
of_node_put(cpu_node); if (ret) return ret;
drv = devm_kzalloc(dev, sizeof(*drv), GFP_KERNEL); if (!drv) return -ENOMEM;
//3.初始化cpuidle_driver数据结构 drv->name = "psci_idle"; drv->owner = THIS_MODULE; drv->cpumask = (struct cpumask *)cpumask_of(cpu);
/* * PSCI idle states relies on architectural WFI to be represented as * state index 0. */ //4.注意,这里指定的进入idle等级的函数为psci_enter_idle_state drv->states[0].enter = psci_enter_idle_state; drv->states[0].exit_latency = 1; drv->states[0].target_residency = 1; drv->states[0].power_usage = UINT_MAX; strcpy(drv->states[0].name, "WFI"); strcpy(drv->states[0].desc, "ARM WFI");
/* * If no DT idle states are detected (ret == 0) let the driver * initialization fail accordingly since there is no reason to * initialize the idle driver if only wfi is supported, the * default archictectural back-end already executes wfi * on idle entry. */ //6.从设备树中解析各个idle等级的信息 // 注意: // a) 这里从设备树中解析的是标准内核支持的C state属性 // b) 返回值ret表示这个cpu支持多少个C state ret = dt_init_idle_driver(drv, psci_idle_state_match, 1); if (ret <= 0) return ret ? : -ENODEV;
/* * Initialize PSCI idle states. */ //7.解析psci自定义的一些C state属性 ret = psci_cpu_init_idle(dev, drv, cpu, ret); if (ret) { pr_err("CPU %d failed to PSCI idle\n", cpu); return ret; }
//8.注册这个driver ret = cpuidle_register(drv, NULL); if (ret) goto deinit;
//9.暂不分析 cpuidle_cooling_register(drv);
return 0; deinit: psci_cpu_deinit_idle(cpu); return ret; } |
4.4.1 psci_idle_state_match - match表
其中psci_idle_state_match实现如下:
static const struct of_device_id psci_idle_state_match[] = { { .compatible = "arm,idle-state", .data = psci_enter_idle_state }, //data中指定进入指定C state的函数 { }, }; |
4.4.2 psci_cpuidle_data
struct psci_cpuidle_data { u32 *psci_states; //用于保存每一个idle等级 struct device *dev; }; |
4.4.3 psci_enter_idle_state - psci进入指定的C state等级
static int psci_enter_idle_state( struct cpuidle_device *dev, //哪个cpu要进入 struct cpuidle_driver *drv, int idx) //要进入的C state基本的索引 { //1.获取psci自定义的C state信息 // 下面的psci_cpuidle_data.psci_states实际上是来源于设备树中的 // arm,psci-suspend-param属性,在下面的psci_dt_parse_state_node中解析 u32 *state = __this_cpu_read(psci_cpuidle_data.psci_states);
return psci_enter_state(idx, state[idx]); } |
4.4.4 psci_enter_state - 进入指定级别的C state方法
psci_enter_state使指定的cpu进入指定级别的C state,调用的函数如下,这两个函数我们在下一章单独讲解
-
当idx为0时,调用cpu_do_idle
-
当idx不为0时,调用psci_cpu_suspend_enter
static inline int psci_enter_state(int idx, u32 state) { return CPU_PM_CPU_IDLE_ENTER_PARAM(psci_cpu_suspend_enter, idx, state); } |
其中CPU_PM_CPU_IDLE_ENTER_PARAM定义如下
#define CPU_PM_CPU_IDLE_ENTER_PARAM(low_level_idle_enter, idx, state) \ __CPU_PM_CPU_IDLE_ENTER(low_level_idle_enter, idx, state, 0) |
4.5 psci_cpu_init_idle - 解析psci自定义的一些C state信息
static int psci_cpu_init_idle( struct device *dev, struct cpuidle_driver *drv, unsigned int cpu, //要初始化哪个cpu unsigned int state_count) //这个cpu支持多少个C state { struct device_node *cpu_node; int ret;
/* * If the PSCI cpu_suspend function hook has not been initialized * idle states must not be enabled, so bail out */ //1.如果cpu_suspend没有设置,则退出 // 这个在U:\linux-5.10.61\drivers\firmware\psci\psci.c中设置 if (!psci_ops.cpu_suspend) return -EOPNOTSUPP;
//2.获取cpu对应的设备树节点 cpu_node = of_cpu_device_node_get(cpu); if (!cpu_node) return -ENODEV;
//3.从设备树中解析信息 ret = psci_dt_cpu_init_idle(dev, drv, cpu_node, state_count, cpu);
of_node_put(cpu_node);
return ret; } |
4.5.1 psci_dt_cpu_init_idle - 从设备树中解析psci自定义的C state信息,并赋值给percpu变量psci_cpuidle_data
static int psci_dt_cpu_init_idle( struct device *dev, struct cpuidle_driver *drv, struct device_node *cpu_node, //这个cpu对应的设备树节点是哪一个 unsigned int state_count, //该cpu一共有多少个idle等级 int cpu) //解析哪个cpu的idle等级的信息 { int i, ret = 0; u32 *psci_states; struct device_node *state_node;
//1.每个cpu对应一个该变量,从中获取这个cpu的idle等级 struct psci_cpuidle_data *data = per_cpu_ptr(&psci_cpuidle_data, cpu);
//2.注意:这里执行加操作是为了将normal state对应 // 的C state也包含进来,也就是0级state state_count++; /* Add WFI state too */
//3.为每个idle等级申请空间 psci_states = devm_kcalloc(dev, state_count, sizeof(*psci_states), GFP_KERNEL); if (!psci_states) return -ENOMEM;
//4.遍历设备树中的每一个idle等级,从设备树中提取出信息完成初始化 // 注意,下面在遍历的时候是从1开始的,因为0对应的是normal state,跳过 for (i = 1; i < state_count; i++) { //4.1 第一步先找出idle等级对应的设备树节点 state_node = of_get_cpu_state_node(cpu_node, i - 1); if (!state_node) break;
//4.2 第二步:从设备树中提取出自定义的arm,psci-suspend-param信息, // 并填充psci_states结构,经过本循环,得到下面数组 // psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003} // 依次对应小核的state0、小核的state1、大核的state0、大核的state1 ret = psci_dt_parse_state_node(state_node, &psci_states[i]); of_node_put(state_node);
if (ret) return ret;
pr_debug("psci-power-state %#x index %d\n", psci_states[i], i); }
//5.是不是所有的idle等级就校验完毕了 if (i != state_count) return -ENODEV;
/* Initialize optional data, used for the hierarchical topology. */ //4.这一步是在干啥 ret = psci_dt_cpu_init_topology(drv, data, state_count, cpu); if (ret < 0) return ret;
/* Idle states parsed correctly, store them in the per-cpu struct. */ //5.赋值给全局的percpu变量psci_cpuidle_data data->psci_states = psci_states; return 0; } |
4.5.2 psci_dt_parse_state_node - 从设备树中解析自定义的arm,psci-suspend-param属性
int psci_dt_parse_state_node(struct device_node *np, u32 *state) { //读取设备树中arm,psci-suspend-param属性的值 int err = of_property_read_u32(np, "arm,psci-suspend-param", state);
if (err) { pr_warn("%pOF missing arm,psci-suspend-param property\n", np); return err; }
if (!psci_power_state_is_valid(*state)) { pr_warn("Invalid PSCI power state %#x\n", *state); return -EINVAL; }
return 0; } |
设备树示例如下:
U:\linux-5.10.61\arch\arm64\boot\dts\qcom\msm8998.dtsi
idle-states { entry-method = "psci";
LITTLE_CPU_SLEEP_0: cpu-sleep-0-0 { compatible = "arm,idle-state"; idle-state-name = "little-retention"; arm,psci-suspend-param = <0x00000002>; entry-latency-us = <81>; exit-latency-us = <86>; min-residency-us = <200>; };
LITTLE_CPU_SLEEP_1: cpu-sleep-0-1 { compatible = "arm,idle-state"; idle-state-name = "little-power-collapse"; arm,psci-suspend-param = <0x40000003>; entry-latency-us = <273>; exit-latency-us = <612>; min-residency-us = <1000>; local-timer-stop; };
BIG_CPU_SLEEP_0: cpu-sleep-1-0 { compatible = "arm,idle-state"; idle-state-name = "big-retention"; arm,psci-suspend-param = <0x00000002>; entry-latency-us = <79>; exit-latency-us = <82>; min-residency-us = <200>; };
BIG_CPU_SLEEP_1: cpu-sleep-1-1 { compatible = "arm,idle-state"; idle-state-name = "big-power-collapse"; arm,psci-suspend-param = <0x40000003>; entry-latency-us = <336>; exit-latency-us = <525>; min-residency-us = <1000>; local-timer-stop; }; }; |
4.5.3 psci_dt_cpu_init_topology -
static int psci_dt_cpu_init_topology( struct cpuidle_driver *drv, struct psci_cpuidle_data *data, unsigned int state_count, int cpu) { /* Currently limit the hierarchical topology to be used in OSI mode. */ if (!psci_has_osi_support()) return 0;
data->dev = psci_dt_attach_cpu(cpu); if (IS_ERR_OR_NULL(data->dev)) return PTR_ERR_OR_ZERO(data->dev);
/* * Using the deepest state for the CPU to trigger a potential selection * of a shared state for the domain, assumes the domain states are all * deeper states. */ //1.设置cpu进入指定的idle等级的函数,这里为什么是设置睡眠最深的C state的回调函数 drv->states[state_count - 1].enter = psci_enter_domain_idle_state; psci_cpuidle_use_cpuhp = true;
return 0; } |
4.5.4 psci_enter_domain_idle_state - cpu进入指定级别的C state
static int psci_enter_domain_idle_state( struct cpuidle_device *dev, struct cpuidle_driver *drv, int idx) //要进入哪个cpu等级 { struct psci_cpuidle_data *data = this_cpu_ptr(&psci_cpuidle_data); u32 *states = data->psci_states; struct device *pd_dev = data->dev; u32 state; int ret;
ret = cpu_pm_enter(); if (ret) return -1;
/* Do runtime PM to manage a hierarchical CPU toplogy. */ RCU_NONIDLE(pm_runtime_put_sync_suspend(pd_dev));
//1.这里得到的是percpu变量domain_state // 这个percpu变量和CONFIG_ARM_PSCI_CPUIDLE_DOMAIN有关, // 我们暂且认为该percpu变量始终为0 state = psci_get_domain_state();
//2.psci_cpuidle_data->psci_states中提取idx指定的C state对应的state参数 // 实际就是上面从设备树中提取出自定义的arm,psci-suspend-param信息, // 对应数组psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003} // 依次对应小核的state0、小核的state1、大核的state0、大核的state1 if (!state) state = states[idx];
//3.进入idx指定的idle等级,该函数在退出idle前不会返回 ret = psci_cpu_suspend_enter(state) ? -1 : idx;
//4.代码走到这里表示已经从idle中退出了 RCU_NONIDLE(pm_runtime_get_sync(pd_dev));
cpu_pm_exit();
/* Clear the domain state to start fresh when back from idle. */ psci_set_domain_state(0); return ret; } |
4.6 进入指定级别的C state
4.6.1 cpu_do_idle - 执行wfi指令,进入idle 0
arm64实现如下,执行wfi指令进入idle状态
W:\opensource\linux-5.10.61\arch\arm64\kernel\process.c
/* * cpu_do_idle() * * Idle the processor (wait for interrupt). * * If the CPU supports priority masking we must do additional work to * ensure that interrupts are not masked at the PMR (because the core will * not wake up if we block the wake up signal in the interrupt controller). */ void noinstr cpu_do_idle(void) { if (system_uses_irq_prio_masking()) __cpu_do_idle_irqprio(); else __cpu_do_idle(); } |
4.6.1.1 __cpu_do_idle - 直接执行wfi进入睡眠
static void noinstr __cpu_do_idle(void) { dsb(sy); wfi(); } |
4.6.1.2 __cpu_do_idle_irqprio - 执行wfi之前关闭一些中断相关操作
static void noinstr __cpu_do_idle_irqprio(void) { unsigned long pmr; unsigned long daif_bits;
daif_bits = read_sysreg(daif); write_sysreg(daif_bits | PSR_I_BIT, daif);
/* * Unmask PMR before going idle to make sure interrupts can * be raised. */ //在执行wfi之前,先操作gic,完成一些和中断相关的操作 pmr = gic_read_pmr(); gic_write_pmr(GIC_PRIO_IRQON | GIC_PRIO_PSR_I_SET);
//执行wfi进入睡眠状态 __cpu_do_idle();
gic_write_pmr(pmr); write_sysreg(daif_bits, daif); } |
4.6.2 psci_cpu_suspend_enter - 进入指定的等级
//注意:这里传入的参数state,是idx对应的C state对应的一个参数, //实际就是上面从设备树中提取出自定义的arm,psci-suspend-param信息, //对应数组psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003} //依次对应小核的state0、小核的state1、大核的state0、大核的state1 int psci_cpu_suspend_enter(u32 state) { int ret;
if (!psci_power_state_loses_context(state)) //指定的第二个参数表示从C state中退出时,从哪个物理地址开始取值执行 ret = psci_ops.cpu_suspend(state, 0); else ret = cpu_suspend(state, psci_suspend_finisher);
return ret; } |
4.6.2.1 cpu_suspend
arm64实现如下:
U:\linux-5.10.61\arch\arm64\kernel\suspend.c
/* * cpu_suspend * * arg: argument to pass to the finisher function * fn: finisher function pointer * */ int cpu_suspend( unsigned long arg, //要进入的idle的等级 int (*fn)(unsigned long)) //进入idle的回调函数 { int ret = 0; unsigned long flags; struct sleep_stack_data state;
/* * From this point debug exceptions are disabled to prevent * updates to mdscr register (saved and restored along with * general purpose registers) from kernel debuggers. */ flags = local_daif_save();
/* * Function graph tracer state gets incosistent when the kernel * calls functions that never return (aka suspend finishers) hence * disable graph tracing during their execution. */ pause_graph_tracing();
if (__cpu_suspend_enter(&state)) { /* Call the suspend finisher */ //1.调用回调函数进入idle等级 ret = fn(arg);
/* * Never gets here, unless the suspend finisher fails. * Successful cpu_suspend() should return from cpu_resume(), * returning through this code path is considered an error * If the return value is set to 0 force ret = -EOPNOTSUPP * to make sure a proper error condition is propagated */ //2.代码走到这里表示已经从idle等级中退出来了 if (!ret) ret = -EOPNOTSUPP; } else { RCU_NONIDLE(__cpu_suspend_exit()); }
unpause_graph_tracing();
/* * Restore pstate flags. OS lock and mdscr have been already * restored, so from this point onwards, debugging is fully * renabled if it was enabled when core started shutdown. */ local_daif_restore(flags);
return ret; } |
4.6.2.2 psci_suspend_finisher - psci进入指定的C state
static int psci_suspend_finisher(unsigned long state) { u32 power_state = state;
//参数power_state,是C state对应的一个参数,实际就是上面从设备树中提取出 //自定义的arm,psci-suspend-param信息,对应数组psci_states[4] = {0x00000002, //0x40000003, 0x00000002, 0x40000003},依次对应小核的state0、小核的state1、 //大核的state0、大核的state1 //参数psci_ops.cpu_suspend,表示这个cpu从idle退出后,从哪个物理地址开始运行 return psci_ops.cpu_suspend(power_state, __pa_symbol(cpu_resume)); } |
4.6.2.3 全局变量psci_ops.cpu_suspend在哪设置
调用路径:setup_arch -> psci_acpi_init -> psci_probe -> psci_0_2_set_functions
static void __init psci_0_2_set_functions(void) { ... //设置函数调用的id psci_function_id[PSCI_FN_CPU_SUSPEND] = PSCI_FN_NATIVE(0_2, CPU_SUSPEND);
//设置回调函数 psci_ops.cpu_suspend = psci_cpu_suspend; ... } |
其中PSCI_FN_NATIVE实际就是拼接得到函数名,实现如下
/* * While a 64-bit OS can make calls with SMC32 calling conventions, for some * calls it is necessary to use SMC64 to pass or return 64-bit values. * For such calls PSCI_FN_NATIVE(version, name) will choose the appropriate * (native-width) function ID. */ #ifdef CONFIG_64BIT #define PSCI_FN_NATIVE(version, name) PSCI_##version##_FN64_##name #else #define PSCI_FN_NATIVE(version, name) PSCI_##version##_FN_##name #endif |
则上面psci_function_id[PSCI_FN_CPU_SUSPEND]的值为PSCI_0_2_FN64_CPU_SUSPEND,该值定义如下:
#define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1) |
其中PSCI_0_2_FN64定义如下,计算得到的值为0xc4000001
#define PSCI_0_2_FN_BASE 0x84000000 #define PSCI_0_2_FN(n) (PSCI_0_2_FN_BASE + (n)) #define PSCI_0_2_64BIT 0x40000000 #define PSCI_0_2_FN64_BASE (PSCI_0_2_FN_BASE + PSCI_0_2_64BIT) #define PSCI_0_2_FN64(n) (PSCI_0_2_FN64_BASE + (n)) |
则PSCI_0_2_FN64_CPU_SUSPEND的值为0xc4000001,这实际上是ARM的一个SMC Calling调用,也就是调用psci固件里面的一个函数接口,0xc4000001对应就是CPU_SUSPEND函数调用,这个在上面已经分析过了,这里不在赘述
由《DEN0028E_SMC_Calling_Convention-1_4alp0.pdf》中的Table 6-4可知,0xC4000000-0xC400001F对应的是电源控制对应的接口,实际上0xc4000001对应就是CPU_SUSPEND
4.6.2.4 psci_cpu_suspend - psci_ops.cpu_suspend实现,进入指定的C state
注意:下面传入的参数state,是C state对应的一个参数,实际就是上面从设备树中提取出自定义的arm,psci-suspend-param信息,对应数组psci_states[4] = {0x00000002, 0x40000003, 0x00000002, 0x40000003},依次对应小核的state0、小核的state1、大核的state0、大核的state1
static int psci_cpu_suspend( u32 state, //要进入的C state unsigned long entry_point) //退出idle时从哪执行 { int err; u32 fn;
//1.第一步,先从这个全局的函数指针数组中找出对应的回调函数对应的idx // 也就是上面在SMC_Calling中介绍的function id,这里为0xc4000001 fn = psci_function_id[PSCI_FN_CPU_SUSPEND];
//2.第二步,执行这个回调函数 // 注意传入的4个参数,参见上面讲解的《CPU_SUSPEND函数参数和返回值》 // invoke_psci_fn是一个全局的回调函数指针 err = invoke_psci_fn(fn, state, entry_point, 0);
//3.走到这,表示已经从idle中退出了 return psci_to_linux_errno(err); } |
4.6.2.4.1 invoke_psci_fn设置的地方如下:
static void set_conduit(enum arm_smccc_conduit conduit) { switch (conduit) { case SMCCC_CONDUIT_HVC: invoke_psci_fn = __invoke_psci_fn_hvc; break; case SMCCC_CONDUIT_SMC: invoke_psci_fn = __invoke_psci_fn_smc; break; default: WARN(1, "Unexpected PSCI conduit %d\n", conduit); }
psci_conduit = conduit; } |
4.6.2.5 cpu_resume - cpu退出idle时从这里执行
.pushsection ".idmap.text", "awx" SYM_CODE_START(cpu_resume) bl el2_setup // if in EL2 drop to EL1 cleanly bl __cpu_setup /* enable the MMU early - so we can access sleep_save_stash by va */ adrp x1, swapper_pg_dir bl __enable_mmu ldr x8, =_cpu_resume br x8 SYM_CODE_END(cpu_resume) .ltorg .popsection
SYM_FUNC_START(_cpu_resume) mrs x1, mpidr_el1 adr_l x8, mpidr_hash // x8 = struct mpidr_hash virt address
/* retrieve mpidr_hash members to compute the hash */ ldr x2, [x8, #MPIDR_HASH_MASK] ldp w3, w4, [x8, #MPIDR_HASH_SHIFTS] ldp w5, w6, [x8, #(MPIDR_HASH_SHIFTS + 8)] compute_mpidr_hash x7, x3, x4, x5, x6, x1, x2
/* x7 contains hash index, let's use it to grab context pointer */ ldr_l x0, sleep_save_stash ldr x0, [x0, x7, lsl #3] add x29, x0, #SLEEP_STACK_DATA_CALLEE_REGS add x0, x0, #SLEEP_STACK_DATA_SYSTEM_REGS /* load sp from context */ ldr x2, [x0, #CPU_CTX_SP] mov sp, x2 /* * cpu_do_resume expects x0 to contain context address pointer */ bl cpu_do_resume
#ifdef CONFIG_KASAN mov x0, sp bl kasan_unpoison_task_stack_below #endif
ldp x19, x20, [x29, #16] ldp x21, x22, [x29, #32] ldp x23, x24, [x29, #48] ldp x25, x26, [x29, #64] ldp x27, x28, [x29, #80] ldp x29, lr, [x29] //出栈,取出数据赋值给lr mov x0, #0 ret SYM_FUNC_END(_cpu_resume) |
五、总结
让我们来回顾一下文章开头的问题
5.1 怎样描述一个C state
不同的平台对"不同等级的C state"描述不一样,进入指定等级的C state的方法也是不一样的,这和平台芯片的设计相关,本文的示例中:
-
cpuidle-arm: 不管是进入哪一级idle,都是通过wfi指令实现,所不同的是,在进入不同级别的C state之前,会根据不同的级别完成不同的操作,例如关闭一些外设,刷cache等操作
-
cpuidle-big_little: 只有两个等级的idle,即idle0和idle1,同样,进入这两个等级也是通过wfi指令,所不同的是,在进入idle1时,会根据情况决定是否需要关闭cpu和cluster的电源,以便达到节能的目的,当然从idle1中退出时因为需要对cpu和cluster重新上电操作,会导致在idle1的退出延迟要大
-
cpuidle-psci: 现在基本所有的arm64平台是由这个driver,在这个driver中通过SMC Calling的方式,调用psci固件里面的CPU_SUSPEND接口,而CPU_SUSPEND的实现,会根据传入的power_state参数的不同,决定是否关闭cluster或cpu的电源
5.2 怎样进入idle
在arm平台中主要有下面两种方法:
-
执行wfi,但是在执行wfi之前,可能会关闭一些片上外设、cpu、cluster的电源,以达到区分不同等级的C state的目的
-
通过SMC Calling调用PSCI固件中的接口,(我暂时没有找到SMC Calling的实现代码,有大佬懂这一块的话可以在评论区留言,也可以向本站投稿发文)
文章评论
大佬,有找到SMC Calling的实现代码 这个吗?