2017-02-23发表Linux dev12 分钟读完 (大约1737个字)0次访问

Linux 3.4中acpi_pad的一个Bug分析

今天，同事被Bug #42981坑了，看了同事发的文章，觉得有必要分析下这个bug。这篇博客主要讲acpi_pad是如何工作的。

模块注册

内核模块在加载的时候首先会执行init函数，acpi_pad注册的init函数是acpi_pad_init。acpi_pad_init最终调用driver_register来将acpi_pad_driver.drv 注册到系统中。

static struct acpi_driver acpi_pad_driver = {
    .name = "processor_aggregator",
    .class = ACPI_PROCESSOR_AGGREGATOR_CLASS,
    .ids = pad_device_ids,
    .ops = {
        .add = acpi_pad_add,
        .remove = acpi_pad_remove,
    },
};

没有 .drv 字段？看下struct acpi_driver 的定义：

struct acpi_driver {
        char name[80];
        char class[80];
        const struct acpi_device_id *ids; /* Supported Hardware IDs */
        unsigned int flags;
        struct acpi_device_ops ops;
        struct device_driver drv;
        struct module *owner;
};

这边需要注意的是，acpi_driver里面直接嵌套了一个device_driver结构体，而不是用指针引用，这一点很重要。

但是，acpi_pad_driver.drv没有初始化！后来找了找，发现了初始化的代码（在acpi_bus_register_driver中）：

1
2
3

driver->drv.name = driver->name;
driver->drv.bus = &acpi_bus_type;
driver->drv.owner = driver->owner;

这个时候，driver是指向acpi_pad_driver的指针。

acpi_bus_type的定义如下：

struct bus_type acpi_bus_type = {
        .name           = "acpi",
        .suspend        = acpi_device_suspend,
        .resume         = acpi_device_resume,
        .match          = acpi_bus_match,
        .probe          = acpi_device_probe,
        .remove         = acpi_device_remove,
        .uevent         = acpi_device_uevent,
};

注册了driver之后，我们就应该关注acpi_device_probe函数了，这个函数真正在sysfs中创建了idlecpus文件（这个文件是用户控制acpi_pad特性的入口）。

static int acpi_device_probe(struct device * dev) 函数是被内核调用的，相当于回调：

static int acpi_device_probe(struct device * dev)
{
        struct acpi_device *acpi_dev = to_acpi_device(dev);
        struct acpi_driver *acpi_drv = to_acpi_driver(dev->driver);
        int ret;

        ret = acpi_bus_driver_init(acpi_dev, acpi_drv);
        //。。。。。。
        return ret;
}

to_acpi_driver就是container_of宏，可以将struct acpi_driver的drv的地址，转化微acpi_driver的地址（就是根据子结构体地址，获取父级结构体地址）：

1
2
3

#define container_of(ptr, type, member) ({                      
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    
        (type *)( (char *)__mptr - offsetof(type,member) );})

acpi_device_probe函数最终在acpi_bus_driver_init中调用了acpi_pad_driver.ops.add 函数，即acpi_pad_add函数。最终使用在acpi_pad_add_sysfs中将idlecpus绑定到了sysfs：

static int acpi_pad_add_sysfs(struct acpi_device *device)
{
        int result;
        result = device_create_file(&device->dev, &dev_attr_idlecpus);
        //。。。。。。
        return 0;
}

dev_attr_idlecpus的定义：

1
2
3

static DEVICE_ATTR(idlecpus, S_IRUGO|S_IWUSR,
        acpi_pad_idlecpus_show,
        acpi_pad_idlecpus_store);

被展开为结构体变量定义struct device_attribute dev_attr_idlecpus。

该文件的读写函数分别是acpi_pad_idlecpus_show和acpi_pad_idlecpus_store。

至此，acpi_pad模块加载完成，idlecpus文件也在sysfs中加载完成了。

通过acpi_pad修改cpu状态

根据bug重现说明：

to make the failure more likely:

# echo 1 > rrtime
# echo 31 > idlecpus; echo 0 > idlecpus
# echo 31 > idlecpus; echo 0 > idlecpus
# echo 31 > idlecpus; echo 0 > idlecpus

(it usually takes only a few attempts)

etc. until the echo does not return

我们通过idlecpus节点，先空置31个cpu，再激活，多试几次就可以重现该bug了。

在此过程中，调用了acpi_pad_idlecpus_store函数：

static ssize_t acpi_pad_idlecpus_store(struct device *dev,
        struct device_attribute *attr, const char *buf, size_t count)
{
        unsigned long num;
        if (strict_strtoul(buf, 0, &num))
                return -EINVAL;
        mutex_lock(&isolated_cpus_lock);
        acpi_pad_idle_cpus(num);
        mutex_unlock(&isolated_cpus_lock);
        return count;
}

将用户输入的buf转化为long，获取isolated_cpus_lock锁（这个就导致了前面提到的bug）。然后通过acpi_pad_idle_cpus将用户需要的cpu数置空：

static void acpi_pad_idle_cpus(unsigned int num_cpus)
{
        // 对cpu想关数据加锁
        get_online_cpus();

        // 当前在线cpu，将要置空的cpu 这两个数字，选一个小的
        num_cpus = min_t(unsigned int, num_cpus, num_online_cpus());
        // 将置空的cpu数目同步到num_cpus个
        set_power_saving_task_num(num_cpus);
        // 对cpu相关数据解锁
        put_online_cpus();
}

set_power_saving_task_num的逻辑很简单，根据当前的power_saving_thread线程数量，少了就通过create_power_saving_task补足，多了就通过destroy_power_saving_task去掉一些。

destory_power_saving_task调用kthread_stop来结束多余的power_saving_thread线程。kthread_stop设置对应kthread的should_stop为1，然后等待该kthread结束：

1
2
3

kthread->should_stop = 1;
wake_up_process(k);
wait_for_completion(&kthread->exited);

但是它在等待kthread结束的时候，还拿着isolated_cpus_lock这个锁呢！！

我们来看下power_saving_thread到底干了啥，导致了bug。

static int power_saving_thread(void *data)
{
        //。。。。。。

        while (!kthread_should_stop()) {
                int cpu;
                u64 expire_time;

                try_to_freeze();

                /* round robin to cpus */
                if (last_jiffies + round_robin_time * HZ < jiffies) {
                        last_jiffies = jiffies;
                        round_robin_cpu(tsk_index);
                }
                //。。。。。。
        }
        //。。。。。。
}

看起来，没有问题，我们来看下round_robin_cpu的代码：

static void round_robin_cpu(unsigned int tsk_index)
{
        //。。。。。。
        mutex_lock(&isolated_cpus_lock);
        cpumask_clear(tmp);
        // 。。。。。
        mutex_unlock(&isolated_cpus_lock);

        set_cpus_allowed_ptr(current, cpumask_of(preferred_cpu));
}

代码中有对isolated_cpus_lock加锁的操作，这导致了这个bug。

Bug是如何出现的

一边，acpi_pad_idlecpus_store函数拿到ioslated_cpus_lock锁，调用kthread_stop等待power_saving_thread结束。

另一边，要结束的这个kthread/power_saving_thread，在round_robin_cpu函数中，等待isolated_cpu_lock锁。两个kthread互相等待，成了死锁。

参考资料：

还有一个问题，就是死锁的情况下，为啥会导致cpu飙高？这个以后有机会再看吧。

Linux 3.4中acpi_pad的一个Bug分析

https://robberphex.com/acpi_pad-bug-in-linux-3-4/

作者

Robert Lu

发布于

2017-02-23

许可协议

#Linux

Linux 3.4中acpi_pad的一个Bug分析

模块注册

通过acpi_pad修改cpu状态

Bug是如何出现的

作者

发布于

许可协议

评论

目录