iOS Crash 六

对比KSCrash发现的问题

检测内容对比

从KSCrash github中的描述可以看到，支持的功能更全：

Mach kernel exceptions
Fatal signals
C++ exceptions
Objective-C exceptions
Main thread deadlock (experimental)
Custom crashes (e.g. from scripting languages)

当前测试阶段检测的crash包含：

OC exceptions
fatal signals

对比之下似乎检测的内容比较少，其实不然，在真正报错的内容占比来看 OC exception > fatal signals >>> C++ exceptions > mach kernel exceptions,这么来看的话，KSCrash相当于增加crash的检测点，但是带来的增长收益并不会太大。但是，从监测的角度上来看KSCrash的设计还是很全面的

crash 信息收集对比

KSCrash 收集的信息和系统保存的快照信息差不多，包含的内容有：

硬件信息
软件信息
线程信息
堆栈信息（和线程相关）
镜像地址加载信息
寄存器信息
自定义信息入口

当前测试用例中设计的信息包含

硬件信息
软件信息
堆栈信息（只是当前crash相关）
镜像地址加载信息（部分）
寄存器信息
自定义信息入口

从定位crash问题的角度出发，当前奔溃堆栈的信息 + reason 基本上就可以找到出错代码的位置了。辅助线程起到的作用很少，同时系统列举的镜像文件并不是所有都能用到，所以测试信息中，只是将关联堆栈中的镜像名称的镜像地址加以上传。上传的size上自定义的crash工具会小的多

发送方式

KSCrash集成了多种发送渠道：

邮件
文件上传
…….

当前自定义只支持上传操作

删除策略

KSCrash支持多种

当前自定义方式只支持一种，发送成功删除本地文件

符号化过程

KSCrash支持json文件格式上传，相当于系统格式的crash文件拆解成对应的json文件。同时也支持将json转换成对应的系统文件格式。在符号化的过程中可以选择对应的处理方式：

json格式处理使用 atos
系统类型 crash 使用symbolicatecrash 脚本

自定义的crash文件使用json格式上传数据，在解析上方便很多，对应的符号化过程使用atos

得出的结论

两种方式都存在自己的优点和缺点，综合来看，KSCrash定制的场景，有些功能是当前不需要使用的，例如：

发送方式
删除策略
冗余的文件内容

当前测试用例中也存在同样的问题

监测维度不够

知道了优缺点，可以将两者融合一下，抽出KSCrash中核心的监测模块，自定义发送场景和删除策略，同时在上传数据中过滤冗余字段。定制成符合公司业务场景的crash工具

atos 和 symbolicatecrash 效率问题

测试用例中的符号解析是使用atos的方式来做的，具体的实现是拿到上报json中的堆栈数组来做逐行解析，从执行的效率来看不是很好，带着这样的疑问看了一下symbolicationcrash文件。

......
# run atos
sub symbolize_frames {
    my ($images,$bt,$is_spindump_report) = @_;

    # create mapping of framework => address => bt frame (adjust for slid)
    # and for framework => arch
    my %frames_to_lookup = ();
    my %arch_map = ();
    my %base_map = ();
    my %image_map = ();

    for my $k (keys %$bt) {
        my $frame = $$bt{$k};
        my $lib = $$images{$$frame{bundle}};
        unless($lib) {
            # don't know about it, can't symbol
            # should have already been warned about this!
            # print STDERR "Skipping unknown $$frame{bundle}\n";
            delete $$bt{$k};
            next;
        }

        # list of address to lookup, mapped to the frame object, for
        # each library
        $frames_to_lookup{$$lib{symbol}}{$$frame{address}} = $frame;
        $arch_map{$$lib{symbol}} = $$lib{arch};
        $base_map{$$lib{symbol}} = $$lib{base};
        $image_map{$$lib{symbol}} = $lib;
    }

    # run atos for each library
    while(my($symbol,$frames) = each(%frames_to_lookup)) {
        # escape the symbol path if it contains single quotes
        my $escapedSymbol = $symbol;
        $escapedSymbol =~ s/\'/\'\\'\'/g;

        # run atos with the addresses and binary files we just gathered
        my $arch = $arch_map{$symbol};
        my $base = $base_map{$symbol};
        my $lib = $image_map{$symbol};
        my $cmd = "'$atos' -arch $arch -l $base -o '$escapedSymbol' @{[ keys %$frames ]} | ";

        print STDERR "Running $cmd\n" if $opt_verbose;

        open my($ph),$cmd or die $!;
        my @symbolled_frames = map { chomp; $_ } <$ph>;
        close $ph or die $!;

        my $references = 0;

        foreach my $symbolled_frame (@symbolled_frames) {

            my ($library, $source) = ($symbolled_frame =~ /\s*\(in (.*?)\)(?:\s*\((.*?)\))?/);
            $symbolled_frame =~ s/\s*\(in .*?\)//; # clean up -- don't need to repeat the lib here

            if ($is_spindump_report) {
                # Source is formatted differently for spindump
                $symbolled_frame =~ s/\s*\(.*?\)//; # remove source info from symbol string

                # Spindump may not have had library names, pick them up here
                if (defined $library && !(defined $$lib{path} && length($$lib{path})) && !(defined $$lib{new_path} && length($$lib{new_path})) ) {
                    $$lib{new_path} = $library;
                    print STDERR "Found new name for $$lib{uuid}: $$lib{new_path}\n" if ( $opt_verbose );
                }
            }


            # find the correct frame -- the order should match since we got the address list with keys
            my ($k,$frame) = each(%$frames);

            if ( $symbolled_frame !~ /^\d/ ) {
                # only symbolicate if we fetched something other than an address

                my $offset = $$frame{offset};
                if (defined $offset) {
                    # add offset from unsymbolicated frame after symbolicated name
                    $symbolled_frame =~ s|(.+)\(|$1."+ ".$offset." ("|e;
                }

                if ($is_spindump_report) {
                    # Spindump formatting
                    if (defined $library) {
                        $symbolled_frame .= " (";
                        if (defined $source) {
                            $symbolled_frame .= "$source in ";
                        }
                        $symbolled_frame .= "$library + " . (hex($$frame{raw_address}) - hex($base)) . ")";
                    }
                    $symbolled_frame .= " [$$frame{raw_address}]";
                }

                $$frame{symbolled} = $symbolled_frame;
                $references++;
            }

        }

        if ( $references == 0 ) {
            if ( ! $is_spindump_report) { # Bad addresses aren't uncommon in microstackshots and stackshots
                print STDERR "## Warning: Unable to symbolicate from required binary: $symbol\n";
            }
        }
    }

    # just run through and remove elements for which we didn't find a
    # new mapping:
    while(my($k,$v) = each(%$bt)) {
        delete $$bt{$k} unless defined $$v{symbolled};
    }
}

......

当前symbolicatecrash 符号一个系统的crash文件需要2s左右的时间。从内容上看，symbolicatecrash文件做的事情也是

解析header
解析线程
解析堆栈
解析镜像文件地址
逐行使用 atos 完成可读符号最后一步
组装数据

总结

测试用例中的解析思路没有问题，导致时间消耗久的原因可能是：

find 命令搜索指定文件耗时
Process 每次执行命令新开资源消耗的问题

优化思路：更改文件查找的方式和swift server执行shell的方式看效率会不会得到提升