警惕ContentProvider的风险

背景

当下的Android App形态里,很大一部分App是多进程的。 假设某音乐类App,其有主进程(Main Process)和播放进程(Play Process)之分,二者之间通过ContentProvider进行大数据量访问。 前者负责提供ContentProvider,后者则访问该ContentProvider

通常,运行在操作系统之上的各个进程,其地址空间是独立的,运行起来互不干扰。 其中一个进程退出或崩溃,并不会影响其他进程,这种隔离是由操作系统保障的。 但是,在Android上似乎并不是这么符合直观。

假设当前因为Low Memory Killer、或者用户行为等因素,导致Main Process被杀或Crash,而此时Play Process正在访问ContentProvider,则其会怎么样呢? 本文将就这个问题,由一条日志线索展开,一起揭开ContentProvider与进程被杀的真相。

一宗播放停止问题

一直以来,该App存在播放暂停或停止的反馈,正好,不久前,在跟踪某一款手机上播放停止的问题时,意外地目睹到了一起“凶案现场”:

06-22 16:03:04.148   998  2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)

案情很简单,就是本来好好地听歌,突然就不播了。现场留下的这条日志,就是我们破解谜题的线索。 这条线索虽然言简意赅,但是似乎蕴含了不小的信息量,不妨望文生义下,大致的意思是说:

  1. 系统将要杀死进程net.poemcode.music:playservice
  2. net.poemcode.music:playservice依赖于Provider:net.poemcode.music/.sharedfileaccessor.ContentProviderImpl
  3. net.poemcode.music/.sharedfileaccessor.ContentProviderImpl“寄生”在进程net.poemcode.music里
  4. 进程net.poemcode.music即将挂掉

日志解读就这么多了,综合起来看有三点:

  1. A进程要挂了
  2. B进程正在使用A进程里的ContentProvider
  3. B进程也要被杀

日志逆向分析

Android提供了ContentProvider,这是绝大部分App都在采用的数据访问机制,而多进程如前文所讲,也是常见的方式。 如果两个进程因为ContentProvider关联在一起,其中一个终止,另外一个也会遭殃,看起来不那么合理。 现在就从代码着手,一起来看看原委。

经过搜索AOSP的代码,确认上述日志与ActivityManagerService1有紧密联系:

capp.kill("depends on provider "
        + cpr.name.flattenToShortString()
        + " in dying proc " + (proc != null ? proc.processName : "??")
        + " (adj " + (proc != null ? proc.setAdj : "??") + ")", true);

上述字符串在上述日志里,就是这一部分:

depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 0)

如果上述代码还不足以让人信服,那么再结合下面ProcessRecord2的代码,

void kill(String reason, boolean noisy) {
    if (!killedByAm) {
        Trace.traceBegin(Trace.TRACE_TAG_ACTIVITY_MANAGER, "kill");
        if (noisy) {
            Slog.i(TAG, "Killing " + toShortString() + " (adj " + setAdj + "): " + reason);
        }
        EventLog.writeEvent(EventLogTags.AM_KILL, userId, pid, processName, setAdj, reason);
        Process.killProcessQuiet(pid);
        ActivityManagerService.killProcessGroup(uid, pid);
        if (!persistent) {
            killed = true;
            killedByAm = true;
        }
        Trace.traceEnd(Trace.TRACE_TAG_ACTIVITY_MANAGER);
    }
}

注意Slong.i,就正好吻合上述日志的前半部分了:

Killing 16141:net.poemcode.music:playservice/u0a103 (adj 200):

现在找到了日志出处,那么来看回ActivityManagerService,较为完整的代码如下:

private final boolean removeDyingProviderLocked(ProcessRecord proc,
        ContentProviderRecord cpr, boolean always) {
    // ......
    for (int i = cpr.connections.size() - 1; i >= 0; i--) {
        ContentProviderConnection conn = cpr.connections.get(i);
        // ......
        ProcessRecord capp = conn.client;
        conn.dead = true;
        if (conn.stableCount > 0) {
            if (!capp.persistent && capp.thread != null
                    && capp.pid != 0
                    && capp.pid != MY_PID) {
                capp.kill("depends on provider "
                        + cpr.name.flattenToShortString()
                        + " in dying proc " + (proc != null ? proc.processName : "??")
                        + " (adj " + (proc != null ? proc.setAdj : "??") + ")", true);
            }
        }
        // ......
    }
    // ......
}

函数removeDyingProviderLocked一共在四个地方被调用,在上述问题场景下,到底是哪一个地方调用了这个方法呢? 先来看第一处,代码如下:

final boolean forceStopPackageLocked(String packageName, int appId,
        boolean callerWillRestart, boolean purgeCache, boolean doit,
        boolean evenPersistent, boolean uninstalling, int userId, String reason) {
    // ... ...
    ArrayList>ContentProviderRecord> providers = new ArrayList>>();
    if (mProviderMap.collectPackageProvidersLocked(packageName, null, doit, evenPersistent,
            userId, providers)) {
        if (!doit) {
            return true;
        }
        didSomething = true;
    }
    for (i = providers.size() - 1; i >= 0; i--) {
        removeDyingProviderLocked(null, providers.get(i), true);
    }
    // ... ...
}

这是第二处:

private void cleanupDisabledPackageComponentsLocked(
        String packageName, int userId, boolean killProcess, String[] changedClasses) {
    // ......
    // Clean-up disabled providers.
    ArrayList>ContentProviderRecord> providers = new ArrayList>>();
    mProviderMap.collectPackageProvidersLocked(
            packageName, disabledClasses, true, false, userId, providers);
    for (int i = providers.size() - 1; i >= 0; i--) {
        removeDyingProviderLocked(null, providers.get(i), true);
    }
    // ......
}

那么到底是不是上面这两处呢?其实并不是的。不妨函数removeDyingProviderLockedSlog的部分:

" (adj " + (proc != null ? proc.setAdj : "??")

如果removeDyingProviderLocked的参数procnull,则其日志输出里,应该类似(adj ??)。 因为现实日志里输出的是数字,而上述两个地方都将参数proc赋值为null,所以也就排除了这两个地方。

一共四处,那么接下来看第三处:

boolean cleanupAppInLaunchingProvidersLocked(ProcessRecord app, boolean alwaysBad) {
    // Look through the content providers we are waiting to have launched,
    // and if any run in this process then either schedule a restart of
    // the process or kill the client waiting for it if this process has
    // gone bad.
    boolean restart = false;
    for (int i = mLaunchingProviders.size() - 1; i >= 0; i--) {
        ContentProviderRecord cpr = mLaunchingProviders.get(i);
        if (cpr.launchingApp == app) {
            if (!alwaysBad && !app.bad && cpr.hasConnectionOrHandle()) {
                restart = true;
            } else {
                removeDyingProviderLocked(app, cpr, true);
            }
        }
    }
    return restart;
}

第四处:

private final boolean cleanUpApplicationRecordLocked(ProcessRecord app,
        boolean restarting, boolean allowRestart, int index, boolean replacingPid) {
    Slog.d(TAG, "cleanUpApplicationRecord -- " + app.pid);          
    // ......
    // Remove published content providers.
    for (int i = app.pubProviders.size() - 1; i >= 0; i--) {
        ContentProviderRecord cpr = app.pubProviders.valueAt(i);
        final boolean always = app.bad || !allowRestart;
        boolean inLaunching = removeDyingProviderLocked(app, cpr, always);
        if ((inLaunching || always) && cpr.hasConnectionOrHandle()) {
            // We left the provider in the launching list, need to
            // restart it.
            restart = true;
        }
 
        cpr.provider = null;
        cpr.proc = null;
    }
    app.pubProviders.clear();
    // ......
}

对比第三处和第四处,已经无法使用上面的方法进行对比。 这个时候不妨检查cleanUpApplicationRecordLocked中的第一行代码,

Slog.d(TAG, "cleanUpApplicationRecord -- " + app.pid);

如果日志中也出现了这行信息,那么就可以区分出处了。

06-22 16:03:04.146   998  2071 D ActivityManager: cleanUpApplicationRecord -- 19826
06-22 16:03:04.147   998  2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.service.MainService in 1000ms
06-22 16:03:04.147   998  2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.business.lockscreen.LockScreenService in 11000ms
06-22 16:03:04.148   998  2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)

由此,可以通过其可以判断调用位置应该是在第四处。当线索调查到这里,再往下继续追溯就变得非常困难了。

通过cleanUpApplicationRecordLocked的注释可以了解到,该方法不仅会在进程将死的时机调用,也会在直接停止进程时使用。

/**
 * Main code for cleaning up a process when it has gone away.  This is
 * called both as a result of the process dying, or directly when stopping
 * a process when running in single process mode.
 *
 * @return Returns true if the given process has been restarted, so the
 * app that was passed in must remain on the process lists.
 */

这个时候继续调查日志:

06-22 16:03:04.145   998  2071 I ActivityManager: Process net.poemcode.music (pid 19826) has died
06-22 16:03:04.146   998  2071 D ActivityManager: cleanUpApplicationRecord -- 19826
06-22 16:03:04.147   998  2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.service.MainService in 1000ms
06-22 16:03:04.147   998  2071 W ActivityManager: Scheduling restart of crashed service net.poemcode.music/.business.lockscreen.LockScreenService in 11000ms
06-22 16:03:04.148   998  2071 I ActivityManager: Killing 12398:net.poemcode.music:playservice/u0a59 (adj 200): depends on provider net.poemcode.music/.sharedfileaccessor.ContentProviderImpl in dying proc net.poemcode.music (adj 200)

从中可以看出Main Process已经已死,从而可以判断出cleanUpApplicationRecordLocked被调用在handleAppDiedLocked

/**
 * Main function for removing an existing process from the activity manager
 * as a result of that process going away.  Clears out all connections
 * to the process.
 */
private final void handleAppDiedLocked(ProcessRecord app,
        boolean restarting, boolean allowRestart) {
    int pid = app.pid;
    boolean kept = cleanUpApplicationRecordLocked(app, restarting, allowRestart, -1,
            false /*replacingPid*/);
    // ... ...      
}

顺水推舟,继而确定appDiedLocked调用了上述方法。

final void appDiedLocked(ProcessRecord app, int pid, IApplicationThread thread,
        boolean fromBinderDied) {
    // ... ...  
        if (!app.killedByAm) {
            Slog.i(TAG, "Process " + app.processName + " (pid " + pid
                    + ") has died");
            mAllowLowerMemLevel = true;
        } else {
            // Note that we always want to do oom adj to update our state with the
            // new number of procs.
            mAllowLowerMemLevel = false;
            doLowMem = false;
        }
        EventLog.writeEvent(EventLogTags.AM_PROC_DIED, app.userId, app.pid, app.processName);
        if (DEBUG_CLEANUP) Slog.v(TAG_CLEANUP,
            "Dying app: " + app + ", pid: " + pid + ", thread: " + thread.asBinder());
        handleAppDiedLocked(app, false, true);
    // ......   
}

最后追根溯源,找到这里:

private final class AppDeathRecipient implements IBinder.DeathRecipient {
    // ......
    @Override
    public void binderDied() {
        if (DEBUG_ALL) Slog.v(
            TAG, "Death received in " + this
            + " for thread " + mAppThread.asBinder());
        synchronized(ActivityManagerService.this) {
            appDiedLocked(mApp, mPid, mAppThread, true);
        }
    }
    // ......
}

Leave a comment

Your comment