背景介绍

有一台 Pterodactyl 后端服务器的 wings 程序高频崩溃。查看 log, 未发现错误记录,但是进程的资源占用却明显偏高。

故障排查

首先把 log 输出改成 debug 模式。

...
  "logger": {
    "path": "logs/",
    "src": false,
    "level": "debug",
    "period": "1d",
    "count": 3
  },
...

启动程序,得到如下错误报告。

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory

<--- Last few GCs --->

[14696:0x34b0970]   276848 ms: Scavenge 1396.0 (1423.6) -> 1395.4 (1424.1) MB, 12.7 / 0.0 ms  (average mu = 0.136, current mu = 0.083) allocation failure 
[14696:0x34b0970]   276869 ms: Scavenge 1396.2 (1424.1) -> 1395.5 (1424.6) MB, 12.1 / 0.0 ms  (average mu = 0.136, current mu = 0.083) allocation failure 
[14696:0x34b0970]   276905 ms: Scavenge 1396.4 (1424.6) -> 1395.7 (1425.1) MB, 26.1 / 0.0 ms  (average mu = 0.136, current mu = 0.083) allocation failure 


<--- JS stacktrace --->

==== JS stack trace =========================================

    0: ExitFrame [pc: 0x3142f05dbe1d]
    1: StubFrame [pc: 0x3142f05934b0]
Security context: 0x39d13a39e6c1 <JSObject>
    2: /* anonymous */(aka /* anonymous */) [0x3cb5c7702201] [/srv/daemon/src/controllers/fs.js:~478] [pc=0x3142f0865edb](this=0x3df4da2826f1 <undefined>,item=0x1c9ff7af9611 <String[17]: r.-7678.26706.mca>,eachCallback=0x3ec27852af91 <JSFunction (sfi = 0x25a91d9d45b1)>)
    3: /* anonymous */(aka /* anonymous */) [0x3cb5...

 1: 0x8fb090 node::Abort() [node]
 2: 0x8fb0dc  [node]
 3: 0xb031ce v8::Utils::ReportOOMFailure(v8::internal::Isolate*, char const*, bool) [node]
 4: 0xb03404 v8::internal::V8::FatalProcessOutOfMemory(v8::internal::Isolate*, char const*, bool) [node]
 5: 0xef7462  [node]
 6: 0xef7568 v8::internal::Heap::CheckIneffectiveMarkCompact(unsigned long, double) [node]
 7: 0xf03642 v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) [node]
 8: 0xf03f74 v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) [node]
 9: 0xf06be1 v8::internal::Heap::AllocateRawWithRetryOrFail(int, v8::internal::AllocationSpace, v8::internal::AllocationAlignment) [node]
10: 0xed0064 v8::internal::Factory::NewFillerObject(int, bool, v8::internal::AllocationSpace) [node]
11: 0x11701ee v8::internal::Runtime_AllocateInNewSpace(int, v8::internal::Object**, v8::internal::Isolate*) [node]
12: 0x3142f05dbe1d

提示内存不足,但是宿主机剩余内存十分充足。因 Pterodactyl 后端为 Nodejs 所写(和 5E 反作弊系统一样),考虑默认分配的内存不足。尝试加入启动参数 --max_old_space_size=32768. 再次启动程序,得到错误报告(敏感信息已打码)。

{"name":"wings","hostname":"localhost.localdomain","pid":47492,"server":"x-x-x-x-x","level":50,"err":{"message":"stream.push() after EOF","name":"Error [ERR_STREAM_PUSH_AFTER_EOF]","stack":"Error [ERR_STREAM_PUSH_AFTER_EOF]: stream.push() after EOF\n    at readableAddChunk (_stream_readable.js:257:32)\n    at SSH2Stream.Readable.push (_stream_readable.js:224:10)\n    at SSH2Stream.Transform.push (_stream_transform.js:151:32)\n    at SSH2Stream.push (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:257:18)\n    at send_ (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:5349:18)\n    at send (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:5274:12)\n    at SSH2Stream.ping (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:833:10)\n    at Timeout.Manager._timerfn [as _onTimeout] (/srv/daemon/node_modules/ssh2/lib/keepalivemgr.js:30:13)\n    at ontimeout (timers.js:436:11)\n    at tryOnTimeout (timers.js:300:5)","code":"ERR_STREAM_PUSH_AFTER_EOF"},"stack":"Error [ERR_STREAM_PUSH_AFTER_EOF]: stream.push() after EOF\n    at readableAddChunk (_stream_readable.js:257:32)\n    at SSH2Stream.Readable.push (_stream_readable.js:224:10)\n    at SSH2Stream.Transform.push (_stream_transform.js:151:32)\n    at SSH2Stream.push (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:257:18)\n    at send_ (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:5349:18)\n    at send (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:5274:12)\n    at SSH2Stream.ping (/srv/daemon/node_modules/ssh2-streams/lib/ssh.js:833:10)\n    at Timeout.Manager._timerfn [as _onTimeout] (/srv/daemon/node_modules/ssh2/lib/keepalivemgr.js:30:13)\n    at ontimeout (timers.js:436:11)\n    at tryOnTimeout (timers.js:300:5)","identifier":"x","msg":"An exception was encountered while handling the SFTP subsystem.","time":"2020-04-15T08:47:49.101Z","v":0}

现在可以确定问题出现在翼龙内置的 SFTP 系统了。这个时候能做的事情就多了。

解决方案

我的解决方案是将 SFTP 系统独立出来,不用 wings 内置的那个。这样就算是因为 SFTP 出了问题,爆猫的也只是单独的 SFTP 系统。wings 还是可以正常跑的。
具体如何启用参考 Pterodactyl 的文档。

除另有声明外,本博客文章均采用 知识共享(Creative Commons) 署名-非商业性使用-相同方式共享 4.0 国际许可协议 进行许可。