Homec4science

Dramatically improve subprocess I/O for large buffers

Authored by epriestley <git@epriestley.com> on Dec 15 2013, 18:42.

Description

Dramatically improve subprocess I/O for large buffers

Summary:
Ref T4189. When we write() a large buffer to an ExecFuture, the fwrite() loop currently looks like this:

$bytes_written = fwrite($socket, $buffer);
$buffer = substr($buffer, $bytes_written);

This is normally fine, but substr() is approximately O(N) in the size of the new string, since it has to allocate and copy it.

Instead, add PhutilRope, which stores a string as a list of small buffers. This allows us to remove bytes from the beginning of the string very cheaply.

In particular, this can occur when you git push a very large repository. If we read off the network faster than we write to git receive-pack, we end up with a very large internal buffer which is expensive and slow to write through.

Test Plan:
I ran this test script before and after the changes:

<?php

require_once 'scripts/__init_script__.php';

$large = str_repeat('x', 1024 * 1024 * $argv[1]);

echo "OK...\n";

$future = new ExecFuture('cat');
$future->write($large);
$future->resolvex();

echo "OK.\n";

A 32MB write took 16s before the change and 400ms afterward. Generally, cost is close to O(N) now and was close to O(N^2) before, in the size of the buffer.

Reviewers: btrahan, zeeg

Reviewed By: btrahan

CC: aran, FacebookPOC

Maniphest Tasks: T4189

Differential Revision: https://secure.phabricator.com/D7768

Details

Committed
epriestley <git@epriestley.com>Dec 15 2013, 18:42
Pushed
aubortMar 17 2017, 12:03
Parents
rPHU73367c774eb7: Add updateEnv() to ExecFuture for adjusting environmental variables
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHU59114868de66: Dramatically improve subprocess I/O for large buffers (authored by epriestley <git@epriestley.com>).Dec 15 2013, 18:42