Homec4science

Reduce initial discovery from O(branches * commits) to O(commits)

Authored by epriestley <git@epriestley.com> on Feb 28 2014, 22:02.

Description

Reduce initial discovery from O(branches * commits) to O(commits)

Summary:
Fixes T4414. Currently, when we discover a new repository, we do something like this:

foreach (branch) {
  foreach (commit on this branch) {
    do_something();
  }
}

In cases where there are a lot of branches which mostly just branch master, this leads to us doing roughly O(branches * commits) work.

We have a commitCache to prevent this, but it has two problems:

  • It only fills out of the DB, and we do this whole thing before writing to the DB, which is the entire point.
  • It has a fixed size (2048) and on initial discovery we're usually dealing with way more commits than that.

Instead, also stop doing work if we hit a commit which is known already.

Test Plan:

  • Added print on the number of discovered refs and number of unique refs.
  • Ran bin/repository discover --repair X on a repo with several branches.
  • Before the patch, got 397 refs and 135 unique refs.
  • After the patch, got 135 refs and 135 unique refs.

Reviewers: btrahan

Reviewed By: btrahan

CC: aran

Maniphest Tasks: T4414

Differential Revision: https://secure.phabricator.com/D8374

Details

Committed
epriestley <git@epriestley.com>Feb 28 2014, 22:02
Pushed
aubortJan 31 2017, 17:16
Parents
rPHe28b78f5ebe3: Fix two issues with ref discovery
Branches
Unknown
Tags
Unknown

Event Timeline

epriestley <git@epriestley.com> committed rPHd86bb086cac3: Reduce initial discovery from O(branches * commits) to O(commits) (authored by epriestley <git@epriestley.com>).Feb 28 2014, 22:02