Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High Memory Usage extended diffing two arrays of quite differently ordered results #77

Open
tysonkerridge opened this issue Apr 21, 2021 · 0 comments

Comments

@tysonkerridge
Copy link

tysonkerridge commented Apr 21, 2021

Hello, I've come across a particular case where I'm seeing a really high memory usage and wanted to make sure I wasn't doing something wrong or extended diffing in some unintended way.

I was extended diffing two seemingly small arrays of items which happened to be the same (or very similar, give or take a few items) but in two different orders. Though my data was from a different source, I've reproduced the issue with the following example but just creating an array of 5,000 random Ints, uniquing and then shuffling the first array of items to create the second set of items to diff.

DispatchQueue(label: "test").async {
   let pool = (0...999_999)
   let getRandom: () -> [Int] = {
       (0..<5_000).map { _ in
           pool.randomElement()!
       }
   }
   let previous = Set(getRandom()).shuffled()
   let current = previous.shuffled()
   let before = Date()
   let _ = previous.extendedDiff(current)
   print("finished. took:", Date().timeIntervalSince(before))
}

Running this code appears to reach a memory usage of around ~1GB, and take ~27 seconds.
For reference, 10,000 items seems to hit ~6GB and take ~111 seconds.

Also for reference, using two random sets of 10,000 items also hits about ~5GB and takes ~113 seconds. eg.:

let previous = Set(getRandom()).sorted()
let current = Set(getRandom()).sorted()

The reason this is a problem is that iOS (or at least debugging device in Xcode) seems to have a memory limit of about 2GB before you get terminated.

In my case, all I really care about is the deleted indexes and not the order, so I've found that in the case where the two arrays are very similar but in different orders, if I sort both arrays of items beforehand so that they're as similar as possible and then run the diff, it takes negligible memory and finishes in no time at all even with 10,000 items. Unfortunately in the case of two random sets of items, sorting won't help and will hit those limits/take a while.

I just wanted to make sure that I'm not doing something else wrong here, or if there's a different approach I should be taking for extended diffing two sets of items that could for all I know be two sets of different items.

Edit: Clarify that I was extended diffing (both seem to be problematic, but still)

@tysonkerridge tysonkerridge changed the title High Memory Usage diffing two arrays of quite differently ordered results High Memory Usage extended diffing two arrays of quite differently ordered results Apr 22, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant