Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use custom d-ary heap implementation #7017

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 8 additions & 3 deletions .github/workflows/osrm-backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -731,6 +731,8 @@ jobs:
sudo umount ~/benchmarks | true
rm -rf ~/benchmarks
mkdir -p ~/benchmarks

for i in {1..15}; do sudo cset shield --reset && break || echo "cset shield reset attempt $i failed"; sleep 1; done
# see https://llvm.org/docs/Benchmarking.html
- name: Run PR Benchmarks
run: |
Expand All @@ -744,7 +746,9 @@ jobs:

sudo cset shield --exec -- ./pr/scripts/ci/run_benchmarks.sh -f ~/benchmarks -r $(pwd)/pr_results -s $(pwd)/pr -b ~/benchmarks/build -o ~/data.osm.pbf -g ~/gps_traces.csv
sudo umount ~/benchmarks
sudo cset shield --reset

for i in {1..15}; do sudo cset shield --reset && break || echo "cset shield reset attempt $i failed"; sleep 1; done

- name: Run Base Benchmarks
run: |
sudo cset shield -c 2-3 -k on
Expand All @@ -760,9 +764,10 @@ jobs:
cp base/src/benchmarks/portugal_to_korea.json ~/benchmarks/test/data/portugal_to_korea.json
fi
# we intentionally use scripts from PR branch to be able to update them and see results in the same PR
sudo cset shield --exec -- cset shield --exec -- ./pr/scripts/ci/run_benchmarks.sh -f ~/benchmarks -r $(pwd)/base_results -s $(pwd)/pr -b ~/benchmarks/build -o ~/data.osm.pbf -g ~/gps_traces.csv
sudo cset shield --exec -- ./pr/scripts/ci/run_benchmarks.sh -f ~/benchmarks -r $(pwd)/base_results -s $(pwd)/pr -b ~/benchmarks/build -o ~/data.osm.pbf -g ~/gps_traces.csv
sudo umount ~/benchmarks
sudo cset shield --reset

for i in {1..15}; do sudo cset shield --reset && break || echo "cset shield reset attempt $i failed"; sleep 1; done
- name: Post Benchmark Results
run: |
python3 pr/scripts/ci/post_benchmark_results.py base_results pr_results
Expand Down
113 changes: 113 additions & 0 deletions include/util/d_ary_heap.hpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
#pragma once

#include <boost/assert.hpp>
#include <functional>
#include <limits>
#include <utility>
#include <vector>

namespace osrm::util
{
template <typename HeapData, int Arity, typename Comparator = std::less<HeapData>> class DAryHeap
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the reorder handler as a template parameter seems somewhat confusing.

Do have a need to specify multiple ones at all? If this is needed, should the be implemented inside the heap?

Can we possibly aim for a simpler implementation of this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the reorder handler as a template parameter seems somewhat confusing.

Well, the intention here was to be able to pass lambda and avoid using std::function which has certain performance impact... Such template parameters is frequent pattern for such things - if you want to pass lambda, compiler will infer type of this lambda for you in a most efficient way. At the same time I agree that probably having the fact that we pass only single type of function in there worth trying to make this more obvious (not sure it will be simple though, but I'll try 😄 )

If this is needed, should the be implemented inside the heap?

What do you mean by "inside the heap"? In my initial implementation DAryHeap wasn't a separate class and all code from here "lived" directly in QueryHeap, but I found it a bit messy and decided to decouple things a bit to keep code structure a bit similar to what we have now (QueryHeap and separate boost::d_ary_heap before and QueryHeap and separate DAryHeap after). I could try to return back to initial implementation...

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or you mean that implementation of reorderHandler should be a part of DAryHeap? I am not sure it is also doable without merging DAryHeap to QueryHeap - this reorderHandler is kind of a "glue" between DAryHeap and QueryHeap...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another option I see is... make it super straightforward. DAryHeap now is kind of generic DAryHeap implementation, but we could "couple" it to our needs, i.e. just take reference to inserted_nodes from QueryHeap and do all things we are currently doing in reorderHandler directly on that vector. Should also work without issues as well.

Anyway @DennisOSRM would love to hear your opinion on this please 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And one more blogpost with concrete example of why template can be more preferable here… https://wolchok.org/posts/cxx-trap-2-std-function/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DennisOSRM WDYT? Tbh I am not sure I have good ideas how to remove template parameter here - all ideas have their own drawbacks. May be you have some? Or may be worth introducing a bit of complexity in favor of performance? 😀

{
public:
using HeapHandle = size_t;

static constexpr HeapHandle INVALID_HANDLE = std::numeric_limits<size_t>::max();

public:
const HeapData &top() const { return heap[0]; }

size_t size() const { return heap.size(); }

bool empty() const { return heap.empty(); }

const HeapData &operator[](HeapHandle handle) const { return heap[handle]; }

template <typename ReorderHandler>
void emplace(HeapData &&data, ReorderHandler &&reorderHandler)
{
heap.emplace_back(std::forward<HeapData>(data));
heapifyUp(heap.size() - 1, std::forward<ReorderHandler>(reorderHandler));
}

template <typename ReorderHandler>
void decrease(HeapHandle handle, HeapData &&data, ReorderHandler &&reorderHandler)
{
BOOST_ASSERT(handle < heap.size());

heap[handle] = std::forward<HeapData>(data);
heapifyUp(handle, std::forward<ReorderHandler>(reorderHandler));
}

void clear() { heap.clear(); }

template <typename ReorderHandler> void pop(ReorderHandler &&reorderHandler)
{
BOOST_ASSERT(!heap.empty());
heap[0] = std::move(heap.back());
heap.pop_back();
if (!heap.empty())
{
heapifyDown(0, std::forward<ReorderHandler>(reorderHandler));
}
}

private:
size_t parent(size_t index) { return (index - 1) / Arity; }

size_t kthChild(size_t index, size_t k) { return Arity * index + k + 1; }

template <typename ReorderHandler> void heapifyUp(size_t index, ReorderHandler &&reorderHandler)
{
HeapData temp = std::move(heap[index]);
while (index > 0 && comp(temp, heap[parent(index)]))
{
size_t parentIndex = parent(index);
heap[index] = std::move(heap[parentIndex]);
reorderHandler(heap[index], index);
index = parentIndex;
}
heap[index] = std::move(temp);
reorderHandler(heap[index], index);
}

template <typename ReorderHandler>
void heapifyDown(size_t index, ReorderHandler &&reorderHandler)
{
HeapData temp = std::move(heap[index]);
size_t child;
while (kthChild(index, 0) < heap.size())
{
child = minChild(index);
if (!comp(heap[child], temp))
{
break;
}
heap[index] = std::move(heap[child]);
reorderHandler(heap[index], index);
index = child;
}
heap[index] = std::move(temp);
reorderHandler(heap[index], index);
}

size_t minChild(size_t index)
{
size_t bestChild = kthChild(index, 0);
for (size_t k = 1; k < Arity; ++k)
{
size_t pos = kthChild(index, k);
if (pos < heap.size() && comp(heap[pos], heap[bestChild]))
{
bestChild = pos;
}
}
return bestChild;
}

private:
Comparator comp;
std::vector<HeapData> heap;
};
} // namespace osrm::util
77 changes: 49 additions & 28 deletions include/util/query_heap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,10 @@
#include <boost/assert.hpp>
#include <boost/heap/d_ary_heap.hpp>

#include "d_ary_heap.hpp"
#include <algorithm>
#include <cstdint>
#include <cstdlib>
#include <limits>
#include <map>
#include <optional>
Expand Down Expand Up @@ -133,20 +135,17 @@ class QueryHeap
Weight weight;
Key index;

bool operator>(const HeapData &other) const
bool operator<(const HeapData &other) const
{
if (weight == other.weight)
{
return index > other.index;
return index < other.index;
}
return weight > other.weight;
return weight < other.weight;
}
};
using HeapContainer = boost::heap::d_ary_heap<HeapData,
boost::heap::arity<4>,
boost::heap::mutable_<true>,
boost::heap::compare<std::greater<HeapData>>>;
using HeapHandle = typename HeapContainer::handle_type;
using HeapContainer = DAryHeap<HeapData, 4>;
using HeapHandle = typename HeapContainer::HeapHandle;

public:
using WeightType = Weight;
Expand Down Expand Up @@ -178,11 +177,31 @@ class QueryHeap

void Insert(NodeID node, Weight weight, const Data &data)
{
checkInvariants();

BOOST_ASSERT(node < std::numeric_limits<NodeID>::max());
const auto index = static_cast<Key>(inserted_nodes.size());
const auto handle = heap.emplace(HeapData{weight, index});
inserted_nodes.emplace_back(HeapNode{handle, node, weight, data});
inserted_nodes.emplace_back(HeapNode{heap.size(), node, weight, data});

heap.emplace(HeapData{weight, index},
[this](const auto &heapData, auto new_handle)
{ inserted_nodes[heapData.index].handle = new_handle; });
node_index[node] = index;

checkInvariants();
}

void checkInvariants()
{
#ifndef NDEBUG
for (size_t handle = 0; handle < heap.size(); ++handle)
{
auto &in_heap = heap[handle];
auto &inserted = inserted_nodes[in_heap.index];
BOOST_ASSERT(in_heap.weight == inserted.weight);
BOOST_ASSERT(inserted.handle == handle);
}
#endif // !NDEBUG
}

Data &GetData(NodeID node)
Expand Down Expand Up @@ -216,16 +235,7 @@ class QueryHeap
{
BOOST_ASSERT(WasInserted(node));
const Key index = node_index.peek_index(node);

// Use end iterator as a reliable "non-existent" handle.
// Default-constructed handles are singular and
// can only be checked-compared to another singular instance.
// Behaviour investigated at https://lists.boost.org/boost-users/2017/08/87787.php,
// eventually confirmation at https://stackoverflow.com/a/45622940/151641.
// Corrected in https://github.com/Project-OSRM/osrm-backend/pull/4396
auto const end_it = const_cast<HeapContainer &>(heap).end(); // non-const iterator
auto const none_handle = heap.s_handle_from_iterator(end_it); // from non-const iterator
return inserted_nodes[index].handle == none_handle;
return inserted_nodes[index].handle == HeapContainer::INVALID_HANDLE;
}

bool WasInserted(const NodeID node) const
Expand Down Expand Up @@ -276,26 +286,30 @@ class QueryHeap
{
BOOST_ASSERT(!heap.empty());
const Key removedIndex = heap.top().index;
heap.pop();
inserted_nodes[removedIndex].handle = heap.s_handle_from_iterator(heap.end());
inserted_nodes[removedIndex].handle = HeapContainer::INVALID_HANDLE;

heap.pop([this](const auto &heapData, auto new_handle)
{ inserted_nodes[heapData.index].handle = new_handle; });
return inserted_nodes[removedIndex].node;
}

HeapNode &DeleteMinGetHeapNode()
{
BOOST_ASSERT(!heap.empty());
checkInvariants();
const Key removedIndex = heap.top().index;
heap.pop();
inserted_nodes[removedIndex].handle = heap.s_handle_from_iterator(heap.end());
inserted_nodes[removedIndex].handle = HeapContainer::INVALID_HANDLE;
heap.pop([this](const auto &heapData, auto new_handle)
{ inserted_nodes[heapData.index].handle = new_handle; });
checkInvariants();
return inserted_nodes[removedIndex];
}

void DeleteAll()
{
auto const none_handle = heap.s_handle_from_iterator(heap.end());
std::for_each(inserted_nodes.begin(),
inserted_nodes.end(),
[&none_handle](auto &node) { node.handle = none_handle; });
[&](auto &node) { node.handle = HeapContainer::INVALID_HANDLE; });
heap.clear();
}

Expand All @@ -305,20 +319,27 @@ class QueryHeap
const auto index = node_index.peek_index(node);
auto &reference = inserted_nodes[index];
reference.weight = weight;
heap.increase(reference.handle, HeapData{weight, static_cast<Key>(index)});
heap.decrease(reference.handle,
HeapData{weight, static_cast<Key>(index)},
[this](const auto &heapData, auto new_handle)
{ inserted_nodes[heapData.index].handle = new_handle; });
}

void DecreaseKey(const HeapNode &heapNode)
{
BOOST_ASSERT(!WasRemoved(heapNode.node));
heap.increase(heapNode.handle, HeapData{heapNode.weight, (*heapNode.handle).index});
heap.decrease(heapNode.handle,
HeapData{heapNode.weight, heap[heapNode.handle].index},
[this](const auto &heapData, auto new_handle)
{ inserted_nodes[heapData.index].handle = new_handle; });
}

private:
std::vector<HeapNode> inserted_nodes;
HeapContainer heap;
IndexStorage node_index;
};

} // namespace osrm::util

#endif // OSRM_UTIL_QUERY_HEAP_HPP
Loading