将std::unique与减少步骤相结合的算法?
有人能想出一个干净(和快速)的解决方案来解决以下问题:
- 我有一系列条目,基本上包含一个键和一个值,比如
struct Value {
int index = 0;
int cost = 0;
}
- 我现在想合并条目,这样每个键只包含一次,但值应该组合 - 即每个
index应该只包含在序列中一次,并且cost应该累积每个重复索引。
我想出的基本解决方案对序列进行排序,当在BinaryPredicate传递给中检测到相等的条目时std::sort,cost将相加到lhs. 然后将的成本rhs设置为 0。然后跟随一个remove_if删除 0 成本值。请参见此处的示例:
#include <cstdlib>
#include <vector>
#include <algorithm>
#include <iostream>
struct Value
{
int index = 0;
int cost = 0;
};
// generate a bunch of random values in a vector
// values will have indices in range [0..10]
std::vector<Value> generator()
{
std::vector<Value> v(20);
std::generate(v.begin(), v.end(), []() { return Value{std::rand() % 10, std::rand() % 10}; });
return v;
}
void print(const std::vector<Value> &values)
{
for (auto v : values)
std::cout << "{i=" << v.index << ", c=" << v.cost << "}, ";
std::cout << "n";
}
//
void merge(std::vector<Value> &values)
{
// sort values and merge costs
std::sort(values.begin(), values.end(), [](auto &lhs , auto &rhs) {
if (lhs.index == rhs.index) {
lhs.cost += rhs.cost;
rhs.cost = 0;
}
return lhs.index < rhs.index;
});
// remove entries with empty cost
auto it = std::remove_if(values.begin(), values.end(), [](const auto &v) { return v.cost == 0; });
values.erase(it, values.end());
}
int main()
{
auto v = generator();
std::cout << "generated values: ";
print(v);
merge(v);
std::cout << "merged values: ";
print(v);
}
在编译器资源管理器上直播
事情是:虽然上面的例子产生了正确的结果,但我认为它不符合 C++ 标准。一个BinaryPredicate“不得通过间接引用迭代器应用任何非恒定的功能” http://eel.is/c++draft/algorithms.requirements#8.sentence-4。比较是一个二元谓词。http://eel.is/c++draft/alg.sorting#general-2.sentence-1 )
这是否意味着我唯一的选择是推出自定义inplace_unique_reduce或类似的东西,或者是否有另一种优雅的方法来解决这个问题?我宁愿不必为此编写自己的非平凡算法。
谢谢
回答
Assuming you are ok with additional allocations, I would use std::map (or the std::unordered_map):
auto merge_entries(std::vector<Value>& original_values) {
auto values = std::map<int, int>();
for (const auto [index, cost] : original_values) {
values[index] += cost;
}
const auto end_of_merged_values = std::transform(
values.cbegin(), values.cend(), original_values.begin(),
[](const auto entry) {
return Value{entry.first, entry.second};
}
);
original_values.erase(end_of_merged_values, original_values.end());
}
Apart from one for() loop (which can be substituted with std::for_each, although such change would introduce unnecessary boilterplate resulting in harder to read code, in my opinion), this solution uses only the STL.
We first merge all the entries using the map and then we overwrite some elements so that our original std::vector holds the merged entries. What's super convenient is the fact that std::transform returns an iterator pointing to the end of the inserted range. Why is it beneficial for us? Because apart from the unlikely scenario where no merging occurs, we have fewer elements compared to what was originally passed in. Using that iterator we can erase the rest of the vector (nonoverwritten elements) keeping it clean, STL-like style.
Assuming you are not ok with additional allocations, but you are ok with streghtening your iterator requirements (to bidirectional), I would use std::partial_sum and std::unique:
template <class BiDirIt, class BinaryPredicateCompare, class BinaryOpReduce>
auto inplace_unique_reduce(
BiDirIt first, BiDirIt last,
BinaryPredicateCompare cmp,
BinaryOpReduce reduce
) {
std::partial_sum(
std::make_reverse_iterator(last), std::make_reverse_iterator(first),
std::make_reverse_iterator(last),
[cmp, reduce](auto acc, const auto& elem) {
if (cmp(acc, elem)) {
return reduce(acc, elem);
} else {
acc = elem;
}
return acc;
}
);
return std::unique(first, last, cmp);
}
used like so:
auto values = std::vector<Value>{
{1, 1}, {2, 2}, {2, 7}, {0, 5},
{3, 3}, {1, 2}, {3, 10}
};
auto comparator = [](const auto& lhs, const auto& rhs) {
return lhs.index == rhs.index;
};
auto reducer = [](const auto& lhs, const auto& rhs) {
return Value{lhs.index, lhs.cost + rhs.cost};
};
auto to_remove = inplace_unique_reduce(
values.begin(), values.end(),
comparator,
reducer
);
values.erase(to_remove, values.end());
for (const auto[index, cost] : values) {
std::cout << index << ' ' << cost << 'n';
}
Just like your original answer, this will not merge nonadjacent elements, but to do that you either have to sort them by index or use something like map, from the first part of my answer.
The std::make_reverse_iterator calls are necessary becauase std::partial_sum accumulates the merged element in the most right-hand side one of given group of consecutive, equivalent elements. std::unique, on the other hand, preserves only the first element from such groups. Because of this, you want to merge the elements in the reverse order compared to the one you will be std::unique-ing.
You raised some concerns about situations where copying or moving is expensive - in such cases, you are either left with your custom solutions that take into considerations your unique constraints, or you ease your constraints. Here we move-assign merged entries, but that's it for the potential bottlenecks. If your move assignment operator is expensive, I fear that no standard solution will work for you and you have to roll your own, like in your answer.