A little puzzle for CppCon

As CppCon begins, Stevens Capital Management is running an SCM Challenge quiz with questions provided by some CppCon speakers.(Creating a little login is required, in part so you can save progress, but they promise not to spam you.)

I’ve contributed a simple little question that’s directly related to my CppCon closing plenary session on Friday. By “simple little” I mean that my question itself is simple and can be illustrated well in a handful of short lines of code. Solving it with a valid answer is another matter, however. I wonder how many people will be able to give reasonable solutions[*] that pass the three short test cases…

Take a look, think about the question, and see if you can come up with a solution that works. Feel free to use the comments below to collaborate with others to solve the puzzle; if you’re at CppCon, use the great face time with other attendees to share possible solutions. If you succeed in writing a solution that prints “true” three times, they’ll email it to me and I’ll review it.

In my talk on Friday, I’ll present solutions to this and other lifetime problems. For those who aren’t here at CppCon in person, you’ll be able to catch the talk online on YouTube about a week later. (As always, all CppCon talks are being recorded and will be posted, but it will take a month or so to process and post all of the sessions — it takes a while to do professional-grade production for over 100 talks! Processing the plenary session videos is being expedited so that we can get those ones up quickly.)

I hope you enjoy the puzzle!

 

[*] No fair hardcoding a hack that artificially prints the right results for just these three cases. I know how to do that too, but that’s evading the problem, not solving it. :)

 

40 thoughts on “A little puzzle for CppCon

  1. will this be acceptable ?

    class MyGraph {
    public:
        class Node : public Counter {
            vector<shared_ptr<Node>> children;
            vector<weak_ptr<Node>> o_children;
            unsigned int id;
            static unsigned int acc; 
        public:
            Node() : id( acc++ ) {}
            void AddChild(const shared_ptr<Node>& node) {
                if ( id < node->id ) children.push_back(node);
                else o_children.push_back( node );
            }
            void RemoveChild(const shared_ptr<Node>& node) {
                auto it = std::find( children.begin(), children.end(), node );
                if ( it != children.end() ) children.erase( it );
                o_children.erase(
                    std::remove_if( o_children.begin(), o_children.end(), 
                    [node](const weak_ptr<Node> & i ){ 
                        auto sp_i = i.lock();
                        if ( !sp_i ) return true;
                        return node.get() == sp_i.get();
                        }),
                    o_children.end() );
            }
        };
        void SetRoot(const shared_ptr<Node>& node) {root = node;    }
        void ShrinkToFit() {}
        static auto MakeNode() { return make_shared<MyGraph::Node>(); }
    private:
        shared_ptr<Node> root;
    };
    unsigned int MyGraph::Node::acc = 0;
    
  2. I agree with intvnut. This API seems contrived to make a clean solution impossible (without introducing something with a program-global view, like a GC). If I were building an API for cyclic directed graphs, I wouldn’t set it up this way.

    Making Add- and RemoveChild members of Node rather than MyGraph prevents any sort of graph-wide analysis at those steps. They cannot even queue things up for the ShrinkToFit method. Likewise, making MakeNode a static member of MyGraph (why not a static member of Node?).

    But graph-wide analysis is not enough, so why is ShrinkToFit a member of MyGraph rather than a global function (or static method)? After all, a single MyGraph won’t have the visibility necessary to deal with things the current API allows, like splicing two distinct MyGraphs into one.

    The fact that you can change the root at any time also means you could disconnect a subtree (that you would then expect to go away), but then change the root to a node within that subtree and completely change the set of Nodes that need to be cleaned up.

    The proposed API has weird separation (and conflation) of concerns. For example, creating a node is completely distinct from inserting one into the graph, yet removing one from the graph probably should be associated with the node’s destruction.

    Solutions that depend on arbitrarily deep stacks (for recursive cleanup) or ones that depend on a clean-up step that takes an unpredictable amount of time are both objectionable in any sort of real-time domain.

    It seems wisest to provide a single graph API (rather than graph + node + global static APIs). With a single API, you can enforce the graph invariants at every step. The current design requires sprinkling enforcement of the invariant across classes that don’t have the necessary visibility into each other to actually do that.

    For example, you might decide that an invariant for your graph should be that there is no more than one edge from node A to node B. You might decide against splicing two distinct MyGraphs together.

    I’d propose a graph class that owns the nodes and hands back some sort of handle to them. It can then offer methods to create and destroy (directed) edges between nodes. This provides a useful, usable API, that has a clean separation of concerns, allows for a variety of implementations, and can prevent (or explicitly allow) strange operations like splicing together otherwise distinct graphs.

  3. @Marco: But, the graph is a -> c -> b, and so the count should remain 3.

  4. @intvnut, adding a->RemoveChild(b) means that the remaining count should be 2 (a & c). If you change return Counter::count() == 2 to return Counter::count() == 2 it passes.

  5. @intvnut, when you do a->RemoveChild(b) you need to update the count check to be == 2 instead 3. It doesn’t fail for me.

  6. Fundamentally, solutions involving shared_ptr and weak_ptr I think will be flawed without some sort of outside structure that has a view of the graph.

    Reachability of a given node from the root is a property of the graph as a whole, not a node and its immediate neighbors. Sure, a node is reachable from the root if one of its parents is reachable from the root, but which parent(s) could change over time. Likewise for whether your parent participated in a cycle involving you.

    Unless you have some sort of propagating update to upgrade/downgrade between shared_ptr/weak_ptr based on changes to the connectivity in the graph overall, I think you’re going to run into problems.

  7. @Marco: You’re missing a->RemoveChild(b).

    When I add that, it fails.

  8. @Marco: Isn’t ‘parents’ a bad name here? They’re still children. Maybe ‘weak_children’.

    Also, isn’t this sensitive to the order of operations? Some orders of operations succeed, others fail. Sure, Herb’s test might cases pass, but…

    For example:

    {
        auto a = MyGraph::MakeNode();
        g.SetRoot(a);
        auto b = MyGraph::MakeNode();
        a->AddChild(b);
        auto c = MyGraph::MakeNode();
        a->AddChild(c);
        b->AddChild(c);
        c->AddChild(b);
        a->RemoveChild(b);
        b->RemoveChild(c);
    }
    

    This sequence should result in a -> c -> b. But, ‘a’ will only hold a weak_ptr to ‘c’.

    Sure, you could do some careful shuffling to ensure you always have one shared_ptr in the vector<shared_ptr>, but I’m pretty sure that just means you need a slightly larger graph to show the problem.

  9. class Node : public Counter {
      vector<shared_ptr<Node>> children;
      vector<weak_ptr<Node>> parents;
      bool hasParent;
      void 
      // ...      
    void AddChild(const shared_ptr<Node>& node) {
       if(node->hasParent) {
         parents.push_back(node);
       } else {
          node->hasParent = true;
          children.push_back(node);
       }
    }
    
    

    Not sure if I’m missing a use case, but adding a flag to the Node class to determine if it’s been added before, and adding it to another collection of weak_ptrs solves the problem.

    You might still want to use ShrinkToFit to queue up the nodes for deletion to avoid overflowing the stack.

  10. The solution was:
    class MyGraph {
    public:
    class Node : public Counter, public std::enable_shared_from_this {
    vector<shared_ptr> children;
    vector<weak_ptr> parents;
    public:
    void AddChild(const shared_ptr& node) {
    children.push_back(node);
    std::weak_ptr wp = shared_from_this();
    node->AddParent(wp);
    }
    void RemoveChild(const shared_ptr& node) {
    auto it(std::find(children.begin(), children.end(), node));

    if(it != children.end())
    {
    node->RemoveParent(this);
    children.erase(it);
    }
    // child has only me as the parent or my only parent is my child
    if(!node->parents.size() || node->MyOnlyparentIsMyChild())
    node->RemoveAllChildren();
    }

    void AddParent(const weak_ptr& node) {
    parents.push_back(node);
    }

    void RemoveParent(Node* node) {
    auto it = parents.begin();
    auto end = parents.end();

    for(; it!=end;)
    {
    shared_ptr cur = it->lock();
    if(cur && cur.get() == node)
    {
    parents.erase(it);
    break;
    }
    else
    ++it;
    }
    }

    void RemoveAllChildren() {
    auto tmpchildren = children;
    for(const auto& child : tmpchildren)
    {
    RemoveChild(child);
    }
    }

    bool MyOnlyparentIsMyChild()
    {
    if(parents.size()>1)
    return false;
    auto parent = parents.begin()->lock();
    auto it(std::find(children.begin(), children.end(), parent));
    if(it != children.end())
    return true;
    return false;
    }
    };

    void SetRoot(const shared_ptr& node) {
    root = node;
    }

    void ShrinkToFit() {
    }

    static auto MakeNode() { return make_shared(); }

    private:
    shared_ptr root;
    };

  11. I was just hinting at using weak_ptrs and pulling the ownership out to another structure to resolve the circular references. You don’t need a full garbage collector to pass the tests. Someone using everything you can imagine out of those APIs might require it, but getting a quick and dirty solution just for the presented tests isn’t that hard.

  12. Based on the problem description and alteryxned’s little hint, I think there must be some way to do this without implementing a mark and sweep garbage collector. However, it depends on knowing some feature of the C++ standard library.

  13. @Juan: The circular reference is *the* problem. It’s the piece not handled automagically by shared_ptr.

    If the graph were acyclic, the normal reference counting of shared_ptr takes care of garbage collecting nodes. The moment you remove the last parent to a node, the node gets deleted. If that node was the last parent of any of its children, those children get deleted, etc. recursively. All this, with no additional code from you. It’s all in shared_ptr’s reference counting.

    The moment you introduce a cycle, that breaks.

    The reference count in shared_ptr keeps track of how many parents a node has, but not whether any of those parents is reachable from the root without passing through the node itself.

    In some circular data structures, you can use a weak_ptr to break this cyclic dependence. But you can’t really do that here easily. Consider this graph:

    auto a = MyGraph::MakeNode();
    auto b = MyGraph::MakeNode();
    auto c = MyGraph::MakeNode();
    g.SetRoot(a);
    a->AddChild(b);
    a->AddChild(c);
    b->AddChild(c);
    c->AddChild(b);
    

    You now have ‘b’ and ‘c’ both as children of ‘a’, and ‘b’ and ‘c’ form a cycle. You might attempt to replace one of the shared_ptr between ‘b’ and ‘c’ with a weak_ptr instead. (Note, I’m still referring just to child pointers; no parent pointers involved here.)

    Now remove ‘b’ and ‘c’ from ‘a’ in an arbitrary order:

    if (rand() & 1) {
      a->RemoveChild(b);
      a->RemoveChild(c);
    } else {
      a->RemoveChild(c);
      a->RemoveChild(b);
    }
    

    If you had put the weak_ptr on c => b, but remove the link ‘a => b’ first, you’re broken. If you put the weak_ptr on b => c, but remove the link ‘a => c’ first, you’re broken.

    Also, nothing in the API forces you to construct the graph in any particular order, including when you call SetRoot, or how many times you’re permitted to call SetRoot.

    To me, this all suggests a solution that uses some form of reachability analysis in ShrinkToFit() to figure things out.

  14. @intvnut

    I get your point, there’s a circular reference there. The circle can be broken by removing C from B or B from C.

    Going further, the last line remove A from list of parents of B, leaving that list with only one element: C

    The list of parent elements of C also contains only B.

    Besides the circular reference I don’t see any problems. Of course this kind of issue must be treated with care but I still think the reference to the parent is needed in order to prevent from erasing all the childs of an element when it is being removed by its parent.

  15. @Motti I caught the same problem and I ragequit and went to sleep. :) My testcase is only a bit different.
    {
    auto a = MyGraph::MakeNode();
    g.SetRoot(a);
    auto b = MyGraph::MakeNode();
    a->AddChild(b);
    auto c = MyGraph::MakeNode();
    b->AddChild(c);
    auto d = MyGraph::MakeNode();
    b->AddChild(d);
    d->AddChild(b);
    b->RemoveChild(c);
    }

    But my code passes all the Herb’s tests. :P
    My guess is that Herb will try to use my failure as excuse to propose gcnew for C++. :)
    I know it was his pet peeve for long time, including I think some lock free algs being easy to do if you have gc.
    But anyway here is how I passed Herb’s tests:

    SPOILER SPOILER SPOILER
    Did you use weak_ptr to parrent like I did? It works oksih, but on destruction of Graph if you have shared_ptr cycle it can not be GCed.

  16. @Juan: A mental exercise: What do the child and parent pointers look like for ‘b’ and ‘c’ after this sequence?

            a->AddChild(b);
            b->AddChild(c);
            c->AddChild(b);
            a->RemoveChild(b);
    
  17. Hi there,

    I think each node should have a list of weak pointers to his parent node.

    When a child node is removed the parent is removed from that list too. If the list of parents is empty, the node is considered orphan and then cleared.

  18. Having MakeNode as a static function, but ShrinkToFit not opens this up to a lot of problems:
    – User can construct cycles without adding them to a graph causing leaks
    – Nodes could belong to multiple graphs
    – It’s much harder to optimize the graph by using identifiers/compact storage (avoiding allocation, etc)

    I would normally change all functions to be direct (non-static) member function of the graph, including AddChild/RemoveChild:
    – That prevents the user from constructing cycles without a graph
    – Makes it easy to optimize storage of nodes
    – Makes it easier to keep track of cycles

  19. I submitted a solution, but it’s not very satisfactory. I took the existence of the ShrinkToFit method to be a special garbage collection routine which the user is expected to trigger, and which will clean up all nodes which are not part of the final graph. Given the API, I think that makes sense; the user can make and connect nodes in an arbitrary order, and maybe only assign a root node at the end. Until the root node is assigned, we don’t know what’s garbage and what’s not.

    However, lots of aspects seem clunky to me. Firstly, the user can still generate memory leaks: they can wire up a cycle of nodes, and then not make any of them part of a MyGraph: ShrinkToFit has no knowledge of these nodes and can do nothing about them. The user can also break the system by assigning a node to more than instance of MyGraph. Also, the existence of a special graph class with a manual garbage collection routine seems unwieldy; it ought to be possible to do this without needing the user to trigger a special clean-up step.

  20. Why is the MakeNode function static? That seems a bit weird. Also, I guess we’re supposed to do the cleanup once in the ShrinkToFit function, and not after each RemoveChild call?

  21. The solutions here (so far) seem to all have in common that they delete too much. All that really shows is that test driven development is very difficult and the tests are not a proxy for knowing that a bit of code is working correctly. Integration testing often pick up additional issues.

    Motti Lanzkron’s test should be added. That is clearly the intention of the problem. Not giving too big a hint – fortunately C++ has exactly the feature that is needed to accomplish this task.

  22. I have the same solution that Cooper posted, with the simplification that Motti Lanzkron mentioned. And my ShrinkToFit is also empty. It seems a bit strange that I did not need that. Perhaps I delete to much or so?

  23. Here’s a test case that I think should be added:

    bool TestCase4() {
    	MyGraph g;
    	{
    		auto a = MyGraph::MakeNode();
    		g.SetRoot(a);
    		auto b = MyGraph::MakeNode();
    		a->AddChild(b);
    		auto c = MyGraph::MakeNode();
    		b->AddChild(c);
    		auto d = MyGraph::MakeNode();
    		b->AddChild(d);
    		d->AddChild(b);
    		d->RemoveChild(b);
    	}
    	g.ShrinkToFit();
    	return Counter::count() == 4;
    }
    
  24. @Cooper your solution can be simplified to calling

    node->children.clear()

    in

    RemoveChild

    , this is what I initially wrote and it passes all the test cases but it’s too eager since it will clear a nodes children even if it has remaining parents.

  25. @Jason Hise: I figured that’s what Herb was getting at with his simple example, but I wanted to avoid spoiling the fun for folks who might be encountering it the first time. I was immediately reminded of one of the MIT AI Hacker Koans when I saw it:

    —-

    One day a student came to Moon and said: “I understand how to make a better garbage collector. We must keep a reference count of the pointers to each cons.”

    Moon patiently told the student the following story:

    “One day a student came to Moon and said: ‘I understand how to make a better garbage collector…

  26. Yeah, it was either smaller stack footprint, or tail recursion elimination if I was [i]really[/i] lucky.

    It was an inverted tree structure (children point to parents) in a solver. Whenever I killed a potential solution, it would automatically unwind to the branch-point in the tree. IIRC, the parent link was the first member of the class, so it would also be the last thing destroyed. I didn’t have a dtor of my own in that class either; just the compiler-generated dtor.

    I have no way to measure the code now. It remains at my previous employer.

    And, since we were always chronically deprived for time and developer resource (part of the reason that’s my [i]previous[/i] employer), I didn’t work on solving the recursion problem. I just raised the ulimit instead and moved on to solving some rather larger problems that remained. :-P

  27. That sort of test driven development brought out the evil minimalist coder in me. The result is horrible and I know it, but it’s fun to apply the “Beyonce Rule” I just learned from Titus Winters in the last cppcast episode (“If you want to file an issue there should’ve been a test for it.”).

    Thanks for some fun on a bad weather Sunday.

  28. What about this? The recursion in not what we want but it works. Still I think it’s not the solution we’re looking for because ShrinkToFit() is empty. I guess Herb wanted to help us with this call:)

    class MyGraph {
    public:
        class Node : public Counter {
            vector<shared_ptr<Node>> children;
     
        public:
            void AddChild(const shared_ptr<Node>& node) {
                children.push_back(node);
            }
            void RemoveChild(const shared_ptr<Node>& node) {
                auto it = std::find(cbegin(children), cend(children), node);
                if(*it)
                {
                    (*it)->RemoveAllChildren();
                }
                children.erase(it);
            }
        private:
            void RemoveAllChildren()
            {
                for(auto& child : children)
                    RemoveChild(child);
            }
        };
     
        void SetRoot(const shared_ptr<Node>& node) {
            root = node;
        }
     
        void ShrinkToFit() {
        }
     
        static auto MakeNode() { return make_shared<MyGraph::Node>(); }
     
    private:
        shared_ptr<Node> root;
    };
  29. You should add a test case that detects overly destructive solutions. I cleared the children member of the removed node prior to removing it from its parent and this passed all the tests :(

  30. The problem posed here is equivalent to the problem of writing a general purpose garbage collector, capable of cleaning up islands of references as soon as they are no longer reachable from any variable on the stack. Reference counting as a solution breaks down when ownership relationships can be circular; any algorithm that determines which nodes can be freed will require traversing the graph to identify the reachable objects. If this operation has to happen immediately when the link is broken, the most efficient algorithm I can think of would have to traverse and tag everything downstream of the broken link.

  31. PSA: webpage is buggy(It caches the result of the run and you can not force it to rerun stuff even if you delete almost all the code) and it will not let you c/p the code so have fun debugging anything with this.

  32. @ intvnut probably opt code had smaller stack usage. If you still have this code you can verify this by looking at the difference of the addresses between local variables at depth n and n+1 of recursion.

  33. My implementation is definitely a reflection of Greenspuns Tenth Rule Of Programming:
    Any sufficiently complicated C or Fortran program contains an ad-hoc, informally-specified, bug-ridden, slow implementation of half of CommonLisp.

    I would have liked my solution better had the Node factory function not been static.

  34. I actually hit the case Bender describes in on program I wrote. I hadn’t internalized the recursive nature of the destructors, but it was obvious in retrospect. Interestingly, the program was fine with optimization enabled, and broke when compiling for debug.

    In any case, for this brain-teaser: Can SetRoot be called at any time? That can alter node reachability rather differently than just deleting an edge in a directed graph. Not that any of the test cases do that…

  35. Herb I hate to be the one to tell you this, but your code is bad.
    Any deep tree will stack overflow on destruction of root or any other node that has enough children for recursive dtor calls to cause stack overflow.
    I know it is a borderline case to have tree this deep, but still I would not ever use library like this.
    That aside off I go to try to solve the problem with shared ptr.

Comments are closed.