Graph based solutions

  • Guest, it's time once again for the massively important and exciting FoH Asshat Tournament!



    Go here and give us your nominations!
    Who's been the biggest Asshat in the last year? Give us your worst ones!

Tenks

Bronze Knight of the Realm
14,163
607
Recently I've decided I wanted to use Apache's Giraph on some of the big data we have at my company. The problem is I don't want to recreate the wheel and run into mistakes that I could weed out by learning about graph based solutions at a fundamental level. Like are there de-facto best algorithms for linking data sets which are linked via a N-tier "foreign key" relationship? Are there low-memory solutions and high-memory solutions for these problems? What are the trade offs?

I'm having trouble finding information but it could be I don't know what I'm looking to find. Is there a place to learn about theoretical graph based solutions and algorithms for problems? I came up with my own home grown low-memory footprint algorithm that I think is pretty slick but no idea if it is just a commonly used pattern.

Then if anyone knows people who work at Facebook or have worked at Facebook and have some documentation on using Giraph to feed into a Hive metastore and using Presto for low-latency queries and possibly Solr/Lucene as a search engine ontop of the metastore that would be cool too