IT/Software career thread: Invert binary trees for dollars.

Vinen

God is dead
2,783
490
Pretty much, relational databases are great at a certain set of problems, but for the most part is shoehorned into EVERY problem. I'd rather use a much much much lower level abstraction and only build the functionality I actually need on top of it.
Because NoSQL database is useful for everything.

I find it hilarious that every week I read about a "Hip Startup" switching to Postgres.

BTW, the DBA problem is a company problem. Not a Technology problme-
 

Tenks

Bronze Knight of the Realm
14,163
606
BTW, the DBA problem is a company problem. Not a Technology problme-
It is a problem with SQL and an international company. There is so much tweaking and tuning required to make SQL perform at scale that it pretty much requires a council of DBAs to bless your schema.
 

Khane

Got something right about marriage
19,847
13,358
It is a problem with SQL and an international company. There is so much tweaking and tuning required to make SQL perform at scale that it pretty much requires a council of DBAs to bless your schema.
Damn dude... how big is the data you work with?

Because it sounds like you're proving Vinen's point.
 

Tenks

Bronze Knight of the Realm
14,163
606
Depends on the table obviously. I think the one that broke Oracle is now causing problems (on the app side) because its IDs are approaching signed int territory. I haven't ran a count but I'd say around a billion maybe? Then each row had anywhere between 10-100 columns. But that isn't even the biggest table I operate on. The table in charge of keeping a transaction history against that table is far, far larger.
 

Tenks

Bronze Knight of the Realm
14,163
606
I usually read this as "We were unable to attract and retain the talent to make [Cassandra,Riak, etc] work, so switched to what we could pay the least for." Or they started with MongoDB and realized what a clusterfuck that is(I seriously do not understand Mongo, as far as I can tell, there's absolutely no use case for Mongo where there just isn't a clearly better alternative, it has no use case).

Though as far as SQL goes, I do love Postgres.

All in all I'm pretty happy that the "NoSQL" revolution finally happened, I still end up using SQL in a good number of places where it makes sense, but (pulling out of ass) I feel like roughly 7/10 times all a specific program really needs is a nested hash-map with a handful of very specific(very performant) queries.

After having worked with it for so long I really believe it's best to build with scalability and simplicity in mind first, and then pull back towards relational only if you're running into data barriers that your noSQL choice can't solve efficiently, and even then...to me it makes more sense to take your noSQL data and put it into SQL rather than switch to SQL as your prime data store. I usually have a cassandra or risk cluster as prime and then just mirror the data to postgres for relational queries if needed.
As long as you have the hardware you can make NoSQL query performant with solutions like Impala and Presto. If you don't need instant performance you can do it pretty easily with Hive. I don't think (doubt) Impala or Hive works with Cassandra since they're Apache projects. I'd assume Presto would since that came out of Facebook (even though they've since switched to HBase as well.) I know CnCGod on this forum deals with Cassandra so he may know.

SQL is still good for some stuff like heavily related data or if you know for a fact your solution won't go to massive scale. Like the old project I was on it was basically building a whole bunch of instructions which were just Groovy code to be ran against a well known XML schema. But I pretty much required a SQL backbone to this problem because although for the most part the rows were just ID, code, metadata the instructions inside of each other could call another instruction. So while I could have used a simple K/V pair and baked the nested recursion into the business logic it was easier to just write a SQL statement that did it for me. That way when you'd get a List (a collections of instructions) that had instruction 1,2,3 in it but 3 called 4 and 4 called 5 I didn't have to perform a bunch of random reads against an HBase table for that which seem to take ~80ms a read. So it could add up where the single SQL statement seemed to take only about a second. But I also knew I wouldn't scale up to millions of instructions and billions of lists made out of the instructions. And I solved the problem with making it run on Hadoop anyways without having to convert it to an HBase table.
 

Cad

<Bronze Donator>
24,492
45,417
AT&T has an oracle database with 1.9 trillion rows, your paltry billion is unimpressive.
 

Cad

<Bronze Donator>
24,492
45,417
One upping? In a programming thread? Shocking.
Didn't really mean it like that, just meaning whatever you did to break oracle isn't an oracle limitation, as there are oracle tables much bigger than that. What was the issue with a billion rows that broke oracle?
 

Vinen

God is dead
2,783
490
Didn't really mean it like that, just meaning whatever you did to break oracle isn't an oracle limitation, as there are oracle tables much bigger than that. What was the issue with a billion rows that broke oracle?
Something that is obviously solved with a NoSQL database.

LOLZ

I bet there needs to be some BIG DATA ANALYSIS GOIN ON THERE! Gotta mapreduce it and shit hardcore
GOTS ME SOME BIG DATA PROBLEMS.
 

Tenks

Bronze Knight of the Realm
14,163
606
Didn't really mean it like that, just meaning whatever you did to break oracle isn't an oracle limitation, as there are oracle tables much bigger than that. What was the issue with a billion rows that broke oracle?
When I said "break" I meant break for our current hardware platform. At a certain point RDBMS becomes so incredibally costly to scale that it makes dollars and cents for your company to jump to a NoSQL approach where you lose all the rich features RDBMS provides but you can warehouse a near limitless amount of information on relatively cheap server infrastructure which is easier to add to in the future as well.

Obviously Oracle works and it works well. But it is expensive. And obviously NoSQL as your data vault is making more sense to many companies or else so many wouldn't be trying to hire for the positions and switching away from RDBMS.
 

Lendarios

Trump's Staff
<Gold Donor>
19,360
-17,424
What I learned in reading big data problems, and sites that handle massive log ins/ concurrent users is the following.
I compare it with physics and approaching the speed of light. On this context, computing the gravity for our speed context is a simple F = (M1 * M2 * K) / R2. Now try computing that near the speed of light, it becomes a clusterfuck and quite complex.
You will not use special relativity to compute a human context gravity problem.

The same thing thing happens with big data and really big usage sites. Normal rules get thrown out the window. If you try to use a solution that works with < 1 mill rows, if you try to apply it to 100 mill rows, well, it will simply not work, to put it mildly.
One concrete example. Normalization is awesome, except when your tables are so big, and your access is so often, that the cost of the extra join really fucks you up. It does not mean normalization is bad, simply on that context it creates other problems that you don't see in most common apps.
now regarding the NoSql, maybe it work, maybe it doesn't, but the companies that are experimenting with it, really have the money to test, so well find out down the line.
The same way applying NoSql at a level where a simple Sql express db will work, is also madness.
 

Voyce

Shit Lord Supreme
<Donor>
7,165
23,432
I know little to nothing about No SQL, so my question is this:

What's so different from using No SQL as a solution to just plain old keyed flat file or VSAM, like I deal with on the Mainframe. Is NO SQL employing some type of superior relational notation?
 

Lendarios

Trump's Staff
<Gold Donor>
19,360
-17,424
Caviat, I know nothing of either one.

Is vsam highly coupled with IBM, meaning you cant use it with other application /OS /environments. ? Is vsam hardware dependent? Does it uses an unique query language?
 

Tuco

I got Tuco'd!
<Gold Donor>
45,454
73,543
I don't really know if excel is necessary for my purposes. I just put the tables I need into powerpoint spreadsheets.

 

Voyce

Shit Lord Supreme
<Donor>
7,165
23,432
Caviat, I know nothing of either one.

Is vsam highly coupled with IBM, meaning you cant use it with other application /OS /environments. ? Is vsam hardware dependent? Does it uses an unique query language?
-VSAM is propritary
-Superficially; I don't think you'll find VSAM on non Mainframe architecture, not because of a physical limitation.
-There is some type of structuring logic built into VSAM , it can be setup in a lot of different ways but generally your wrapping a high level language around it as an end user. If you're in CICS you might use something like the god module setup above to access the records. As the end user of VSAM (Programmer), I don't bother setting up the VSAM structure (it's left to the System Programmers, similar to DBAs).

From what I know of the history of technology (given i'm 27). keyed flat files and less proprietary setups similar to VSAM have been circulating since before the 70's.

To me its just (and I could be completely wrong, which is why I asked), I get the impression that "No SQL" is kind of a buzz word for otherwise legitimate alternatives to data storage that have been used for a longer time than SQL. and it's less like an evolutionary new thing, and more of a recycling of an old idea to meet the needs of a new problem, or an old problem that's resurfaced. Which is fine I'm just trying to understand the term for what it is.
 

Voyce

Shit Lord Supreme
<Donor>
7,165
23,432
This is more or less CS as a field period. CS as a field is actually pretty well traversed despite being new, most of the low level concepts were discovered back in the 60s and 70s and we've just built layers of abstraction over them, there's almost no "new" breakthroughs in CS, just new combinations of old ones(least until we get new architectures, and then it's a new wild west!). So you're pretty much spot on. NoSQL itself doesn't mean anything, since it covers about 20 different categories of databases, but the things they have in common is they tend to deal with flatter structures and more immutable data by default(because these two things are the cornerstone of being able to easily distribute the data and load).

Kinda like Functional Programming, FP has been very popular the last few years despite it being basically one of the first types of programming invented. Many researchers back in the 60s/70s considered most of the FP concepts to be "superior" but in reality too expensive to use. So all that's changed is computers have become so fast compared to then that the inherit penalty for that style was entirely eat up by the increased performance of the hardware, so now we're seeing a cultural revival even though it's 50yrs old.
Yeah ISWIM was thought of in 1966, and of course Lambda Calculus was conceptualized in the 30's.

You're a bastion of valuable information, have I netted you yet? I should net you if I haven't.

I just wanted to conceptualize what I was looking at with this MEAN developer stack stuff.
 

Palum

what Suineg set it to
23,551
33,979
NoSQL is worthless in at least 90%+ of implementations I've seen thus far since there's no reason for it except A) because it sounds sexy B) apparently you might as well add a random middleware file system layer instead of just defining data structures in your application like normal or C) because someone thinks that dumping a clean and efficient relational database is a good idea when they are dead wrong.

There are a lot of good use cases for a legitimate NoSQL 'database', but very few companies seem to have them. Once you get out of the strict data science realm into the application arena, it starts to become a bit fuzzier as to the practical use. What's the difference between, as you say, storing shit in flat files and coding your application to use them as you wish? I suppose some measure of consistency, but it's really just a whole bunch of resource libraries at that point. About the only time I've considered it for projects are ones where we have vast amounts of data which needs to be reorganized 'quickly' by analysts instead of DBAs. Ultimately, though, user/stakeholder (not to mention developer) knowledge of SQL and derivatives is so ubiquitous it's hard to sometimes figure out times where throwing more hardware at building expensive views on a relational databaseisn'tmore cost effective than building custom NoSQL solutions to lead to the same end result. Sometimes an extra DBAischeaper than all the headaches and hassle of starting fresh.

I haven't really played with some of the more recent SQL Server 'integrations' with Hadoop file system, do they cooperate fairly well now?
 

Tenks

Bronze Knight of the Realm
14,163
606
I was out for my friend's bachelor party and was talking with another one of his friends who also works in the IT field. He is working for a start up now in the medical space who have 15m of VC they need to burn through this year and was asking if I'd be interested to join once I told him about working in the realm of Hadoop and graphical databases. I'm just not sure if I feel like leaving the cubical and the safety of my guaranteed work hours to jump towards a "sexy" start up. He guaranteed he could beat my current salary assuming I wasn't just making up my knowledge set but I've always had somewhat a negative outlook on companies with on-site chefs, hacky sack circles, kegs instead of water coolers and unlimited vacation. All that sounds cool in a vacuum until you realize you're working 70 hours a week. Heclaimedto work 9-5 the vast majority of his weeks with just a few 9-7/9 crunchtime weeks. Has anyone worked in one of these start ups and what was your experience? Did you like it? Hate it? Learned to hate it?