YarcData's Urika Shows Big Data is More than Hadoop and Data Warehouses (Carl Claunch, Sept 11, 2012)
- Why data discovery is hard
- Analyst validation
- Discover unknown linkages
- Purpose built
- The power of graph analytics
- What fits into 512 TBs?
- Easy to deploy
Top challenges for real-time data discovery…and how we solve themAnalysts set out to reach those “Eureka!” moments through big data discovery. They endeavor to use data to iteratively search for relationships and patterns that lead to new insights, new questions, and different paths of inquiry. In short, they embark on a mission of discovery using data.
They will inevitably deal with significant challenges if they attempt big data discovery using traditional data analytics processing. You see, traditional analytics processing is all about orderly data. Know what information you’ll need in advance…and line it all up uniformly in rows and columns. Build a schema; predict the sources; plan the reports; and control the ad-hoc queries.
Discovery, on the other hand, is “messy”. Always has been. That’s why the big “Eureka!” moments from history are mostly accidents. Purposeful discovery done today with data is no different in this regard. It’s answering one question to come up with another; going down one path of inquiry to end up on a new one; postulating a new theory after validating, or not, the idea that came before.
That’s why this thing called data discovery is hard. Traditional data analytics constrain it. And that’s why we built Urika, an appliance that melds what data discovery truly is with the hardware and software combination to truly deliver it.
Here’s how Urika addresses the top three challenges of big data discovery.
- Discovery cannot know all the data relationships in advance. That’s the essence of discovery – to find these out as you go, as you add more and diverse sources of data to your analytics engine; and as you perform pattern-matching queries to surface previously unknown linkages between the data. Urika solves this with a schema-free, in-memory graph analytic database. You can add structured, semi-structured, and unstructured data as you ingest it…Urika’s powerful graph processing engine will bring your growing set of data relationships to the surface in response to your queries.
- Discovery wants to ask questions followed by more questions…an iterative process that depends on real-time response to explore data relationships and patterns. These types of database queries are by definition ad-hoc, crafted on the fly, with no consideration for being “well-behaved” within a schema. Urika has a special hardware accelerator tuned to get the most out of the maximum processing power delivered by its large, shared memory and massive multi-threaded architecture. Results from queries are returned real-time; performance remains predictable even as the data model grows, freeing the inquisitor to seamlessly follow breadcrumbs to “Eureka!”
- Discovery doesn’t access data in a predictable way. Its access never shows a pattern. You can’t know what to pre-fetch or cache. But it can’t be left this way. There’s a lot of data access going on to find all those relationships, to expose all those patterns…all within massive amounts of constantly changing data. And real-time response is still required. Urika meets this demand with a data model held completely in memory…one that can scale up to 512TB if needed. But it doesn’t stop there. There’s a lot of fetching and processing needed to surface all that linkage between the data. That’s why Urika can go up to 8,192 graph accelerator processors, each one doing 128 independent threads of work at the same time.
The essence of what discovery is cannot be changed to fit an analytics engine and still be discovery. We get it. That’s why we built Urika – purpose-built real-time platform for big data discovery.