LogicBlox User Days 2013 Sessions

Keynote

Welcome and Introduction

Speaker: Molham Aref, CEO, LogicBlox

Client Experiences

Planixs - The World is Not Enough

Speaker: Stuart Houghton, COO, Planixs

Vitamin vs. Pain Killer : Why user adoption is so important

Speaker: Iain Watson, EVP Product Management, Predictix

It’s All About Performance

TPC-H: A Deep Dive into the Standard Benchmark for Decision Support

Speaker: Wael Sinno
Affiliation: LogicBlox

Abstract:
TPC-H is a standard benchmark for OLAP and decision support systems that has enjoyed broad acceptance by database vendors. This talk will discuss the implementation of the specification of TPC-H on LogicBlox and several systems including leading in-memory, column-oriented, and OldSQL databases. It will also include a discussion of the results, highlighting the strengths and weaknesses of LogicBlox. Areas of the benchmark where the relative performance of LogicBlox was significantly different than the average were examined in detail, and some of that analysis will be presented.

This talk will cover the following areas:

Explanation of the core components of the benchmark specification.
Brief overview of the implementation.
Comparison of some selected queries in SQL vs LogiQL.
Performance results compared to competitor systems.
Analysis of significant queries.

Understanding Graph Databases and the Strengths of LogicBlox

Speakers: George Kollias, Spyros Hadjichristodoulou, Martin Bravenboer
Affiliation: LogicBlox

Abstract:
Graph databases (e.g. Virtuoso, Neo4J), graph engines (e.g. GraphLab, GraphChi), and semantic web (RDF) query languages (SPARQL) have become increasingly popular in recent years as part of the NoSQL movement. The main claim of graph databases is that general-purpose databases cannot store graphs in a way that allows these graphs to be efficiently queried.

The LogicBlox team had the suspicion that LogicBlox 4 is actually very suitable for graph problems, due to the foundation of Datalog as a recursive query language, the highly normalized schemas we encourage, and the novel join algorithm of LogicBlox 4. This summer, the benchmark team at LogicBlox has done a detailed study of the strengths and weaknesses of LogicBlox 4 when applied to graph problems, and compared the performance and user experience to many graph solutions that currently are being advertised.

In this talk, we report on the results of this assessment. We discuss the arguments used to develop specialized databases for graphs and explain how these relate to features of LogicBlox. We present the results of a wide range of benchmarks, covering semantic web applications (LUBM), fundamental graph problems (graph traversal, N-clique and triangle counting), and recursive graph analyses, such as PageRank and shortest-path. These recursive algorithms can be expressed in LogicBlox 4 due to the introduction of support for aggregation in recursion. We discuss how systems compare when dealing with graphs with drastically different statistical properties (e.g. Twitter follower graph vs road networks). We will cover the state-of-the-art in handling graphs that do not fit in memory, and how LogicBlox could handle such problems in the future. Finally, we will analyze the trade-offs between using bottom-up and top-down evaluation by a comparison to Prolog implementations. A theme throughout our talk is how conveniently a problem can be expressed in different systems, and what kind of expertise is necessary to tune the implementation for performance.

LogicBlox Performance Puzzlers

Speaker: Soeren Oleson
Affiliation: LogicBlox

Abstract:
Similar to most mainstream database systems, some basic knowledge of query evaluation, indexing, concurrency, and cost estimation was needed in LogicBlox 3.x when working with non-trivial data sets. The LogicBlox team has over time assisted many of our customers with addressing such performance challenges. In this talk, we present a series of such puzzles from recent applications. We discuss why the example did not perform as expected on LogicBlox 3.x and how we analyzed the issue to come up with a recommendation.

With LogicBlox 4.x, our database system has drastically been revised with novel methods for auto-tuning, based on very precise statistical information, and strong theoretical guarantees that queries do not derail. To give a perspective on the future of LogicBlox application development, we discuss how the puzzles we have found perform in LogicBlox 4.x, highlighting how the auto-tuning features have removed the need for detailed understanding of the database architecture. We do also highlight what challenges still remain to be addressed, based on the puzzles from applications as well as our own benchmarks.

Services & Integration

Multi-dimensional Queries and Updates on LogicBlox

Speaker: Ruy Ley-Wild, Jeff Vaughan, Geoff Washburn
Affiliation: LogicBlox

Abstract:
The Measure Service allows developers to define a multi-dimensional view of their data, and supports its queries and updates through a web service interface. While the query capabilities of the Measure Service are better known, the update capabilities are new to the Measure Service since the second half of 2013. In this talk, the Compiler Team will introduce the Measure Service approach to representing analytical data, multi-dimensional expressions for queries and in particular, updates. We will also give the audience a taste of how these are realized in LogiQL.

Dynamic, Accurate, and Multi-dimensional: What’s New in LogicBlox Data Exchange Services

Speaker: Rafael Lotufo
Affiliation: LogicBlox

Abstract:
Since its first release in LogicBlox 3.9.6, the LogicBlox Data Exchange Services have seen many new features: they have gone dynamic, supporting the exchange of delimited files without pre-configuration using active logic in the workspace; they support error reporting, helping users catch incorrect data early and precisely; they support multi-dimensional data export, allowing users to export multi-dimensional query results through the Measure Service. In this talk, we provide an overview of these new features and their uses. For users who are interested in getting their hands dirty, please attend the tutorial!

Integrating LogicBlox with 3rd Party Tools

Speaker: Andy Dean
Affiliation: LogicBlox

Abstract:
Modern enterprises often have a heterogenous mixture of different systems. Historically the integration of such systems has been very challenging, requiring custom code to be written for a variety of proprietary protocols. In recent years the trend has been toward integrating through web services. The LogicBlox Data Exchange Services make integration with other systems easy. In this sessions we look at how popular third party systems for data integration (Kettle), reporting (BIRT), and data analysis (Excel) can be easily integrated with solutions built on the LogicBlox platform. It is important to note that the products names are simply examples, and then similar products can generally be integrated in a similar fashion.

Building & Deploying LogiQL Applications

Speakers: Eelco Dolstra, Shea Levy, Rob Vermaas
Affiliation: LogicBlox

Abstract:
We present the LogicBlox application build and deployment service, alpha version. The build and deployment service allows a developer to quickly take an application from source code to a deployed, publicly accessible instance hosted on AWS EC2. We will demonstrate how a developer can configure a code repository to be built using the build service, which produces a build artifact. We will then show how to invoke the deployment service using the build artifact as an input, along with other configuration parameters.

Salty Crackers

Knowledge-base Systems and Their Relation to LogicBlox

Speaker: Joachim Jansen
Affiliation: KU Leuven

Abstract:
IDP3 is a knowledge-base system, offering a rich, declarative knowledge representation language, a range of inferences and a built-in interaction with a procedural language. In this presentation I will give an overview of the knowledge-base system paradigm, and try to sketch its relation to LogicBlox. In this comparison I will show an instance of IDP that simulates LogicBlox execution (albeit inefficient) and I will identify aspects offered by the KBS approach that could be of interest for LogicBlox.

An Overview of the Ciao Language

Speaker: Jose Morales
Affiliation: LogicBlox and IMDEA Software Institute

Abstract:
Ciao is a general purpose logic programming language that has been designed from the ground up to be small and extensible in a modular way. An important aspect of Ciao is that it provides the programmer with a large number of useful features from different programming paradigms and styles, and that the use of each of these features can be turned on and off at will for each program module. Thus, a given module may be using e.g. higher order functions and constraints, while another module may be using objects, predicates, and concurrency. Like other Prolog-based systems, the Ciao kernel is built on an efficient and well-tested WAM-based bytecode emulator, which enables fast compilation of portable and small executables. But contrary to other systems, Ciao includes a robust module system, which allows module-based automatic incremental compilation, and modular global program analysis, debugging and optimization. In this talk I will give an informal overview of the language and illustrate its design philosophy.

A Brief Introduction to Mercury

Speaker: Zoltan Somogyi
Affiliation: LogicBlox

Abstract:
Mercury is a general-purpose logic programming language intended to support the creation of large, reliable and efficient applications. To achieve this objective, Mercury asks programmers to provide information about several aspects of their programs, in form of e.g. type and mode declarations. The compiler then uses this information both to check the consistency of the code with the declarations, and to improve the speed of the generated code.

With a few carefully controlled exceptions, Mercury is purely declarative; even input/output is done without side effects. This also makes programs easier to read, to understand and to maintain, and gives a lot of freedom to the compiler to perform aggressive optimizations.

Mercury has a suite of associated development tools. These include an advanced profiler that can help the compiler automatically parallelize Mercury code, and a debugger that can itself direct the search for a bug, with programmers having to do nothing more than answering questions.

Tools & Reusable Components

LayerBlox: Managing Function Variability Using Modular, Reusable LogiQL Components

Speaker: Kurt Stirewalt
Affiliation: LogicBlox

Abstract:
A common problem in application development concerns the need to implement and maintain multiple variants of core computations. Variability arises to support different deployment contexts---e.g., weekly batch vs. online, incrementally maintained vs. on-demand service, etc---as well as different feature sets. We developed LayerBlox to address variability management in such large development projects. Our work differs from prior attempts to develop reusable assets in the nature and structure of the components in our library and the manner and rigor with which they compose. Specifically, components in our library are hierarchical and compose by layered assembly in a manner analogous to stacking together LEGO® bricks.

In this talk, we will introduce the key ideas behind layered design, briefly overview the layered-assembly architecture of forecast generation, and illustrate the ease with which we can specify different variants. Near the end of the talk, we will explain the implementation of LayerBlox itself. LayerBlox implements component assembly in a novel way, via schema decoration, which is a notational convention that is used to modularize logical specifications in languages such as Z. Schema decoration turns out to be a useful user-visible concept in the specification of layered assemblies. One interesting open question concerns how we might support it more directly in LogiQL.

Components for Creating a Rich Web User Experience over Multi-dimensional Data

Speaker: Ivar Pruijn
Affiliation: Cloud9

Abstract:
Pivot tables, filters, and charts are common data visualization components for analytical applications. Programming them in a way such that they are reactively connected to one another, and performant enough to handle the large amount of data that’s typical in LogicBlox applications, is a time-consuming task. Cloud9 has partnered with LogicBlox in 2013 to produce these components in a reusable way. These components can be easily configured through JSON data structures, and they work natively with the LogicBlox Measure Service. In this talk, we will show the capabilities of these components. We will also sketch out a vision, and show the initial steps, toward incorporating these components into the future modeling environment for LogicBlox.

LogicBlox: Looking Forward

Evaluating LogiQL Queries on GPU

Speaker: Haicheng Wu
Affiliation: Georgia Institute of Technology

Abstract:
Large-scale, multi-GPU cluster systems potentially present a vehicle for major improvements in throughput and consequently overall performance of query evaluation. This talk introduces the design, implementation, and evaluation of an alternative evaluation runtime for LogiQL queries. We report the performance on the full set of industry standard TPC-H queries on a single node GPU. We will compare the performance of these queries on GPU with their performance on the CPU-based LogicBlox database. We point out key bottlenecks, propose potential solutions, and analyze the GPU implementation of these queries. To the best of our knowledge, this is the first reported end-to-end compilation and execution infrastructure that supports the full set of TPC-H queries on commodity GPUs.

The Future of LogicBlox Forecasting

Speaker: Nik Vasiloglou
Affiliation: LogicBlox

Abstract:
Current forecasting systems in retail utilize local regression models in order to produce forecasts, using a limiting number of factors. In an effort to utilize more global information they resort to aggregations on different scales and adhoc combination of the forecasting models that improve performance. The aggregations and the combinations are customized by domain experts and increase significantly the overall setup time and cost of a forecasting application. Modern data science has proven that highly segmented and customized models tend to be less accurate than big data models that rely on more sophisticated algorithms that are domain agnostic.

The current forecasting Logicblox system which is under development, utilizes nonlinear regression models that achieve better performance than traditional regression. We also use unsupervised methods to ingest different sources of information. We believe that domain experts should be able to inject knowledge in the general model and validate its importance in the accuracy. We will show how our design allows users to express expertise with logical rules and evaluate their significance.

LogicBlox 2014: Roadmap & Value Proposition

Speaker: Shan Shan Huang
Affiliation: LogicBlox

Abstract:
We discuss the overall LogicBlox value proposition by comparing and contrasting building and operating enterprise hybrid applications on three different technology stacks, of which LogicBlox is one. We discuss the value proposition in terms of cost of infrastructure, software, and the staffing cost of software construction and maintenance. We lay out the LogicBlox 2014 roadmap in this context, and discuss how the roadmap will enhance LogicBlox’s value proposition in 2014.

Tutorial

Getting Dirty with LogicBlox Data Exchange

Speakers: Thiago Bartolomei, Rafael Lotufo, Laurent Oget, Trevor Paddock
Affiliation: LogicBlox

Abstract:
LogicBlox Data Exchange Services provide high level ways of getting your data in and out of the database, in either tabular (e.g. delimited files) or hierarchical (e.g. JSON) format. This tutorial will give users a hands-on opportunity to define static Tabular Data Exchange (TDX) as well as dynamic TDX.

This tutorial will start with teaching users the basics of static Tabular Data Exchange (TDX) service, configuring using joins, entity accumulation policies, column transform functions, filters, and error handling. We will exercise methods GET, PUT, and POST for import as well as export. After working with static TDX services, we will move on to dynamic TDX services, where users learn that they can accomplish the same data exchange activities by encapsulating the exchange configuration as parameters in a service request.

For information on LogicBlox User Days, please email userdays@logicblox.com