Blogs

2017 Summer Release

Summary A little bit data, but DataFibers has completed the summer release of 2017 about right time. In this release, we have applied 30+ changes requests. In this release, DataFibers is featured with a preview of new web interface. In addition, couple of connectors are added/updated to preparing the demo later. Details Below is the list of key changes in this release. Support Flink Table API and SQL API support Flink upgrade to v1.

Continue reading

Simplify Big Data Streaming

Here is our free training offered during 2017 summer meetup in Toronto, Canada.

Continue reading

Spark Word Count Tutorial

It is quite often to setup Apache Spark development environment through IDE. Since I do not cover much setup IDE details in my Spark course, I am here to give detail steps for developing the well known Spark word count example using scala API in Eclipse. Environment Apache Spark v1.6 Scala 2.10.4 Eclipse Scala IDE Download Software Needed Download the proper scala version and install it Download the Eclipse scala IDE from above link Create A Scala Project Open Scala Eclipse IDE.

Continue reading

One Platform Initatives for Spark

In the early of this September, the Chief Strategy Offer of Cloudera Mike Olson has announced that the next important initiatives for Couldera - One Platform to advance their investment on Apache Spark. The Spark is originally invented by few guys who started up the Databrick. Later, Spark catches most attention from big data communities and companies by its high-performance in-memory computing framework, which can run on top of Hadoop Yarn.

Continue reading

Constructor - Scala vs. Java

1. Constructor With Parameters Java Code public class Foo() { public Bar bar; public Foo(Bar bar) { this.bar = bar; } } Scala Code class Foo(val bar:Bar) 2. Constructor With Private Attribute Java Code public class Foo() { private final Bar bar; public Foo(Bar bar) { this.bar = bar; } } Scala Code class Foo(private val bar:Bar) 3. Call Super Constructor Java Code public class Foo() extends SuperFoo { public Foo(Bar bar) { super(bar); } } Scala Code

Continue reading

2016 Winter Release

Summary Before new year, DataFibers has completed the winter release of 2016, which has more than 20+ changes requests applied. In this release, DataFibers is featured with new api document and landing pages. In addition, the preview version of stream processing (by flink) is ready. Details Below is the list of key changes in this release. Integrated REST API Document to the DF Application Added landing welcome page

Continue reading

When to Disable Speculative Execution

Backgrounds This is the link from WikiMedia about what’s Speculative Execution. In Hadoop, the following parameters string are for this settings. And, they are true by default. mapred.map.tasks.speculative.execution mapred.reduce.tasks.speculative.execution When to Disable Most time, it helps. However, I am here to collect some scenario when we do not need it. Of course, when ever your cluster really in shortage of resource or for the purpose of experiment, we can disable them by setting them to false since “SE” really a big resource consumer It is generally advisable to turn off ”SE” for mapred jobs that use HBase as a source.

Continue reading