HBase最佳实践 - Bulk Loading原理与Spark实现

Mar 12 2020 Database 13 minutes read (About 1910 words)

Apache HBase能够在大数据集上为我们提供随机, 实时的读写访问. 然而, 在实际业务中, 我们的原始应用并非基于HBase构建. 这时候, 如何将大量的数据(这些数据的存储量可能是TB甚至PB级别的)导入到HBase中成了我们首先需要解决的问题. 最基本的, 我们可能会想到使用Client APIs或利用MapReduce Job通过TableOutputFormat写入. 然而, 这两种方式都不是最高效的, 在向HBase中导入大规模数据集时, 首先应该考虑的是HBase提供的Bulk Loading方法.

HBase最佳实践 - HBase过滤器源码剖析及自定义过滤器

Mar 6 2020 Database 21 minutes read (About 3177 words)

本文首先结合HBase过滤器的源码, 讲述HBase过滤器抽象基类Filter中各个函数的作用. 最终给出一个简单的自定义过滤器的案例, 在此基础上分析了Filter中各个方法的执行流程, 读者在理解该案例的基础上可以编写任何个性化的过滤器. 本文涉及的源码基于HBase 1.4.x.

拜占庭将军问题 (The Byzantine Generals Problem)

Feb 14 2020 Distributed System 17 minutes read (About 2573 words)

拜占庭将军问题(The Byzantine Generals Problem)提供了对分布式共识问题的一种情景化描述, 由Leslie Lamport等人在1982年首次发表. 本文首先以插图的形式描述拜占庭将军问题, 最后在理解拜占庭将军问题的基础上对现有的分布式共识算法进行分类.

Tinyflow - A Simple Neural Network Framework

Jul 23 2019 Machine Learning 7 minutes read (About 1004 words)

In recent years, thanks to the rapid growth of computing power, deep learning has blossomed. The increase in computing power is largely due to the GPUs. As we all know, the current popular deep learning frameworks such as tensorfow, pytorch, mxnet, etc. all support GPU acceleration. In order to explore the implementation principles behind the deep learning framework, this blog post will attempt to build a simple deep learning framework - Tinyflow. We will build a general automatic differentiation framework in which you can add any custom operator. To keep it simple, Tinyflow only implements the operators necessary for multilayer perceptron (MLP) models (such as MatMulOp, ReluOp, SoftmaxCrossEntropyOp), and of course it supports the addition of any other operators (such as ConvOp). At the bottom, we will use GPUs to accelerate matrix operations. Although compared to the mature deep learning framework, Tinyflow is very simple, but it does have the two core elements necessary for deep learning framework: automatic differentiation and GPU operation acceleration.

Automatic Differentiation Based on Computation Graph

Jul 22 2019 Machine Learning 21 minutes read (About 3154 words)

Automatic differentiation (AD), also called algorithmic differentiation or simply “autodiff” is one of the basic algorithms hidden behind the deep learning framework such as tensorflow, pytorch, mxnet, etc. It’s AD technique that allows us to focus on the design of the model structure without paying much attention to the gradient calculations during model training. However, this blog post will focus on the principle and implementation of AD. Finally, we will implement an AD framework based on computational graphs and use it for logistic regression. You could find all the code here.

HBase最佳实践 - Bulk Loading原理与Spark实现

HBase最佳实践 - HBase过滤器源码剖析及自定义过滤器

拜占庭将军问题 (The Byzantine Generals Problem)

Tinyflow - A Simple Neural Network Framework

Automatic Differentiation Based on Computation Graph

Your browser is out-of-date!