Programming Impala Applications

Programming Impala Applications

The core development language with Impala is SQL. You can also use Java or other languages to interact with Impala through the standard JDBC and ODBC interfaces used by many business intelligence tools. For specialized kinds of analysis, you can supplement the SQL built-in functions by writing user-defined functions (UDFs) in C++ or Java.

Continue reading:

Overview of the Impala SQL Dialect

The Impala SQL dialect is descended from the SQL syntax used in the Apache Hive component (HiveQL). As such, it is familiar to users who are already familiar with running SQL queries on the Hadoop infrastructure. Currently, Impala SQL supports a subset of HiveQL statements, data types, and built-in functions.

For users coming to Impala from traditional database backgrounds, the following aspects of the SQL dialect might seem familiar or unusual:

  • Impala SQL is focused on queries and includes relatively little DML. There is no UPDATE or DELETE statement. Stale data is typically discarded (by DROP TABLE orALTER TABLE ... DROP PARTITION statements) or replaced (by INSERT OVERWRITE statements).
  • All data loading is done by INSERT statements, which typically insert data in bulk by querying from other tables. There are two variations, INSERT INTO which appends to the existing data, and INSERT OVERWRITE which replaces the entire contents of a table or partition (similar to TRUNCATE TABLE followed by a new INSERT). There is no INSERT ... VALUES syntax to insert a single row.
  • You often construct Impala table definitions and data files in some other environment, and then attach Impala so that it can run real-time queries. The same data files and table metadata are shared with other components of the Hadoop ecosystem.
  • Because Hadoop and Impala are focused on data warehouse-style operations on large data sets, Impala SQL includes some idioms that you might find in the import utilities for traditional database systems. For example, you can create a table that reads comma-separated or tab-separated text files, specifying the separator in theCREATE TABLE statement. You can create external tables that read existing data files but do not move or transform them.
  • Because Impala reads large quantities of data that might not be perfectly tidy and predictable, it does not impose length constraints on string data types. For example, you can define a database column as STRING with unlimited length, rather than CHAR(1) or VARCHAR(64). Although in Impala 2.0 and later, you can also use length-constrained CHAR and VARCHAR types.)
  • For query-intensive applications, you will find familiar notions such as joinsbuilt-in functions for processing strings, numbers, and dates, aggregate functions, subqueries, and comparison operators such as IN() and BETWEEN.
  • From the data warehousing world, you will recognize the notion of partitioned tables.
  • In Impala 1.2 and higher, UDFs let you perform custom comparisons and transformation logic during SELECT and INSERT...SELECT statements.

Related information: Impala SQL Language Reference, especially SQL Statements and Built-in Functions

Overview of Impala Programming Interfaces

You can connect and submit requests to the Impala daemons through:

  • The impala-shell interactive command interpreter.
  • The Apache Hue web-based user interface.
  • JDBC.
  • ODBC.

With these options, you can use Impala in heterogeneous environments, with JDBC or ODBC applications running on non-Linux platforms. You can also use Impala on combination with various Business Intelligence tools that use the JDBC and ODBC interfaces.

Each impalad daemon process, running on separate nodes in a cluster, listens to several ports for incoming requests. Requests from impala-shell and Hue are routed to the impalad daemons through the same port. The impalad daemons listen on separate ports for JDBC and ODBC requests.

时间: 2024-08-09 09:41:10

Programming Impala Applications的相关文章

Cloudera Impala Guide

Impala Concepts and Architecture The following sections provide background information to help you become productive using Cloudera Impala and its features. Where appropriate, the explanations include context to help understand how aspects of Impala

MAT00021M C++ Programming

DEPARTMENT OF MATHEMATICSC++ Programming with Applications in FinanceMAT00021MIndividual Project Deadline: 23:55 on 18/04/2019Pricing European Two-Asset Optionsand Chooser OptionsContentsImportant Information . . . . . . . . . . . . . . . . . . . . .

计算机类免费电子书共享

列表最早来自stackoverflow上的一个问题:List of freely available programming books 现在在github上进行维护:free-programming-books List of Free Programming Books This list initially was a clone of stackoverflow - List of freely available programming books by George Stocker.

Android 学习资料分享(2015 版)

我是如何自学Android,资料分享(2015 版) Tikitoo2015.02.11 10:21 1713 字 3932 次阅读 自己学了两三个月的Android,最近花了一周左右的时间写了个App--Diigoer(已开源),又花了一两周时间找工作,收到了两个Offer,也算是对自己学习的一种认可吧:我刚开始学习总结的--<我是如何自学Android,资料分享>,如果是初学Android 的话,不应该错过的,而今天这篇分享好这篇文章,相对于第一次写的会有所提升,所以建议先把上一篇看了,再

一句话讲清楚什么是JavaEE

Java技术不仅是一门编程语言而且是一个平台.同时Java语言是一门有着特定语法和风格的高级的面向对象的语言,Java平台是Java语言编写的特定应用程序运行的环境.Java平台有很多种,很多的Java工程师,即使是干了很长时间的工程师也不是很理解不同平台之间的区别和关联是什么.Java编程语言一共有四个官方的平台: ■ JavaPlatform, Standard Edition (Java SE) ■ Java Platform, Enterprise Edition (Java EE) ■

C++游戏开发需要阅读的书籍

如果要自学游戏程序开发的话,可以看看下面的,呵呵. 游戏开发资料(PDF书都是中文版的,非英文,很多是本人自己扫描制作,从未网上发布过,所以独家啦):  1.Gamebryo 2.2游戏引擎(盛大.腾讯等公司制作网络游戏常用)+Gamebryo v2.2.1说明文档 2.游戏PDF书及其代码:  3D游戏编程.3D游戏编程大师技巧.Direct3D游戏编程入门教程第2版.DirectX角色扮演游戏编程  DirectX特效游戏程序设计.MFC windows程序设计第2版.MFC深入浅出.VC+

计算机会议排名等级

http://blog.sina.com.cn/s/blog_9c411c310102vs2g.html 附件是计算机领域的学术会议等级排名情况,分为A+, A, B, C, L 共5个档次.其中A+属于顶级会议,基本是这个领域全世界大牛们参与和关注最多的会议.国内的研究者能在其中发表论文的话,是很值得骄傲的成就.A类也是非常好的会议了,尤其是一些热门的研究方向,A类的会议投稿多录用率低,部分A类会议影响力逐步逼近A+类会议.B类的会议分两种,一种称为盛会级,参与的人多,发表的论文也多,论文录用

GuidelinesOfGameDevelopment游戏开发新手指引

# GuidelinesOfGameDevelopment Just give out some experience or directions on game development to green hands.分享经验或路线给新手们 正文:最新指引链接 游戏类型:手游.端游.页游.家用游戏(电视) 游戏相关职位: 客户端: 游戏逻辑(常见功能和特色功能开发),将用户体验做到极致,未来方向应该是主程.游戏制作人和游戏玩法创新.C#或Lua.js.我想说,让Lua滚出游戏界,算了,还是我退出

网络编程——The C10K Problem(C10K = connection 10 kilo 问题)。k 表示 kilo,即 1000

The C10K problem翻译 (C10K = connection 10 kilo 问题).k 表示 kilo,即 1000 比如:kilometer(千米), kilogram(千克). 如今的web服务器需要同时处理一万个以上的客户端了,难道不是吗?毕竟如今的网络是个big place了. 现在的计算机也很强大了,你只需要花大概$1200就可以买一个1000MHz的处理器,2G的内存, 1000Mbit/sec的网卡的机器.让我们来看看--20000个客户,每个为50KHz,100K