11.integrating Sap Hana And Hadoop.pdf

Uploaded by: Anisha
0
0

March 2021
PDF

This document was uploaded by user and they confirmed that they have the permission to share it. If you are author or own the copyright of this book, please report to us by using this DMCA report form. Report DMCA

Overview

Download & View 11.integrating Sap Hana And Hadoop.pdf as PDF for free.

More details

Words: 23,599
Pages: 99

Preview
Full text

Loading documents preview...

What You'll Learn Learn to integrate SAP HANA and Hadoop in three integration scenarios: Smart Data Access , MapReduce, and SAP HANA XS. Find out how to create a Hadoop test environment, and walk away with a compl ete technical understanding of how to integrate SAP HANA and Hadoop! 1

Creating a Hadoop Test Environment . . . . . . . . . . . . . . . . . . . . . 1. 1 1.2 1.3 1.4 1.5

2

. . . . .

6 9 16 23 27

Consuming Hadoop Data within SAP HANA Smart Data Access . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

34

Setting up SAP HANA Smart Data Access. . . . . . . . . Create Remote Data Sources and Virtual Tables . . . . Execute Simple Queries . . . . . . . . . . . . . . . . . . . . . . Example SDA Performance w ith Three Billion Rows .

. . . .

34 40 44 46

Developing and Executi ng Hadoop M apRedu ce Tasks from SAP HANA . . ........... . ......... . ........... . ......

50

2.1 2.2 2.3 2.4 3

Considerati ons for Setting Up a Hadoop Cluster. . . . Instal l Cloudera Direct or on AWS. . . . . . . . . . . . . . . Deploy a Hadoop Cluster on AWS . . . . . . . . . . . . . . Launch a Developer Edition of SAP HANA on AWS . Set up a Hadoop Developer User in SAP HANA . . . .

3. 1 3.2 3.3

. . . . .

. . . .

. . . . .

. . . .

. . . . .

. . . .

. . . . .

5

. . . .

Configuring Environments . . . . . . . . . . . . . . . . . . . . . . . . . Execute a Simple MapRed uce Job (File Read). . . . . . . . . . . Build and Execute a Custom MapRed uce Job (Document Scan) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

50 57

Utilizing Hadoop REST Services within SAP HANA XS . . . . . . . .

85

Accessi ng the Hadoop Fi le System (HDFS).... . . . ...... Accessing HBase. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

85 90

5

Future Outlook . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

96

6

What's Next?. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

98

4

4. 1 4.2

62

4

1

Creating a Hadoop Test Environment

The modern Enterprise Architecture needs to balance the storage, processing, and fast analytic reporting requirements combining both Transactional and Behavioral data. » Transactional data is best categorized by enterprise resource plannin g » Behavioral data represents the rapid growth of data in such areas as machine-generated logging and social media Figure 1 illustrates how these two different types of data can be harnessed using SAP HANA and Hadoop. .---Front End Servi ces Fiori/SAP Ul, 81, Mobile, Lumira, Predictive Analysis

SAF

I Federati o n Layer SAP HANA Real-Time Analytics. Virtualization , Predictive Analytic Libraries, Custom Applications SAF

1

1 ,.__

Enterprise Dat a ERP, Data Warehouse (HANA/Third Party Database)

Transactional

.

u

::I >.

-.; c:

< ~

Open Source HADOOP Massively Scalable Storage and Compute Batch Processing, In-Memory Engine, Ingest, Predictive Analyt ics Library

~

~

'E

:c 1-

Hadoop

-+ SAP

...

0 {].

Non SAP Sources Remote Sensors, Internet of Th ings, Social M edia, Logs. Remote A PI 's '---

Behavorial

Figure 1 M anaging Transactional and Behavioral Data

5

Creating a Hadoop Test Environment

This E-Bite demonstrates, step by step, how prototype test environments can be set up combining both SAP HANA and Hadoop. If you already have either an SAP HANA system or a Hadoop cluster, then the steps will still be helpful, since they will cover the additional setup required to ensure the systems can communicate with each other.

1.1

Considerations for Setting Up a Hadoop Cluster

A Hadoop cluster is a type of computational cluster that is designed to store and analyze huge amounts of unstructured data. Clusters exist in a distributed computing environment. These clusters run Hadoop's open source software, which is designed to run on low-cost commodity hardware. Hadoop is made up of a large collection of projects or libraries, covering areas from real-time data ingestion, document storage, and in-memory reporting engines. It's constantly evolving, and choosing the right project for your needs requires careful consideration. At Hadoop's core are several projects that are covered by this E-Bite: » HDFS: Hadoop Distributed File System » HBase: NoSQL Distributed Database » YARN: Second Generation Map/Reduce Framework » Spark: In-Memory Data Processing Engine » Hive: SQL-like interface to Hadoop » Impala: Cloudera's proprietary SQL-like interface to Hadoop Similar to Linux there are a number of companies that help you to install and manage your Hadoop cluster. Cloudera, Hortonworks, and MapReduce are the current leaders in this space. Their business models differ slightly: some believe in providing purely open source solutions whereas others favor selling their own custom applications as add-ons. If you are interested in just trying out Hadoop without connecting to SAP HANA, the easiest way to start is by downloading and running

6

I

1

Creating a Hadoop Test Environment

virtual machines on your own PC, provided by most of the main Hadoop vendors. Remember that a Hadoop cluster is designed to run on tens to thousands of machines (nodes). If you wish to test out integration with SAP HANA then we recommend that you set up your own Hadoop cluster. Broadly speaking your Hadoop cluster's performance improves linearly with the number of nodes added. Running it on a single machine enables you to test integration, but any runtime stats you record will greatly improve as your cluster grows. For building a scalable productive environment it's best that you contact the leading vendors, who have consulting services to help you determine the best hardware and software configuration to meet your needs. On-Premise Cluster

If you were to build an on-premise cluster, as a starting point you might consider 10 to 50 machines (nodes). Depending on your requirements each machine might have:

» 12- 24 1- 4 TB hard disks » Two quad-/hex-/octo-core CPUs, running at least 2- 2.5 GHz

» 64-512 GB of RAM Cloud-Based Cluster

Increasingly organizations are looking to cloud-based providers for their hardware hosting needs. This is a great place to start for both start-up companies and organizations taking their first tentative steps. Cloud providers such as Amazon Web Services (AWS) give the flexibility of launching a cluster for a limited time, on the hardware configuration of your choice, specific to your budget and test objectives. In this E-Bite AWS is used to host the cloud machines for both Hadoop and SAP HANA. If you wish to follow along with the examples then you should sign up with AWS (http://aws.amazon.com/) and the SAP Cloud

7

I

1

Creating a Hadoop Test Environment

Appliance Library (http://cal.sap.com). SAP HANA One and SAP HANA Developer Editions also deploy into AWS , so co-locating your Hadoop cluster and SAP HANA on the same cloud provider simplifies network configuration. Figure 2 gives an overview of the AWS system landscape (instances) and the steps followed in this E-Bite.

AWS PoC Landscape

Step 3: Launch HANA SAP HANA SAP (via SAP Cloud Appliance Library) ' - - - - - - ---r--------....;:;:.::....J

Clo udera Enterprise Hadoop Cluster

~ Cloudera Manager [port 7180]

Hadoop User Interface (HUE) [port 8888] Hadoop Cloudera

Step 1: Install Cloudera Director

Step 2: Deploy Cluster (1 .. n Nodes) Amazon

Web Services

Fi g ure 2 AWS PoC Landscape

Create the Test landscape In this E-Bite the following steps are performed to create the test landscape. 1. Launch an instance and install Cloudera Director. Cloudera Director is

a tool that simplifies the creation of a Hadoop cluster on AWS.

8

I

1

Creating a Hadoop Test Environment

2 Use Cloudera Director to create a Hadoop cluster on AWS with 1 ton

nodes. 3. Launch a Developer Edition of SAP HANA on AWS . If you already have a SAP HANA environment that can be exposed externally to the Internet then this step may not be required. The Cloudera distribution of Hadoop has been used in this E-Bite because it has tools such as Cloudera Director, which greatly simplifies the creation of a Hadoop cluster on AWS. Cloudera also has a Quick Start solution on AWS (http://aws.amazon.com/quickstart/) that will automatically launch a 10-node cluster; however, for testing purposes the flexibility of using Cloudera Director enables various cluster configurations to be launched, with more control over AWS hosting charges.

1.2

Install Cloudera Director on AWS

This section covers a highly summarized version of the steps necessary to install Cloudera Director on a new AWS instance. As a perquisite please sign up for AWS . Important Please ensure you also create and safely store an AWS Access Key and an

AWS Secret Key because they will be used in later steps.

For further details of AWS sign-up steps, please see http://aws.amazon.com/ and http://docs.aws.amazon.com/AWSSimpleQueueService/ latest/SQSGettingStartedGuid e/AWSCredentials.html. For more detailed information of the Cloudera installation steps, please refer to http://www.cloudera.com/content/cloudera/en/documentation/ cloudera-director/latest/topics/director_deployment_start_launcher.html.

9

I

1

Creating a Hadoop Test Environment

1

First, log into AWS and enter the EC2 Dashboard. To simplify all subsequent steps in this E-Bite, make sure you are working with instances in Oregon (US West 2), as seen in Figure 3. Other locations can be used but only the Amazon Machine Images (AMis) in this location have been used in this document. •-.

AWS v

Services v

~

Tags Reports Limits • INST1NCES

0 0 0 2

Elast•c IPs Snapshots Load Balancers Secunty Groups

US West {N Callfom•a)

EU (Ireland)

EU (f rankfurt)

Instances

0

Spot Requests

Reserved Instances

' utes C"

US East (N. Virginia)

Runnmg Instances Volumes Key Patrs Placement Groups

As1a Pae~f1c (Su-;gapore)

Easily deploy IRuoy. PHP. Java . .NET. Pytnon. NOde.js & DocKer applications with Elastic Beansta!k.

Asia Pacific (Tokyo) ide

A sia Pacific (Sydney)

- IM>GES

AMIS

Suwort •

I US West (Oregonj

You are using the following Amazon EC2 resources in the US West (Or)l

0 7 2 0

~

Aron M
Resources

EC2 Dosh boord Events

E~1· v

I

Create Instance

Bundle Tasks

South Amenca (S3o Paulo)

Figure 3 Set EC2 Dashboard for Oregon (US West 2)

Next we launch a new Linux instance (Figure 4), where we will install Cloudera Director. .-.

A WS v

EC2 Dashboard Events

ServiceS v

E""' t

"

1¥!iidN¥fiPI

Arun M..K:Dofld d.,.

Or~pn

v

Actio ns v

Tags

None found

Reports

limns .. INSTANCES

1nst.1ncts

Spot Requests Reserved Instances

Suwurt ...

You do not have any running instances in this region First time using EC2? Cneck out the GettJng Started Guide . Click the Launch Instance button to start your own server

Hf!ih@§b!JI

Fi gure 4 l aunch Instance

Search for AMI "ami-b8a63b88" and select it (Figure 5).

10

Creating a Hadoop Test Environment

.-.

A WS v

1. ChooseAMI

Setveces

E: 1t

v

2 C!\oos.e nstance Type

"

Ar on MacDonald...

3 Conftgure lns:ance

4 Add Slorage

5

T~

Instance

1

~pport ...

OrCQOn,

6. Configure Securt; Gtoup

Canc•l and Exit

Step 1: Choose an Amazon Machine Image (AMI)

Art AMI is a template that contains the software configuration (operating system. appiK-aUon server, and applications) required to launch your instance. .... You can select an AMI provided by AWS, our user community, or the AWS Marketplace; or you can select one o f your own M ils.

.,,.

Quick Stan

1 to 1 01 1 AMIS X

MyAMIS RHEL-6.4_GA-x86_64-10-Hourly2 . am• b8a63b88

AWS Marketplace

64 -b~

Community AMis

Figure 5 Search for an AMI and Select

Choose the "c3.xlarge" INSTANCE TYPE (Figure 6) . .-.

AW S

• . ChOoseAt.ll

v

Services

E~ 1t

v

2. Choou 1ns1anc• Type

Aron MacOoni!d..

v

3 Contgure Instance

4 Add Storage

5 Tag Instance

Support ...

Oregon..

6 connourt StcuM'/ Group

7

Re.,ltH~

Step 2: Choose an Instance Type 0

~<.>!':'P.~\~ -~P!1!':'.~!'!!

c4 4XIarge

16

30

EBSonly

Yes

Hog~

0

~!':'P.~\~ .~P!1!':'.'?!'!!

c4 8XIarge

36

60

EBSonly

Yes

1Q.~I9.~~!1

~<.>!':'P.~\~ RP!i!':'.~~-~

c3 1arge

3. 7~

2 X 16 (550)

~~P.":!1~ .~P!if!l.~~-~

c3.xlarge

7.5

2 x40 (SSO)

Yes

MOderate

~<.>!':'P.~ \~ .~P!i!':'!?~.~

c3 2xtarge

8

15

2 xSO (SSO)

Yes

H lg~

!<~P.~\~Rp!i!l)!~~.~

c3 4XIarge

16

30

2 x 160 (SSO)

Yes

Hlg~

!<<.>!':'P.~\~RP!i!':'!~~

c3.8xtarge

32

60

2 x320 (SSO)

1Q.~ig_~~!l

~<.>!':'P.~\~.~p!i!':'!?~.~

cl .mecHum

1.7

1 x350

Mooerate

•

Can eel

Prtv iOU$

Revrew and Launch

Moderate

I

Next: configure l nst:.ne,t Details

Fi gure 6 Select c3 .xlarge

11

Creating a Hadoop Test Environment

J

1

Under Configure Instance Details, ensure NETWORK is assigned to a Virtual Private Cloud (VPC). Create a new one if necessary (Figure 7). .......

A W S ....

, Cnoose AMI

Services ..,

E~1·

2 Choose lnstan'e T~pe

....

Aron .,.,}(.:Don d kt ...

3. ConfiQure Instance

4 Add SIOraoe

5 Tao Instance

Oreo:)O"l...

6 Confioure Setunt/ Gtoup

Sup·port"'

7 Re ...leN

Step 3: Configure Instance Details Configure the instance to suit your requirements. You ~tan launch multiple instances from the same Afv11, request Spot Instances to take advantage of the lower pricing assign an access management role to the instance and more

•

Number of i n 5t;~.nce5

Request Spot Instances

Pu rchasing option Network

1

c.:fVii0',_ -7,_,a::c 61c:_t7:_:1_,_ 2 -" (1c:_72,;:.3::.:1...::0'-" .0'-' 11_, 6)'-'(0::ce:.:.:la,un"-')--~---' ·

Auto-assign Publle IP

C

No preference (defautt su~net in any Avail.lbili!)' Zo •

Sub not i

Pl;tcement g roup

lAM ro le

Create newVPC Create new subnet

vse suDnet senlng (EnaDie)

No placement group

• C

None

Cancel

Previous

Create new lAM role

Rev1ew and Launch

N8xt: Add Stol'3gt

Figure 7 Configure Network Settings

Under CONFIGURE SECURITY GROUP (Figure 8), select an existing group or create a new group. Note In subsequent sections the TCP Ports for Cloudera Manager [7180], HUE [8888]. SPARK [10001]. WebHDFS [50070]. WebHCAT [50111] and HBase Stargate [20550] may need to be added to this group.

Create a new Key Pair file (.PEM), as seen in Figure 9, which is used to access your AWS instances via an SSH tool such as PuTTY.

12

Creating a Hadoop Test Environment

.-.

AW S

, Choose AMI

v

ServiceS

Eu 1t

v

2 Choose Instance Type

Ar on M
...,

3_Contioure Instance

4 Mel Storage

s Tao lnstanc e

Or egon ...

6. Configure Security Gfou.p

1

SVpport "'

7 Re VIew

Step 6: Configure Security Group A security group is a set of firewall rules that control the ltatflt tor your instance. on thls page. you can add rules to allow specific. tratflt to reach your instance. For example, if you want to set up a "•eb server and allow Internet traffic to reach your instance. add rules that allow unrestricled access to the HTTP and HTTPS ports. You can create a new security group or select from an existing one below. Learn more about Amazon EC2 security groups. Assign :t s•curlty g roup:

Cre.ate a new security group • Select an existing security group

•

Type

S.eurity Group 10

N:tmt

Otscription

Actions

sg-2ceeOe43

detaun

defaun VPC sewrity group

Copy to new

;

Source

Protocol

Port R.1nge

Af1 traftic

All

All

sg-2cee0e43 (defaun)

SSH

TCP

22

0.0.0 0/0 Cancel

i

Previous

Revu?w and L1.unch

Figure 8 Review and Launch

Select an existing key pair or create a new key pair

X

A key p air consists of a public key that AWS stores, and a private key f i le that you store. Together, they allow you to connect to your Instance securely. For Windows AM Is. the p rivate key file Is required to obtain the password us~ to log into your instance. For U nux AtAis. the private key file allows you to securely SSH into your instance Note: The selected key pair will be added to the set of keys authOrized for this instance. Learn more

about f@movtng ex1st1ng key pa1rs from a pub lit AM I. Create a new key pair Key p air name

IHAOOOP_.AWS_VPCI Down load Key Pai r

•

You have to downlOad the private key file (" pem file) before ~ou can continue Store i t in a secure and accessi ble location. You wtll not be able to dovmload the file aga1n aner 1rs createCI.

Cancel

Figure 9 Creat e a Key Pair

13

Creating a Hadoop Test Environment

1

After several minutes the n ew instance will be created, as seen in Figure 10. •- .

AWS

ServiceS

v

v

E" 1t

Aron

"'

I£!1!.!JN§£!.BJ

EC2 Dasn~oaro Events

Tags

Q

Conntct

MilCOon<~old ..

N V1 rg •n lil •

Su~port "'

Actions v

seorch : m31arge-

1to 1 of 1

Reports

•

limits -

N

•

Name

•

ANCE

In stances

Instance 10 ..

Instance Type ..

Availability Zon e ..

i-ee2e7 113

m3.1arge

us-east·1d

Instance State ..

Stah iS Chec

() 212 thee

running

Spot Requests Reserveo Instances - IMAr-cs AlAIS

Bundle Tasks

- ELJ>S IC BLOC' Volumes

lORE

lnst.lnct : I i·tt2t711 3

Public ONS: te2·54·90·140·73.compute·1 .;~~zonaws.com

l!!l!!!:l •

Snapshots Oescr1ptlo n - NE rwor

c. SECURITY

Status cneci<S Instance 10

security Groups

Monitoring

Tags Public ONS

i-ee2e7 113

Etasuc IPs

1.amazon3".\<S.com

Placement Groups

Instance state

Loao Bal
lnsumce type

Key Pairs

Private DNS

Netv.ork Interfaces on.,,.toiD• •

Feedbac k

ecl·54·90-14(). 73compule·

Q

running

Public IP E1os1ic IP

m3 131ge

rp-10.233-91· 188.ee2intem31 10

')'1.~

Q 1 t AA

54 90 140.73

Availability zone C"O.,.,.,.;f\, nrn .. n•

Eng liSh

us-east-1d O..,tl4.,_1=,..,onvi••

TetmSofUse

Pnvacy Pohq

Fi gure 10 New Instance Running

Select the n ewly created instance and click the

CONNECT

button.

You should be presented with details similar to those shown in Figure 11. In this E-Bite the PuTTY tool is used as the SSH client in conjunction with Pageant (an SSH authentication agent for PuTTY). In order to use PuTTY the PEM file must first be converted to a public/private key file (.PPK) using PuTTYgen.

14

Creating a Hadoop Test Environment

Connect To Your Instance I wou ld like t o conn ect with

X

• A standalone SSH client A Java SSH Client directly from my browser (Java required)

To access you r instance: 1. Open an SSH client. (find out how to connect usrng PuTTY ) 2. Locate your privale key file (HADOOP_AWS.pem). The wizard automatically detects the key you used to launch the instance. 3. Your key must not be publicly viewable for SSH to work. Use this command ff needed: chmod 400 HADOOP_AI-IS.pem

4. Connect to y our instance using Hs Publi c DNS:

ec2-54-90-140-73 . compute-l.am.azonaws . com Ex>tmple: ssh - i HADOOP_AI-IS. pem ec2-use r-~ec2-54-90-140-73 . compute-!. amazonaws. com

Please note that in most cases the username above will be correct, however please ensure that you read your AMI usage instructions to ensure that the AMI owner has not changed the aetault AMI username If you need any assistance connecting to your instance, please see our connectron documentation .

Figure 11 Connection Details

PuTTY, PuTTYgen, and Pageant are available to download at http:// www.chiark.greenend.org.uk/~sgtatham/putty/download.html.

Next log into the newly created instance using PuTTY. The userid is "ec2user" (Figure 12). - __ . ~-

·::__ -

·. .: .__

~~ =

..--_ -- -

,.,._

-

-,~=

•

-

·,~

--

...:...:~

;

''

--.··-- ~

-

_--·

,.. •

- ~

"- ~-

[ _~

... -"

-

"

···"

.-- -- - . ··- ·-+- ~ - ,.-

•

Figure 12 Log in as ec2- user

Execute the following Linux commands to download Cloudera Director (Figure 13):

15

I

1

Creating a Hadoop Test Environment

sudo wget http : //archive .cloudera . com/director/redhat/6/x86_64/ director/c l oudera-director . repo -0 /etc/yum . repos .d/c l ouderadirector . repo

Figure 13 Download Cloudera Director Installs

Install Cloudera Director using the following command: s udo yum install cloudera-director-client

After several messages you should be complete (Figure 14). =--~.,._.::.

-~

l

__ -::-~ : J.~::.~-~--,_..---_--

__ .... . .- . ...

r-

~. -

: _ . _

._-_ . :1 _::_~

..-

.: :_ __

~ . :.__

-~ · '"""'- ~'"

,.... - .....

Fig!lre 14 Cloudera Director Successfully Installed

For more detailed information about the Cloudera Director installation steps please refer to http://www.cloudera.com/content/clouderalen/ d acumen ta ti on/cloud era-director/Ia test/to pies/director_d ep laymen t_start_ launcher.html.

1.3

Deploy a Hadoop Cluster on AWS

After Cloudera Director has been installed it is now possible deploy a new Hadoop cluster on AWS, but first some configuration files need to be

16

I

1

Creating a Hadoop Test Environment

maintained; for example, setting the number of nodes that should be created and specifYing which Hadoop libraries to install. Using an FTP tool (e.g., FileZilla) copy the PEM file created earlier and copy it to the instance in the following directory location: /home/ec2 -user/Hadoop_AWS_VPC . pem

At the Linux command line, copy an example configuration file and open it, ready to make the necessary changes: sudo cp /usr/l i b64/cloudera-d irector/client/aws .s i mple . conf /usr/ li b64/ cloudera-d irector/client/aws . conf sudo vi /usr/l i b64/cloudera -di rector/client/aws .conf

Within the newly created config file (aws.conf) edit the file to the settings shown in bold: accessKeyl d - "[Your AWS #Ma ke sure t he value i s secretAccessKey- "[Your #Ma ke sure t he value is

Access Key goes here)" enc l osed in do uble quot es . AWS Secret Key goes here]". enc l osed i n do uble quotes .

ssh use rname : ec2 - user # for RHE L i mage pr i va t eKey : / home/ ec2 -user /Hadoop_AI4S_VPC . pem ins t ances I ml x I t ype : m3.1arge image : ami -18a23f28 cluster I products ( CDH : 5 # in cludes Impa l a and Spark se rvices : [ HOFS , YARN , ZOOKEEPER , HBASE . HIVE . IMPALA , OOZIE . SPARK, HU E] nodes I count : 5 #Th i s can be i ncreased for more real i stic Cl uster II tes ti ng i nstance : ${instances .mlxl Listing 1 Example aws.conf File Setti ngs

17

I

1

Creating a Hadoop Test Environment

Note This will launch 6 instances on AWS [a 5-node cluster and an additional node running Cloudera Manager]. Each instance will have instance type m3.1arge, and will be based 0111 AMI ami-18a23f28. With this configuration file it i s very easy to launch different size clusters as required.

With the configuration in place we are ready to create and launch a Hadoop cluster. Creating a cluster may take many minutes to execute. s udo cl oude ra-director bootstrap /usr/lib64/c l oudera-director/c li ent/ aws .conf

While the cluster is being created you will see status updates similar to those in Figure 15.

Figure 15 Cl uster Created

If you view instances in AWS EC2 dashboard you will gradually see new instances being created (Figure 16).

18

I

1

Creating a Hadoop Test Envi ronme nt

.-.

AWS

Services "'

v

E_ t

Aron M.J<.'Oon.Jid "

...

Ff!lld@@ii,!.M

EC2 Dashboard

Connect

Actions

OrCQOn "

Supt)Ort ...

v

0

0

Events Tags

~

Repons

.

Name

Instance 10 •

1

~

1 to 7 of 7

Instance Type

Availability Zone

,..9e6f1cS8

m3 1arge

tn··west-2a

i-74770482

c3.xlarge

us-west-2a

d oudera
:«)532016

m3 1arge

us~"-est -2a

eloude
i-e1532017

m3 1;vge

l$- .\'t~St·23

eloude-0 t c0-995f-9cfo.837~4d

o-de532028

m3 1arge

UVo\'e$1·23

AM IS

eloude<3-diree I0<·1a.
i-dc53202a

m3 1;vge

us....,-est·2a

eunate Tasks

eloudt
kld$3202b

m3 13fQi

us·west·23

Limits

cloudera-d.reclor-tcab5930-30bf-45ba·be tS-Sc bad1bSeb69

- INST.4NCES

•

lnst~nets

Spot Requests Resti'Ved Instances

A.

C

()

0

vorumes lnst.:mot: Ji-74770482

Snapshots - NWOP.SIJH

security Groups

Public ONS; te2·52· 24·141·34.us·wnt·2.eomputt,.lm.uon.lws.eom

SUltU$ CheckS

Desc ription

Monitoring

Tags

erasuc IPs

Figure 16 Instance Launched in AWS E.C2

If you have not done so already, ensure that the security group of y our cluster allows inbound TCP traffic on 7180 and 8888 (Figure 17) . • -.

AWS

v

EC2 OaSI11>0arO

events

S@TVIC@'S

v

::

t

Aron MacOonil',d"

...

i§IB+h!f'9r!i!J

Orego'l ..

Support "'

Actions v

Tags

1 to 1 of 1

Reports

Stcu rity Group; sg·2ceaou~

Limits - INS

Description

NC...S

Inbound

Outbound

Tags

Instances Spot Requests

Edit

Reserved 1ns1ances T'fpt

Protocol

Pon Rangt

sourct

All tratr~

All

All

sg-2eeeoe• 3 (Oetauft)

custom TCP Rule

TCP

7180

000010

SSH

TCP

22

0000'0

1

- IMAGI

AMI$

S<Jnote Tasks IS lC

0'

VOlumes SnapshOts - N

nv,

SF -UR1TY

Security G roups

ElaStiC IPs

Fi gure 17 Open Port s on Security Group

19

Creating a Hadoop Test Environment

I

1

During the process (Figure 18) it may also be possible to access Cloudera Manager using your Internet browser with the public IP address of the first newly created instance, on port 7180, e.g., http://:7180/. Note The initial userid/password is ADMIN/ADMIN.

Check the progress of the cluster installation, as seen in Figure 18. CJOUdera· manager

f l0

Home Clusters .. Hosts D1agnost1cs - Audits Charts -

Backup -

~

Support · .1. admln •

Adm1n1stratton -

Last Retresneo May 10 20 15 3·03·25 Al.l EDT

Command Details: First Run Comm3nd

.t

Context

St.lt US

First Ru n

Finished

SUrtt d 3t

Endtd .1t

r.tay 10. 2015 2 54 31 Al.l EDT

May 10. 2015 3 03 24 AM EOT

Command Progress Compreteo 17 ol 17 steps.

../ lni1jarizing ZooKeeper Service C~plctcd

1 sU:p$

~v<:cu:ofu lly .

../ Starting ZooKeeper Service Coapletcd 1 steps successfu lly.

~ rJ'

../ Cneckong i!tne name Olrectorres o! rne Namer-aooe are empty Formatting HDFS only~ empty Successfull

fo,...att ed fl.alllt-Uode.

Fi gure 18 CIDudera Manager Showing Cluster Install Progress

Once completed, view the status of your Hadoop cluster using Cloudera Manager (Figure 19). Traffic lights indicate the status of each of the main services running.

20

Creating a Hadoop Test Environment

fJ0 ~ Sup port ·

CIOUdera· manager Home Clusters .. Hosts OragnostJcs ·

Audrts Charts •

1

J

.1. admln •

Backup - AdminiStration -

30 minutes preteoi n~ May 12 20 15 810AI.I EDT

Home

Stotus

All Heolth Issues [!D

• C5-Simple-AWS

0 0

tCLIH 53 l t>arte s)

Configurotl on

ED •

Charts

iii Hosts

AOO ClUster

All Rec,ent Commonds

30m 1h 2h 6h 12h 1d 70 300

Cluster CPU

ti HBASE·1

•

Cl HDF5-1

0

HIVE· I

0

i'II HUE-1

0

'/ IMPALA· I

-----JA~------~A--

:.\" 1

0

B OOZIE-1

0 0

¢ SPARK· I ~~~ YARN-1

l

0

j ZOOKEEPER· t

~ 3: s

Cluster Oisk 10

Ji

Cloudera Management Service

~I

' \

li

Figure 19 Cluster Running

Note With small clusters it is not uncommon for there to be some errors, indicated with red traffic lights. Typically you need at least three nodes, and each node should have a minimum of 15 GB memory to remove most warnings and errors on HDFS. Your cluster may still operate, but increasing your cluster hardware specification will help future tests.

Next test that the Hadoop User Interface, or HUE, is running. Access HUE using your Internet browser with the public IP address of the second newly created instance by Cloudera Director, on port 8888, e.g., http:// :8888/. Note For larger clusters you may need to navigate within Cloudera Manager to find the node on which HUE is running.

21

G1 •

Creating a Hadoop Test Envi ronment

When you first log on, you will be asked to create an admin ID and password. Once logged in, install the Hive Table examples from the QuiCK START menu. (Figure 20). H

ue H~

ft

Query Edttors ..,

About Hue

O.tll BtOWMIS

ouick St>rt

.,

Wo-rtcfiOW!I

conr~guratiOn

v

Search

Security ..,

•

II

cc v

0

'!I!']

se"'"' Logs

Quick Start Wizard - Hue ,,. 3. 7.0 - The H adoop Ul

Step 1

0: ct\er( k Conflgurat 'Yt

Step 2 8 Examples

Step J W Users

Step 4

)Ill Go1

Install all the application examples .!. All

Install individual application examples

.!. Impala Editor

S.ek

Ill

Figure 20 Install All Examples

When the installation is complete access the Hive query tool, with the QUERY EDITOR to EXECUTE

the following run SQ L script (Figure 21);

selec t count(*) from samp l e_07 Note Shutting down your Hadoop cluster will likely invalidate al l your cluster settings because the internal IP setting may change. Your cluster w ill be unl ikely to operate properly when restarted after a shutdown. Terminate you r test cluster when no longer required to avoid excessive AWS charges. It's best to start out testing with small one-node clusters, and only create larger clusters when you are comfortable operating a more realistic Hadoop environment.

22

I

1

Creating a Hadoop Test Environment

~

*

Ue

Cluefy EdotoB v

>\ Hive Editor Assis t

~Browsers v

auory Ed itor

Worldlows v

My Queries

5Mrch v

saveo Queries

Security v

•

•

C:C v

C 0

Table name

!!J

Ill sample 08

·-

-

Save as. ..

Recent qoones

Explai n

Query

log

or crea1e a

New query

Columns

Results

Chan

_co 823

Fi gure 21 Execute Hive SOL

Your Hadoop cluster is now ready to use.

1.4

8 ,_

0

defauft

llllsample_07

1

History

Sethngs

OATAB...

J

Launch a Developer Edition of SAP HANA on AWS

Next we need to launch an SAP HANA system. In the following steps we have used the SAP Cloud Application Library. See https://cal.sap.com for more details. Choose a solution to launch on AWS such as SAP HANA Developer Edition 1.0 SPS09 (Figure 22). Under general properties please ensure the same region (US West 2) an d network (VPC) created earlier are used (Figure 23). It's not essential to use the same region and network, but it will simplify network security.

23

l'O

Creating a Hadoop Test Environment

~ SAPCioudApplance Ubfary

HOME

INSTANCES

SOLUTlONS

ACCOUNTS

USERS

Solutions: Filtered (1 of 57)

IJ: S

f)

View: All

Name

Publisher

Activated By

Operations

~ SAP HANA developer edtbOn 1.0 SPS 09

SAPSE

Aron MacDonald (SO

Create lnstanct

Figure 22 Select SAP HANA SPS09 Solution

II

Cost per Hour

Define Geotral Properties

USD 0.48 Cantel

Next 30 Days

USD 25.45

Olscl.aimer

Enter me oenef'31properties of me solution instance: Name.

*

HANA SPSo-9

Atwunt

*

TEST1 {Amazon Web $eNices)

Reg on:

*I:!IElllfj

* Subnet *

NeMo~:

v

I j Information tor addi!lonal reQions

vpc-7;)611712 VPC subnet 1172.31.0.0120 ( 4091 treeiP addresses PubiJC Stabe IP Addren

Figure 23 Choose EC2 Region

24

1

Creating a Hadoop Test Environment

Finally a new SAP HANA instance will be created (Figure 24). 1::..:;'

SAPCIOuciAI)I)Iianeet.iClrety

HOME

IN STANCES

SOLUTIONS

ACCOUNTS

USERS

Instances: Filtered (1 of 3)

u=

V~ew: All

tns.tanee Name

~

•

HANASPS09

SOIUUOn

NANASP$09

Created By

Created On

X

Operations COMtCI

Slit

Figure 24 SAP HANA Instance Created

Connection details of the SAP HANA instance are available here, as shown in Figure 25.

Figure 25 Get Connection Details

25

1

Creating a Hadoop Test Envi ronment

Open the SAP HANA instance in the browser to see all the logon information of the newly created instance (Figure 26). r!

C

52.10.211.82/sap/deV$/edition/index.html ~l1 ~1 \1 1..: ~ 1 .-, I C,M OI

\AJV CJI"\MMC:t\ \J:>t:l

ll'::t l~U

Ut:IIJW,

Sample Appl ications

Workshop Materials

Server Connectivity

~ Internet ot Th•nas (loT)

~ What's new '" SP$9

Hostname: 52.10211.82 Servec- lnstbnce: 00

~ Ptr$ona l Spend AnaylSIS (P$A)

~

~ Geo Location

Sotuttons to the Nat•ve

Users

Development Workshop

Ao!e: System

~ Ooc:umont~tlon 10f'

U~er. SYSTEM

tho OpenSAP

Password:

COUr$• ~ SFLIGHT from t he ASAP world

~ Sot ubons to t he OpenSAP course

Role: Develo~r U:•r. COOEJAMMER Password. Cod9Jam2015

~ ACME Catalog

~ Mallmg List (enlry.html for .nput)

Role: Predlc-•ive Analytics. User: AFLUSE.R Password: CodeJam201S

Figure 26 SAP HANA Connection Properties

You will now be able to see the instance running in the AWS EC2 dashboard (Figure 27) . •-.

AWS ..,

Servl~

v

E

v

A.'0'1

FW,J@jihiji

EC2 oasnDoOard

connect

Action s

MacDooad ..

O"eQ0'1 ..

~~xYt

"'

v

Events ~ SNreh KANA SP$09

Tags

1 to 1 of 1

Reports

Limns .!!~

• lN

....

•

Nam e

a

""NA SPS09(SAP HANA)

lns~ net 10

.....

lnst.an.ce Type

Availab ility Zo no

lnstancu

SpotRequesiS Reserved Instances - IMA<

•

AMIS

Instance: I i ~e$9ca 70f (HAN A SPS09(SAP HANA ))

Buncne Tasks

2.computt ,.tmJ.zon.lW'1.COm

t;

.S lL B;.V'.J

OtscripUon

Votlmes

• Nr'\'INIJ

Securily Groops

Status Che<:k.S Ins-tance 10

snapsnOts 1Rr

ln..-onCA! s t3te In s tance type

Monitoring

i<69ca70f

"""""9 r'3 xlat9e

Pu blic ON S: ec2-52·10 ·211· 82.us-west-

Tags Public ONS PubliciP

ec:2-S.2-t0-211-82 m-west·

2 compute amazonaws com S2 10 211 82

Elastic IP

ElaS!IC IPS

Figure 27 Shows SAP HANA Run ning in EC2

26

1

Creating a Hadoop Test Environment

Note Be sure to shut down the SAP HANA instance when not in use to avoid expensive AWS charges.

1.5

Set up a Hadoop Developer User in SAP HANA

In the rest of this E-Bite you could follow the SAP HANA steps using the SYSTEM user, but in a realistic scenario a different developer user wou ld be used, with the necessary authorizations granted. Following are the steps used to log into SAP HANA for the first time and create a developer user. Note It's assumed you have already installed SAP HANA Studio and have added the newly created system in Figu re 26, using the SYSTEM user, with the password specified during the AWS installation phase.

Create Hadoop Developer User in SAP HANA

Log in as SYSTEM and execute the following SQL to create the Hadoop Developer "HDEV01 ". CREATE USER HDEVOl PASSWORD Password! ; I NSERT INTO _SYS_REPO . PACKAGE_CATALOG(PACKAGE_ID , SRC_SYSTEM , SRC_ TENANT , DESCRIPTION . RESPONSI BLE, IS_STRUCTURAL) VALUES ( ' HOEVO l '.' HOB ', '' , ' HD EVO l ' , ' HOE VOl ' , 0) ; CALL GRANT_AC TI VATEO_ROLE( ' sap . han a . xs . ide . ro 1es : : Oeve 1oper ' . ' HOEVOl ' ) ; GRANT EXECUTE ON REPOSITORY_REST TO HOEVOl ; GRANT EXECUTE ON GRANT_ACT IVAT ED_RO LE TO HDEVO l ; GRANT EXECUTE ON REVOKE_ACT !VATED_ROLE TO HO EVOl ; GRANT REPO .READ , REPO . EDI T_NATIVE_OBJECTS , REPO .ACTIVATE_NAT IVE_ OBJECTS, REPO . MAINTAIN_NATIVE_PACKAGES ON "H DEVO l" TO HDEVOl ; GRANT REPO .EDIT_IMPORTED_OBJ ECTS, REPO . ACTIVAT E_I MPORTE D_ OBJECTS, REPO . MAINTAIN_ IMPORTED_PACKAGES ON "HDEVOl " TO HDEVO l;

27

I

1

Creating a Hadoop Test Environment

1

GRANT CREATE REMOTE SOURCE TO HDEVOl ; Listing 2 SQL Script to Create SAP HANA User

Next navigate to ADD SYSTEM WITH DIFFERENT USER. Under AUTHENTICATION BY DATABASE USER type in the username "HDEV01" and password "Password1 defined in the SQL script (Figure 28). Click FINISH to log in. "I

ll:i

• Con- 1 Sys<em'

Fole Edo< NMg•te 9..,«1 W1ndow Help

r.l ·

.

~

· ':? ¢> ·

=

~ Sys<ems !:!

~ · I C3 U · a ~ s ~

v

itb HDB (HDEVOI)

C

.

!!:! •HOB • SQL Console I

0

•Us.age, Cu<..~

1

HOB (SYSTEM)

!:!

'

9! I I~ SAP HANAAdrrMnistrationConsolc I

QuickAcu.ss

.

e "

=o

P SKMR.sql

f'.l o I !!.

52~0.90~35 00

-®

Add Sys:tem with Diffe-rent U$er - 'HOB 52.10.90.135 00'

J.ofJool

Conne
"'

. . .

~(PACkAGf_IO~ SRC_SYSTE.H, SRC_ Tf.IWIT , O£SCRIPTIQt4, AfSPOJfSI8 LI

'"''""'.,e);

Specify the properti~ fot conn«ting to the system.

~ $ . ide . rol es : : Otvelo~r · , 'HDEV91' ) ;

Authentication can be carried out using the current operJting system user or 11 var.cl SAP HANA databas.e user

~ ""'""''

TO HOf\101;

E TO HOfV91;

~ECTS1 REPO.AC TIVATE_riA.TM_OSJE(TS, REPO.MAINTAIN_riAUVf_PACI

J Authentication b)' curttnt op~ating syntm user

jfo.ACT IVATE_IWOATE0_08lE
o Auttwrtiution by datablise user UstrName HOM !

Password:

········~

~ Stort uw n•me •nd ~ssword m st
"'

t:)ConnKt using SSL

'

-

.

~ Enablt SAP start s.e:rvi
f] UstHnPS

-

c:1

®

< ~
I

N"">

II

f inish

II

Caned

v=o

I

'

V•lut

. r-

'"

I

if!,, HOB (SVST!M)

3-

Description

Conn«bon Propcrbes

.

Palh

/ S1.10.cxi.13S«<:SlNGLEOS:SVSTEM

..

' '

Figure 28 Connect HDEV01 User

Now the HDEV01 user should be added and schema HDEV01 should be visible (Figure 29).

28

Creating a Hadoop Test Environment

fo S~tms

t3

iO ' HD8·SQlConsolel ~

0

i!t • l !
tb H:>8(HOMI) 4

a. Cmlog

a

ID SQl
9ubbc ~nonyms

l HANIV<S_SASE

~

!L

PASSWMO huli!Ot"dl;

_

.

"" Proudum

GaAHl REPO . EDIT..I~HO...OSlEClS, RIEPO . ACllVATE... IMI'OaTEO...oeJECTS, AIEPO.KA.IIITAlH...l.HPOfiTED...PACICAGES

~ a. ~nonyms ~ c.

0

vALues ( 'I«V.t', ' HOe'. ·· , 't()(Wt ' , ·•otvet· ,e);

at UIMModffs W f'PM Qut.ySourcts W. F\lfl
..

~OEVGl

0

•I

INSERT INTO _SVS_SttPO.P.\CKA6E_UTAlOG(PACIWiE_IO, SfiC_SYSTVt, SRC_THW4T, DESCRIPTION, RESPONSI8 ll

- . column Vi~

~

lA XlPBkMR~

suooomoo

HDB (SYSTEM)

(;AAHT
l•bi«S

it- Triggers -. Views

...... ~ lOT

oi iAPJVINA_AOMIN .I SAPJiANA_OEMO ,. SAP)'i~NA_EPM_NOO

J

SAPjiANI;_TEST

o rel="nofollow"> SAP)iANA.,fESl,..OEMO ,& SAP_REST.;\PI

MI& SAP)<S_LM ~ SAP..XS_lM_PE

C] Propelbes Sl 0 •

.J SVS

..,.

.,a svsnM ~

.I UIS

4 WllE J WORKSH~

"'""""' Figure 29 HDEV01 Connected

Create Repository Workspace Next we need to create a repository workspace on the local machine, to store custom development files_ Choose the SAP HANA DEVELOPMENT perspective, select the Hadoop developer user, and create the workspace (Figure 30). Jj $AP HAHA ~loptMnt • $AJI HAHA S.t\ldio File

Cl ·

E••

N~ltg~e

Search PfOJC'Cl Window '!). · 0 . ~ ... . 0

"""

fO Project bplorer 11) Rtpoiit0tic:5

SI ~ Sy$tem1 .,~

-

0

G

"' ~

6

llil__ C>

.

0

.

• I>

. .

Q\IJC'i(A<
rs I 4IC w HANA Admini:Stration Console 18' SAP HANA Development I G

0

v

1b H08(HO~ ~ H06(SVSl ~ CruteRepolltoryWotlup.l<e

l.;

Odtte F~" UsC'I''s Wo~acu RtfrHh

f5

l ogOff

Figure 30 Create Repository Workspace

29

1

Creating a Hadoop Test Environment

Choose a local directory to store custom fil es (Figure 31). ICI Create New Reposito.y Workspace

'

~

@)

Create Workspace Create a new workspace in a loution based on the specified workspace name and root.

SAP HANA System ~

I Add System... I

(Fh HOB (HOEVOI)

[BJ HOB (SYSTEM)

l og On..

fZI Use Default Workspac.e Worbp~c

N.jr e.

(Oefauk)

I

C:\ Users\Aron.MacOonald\hana_work\ HOB_OO_HOEVOl

Workspace Root:

Browse...

I

Worbpace location: C:\Users\Aron.MacDonald\hana_work\ HOB_OO_HOEVOl \_empty_

<1)

I

Finish

II

Can
I

Figure 31 Set Lo cal File Locat ion

The repository is now ready to use (Figure 32). D

SAP HAW. On~loptMnt. SAP HANA Studio

Fil~

Edit Nlli?"t: St¥ch

Proj«t

"""

~

WW~dow

H"p

(> · 0 · 2- 4 · 0 0 0 &

_t'j •

. . .

. . Q\lid:Accc:ss

fD Ptc,«t &fllottr

=o llli'.l o '4 v

~ Reposdonts &3 ~ Systems

e ~

ld I ~ SAPHANAAdi1Wii:!.t.r~tionConU~If: I:IEJ WHAHAOeveiOpm«~e: l

=o

• 1/, HOO (HDEWI) LJ' (Dd•uh) • (:llheni,Aron.MacOorwld'·.Mru_v.oli: HOB_ 4

(8 HDE\'01

lJ

l!! u HD8 (SYSTEM)

Figure 32 Rep ository Defined

Create Hadoop Examples Project

Next create a SAP HANA XS Project, under the Project explorer tab, to store all the custom files (Figure 33). Choose the appropriate project, in this case XS PROJECT, under APPLICATION DEVELOPMENT (Figure 34).

30

I

1

Creat ing a Hadoop Test Environment

iCi SAJ>HANA........,... • w -

N·

--- --

--

Filt Edit N.wg,ct 5(.,(1\ Ptoj«t Run

--

--

---

ol> · O · ~ ,H' · Cl OG S' ·

• t;>

-

C> .

QuickAccm

fO Pn:,t
-----

-------------------------------

--

Wll'tdow Http

=

C

8 ~

~

f{ll ~t.q>osconts ~ Syltcm~

..,

Now

a! I 4S SAPHANAAdn-.n~tutionConwlt fO'SAP HANAI?we!opmeM: I

:

("j Proj«t...

''""

"

Refruh

£•• Show In

AII·Shifi:·VI •

Copy

Stat1c Wtb ~toj«t

("j &..,.,.....

n

Ct"•N

O:h«...

Ctti•C

..........

Copy Quihf•c:d N•mt

II!

Ctri•Y

Dtliftt

......... ....'"" ........ t'

rs

Rdruh

Figure 33 Add a New Project to Store Custom Java Code

lei New Project

13 liiiii3iilj

Select a wizard

=~

r

Create an XS proj«t

-

-l.

Wizards:

I

type fifttr text ~

G). General t> ((p Eclipst Modl!ling Framework 1> (d. Java ~ ~

JavaScript

fat, Plug-in Development • ~ SAP HANA 1>

• ca. Application Development

& XSProject

~ ~ Web ~

C!!, Xtext

~ I#

<1>

Examples

I

Next>

IL

F•nish

I

Cancel

I

Figure 34 Choose XS Proj ect

Input the project name "HadoopExamples" (Figure 35).

31

1

Creating a Hadoop Test Environment

XS Project

Create an XS project. You can share your pcoject or save it lo.c:ally.

0

Share project in SAP repository

Project name Hldoopb:ample~ Use dtfault loc.hon 4t,.;~r

8roW1-~-

C:\Apps\ GBAMHB\HANA.WS..918K2\HodoopExamples

Worting sets

C) Add project to working sets 1.

q

S.l«t...

;;~

Noxt>

Figure 35 HadoopExamples

Add the project as a subpackage in the SAP HANA repository by checking the ADD PROJECT FOLDER AS SUBPACKAGE box. Then click NEXT (Figure 36) .

•

lei N.,. XS Project

·I El ~

Shart Projt
E7 -

1Rt:pository Worlupaces.: li'iD.iouft) • C\U•.,•\A•on.MocDonold\h
<

'"

Re-positoty Package:

IAdd Wookspoct•.• l

'

I

HOEVOl.HadoopExampiH

Browse...

I

~ Add Project Folder u Subpackage

Ntw Project loution:

C:\U~trs\Aion.MacOontJid\hant1_work\HOB_OO_HOEVOl\_tmpty_\ H

1-

(1)

I

< Baclc

II

Noxt>

I

Fm1sh

I

Caned

I

Figure 36 Add Subpackage

32

1

Creating a Hadoop Test Environment

1

Now we can create the project by clicking FINISH (Figure 37). I @J ~

I

~ New XS Project

Create Objects You ciJn choose or enter a name for objects to be created in your project. All obj~cts chosen ar~ cr~ated and activa t ~ automatically upon creation of the

a -

Common Objects

..,.

Schema

Enter r

DOL

After creating a schema, enter a name to create a DOL

XSJavaScript Entt

1e to ere. tea sc

n

.. '

ocr

.hdbschema hdbdd

~~( I~

.xsjs

Access Objects XS Application Access (.xs.ccess)

0 0

XS Application Descriptor (.xsapp)

I

<7>

I

< Back

I

Next>

Finish

II

I

Cancel

Figure 37 Create Access Objects

The new project should be visible in the PROJECT EXPLORER tab (Figure 38). il SAP HANA ~mt.nt • C:\Windows\sy$tem32\SQl Con.501c 2 S)$tem: H08 Host 52.10.90.US lnst•ncc: 00 Conn«ted U~er: HOE'Y01Sf$tc:m Uwge Custom Syste~. File

Ecfrt

Navigate

. r:j •

~arc h

Proje
f!. · O ·

~

, .
He-lp

Window

e. .

• o;;> ¢> .

4

Q H.sdoop&les IHOB 1HOEV01 11o

=

0

8 ~

v

'I<> Systtms

HOE'M .Hadoopfumplc

M

JavaSc:ript Resources It\ SAP HANA System library

lil liioiali(

.

Quick Access

£0 Projtc:t Exploftr 13 !ill R<pos~orits

"'

(I! I W SAP HANA Administration Console:

IEJ SAP HANA Dcvc:lopment. l

=

a:! H08 • SQL Con~lt 2 I3

HOB (HDEVOl) S2l0.!10l3S OO

":.I O I !i

0

. . .

"' SQl

m. .xsaccess

1!/, .mpp

Figure 38 Project Created

33

Consuming Hadoop Data with in SAP HANA Smart Data Access

Consuming Hadoop Data within SAP HANA

2

Smart Data Access SAP HANA Smart Data Access (SDA) is a virtualization technique that enables SAP HANA to access Hadoop data (and other remote data) as if it were local tables (i.e., without copying data into SAP HANA). This section guides you through the steps in using SDA to give SAP HANA access to Hadoop data.

2.1

Setting up SAP HANA Smart Data Access

Setting up SAP HANA Smart Data Access requires that the necessary OCBC drivers be downloaded and installed on your SAP HANA system. In this section the following drivers will be installed: » Hive » Spark » Impala (Cloudera distributions only) The latest drivers and full installation steps can be found at: http://www.simba. com/connectors http://www.cloudera.com/content/cloudera/en/downloads/connectors/ impala/odbc/impala-odbc-v2-5-28.html Note The Simba drivers have a 30-day f ree trial.

The following are the simplified installation steps for SAP HANA.

34

I

2

Consuming Hadoop Data within SAP HANA Smart Data Access

Download ODBC Drivers

Log onto SAP HANA Linux command as root user, and execute the following statements to install the drivers: mkdir download ##HIVE ODBC Drivers cd $HOM E/download wget http ://www . simba . com/wp-content/uploads/2014/09/SimbaHiveODBC_ Linux-1 . 4 . 13 . zip unzip Si mbaHiveODBC_Linux- l . 4 . 13 . zip -d /opt cd /opt gunzip -d SimbaHiveODBCDr i ver_L inux -1 .4.1 3 . 1013 . tar . gz tar -xvf SimbaHiveODBCOr i ver_Linux- 1. 4 . 13 .1013 . tar

1/ff SPARK Ori vers cd $HOM E/download wget http : //www . simba . com/wp - content/uploads/2015/01/SimbaSparkODBC_ Linux-l . 0 . 4 . zip unzip SimbaSparkODBC_Linux-1 . 0 . 4. zip -d /opt cd /opt gunzip -d "SimbaSparkOOBC_l . 0 . 4. tar . gz " tar -xvf "SimbaSparkODBC_1 . 0 . 4 . tar " ## IMPALA Drivers cd $HOM E/download Wget https : //downloads . cloudera . com/connectors/impala_odbc_2 . 5. 28 . 1008/ Linux/SLES11/Clouderalmpa l a0DBC-2 . 5. 28.1008·1.x86_64 . rpm zypper in stall ClouderalmpalaODBC - 2. 5. 28 . 1008· l . x86_64 . rpm ##Check Installed ls /opt!simba ls /opt/cloudera Listi ng 3 Linus ODBC Drive r Installation Steps

The /opt/ directory should contain the subdirectories hiveodbc, sparkodbc and impalaodbc, shown in Figure 39. -"'

~-

"'_--~

-~

1=..-

__

0

~ J-~

Fi gure 39 Drivers Installed

35

I

2

Consuming Hadoop Data wi t hin SAP HANA Smart Data Access

Configure ODBC Drivers ini Files

Log in to SAP HANA as the root user, switch to the hdbadm user, an d then edit the example odbc.ini files using the following Linux statements: su cd vi vi vi

- hdba dm $HOME /opt/s imba/hiveodbc/l i b/64/simba .h iveodbc .i ni /opt/simba/sparkodbc/lib/64/simba . sparkodbc . ini /opt/cloudera/impalaodbc/lib/64/cloudera . i mpalaodbc . ini

Ensure the following are true for each ini file:

» Diver encoding is set to UTF-16 » Error message path is prefixed with /opt

» iODBC driver is commented out in favor of unixODBC For example they need to be generically set as: DriverManagerEncoding=UTF -16 Erro r MessagesPath=/opt//ErrorMessages/ #Gener i c ODBCinstL i b # i OOBC #ODBC!nstlib= # SimbaOM I unixODBC ODBCi nst li b= Listin g 4 Generic ODBC in i File Settings

For the Simba ODBC drivers it is necessary to store the .lie files in the hdbadm $HOME directory (Figure 40). The .lie files are sent via email when signing up for the trial.

Fi gure 40 Simba l ie Files Installed

J

2

Consuming Hadoop Data wi t hin SAP HANA Smart Data Access

Define Data Source Names

In this step we will configure the .odbc.ini file and create three data source names (DSNs):

» hive1 » spark1 » impala1 Log in to SAP HANA Linux as root then execute the following statements to edit each main ODBC ini file: su - hdbadm cd $HOME

vi $HOMEI . odbc . ini

Edit .odbc.ini as follows: [hi vel) Oriver=/opt/simba/hiveodbc/lib/64/libsimbahiveodbc64 . so Host~ Port=l OOOO Hi veServerType=2 Auth l~ech~2

[sparkl) Driver~/opt/simba/sparkodbc/lib/64/libsimbasparkodbc64 . so

HOST= PORT=lOOOl SparkServerType=3 Auth l~ech=3 UID~hive

PWO=hive ### SPARKSOL & HANA on l y support data from single schema at the moment ### create a new database in HIVE, avoid the default Schema=sparkhana [ i mpalal ] Driver• /opt/cloudera/impa la odbc/lib/64/libc louderaimpalaodbc64 .so HOST= PORT=21050 Listing 5 .odbc.ini DSN Settings

37

I

2

Consuming Hadoop Data within SAP HANA Smart Data Access

To enable the DSN settings to be picked up after a restart, the following file needs to be created: vi $HOME/ . custome r . sh

Ensure it has the following contents: export OOBCINI=$HOME/ . odbc . in i export SIMBASPARKINI=/opt/simba/sparkodbc/l i b/64/s i mba . sparkodbc .i ni expo rt CLOUOERAIMPALAINI=/opt/cloudera/impa l aodbc/l i b/64/ cloudera .i mpalaodbc . ini expor t OOBC!nstlib=/usr/l i b64/l i bodbcinst . so expo rt LO_LIBRARY_PATH=$LO_LI BRARY_PATH : /usr/ l oca l /lib expo r t LO_L IBRARY_PATH=$LO_L IBRARY_PATH : /opt/s imba /impa l aodbc/l i b/64 Listing 6 Define .customer.sh

After making changes, SAP HANA may need to be restarted. Execute the following to restart, if in doubt: . /HOB stop . /HOB start Activate Spark SQL on Hive Thrift Server 2 (Optional)

Spark SQL is a recent addition to Hadoop's list of projects. Spark is Hadoop's in-memory engine. Spark SQL enables data to be loaded into Hadoop memory for faster query handling. Spark is fast becoming the

most popular project for analytic reporting and predictive analysis in Hadoop. In future releases of SAP HANA (SPS10 and beyond) Spark will become the main integration point (see Section 5 for more information). Note The following steps are applicable for CDH 5.3, and may differ for later versions and alternative distributions of Hadoop as this area evolves.

Log in to the Name Node of the Hadoop cluster and execute the following Linux commands:

I

2

Consuming Hadoop Data wi thin SAP HANA Smart Data Access

sudo su cp /var/ run /cloudera-scm-agent/process/30-impa l a-IMPALAO/hive-conf/ hive -site . xml /etc/spar k/conf chmod +r /etc/spa r k/conf/h ive-site . xml vi /opt/cloudera/parce l s/CDH/ lib/spark/bi n/compute -classpath .s h

Edit the spark compute-classpath.sh and add the Hive library to the class path as follows: #Bu ild up classpath CLASSPATH="$ SPARK_CLASSPATH " CLASSPA TH=" $CLASS PATH : SSPARK_SUB~H T_CLASSPATH " CLASS PATH="$CLASSPATH : $FWDIR/conf " ##Add HIVE classpat h: CLASSPATH="$CLASSPATH: /op t /clouder a/parcels/COH/l ib/ hive/ l ib/* "

Now shut down the Hive thrift server and restart with the port 10001 , for Spark SQL to use. SPARK_HOME=/opt/cloudera/parcels/COH/lib/spark/ expo r t JAVA_HOME=/usr/java/latest cd SSPARK_HOME . /sb i n/stop-thriftserver . sh sleep 3 . /sb i n/start-t hri fts erver .sh \ --master spark : //$(hostnamel : 7077 \ -- hiveconf hiv e . server2 . thrift . port• 10001 -- hiveconf hive . serv er2 .t hrift . bind . host S(hostnamel

Check that the thrift server is running on port 10001: ne t s tat -nl I grep 1000 1

You should see something similar to Figure 41: ~ +-

-..__

:- -.-

-.

-. :

"'l

-~

.:-

~:.!:!:

-

- .

-~-"'-~-

- . -

---

.-

•-

-; -

: • : •

:_ :_ ~::..

Figure 41 Spark SQL Run ning on Thrift Server

You should test that both Hiv·e and Spark SQL are accessible via the thrift server with the following beeline commands:

39

I

2

Consuming Hadoop Data with in SAP HANA Smart Data Access

##H i ve test /opt/cloude r a/parce l s/COH/ li b/spark/bin/bee l ine -u "jdbc : hive2:// $(hostnamel :l0000/defau lt hive hive " tiff Spark test /opt /cloudera/parce l s/CDH/ lib/spark/bin/bee l i ne -u "jdbc : hive2 : // $(hostname) : l0001/defau lt hive hive " Test DSN from Command Prompt

Before trying to add a remote source in SAP HANA you should check that DSNs are accessible. Log in to SAP HANA Linux as the root user, then execute: su i sql isq l i sql

hdba dm -v hi ve l -v sparkl -v impal al

Next test a basic SQL statement for each DSN (Figure 42): show tables select count(*) f rom samp l e_07

Fi gure 42 Test Connect ion and Basic SQL

2.2

Create Remote Data Sources and Virtual Tables

Now let's create the three remote data sources shown in Table 1.

40

J

2

Consuming Hadoop Data within SAP HANA Smart Data Access

Source Name

Adapter

Data Source Name

CDH _HIVE

Hadoop(ODBC)

hive1

CDH_IMPALA

Hadoop(ODBC)

impala1

CDH_SPARK

Hadoop(ODBC)

spark1

Table 1 Remote Data Sources

Note For these examples the username and password are "hive".

In SAP HANA Studio create a NEW REMOTE SOURCE (Figure 43). ~ SAP HANA Development • C:\Windows\Syst.,.,32\SQL Console 3 System: HOB Host: 52.10.90

Edit

file

C:S ·

Navigate Search

""

CD Project ...

Project Run Window Help

"tl- · 0 · 0 4 • "0 00 ,!; ~ Reposito... ?-o Systems ~

= l:l

HOB SQl Con:.ole 1

..,.

a:

HOB (HDEVOl) S2.lo.90.

• ~ HOB (HOEVOl) ~ liiP Catalog 4

· ~ <:

Quiclc Access

!f;f · I ~ il • llll @ 8 ~ ~

. .

O SQLL

C:. Content G:io Provisioning ~ Smart Data Access • 'G;; R ;-·• -<·.-·~ ~ tTl 1:3 Open SQLConsole

~ ~

Secu Q.

{fj; HOB (SY

£]

N!W Remote Source...

Rtfr6h

FS

I

Fi gure 43 New Remote Source

Enter SOURCE NAME, ADAPTER NAME, DSN, USER NAME, and PASSWORD for each, as shown in Figure 44. Once you have saved all three they should appear in the Remote Sources list. Next we will create three virtual tables for each of the Remote Sources, using sample_07 (Figure 45).

41

J

2

Consuming Hadoop Data within SAP HANA Smart Data Access

1m: SAP MANA Devdop~t • Remott File £dit

Sc~rch

N4Vigotc

[j · ,..~

SourcH Sysum: HOB Host: S.2.10.90.135lrutanc:t: 00 Conne-cted User: HDEVOlSysttm Usage-: Custom System· SAP HANA Studio

Proj«t Run

Window

f1. · 0 · 1:!> 4' • ' C)C> ~ Syttems !:!

R•

'!
llfi · l eJ 11 · 1111

• lfH HOB (HOOOl)

~

Cl

.:3 '4

... Catalog

@)

~

i!i •

&. ·HOB · New Rtmote Source 1

;

IJt I ~ SAP HANAAdmink.llationConsolc IB SAP HANAOcvdopmtnt I~J Java

Zl a:3 "HOB· SQL Console 1

Gl H08 - SQL Console 3

0

Cl

HOB (HOEYOl) S2.10.90J3SOO Adapt~ Name:

Source Nome:

U. Content

c:.

H(:lp

Qu•ckAcc~s

I ''· P

2

COH_HIV~

Source location: ""' inde:uervtr

HAOOOP (008()

,. . . Provisioning

&g Smart Data Access "' Q. Rtmott Sources ~

ill

Property name

, Q. HAOOOP. SOURCE,COH W. Security

Voluc

"' Conoe
HOB (SYSTEM)

Connection Mode • Otta Source Neme: Data Source Name • hivel "' CredC'fttiah: Credtnti.als Mode • Technlcal User Ustr Name •

hiv e!

Password •

Cl 0 tf"rors. 2 warnings.. 0 othm.. Rtsc

Description ~

Searc.h

A Warnings (2 itM'Is)

t l,

Fi gure 44 Sample Remote Source Settings

[Q ProjKt bplott-~

f01 Repo~itotit>S Jo Syuems

£3

, :t1! HOB (HOEVDI) a. 1ii.t C•t•log W. Conttnt

"' .:to Pf'O\·isioning ~ SmattO.taAcceu "' • Remo~e Sour<~

,. (l. COH_I·UVE "' 0 HIVE "' 8. default ; fU> sa~~

1::J ,.) lt! 4

CL COt~JM~~~ ::!

• 0

Optn SQL Comole Add n Virtu.t

-

T~le

IMPAlA :-' ~'-R_<~_"_'•

___________r_s_.

"' & def1ult f:) wmple_07

F S

sampae,..06

spa rkhana

• ~ COI{.SPARK

• 0 SP.WC £ dd•ult !!. SPARJ(_SVSTEM S spaMiana

.~ ~Ann£2 ;.~~t;O~nl. Fi gure 45 Add V irtual Table

42

Consuming Hadoop Data within SAP HANA Smart Data Access

Note For CDH _SPARK use t he copy of sample_07 under sparkhana database.

Create virtual tables in the HDEV01 Schema (Figure 46). Ia Create Virtual Table

I

I liD liooiJ.II

Create Virtua l Table

CD

Enter table name: and select target schema

Table Name:

CDH_HIVE_sample_07

S
IHDEVOl

· I

(1)

I

Create

II

Cancel

I

Figure 46 Set Virtual Table Name and Schema

The three virtual tables should appear under HDEV01 schema (Figure 4 7). £D Project bplorer 4

ll}) Repositories

~" Systems E:!

~ill HOB (HDEVOl) • B Catalog ~ Public Synonyms ~ ~ HANA_XS_BASE

e

•

a HOEVOJ. ~

e, Column Views

~

e, EPM Models

~

11:. EPM Query Sources

~

a ~ a

Functions Indexes

t> "

Procedures

~ I<;

Sequences

~ ~ Synonyms

• _. Tables 1:61 CDH_HNE_>ample_07 CDH_HIVE 1:61 CDH_IMPALA_sample_07 CDH_IMPALA liJ, CDH_SPARK_sample_07 COH_SPARK

Fi gure 47 Three Virtual Tables

43

I

2

Consuming Hadoop Data within SAP HANA Smart Data Access

2.3

Execute Simple Queries

With the virtual tables created we can now test whether they are returning results. Execute the basic SQL SELECT statement (Figure 48). select* from "HDEVOl "." CDH_HIVE_sample_07" : select* from "HDEVOl "." CDH_IMPALA_sample_07 " : selec t * from "HDEVOl " . "CDH_SPARK_sample_07 ": HOB (HDEVOl) 52.10.90135 00 IIll SQL

(f: R•sult l~ R•~uk

select • froo '"HDEV91"' . •cott_SPARK_sample_e7"

,.

cod ~

description

00·0000 11·0000 11-1011 11-1021 11-1031 11-2011 11·2021

All Occupations

1 1 -~ 11

ll.dmi.nictum..-. VIVi ,.. - c . _ ]:tQ___¥10

total_emp

s.alary

134,354,250 Management occupations 6,003.930 Chid executiv~ 299,160 General and operations ... 1,655,410 61,110 legislators Advertising and p romoti... 36,300 Marketing managers 165,240 322,170 11·2022 Sales managers 47,210 11·2031 Public reJations managers

40,690 96.150 151,370 103,780 33,880 91.,100 113,400 106,790 97,170 _J£,~7fl

State&ent ' sel ect • fro. "HDEV91"' . "COH _HIVE!sa~r.p le_e7 • '

s uccessfully executed in 20 . 507 seconds i {U rver processing ti.Mie: 19. 367 seconds) Fetched 823 ro...·(s) i n 4.646 secondS ( s ef ver processi ng tin:e: e ms 517 }lS ) Stateunt ' sel ect • free" oevet• , "cOH_~LA sample_e7•• s uccessfully executed in 602 s 284 .,as (server processing tice : 347 ms 506 v..s) Fetched 823 row( s) i n 982 111s 255 IJ-S (server processi ng t i.n:e: e ra.s 446 }IS) State.:ent · sel ect • f roe " oeve ~ "CDH SeARK. sample 07"· s uccessfully executed in l _Ll_3_8-....$econds- (se r.;er proCessing tir.e : 909 ms 369 v.s) Duration of 3 statements : 22 . 248 seconds Fetched 823 row(s) in 826 MS 699 J.lS (server processing time: 6 I'IS 529 }IS)

Figure 48 Run times for Basic SQL

Now run a slightly more complex SQL statement (Figure 49): select sum( "salary " ) from "HDEV0l "." CDH_HIVE_sample_07 " ; select sum( "salary ") from "HDEV0l "." CDH_!MPALA_sample_07 ": select sum( "salary " ) from "HDEVOl ". "CDH_SPARK_samp l e_07 " :

44

2

Consuming Hadoop Dat a within SAP HANA Smart Data Access

HOB (HDEVOl) 52J0.90J35 00

I::e

IE! SQL Resu~ Res~ ~~ select sum( "salary" ) from "HDEV91" . "CDH_SPARK_ sample_97"

_@

SUM(solary)

39,282.210

l Stateme nt · select sum(" salar y" ) from "HDEV91" . "CDH_ HIVE_ s ample_07" • successful l y execute d i n 28 . 143 seconds ( server processing time: 27 . 766 seconds ) Fetche d 1 r ow(s ) i n 0 ms 14 ~s (se rve r p r ocessing time : 9 ms 9 ~s ) Statenoe nt ' select sum(" salary "} f r om "HDEV01" . "CDH-t'' IPAl A.sample_07 " ' succe ssfully executed in 758 O'S) 969 ~s (server processi ng ti11e : 503 ms 398 Fetche d 1 r ow( s ) i n 0 IllS 19 flS ~ server pr<>cessine time: e ms 0 flS)

~s}

Stateme nt ' sele ct sum(" sal a r y· ) from "ljDEV01 •. " CDH_SPARK_s a~ple_97" ' successful l y executed i n L411 secondsj ( serve r pr<~cessing t ime : 1.113 seconds ) Duration of 3 staten:ents: 30 . 320 seconds f etche d 1 row( s ) i n 0 ms 19 IJS ( serve r pr ocessi ng time : 0 ms 0 IJS)

I

Figure 49 SQL Runtimes

Figure 49 shows the results on a tiny single-node Hadoop cluster; Impala and Spark significantly outperform Hive. Note As of SPS07, SDA with Hive has supported a hint that instructs Hadoop to store/cache the resu lts of the hive query so subsequent SQL calls read the cache, rather than re-executing a Hive MapReduce. An example would be:

select sum( "salary ") from "HDEVO l" . "CDH_HIV E_samp l e_07 " with hi nt (USE_REMOTE_CACHE) .

Note In newer releases of Hadoop, Hive can also be configu red to use Tez, which promises significantly faster performance over existing Hive runtimes, but is not available on CDH to bench mark yet. With the rising importance of Spark SOL as t he new de facto standard for analytics on Hadoop, it is not clear whether Hive on Tez will be a common source for SDA.

45

I

2

Consuming Hadoop Data wi t hin SAP HANA Smart Data Access

2.4

Example SDA Performance with Three Billion Rows

The earlier steps used very small files. Small files are great for providing simple examples but Hadoop is designed for big data. The following steps illustrate how to create much larger datasets for testing purposes. Fortunately TPC-DS provides tools for generating a large dataset for benchmarking. Cloudera has created simplified steps for generating a TPC-DS in Hadoop, and also for converting to Parquet file format for optimized reads. https://github.com/cloudera/impala-tpcds-kit The following demonstrates how the remote data source and the Hadoop file type (text file vs Parquet[Column store] file) greatly impacts runtimes. Create 12-Node Clusters and 3 Billion Records of Data

First, re-execute the steps in Section 1.3 to create a larger 12-node cluster using AWS instance type r3 .2xlarge (60Gb Mem and 8 vCPU per node). Next the SAP HANA DSN hostnames, defined in Section 2.1, were repointed to the new cluster. Then log in to the NameNode and execute the following statements: ed $HOME sudo yum -y insta ll gee make f l ex bison byacc git git clone https : //g ithub . com/grahn/tpcds - kit . gi t ed tpcds - kit/too l s ma ke -f Makefile . s uite cd $HOME git clone https : //github . com/cloudera/impa la - t pcds - kit cd impala-tpcds-kit s udo -u hdfs hdfs dfs -mkdir /user/SUSER sudo -u hdfs hdfs dfs -chown $USER /user/SUSER sudo -u hdfs hd fs dfs -chmod 777 /user/SUSE R List ing 7 Statements to Prepare for TPC-DS

I

2

Consuming Hadoop Data within SAP HANA Smart Data Access

Then copy your AWS .PEM file, used to create the cluster, to the $HOME directory, using an FTP tool , and execute the following statement: chmod 400 $HOME/HAOOOP _MJS_VPC . pem

Edit the tpcds-env.sh file for 3 billion records, for a 12-node cluster. vi tpcds · env . s h ##set f ollow ing in tpcds -env. sh export TPCOS_SCALE_FACTOR=l OOO expor t OSOGEN_NOOES=l2 expor t OSOGEN_THREAOS_PER_NODE=8

Edit dn. txt with the internal n ode names created by Cloudera Director. Note If using AWS the following files may also need to be updated to use the PEM file and ec2-user to enable copying files across nodes in your cluster: ..,. push-bits.sh ..,. run -gen-facts.sh ..,. set-nodenum.sh Add the follow statement in the above Shell Scripts files, as appropriate:

- i $HOME/HAOOOP_AWS_VPC . pem ec2 - user@

Next, run the following statements to copy the code across nodes and generate the data: . /push-b i ts .sh . /se t -nodenum. s h . /hdfs -mkd irs . s h . /gen-d ims .sh . /run -gen - facts . s h

Then run the following statements to create Parquet format tables, and populate them: . /impala-create-externa l -tab l es . sh . / i mpala · l oad · dims . sh . /impala- l oad-store_sa l es .sh

47

I

2

Consuming Hadoop Data within SAP HANA Smart Data Access

Note Apache Parquet is a columnar storage format available to any project in the Hadoop ecosystem, regard less of the choice of data processing framework, data model, or programming language.

Test Runtimes

Once the data has been loaded to Hadoop, refresh the remote source tables in SAP HANA (Figure 50). £D Project Expl...

l{j) Re.positories

't-o Systems !:!

= El

® ~ l eJ u ~ D rB a ~ ~

&

Remot~ Sources

• Q.

COH_HM

• 0 ~ ~

HIVE

& S

ddault tpcds_parquet

~ customer

0 custome:r_addre:ss I:J customer_demographics f":l date_dim C et_cus-tome.r ~ et_custome.r_addrtss ~ et_customer_demographics

rJ

et_date,dim

~ et_household_de.mographics

~ et_inventory

!:! et_item C et_promotion I:J et_st ore tLL'i et_store_sales

e!J et_time_dim f::l household_demographics El item

5

[:l promotion

!::l store Fi store_sales l'l time..dim ~ b

Q. Q.

COHJMPALA COH_SPARK

r---~·wft~~~·~Qw O~ O~ D~ CO ~l~ JOw C<~C~Q~ U~-----, Ill ~ t

( (

Figure 50 TPC-DS Tables Are Listed

Next create virtual tables for the store_sales [parquet format] and et_ store_sales [table delimited] tables (Figure 51).

2

Consuming Hadoop Data within SAP HANA Smart Data Access

[Q Proje
Y.o Systems !:!

= 1:1

® · I G!! U · tm !S CJ ~ .,.

• [P'U HOB (HOEVOl) • & Catalog flo

Q. Public S-ynonyms

~ ~ HANA_XS.BASE

• J:l HOEVOl t> ~ Column Views ~ Iii;

EPM Models

~

ro. EPM Query Sources

t>

do Fundions

~

r;;. lndtxes

t> . , Procedures 1> ; .

Sequences

~ "' Synonyms • lii; lablos l;al COH_H!VE_et_store_sales CDH.HIVE ::i) COH_H!VE_sample_07 COH_HNE

I;& COH.HIVE_Itorc.wlc$ COH.HNE i:.9l COH_IMPALA_et_store_sales COHJMPALA ::i) COH_IMPALA_sample_07 COH_IMPALA !;{jl COH_IMPALA.store.sales COH_IMPALA

l;al

COH.SPARK.sample.07 COH_$PARK

mJ MR_RESUl lS Figure 51 Virtual Tables for Store Sales Created

The Store Sales table has approximately 3 billion records. Finally test SDA fo r each of the Remote Sources by executing the following sample SQL statements: select count(*) from "HOEVOl" . "_store_sales ":

select "ss_store_sk ", sum( "ss_net_profit ") sum_agg from "HDEVOl "." _store_sales " where ("ss_so l d_date_sk " bet~1een 2451149 or "ss_sold_date_s k" beb1een 24515 14 or "ss_sold_date_sk " between 2451880 or "ss_sold_date_sk " beh1een 2452245 or "ss sold_date_sk " beh1een 24526 10 group by "ss_store_sk " order by "ss_store_sk " limit 100;

et_store_sales and 2451179 and 2451544 and 2451910 and 2452275 and 2452640)

Listing 8 Sample SQL Statements

49

I

2

Developing and Execut ing Hadoop Map Reduce Tasks from SAP HANA

The runtimes of the queries a.re listed in Table 2 . Text File

Parquet

et_store_sales

store_sales

Simple SQL

25 sec

12 sec

More complex SQL

45 sec

7 sec

Si mple SQL

10 min

26 sec

More complex SQL

3.3 min

22 sec

Impala

Spark SQL

Tab le 2 SQL Query Runtimes

As you can see, there is a significant difference in runtimes for a text file versus a Parquet file. This demonstrates the importance of using Parquet file formats in HDFS for SDA to get optimal runtimes.

3

Developing and Executing Hadoop MapReduce Tasks from SAP HANA

MapReduce is the main programming model within Hadoop. This section guides the reader through the steps for creating and executing a custom MapReduce task from within SAP HANA.

3.1

Configuring Environments

In order to build and launch MapReduce jobs from SAP HANA it is necessary to prepare both your SAP HANA system and Hadoop system to enable smooth integration between them.

50

I

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Prepare Hadoop

As per the SAP HANA Adminstration Guide, Section 6.1 .2.4 (http:// help.sap.com/hana/sap _hana_administration_guide_en. pdf) there are several prerequsite Java libraries needed by the HANA controller before it can execute. It is necessary to download the following open source libraries and copy them to the HDFS:

» solr-commons-csv-3. 5.O.jar » joda-time-2.3.jar Via Linux command line, on the NameNode of your Hadoop cluster, execute the following commands: wget http : //central . maven . org/maven2/joda - time/joda -time/2 . 3/joda -time 2. 3 . jar hdfs dfs -put joda-time-2 .3 . jar /sap/hana/mapred/ l ib wget http : //central . maven . org/maven2/org/apache/solr/solr -commons -csv/ 3 . 5. 0/solr-commons-csv-3 . 5. 0 . jar hdfs dfs -put solr-commons-csv-3 . 5. 0. jar /sap/ hana/mapred/lib Listing 9 Commands Requ ired for Jar Files

You also need to ensure the Hive Metastore server is running. On a Cloudera Hadoop cluster you may need to enable the server. In Cloudera Manager (refer to Section 1.3) access the Hive Instance settings and add a role instance (Figure 52): Assign a cluster node to be used as the WebHCAT server (Figure 53).

51

I

3

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

cloudera·manager

~ Support -

fie

Home Cluster.> .. Hosts D1agnosbcs • Aud1ts Charts •

Backup •

.L admln -

Admm1strat1on •

C5-Simo!e-AWS

HIVE-1

Stotus

l nstonces

Conllgurotion

Commonds

Chorts ~lbr.uy

Audits

Role Instances

0

Ill ACtions •

MO Role Instances

Filters

Ill ACtions tor Selecteo •

Role Groups

25

Display

• Enttles

v SEAR CH

• Role Type

0

• sme

NIA

Gateway

STA TUS

v

• Role Group

IQ· 112 -~ 1 -2Q-1~ !.!~-\~~~-

Gateway Oetaun Group

2 ~QmQyt ~ int~rnSll

0

0 None

0

• Host

t::l~~ M~i~lgrt

~

0

z~21DQ~I~ iDi~mal

Hive Meta st ore s erver o eraun GrCiup

lg-172-~1 -2Q-1~ !.!$:·\~~~ -

HlveServer2 C!etaun Group

Starteo

2 '2au;um: I!Utulat

Good He•llh !:::!~~~r2

) DECOMM ISS IONEO

!Q-l ZZ-al -22-l~ !J~-~~~ Started

) MAl NTE NANCE MODE

) RACK RO LE GROU P

Figure 52 Click Add Role Instance to Add WebHCAT Server

cloudera·manager

Support - .L admln -

Add Role Instances to HIVE-1 Customize Role Assignments You can specl1y the role assignments tor your new roles here You can also view the role assignments by host.

AM:@fiji

Hive ~-Ns

Htve Metastore Server

setect hosts

1(

1

'1'11-Cs w ebHCat server

x::

1 NeVI

ip·172·3 1·20 -158.US·WeS1·2.comp

H Back

HU

Htveserver2

5eiect hOSts

1(

1

G Gatew·ay

1(

1

select hosts

DfJEJ

Figure 53 Assign Host to W ebHCAT Server

52

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

3

J

The WebHCAT server should now be running (Figure 54). Ill ACtiOns ror seaected • • Role Type

0 0

0

Display

... St3tt

25

• Role Group

• Host

Gateway

N/A

!p-172-31-20-1~8

ys-west-2 compute IOJemal us -v~est.2

compute internal

Gateway oerault Group

H ive MetastOfe Server

Startecl

ip-172-31 -20- 158

Hivesetyer2

Sta rted

ip-172-31-20-158 us-\•1@$1-2 tompute lntemal

Hiveserver.! oerault Group

W<;;bHcat server

Started

jp-112-31-20-158 us-y:est-2 tommue internal

WeDHCat server oeraun Group

Hive Metastore server Default Group

Figure 54 WebHCAT Server Running

Note A productive Hadoop cluster may have more restrictive access requiremen ts so the following settings may need t o be finet uned furt her to meet requirements.

Depending on your existing duster configuration the following settings may also be needed: » HDFS hdf s -site . xml df s . permissions . enab l ed

• Entries

f al se

core -s ite . xml hadoop . proxyuser . hi ve . hosts = * hadoop. proxyuser . hive . groups = *

» HIVE hi ve -site . xml hive .se rv er 2. ena ble .i mpe rsonatio n = true webhcat- si te . xml (Cloude ra On ly : Sa fe t y Valve )

temple t on . li bjars /opt/cloudera / parcels/C DH/ja r s/zoo keeper -3 . 4 . 5cdh5 . 3. 4 . ja r< / val ue> <desc ription>Jars t o add to t he classpa th .

53

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Note

After these settings have been made then the Hadoop Services will need to be restarted to take effect. Prepare SAP HANA

In SAP HANA SPS09 , Hadoop Map Reduce Integration is an optional addon feature, not included as part of the base installation. First you need to download the add-on from the SAP Software Download Center (Figure 55): https://support.sap.com/software.html. SAP SOFTWARE DOWNLOAD CENTER SEARCH RESl1TS IN SN> SOffiVAAE DOWNLOAD CENTER

HAN& HAQQQPCON!BOllfB I 9

t'..tll:~flal'
C (4 HAI!!AHA{)()QfiOJ 809

l · Zoo
P~t "

Figure 55 SAP Service Market Pl ace Download Center

Download and extract the ZIP file on your local machine. The delivery unit then needs to be imported into SAP HANA.

Go the SAP HANA application Lifecyle Management tool. Under the Products/Delivery Units menu import the delivery unit (Figure 56) (http:/ l/sap/hana/xs/lm/). Once successfully imported it should be visible under the list of delivery units installed (Figure 57).

54

J

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Figure 56 Import Delivery Unit

&' HAllA AJ)pll(.auon trrecycte uanagement HOME

PROOUCTS

PrOducts

+

[E] [E] [Ei] [5) [E) [E] (E:i) [Ei] (E) [E] [E] [Ei] [§] (5] [Ei]

[E] [E] [Ei] [E] [Ei]

TRANSPORT

Delivery Untts

Create

R Sa-.re

SETIINGS

MANAGE SYSTEIA

Packages

tii Oetete 0,

! 1moon y

HANA_HAOOOP_CnR (sap.com)

~

Expon

Details N3me 1-«ANA._HAOOOP_Clt.R

HANA,.IDE_CORE (n o.eom) HANA..STUOIOJID8 tsao.coml

~Mor

HANA_ TA_CONFIG (sap com)

Oescriphon

Hadoop Controller

s ap.com

Version

SAP Sul)port Info

1.9 .1

HANA,.TEST_TOOlS(sap.eom) HANA_UI_I>ITEGRATION_OONTENl

HAW.._UI_ItlT~GAATION_SVC (sap. HANA..XS_BASE (sao com)

HANA..)(S_OBUll.S(sap.com)

Dependencies Outgoing Oependenci&s

HANA,.XS_EOOOR (sap.COm)

Status

Incoming Oeptndendes

Delivery Unit

HANA,.XS_IOE ($30 com) HANA,.XS_Liol (sap.eom) HCO_OEMOCONTENT (U~.tom)

No data

HCO_OEI.IOCONTEr.rT_NEXT(sapciO HCO_WU (..p.eom) HOC_.ADIJlN ($ap,eom)

Assigned Packages

HOC_IOE_OORE (sap.com) HOC.J(S_8ASE (sap.com) HOC_XS_u.t (sap.oom)

I < Assign I

SAPOEVS_I
tl

[5] SAPOEVS_IOT (develoot-rs.sap.com

,.v. ""' . . .,,

1=1...

X

lkiaSSIOA

Cl\tct for Unassigned

Orional L-anguage

Name

Original language

s.ap.hanaJh3dOOP.OOt\tfOIItr

••

J

Figure 57 SAP HANA Hadoop Delivery Unit Installed

55

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Create a Remote Data Source

Now with SAP HANA and Hadoop prepared, the next step is to create a Remote Data Source. This will enable SAP HANA to communicate with the Hadoop cluster. In order for this to work you need to ensure that both the following Hadoop services are running:

» WebHDFS: Used for accessing the HDFS (e.g., reading and writing files on the Hadoop cluster)

» WebHCAT:

Used for accessing Hadoop metadata and executing and monitoring MapReduce jobs

Run the following SQL: DROP REMOTE SOURCE Hadoop_SOURC E_CDH CASCADE ; CREATE REMOTE SOURCE Hadoop_SOURCE_CDH ADAPTER "hadoop " CONF IGURATI ON ' webhd fs_url =http : //: 50070 ;webhcat_url =http: // :50111 ' WITH CREDENTIAL TYPE ' PASSWORD ' US ING ' user=hdfs ; password=hdfs '; List ing 10 SOL to Create a Remote Data Source

Note WebHDFS is typically running on the Hadoop NameNode on PORT 50070; WebHCAT typically uses port 50111 on the node specified in the Hive configuration.

After running the SQL the Remote Source should appear in the list (Figure 58).

I

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

3

1D SAP KANA Administration File Edit

NMVig, te:

Projt
Window HeJp Qurd: A<
f!l': · E!J II · 1!1 ~ ' 1!1. HOB (HOEVOt)

~

8 '!0 "

QJ. Catalog

in HADOOP BKMR.s.ql

g! I I~

SAP HANA Admini~ttation Console

IB SAP HANA Developm ent

C 'H08 • SQL Console 1 ~

0

HOB (HDEVOl) 5U0.90.BSOO

----......, IO SQl

a.. cont tnt

DROP REKITE SOURCE HAOOOP_ SOURCE_COH CASCADE ;

, Qt Provisioning Smart Data Acc6s "' ~ Remote Sources

5f

CAfAT£ REfo'I)Te SOURCE HADOOP SOURCE CDH

ADAPTER ..h adoop"

-

-

CONfiGURAriOH 'webhdfs_url • http: //: 50078;wtbhcat_url• tlttp: I /52.24.199.97: 58111 '

CL HAOOOP_SOURCE_COH ll. HAOOOP_SOURCE.H2

WITH CREDENTIAl TYPE 'PASSWORD'

USING 'use:r•hdfs;pass-f,rd•hdfs';

a. HIV£1

fl. IMPAtAl fl. SPAIUO.

~ · ~cunty

itJ

HOB (sYSTEM)

D

Proptttit1 $:3 0 £rrotlog

Property

"' Session Auto Commit Clie:nt Oistllibution holation l evel Single Fetch for Rtsult

On On A.e:ad Committed Off

"' Systtm

;ctb<:sap://52.10!JJ....5.l20:3001S:HO£V01

Writable

Smartlnsttt

7 : 23 '

Figure 58 Hadoop Remote Sources

3.2

Execute a Simple MapReduce Job (File Read)

SAP HANA has incorporated a very simple Map Reduce job in the Hadoop Controller that allows you to r ead a delimited file stored on the HDFS and display the results in SAP HANA. Create Comma- Delimited Test Files on HDFS

Initially make a few comma-delimited sample files containing the items listed in Table 3.

57

0

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Code

Description

Total Integer

Value Decimal

aaa

Row A

30

23.56

bbb

Row B

20

11

I

3

Table 3 Comma-Delimited Sample Files

Execute the following statements on the Hadoop NameNode to create two files with example data (file01.csv and file02.csv): echo "aaa . Row A. 30 . 23 . 56 " SimpleF il e/fileOl . csv echo "bbb , Row B, 20 , 11 . 00 " SimpleFile/file02 . csv

sudo hdfs dfs -put

/user/admi n/

sudo hd fs dfs -put

/ us er/admin /

Alternatively you could create the files using HUE. As you can see in Figure 59, once execution has been completed FILE01.csv and FILE02.csv should exist in the HDFS directory /user/admin/ SimpleFile. A

f-1 1Ue

Out:ry Ed1tors

v

Metntore Manager Workflows v

•

•

1!3 File Browser

0 Action$ "'

ft Home

user I admm

SimpleFi le

,. Name

Show

x Mo ~to trash

v

• H1story

-'

Size

..

..

J

Cl

file01.csv

21 1>,1105

Cl

file02.csv

21 1>,1105

45

User

Pennisslons

Oate

admin

drwxr·xr·x

July 09. 2015 06 36 AM

1001

adm1n

drwxr-xr-x

July 09 2015 06 51 AM

rool

admin

-t'l.'-f· · (-

July 09 2015 06 36 AM

1001

admin

. (W..(• .(-

July 09. 2015 06 51 AM

admin

• of 2 rtems

Group

e Trash

Page

Figure 59 Simple Files in HDFS

Double click on FILE01.csv to make its contents visible (Figure 60).

58

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

tC Ue

it

Query Ed1tors v

Metastore Manager

Workflows v

•

a C:C v

1!.1 File Browser ACTIONS

* Home

I user I admrn

SrmpleFrle

file01 .csv

P•ge

IVrewas Dlnary , EOrt trle

.!. oownroad

l:i View file IotabOn

Z Rerresh INFO

l ast modified

July 9 2015

User

root G roup

Fi gure 6o File Contents in HDFS

Create and Execute a SAP HANA Virtual Function

Now in SAP HANA let's create a virtual function, using the Hadoop Remote Source to read the file(s) (Listing 11). CREATE virtual FU NC T ION HOEVOl . HOFS_FI LE_REAO() RETURNS TABLE ( " code " VARCHAR(255), " description " VARCHAR(255 ), " total_INT" integer . " va l ue_OEC " decimal( l 0 . 2l l CONFIGURATION ' enable_remote_caching - true ; cache_va l idity- 3600 ; hdfs_ 1 o c a t i o n• Ius e rIa d mi nISi mp 1 e Fi 1 e ' AT Ha doop_SOURCE_COH ; listi ng

11

SOL to Create a Virtual Function

The function will now appear under the HDEV01 schema (Figure 61).

59

J

3

f)~

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

t1:;

SAP HANA A
fde f.d•t

t:faviga1e frojtd

~ndow

I

~ (~ SAP HANA Administration Consote

f.o Systems E3

0

fBl · ( !ill il · lm iS CJ ~

• !h. HOB (HOE\'01)

@I

c:J

IEJ SAP HANA Development

S •HOB • SQl Console 1 &3 1"1 HAOOOP BK MR.sql

HOB (HDEVOl) S2.!0.90.13SOO

"' . . Catalog

b Public Synonyms

0: HANA_XS.BAS£ "' _: HOEVOl Do

a. (olumn VitwS

~ &j.

EPMModtfs.

~ Qj, EPM Quf.()' Sources "' fii, Functions

['j} HOFS_fi.U!AO t> Sij. lndexes

ito Proceduns & Stqutnces C~o Qt. Synonyms t> Sit- Tables ~ &t Triggers ~ &to Views ~

Do

p.

;a iOT

.., PSA > .0:: SAPJ .;& SAP.)<S_LM_PE

- sYS

p. ~ SYSTEM

.l. t~t

_.'l UIS

A> Wn!

t1

!::felp

..description'" , • ,

successfully

ne:ce,.~ t: ed

c:J Properties &3

0

l

tOr

in 329 •s 464 J.lS (ser ver processing time: 67 ms 534 ps) • Rows Affect ed:

Lo~

@~

Proputy

Value

"' Session Auto Comm it Cli~nt Distribution bolation t~l Single frlch fOt Resutt

On On Read Committ~d

f:1

Off

"' System Smart ln~rt

Fi gure 61 Virtual Function Appears in HDEV01 Schema

Next execute the SAP HANA SQL to launch the Simple Map reduce task using the following statement: SELECT * from HDEVO l . HD FS_FILE_READ()

The results shown in Figure 62 should appear.

60

e

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

~ SAP HANA Administrarion Console • C:\Wi:ndows\syste:ml2\SQl Con~ 1 System H08 Host S2.10.90.US lnstan<~ 00 ConnKted U~ HOEVOlSystem Us.gr. Cus.. c

file fdit f!!Mgate f rojKt

~ndow

Quici:.Acces ~ Systems

!:1

0

!:i! U · o r.Ei s ~

®·

• 1l. HOB (HOEVOl) a. Cat•log

S IIWSAP HANA.AdministrationConsole IEJ SAPHANAO~tlopment =

C •HOB • SQL Console 1 S3: M HAOOOP BK MR.sql

HOB (HDEVOl)

a

4

~ Iii- Pvbi!C Synonyms • ,a! HANA)(S_BASE

SQl

...

code

~ EPMMoclc:ls

... EPM Qut.ry Sources

bbb

G Function5

0

52.10.90~3500

~ Rtsull

SELECT • f ....

,G HOE\'01 ... Column View$

4

@) ~

Jjelp

HOMt.HO•s.•ne_•oool

description

tot41jNT value;_OEC

Row A

30

23.56

RowS

20

11

lj

HOFS_ALE._R£AD liio lndetts .-. Procedures 1iit> S«!uencts

Q> Syl'lonyms Qt. Tables

.

iit Tri99"'

W VitwS -& lOT ~ 4 SAP_HANA_AOMIN ;(; SAP_HANA_IlEMO J,: SAP_HANA_EPM.NEXT ~ SAP.HANA..TEST all SAP.HANA_TEST.OEMO

.I SAP.REST_API Jl SAP)CS. LM

successfully execut~ in 6se IllS 41 J.lS (ser n r prc>cusing ti«e: 344 ms 497 JlS) Fe-tched 2 r
0

,I SAP,.XS..LM..PE

Propettln &3 0 Error log

Property

~ ~ svs

4

• ,a! SYSTEM

On On

Auto Commit

• .a! ttst ~

Value

Sess•on

.: urs

• ,G YnLE 4

Clitnt Distribution lsolation levd Single fete h for R~lt System

;dbc::sap-J /Sl.10.90.•. .5.220:3001S:HO£Wl

Read Committed Off Writable

] ,1 .

Smart lnsert

Fi g ure 62 Results of Simple Remote Fu nction

In Hadoop you can see that a MapReduce task was executed by the SUCCEDDED label that appears (Figure 63). t4l

ue

•

..

Query EditOrs v

•

•

e~

Job Browser

Usemame j3t..rch for us.emame

logs

Melutofe Man.er Wofttflows v

10

Text Seartt1 for text

N ame

1435991080510_0061 TempletonCon1rolledob

Status

Ustr

Maps

Rtduon

Queue root hdfs

Pri ority

Ouratio

20s

N/A

Fi g ure 63 Hadoop Job Browser

61

Developing and Execut ing Hadoop MapReduce Tasks from SAP HANA

3.3

Build and Execute a Custom Map Reduce Job (Document Scan)

In SAP HANA, version SPS09 and onwards, it is possible to write and deploy a custom Hadoop MapReduce job on a Hadoop cluster directly from SAP HANA. As a brief overview, the custom Java MapReduce is prepared in SAP HANA studio, activated and stored in SAP HANA as a packaged MapReduce Jar file. A virtual function is then created, linked to the MapReduce job and attached to the Hadoop Remote Source. When a se 1ect statement is executed against the virtual function then : 1 The packaged MapReduce Jar file is sent to Hadoop (via WebHDFS Service) along with an XMLjob instruction file. 2. The MapReducejob is executed in Hadoop (via WebHCAT Service), and

the results of the job are stored in a temporary directory on the HDFS, under /sap/hana/mapred/<SID>!!/output.

3. While the Map Reduce job is running, SAP HANA is continuously monitoring the job using WebHCAT Service. 4. When the job is complete, SAP HANA returns the results in the HDFS I sap/hana/mapred!<SID>///output directory. The files are then deleted. Create Sample Files to Scan

Initially make two sample files containing a list of names. These files will be used as input for the custom Map Reduce job. The custom MapReduce job will be a slightly more advanced version of the typical word count example. First, create two files and their contents:

» file01 : Jack Jill Bill Betty Betty Betty » file02: Jack Jill Harry Hillary Tess Tony Jack Jill

62

I

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

J

3

The execute the following statements on the Hadoop NameNode: echo "Jack Jill Bil l Betty Betty Bet ty " 1 sudo hdfs dfs - put - f / user/ adm i n/input/fileOl echo · J ac k Jill Harry Hi ll ary Tess To ny Jack Ji ll" 1 s udo hdfs df s - put -f - /user/adm i n/ i nput/f i l e02

Alternatively you could create the files using HUE. Once done, the FILE01 and FILE02 should exist in the HDFS directory /user/admin!input (Figure 64) .

~

File Browser

Search foe file name

ft Home

O Actons..,

I user I adm1n

input

v

• H1story

"

Size

~ Name

•

x Mo-.etotr3sh

1

•

Group

actmm

drwxr-xr-x

July~.

20t5 07 12 AM

1001

admin

drwxr-xr-x

July~.

20t 5 08 38AM

.(v.1..f-f-

July~.

20t 5 08.36 AM

July~ .

20t5 08 38 AM

fil eOt

33 bytes

root

adnun

D

m ao~

45 bytes

rool

odmm

• of 2 items

Figure 64 Sample

Dote

admin

D

Show 45

Penni55ions

9 Trash

Page

File~

The contents of FILE01 should then be visible (Figure 65). Create a Custom Java Project

Now we will create a custom Java project. Create a new Java project in SAP HANA Studio Developer perspective (Figure 66) by right-clicking anywhere in the PROJECT tab, then selecting NEW • PROJECT.

Developing and Executing Hadoop MapReduce Tasks from SAP HANA

it Que
t1 1Ue

l!l1

•

•

e ,;;-{;]

F e Browser

ACTIONS

it Home

user

admm

mput fil e02

of 1

P>9<

~

.c

,.

..c

JVIewas bOiary

r! VI<w ... lOCation

C Rerresn. INFO

modified

l~.st

J"' 9. l01S

User

'""'

Group

Figure 65 Sample File Contents

II§ SAP HAm OtYtolopme!lt • SAP HANA Slud10 file (dit tt~e s.... frcjo
10 ....... ~

-

~ · 0 · ~ 10' ·

13

'1ll"-···

j, s,....

...

. t!tlp

¢0v !<

. .

"'""'" Copy

Copy Quhhed N4Mt

-·

~ Ptstt

P.steSpec:..L.

RtnHne...

t.

,_..._

I

~

~

N~

...

H.AH6. Ocwlopmtnt

= c

Go into

,

IS W

=c

t '4

,...,.

gt I 4C SAP HANA Adtnitliie,.l.oOI'I C~lc

Q\lo
Cf H.doop&:.mplei (HOB (HDMil. 'HO£Wl.t

X

.

· ~ ¢ ·

F lllt•Shift•W •

-·

Ctri•V

-

O.ltlti.H4 l tblt

.J

"litll

Ctrf•C

Pr~t...

....

FCIIdtr

XS J~tScfliiC filt XSOOJUfilt

~ J• v•Scripc SoloWct ne ~

"

()Qmplc:•••

~ Oct'ltr-

CtrJ.N

[q)olt,••

Rdrt111

FS

Clo1oeProjc<1 CkneUnrtltted Proj«b Vtli.s.tt O.StbltJSlint

Profile As

"""''"' ·~.,

RtStort fwn lo
j

Lot off from W HANA System

r ..m Compare with

--<

Ccl'lftgurt

.. - . . .. .

Figure 66 Create New Java Project

jho!)tltt

~

e

C

RHourc.

G.t. (tl.it~gt Mtn.gtr ~

T • IC -' ~

til

~-

c

iJ

Surch ~tml>!

.

Ch• f19tiD:

-

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Choose JAVA PROJECT from the list (Figure 67). I

~(~ New

S..lect • wiurd

i @l ~

=

r

Create a Java project

....J

Wizards.:

type filter text " (Cp Java

~

~ Annotation

~

~ Class

(J Enum (i lllterface ~ Java Project

;I? Java Project from Existing Ant Buildfile ~ Java Working Sot

!If Package 53 Source Folder ~

tal> Java Run/Debug

.

~ ~ JU nit

-

®

< Back

I

Next>

I

Fmish

I

C.ancet

I

Figure 67 Cho ose Java Project

Name the project "javaMR_DocScan" and ensure you choose JAVASE-1!.7 in the dropdown menu next to the USE AN EXECUTION ENVIRONMENT JRE radio button. This will ensure that the Java project aligns with the Hadoop cluster version we are working with (Figure 68). Now, under JAVAMR_DocSCAN in the PROJECTS tab, you should see a folder called SRC. Right-click on this folder and select NEw • PACKAGE. Create a package called "org.hana.hadoop.mapred.sample" for the Java code (Figure 69).

I

3

Developing and Execut ing Hadoop Map Reduce Tasks from SAP HANA

ICi New Java Proje
'" =

a Java ProjKt

Create a Java project in the workspace or in an external ! ocation.

Project name: javaMR_OocScan [1) Use d~autt location Lo,.toon

Co\Apps\GBAMHB\HANA_WS_91BK2\j•v•MR..DocSc•n

Browse ...

JRE o Use an execution environment JRE: ..J

~I

IJavaSE-1.7

Use a project specrfic JRE:

com.sap.tdt.sapjvm.jre.win32.x86_64_81.0.0

{: Use d!fautt JRE (currently 'com.sap.ide.sapjvm.jre.win32.x86_64_81.0.0')

Configure JRf5.

Project layout Use project folder as root for sources and class files. t.O Create separat e folders for sources and class files

Configure default...

Working stts

0

Add project to worlcing sets

Select...

The default compiler compliance level for the current workspace is 1.8. The new project will use a project specific compiler compliance lev~l of 1.7.

L__<_B:..:•..:.ck'--..JL_:..: Next::.::..:>_....JI (..__F:..:in:..:is:..:h_..JI

_C.:.:•:..:"'.:.:•:..:'_....J

Ll

Figure 68 Project Name

66

3

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

lil:i J.va • SAP HANA Studio fil~

f.d"it

Source Retaqor f!!Mgate Sejrch etoj«t

·n ·

& ~ · ~ · o · (!I

~ltldow

Run

•

!:!tip

· <::> <::> ·

•

QuickAcceu

tl

Pacbge Explor~ t3

" 4

9

~

~ I ~ SAP HANA Administration Cons.ole ~ SAP HANA OevelopmMt ~

v c::::~ 1:1

<=>

0

g: outli ... t3

.

f,J javaMR_OocS<.tn

. ;.._

New GolntC)

' Ill • Ill

Open in New Window

Ql

Ope.n Type H1e.rarchy Show In

F4

Alt•Shift•W •

{ij (opy

(trl•(

!lil

Copy Quahfu:d Name

0

Ptstt

(tri •V

X

Ocktc

Otld.t

Bu~d

Path

'

Source

Alt•Shift•S •

Rdactor

Alt•Shift+T •

.......

Import...

<8'

Refresh

Export••• FS

...

Assign Working Sd:s" .

Profile As Debug As

Run As

0

Tum Comp.!lrewith

· L;

An outhne is not avatlable

~ Java Project

r3 !If

ProjKt... Pbckilgt

~ ( las.s (1 Interface (1 Enum

re

Annotation

es

Source Folder

~

hva Working Set

0

Folder

. 0

u

file

....

Untitltd Text File

rt:

JUnit TtstC.ISt

r3

b-•m~e:...

r3

Othtr...

~..

Res•ore from lo
Ctrt+N

Problems

~

0 c~ 2 waming~het~

'

Oe-cb at1

Rtsourcc

Otscrip6on

.. .

Validate

~ •

'

P.tth

W~rning.s (2 itt:ms)

[

"'

Fi gure 69 Create Package

The Java project should now be created (Figure 70). ~ Package Explorer ~

0

~

v ~ c:J

~ HadoopExamples (HOB (HOEVOI, 'HOE'IOI.Hadoop Examples')(

., fJ javaMR_Ooc'Scan 4

c:: rel="nofollow">

Hfdoopf.xample:s !HOB •HOEVOt HOE'VOl.Hodoopbamp!e$'1]

0

src '~

~ ~

org.hana.hadoop.mapred.sample JR£ Syst<m Library (JavaSf.l.7)

Fi gure 70 Package Created

Add External Hadoop Jar Files to Project

Before the custom MapReduce code can be written several Hadoop Jar files need to be added to the project.

LO<.rtion

Developing and Execut ing Hadoop Map Reduce Tasks from SAP HANA

The relevant Jar files can be copied from your Hadoop cluster to your local machine. Alternatively you can download from the Hadoop distributor's website. In this example the Jar files needed were downloaded from http :// search.maven.org/. Download hadoop-common and hadoop-mapreduce-client-core using the following links: http://search.maven .org/remotecontent?filepath=org/apache/hadoop/ hadoop-common/2.6.0/hadoop-common-2. 6.0.jar http://search.maven.org/remotecontent?filepath=org/apache/hadoop/ hadoop-mapreduce-client-core/2.6.0/hadoop-mapreduce-client-core2.6.0.jar In the PROJECT EXPLO RER tab add a new folder called "lib" and add the downloaded Jar files to this folder. The project should now look similar to Figure 71. ~ Package Explorer £3

~

d

~

v

c::)

~

"! HadoopElcampl•s (HOB (HDEWI, HDEWI.HadoopExamplos1) f£:J javaMR_OocScan • " src

e, org.hana.hadoop.maprt
21, JRE Syst•m libra')' (JavaSE-1.7)

• ~ lib

t!J hadoop-common-2.6.0.jar ~

hadoop•mapreduce·client•core•2.6.0.jar

Fi gure 71 Hadoop Jar Files

Next add the Jars to the build path. Select the downloaded Jar files, rightclick, then choose BUILD PATH • Aoo TO BUILD PATH (Figure 72).

68

I

3

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Edit SoUtcc

File

Refodor

Navi91tc

Proj«t

ScOlrch

· t~> e · · ~ · O ·

[3 ·

-

·-

1m Java • SAP HANA Studio

1\un

. .

~ .

1!1

Win dow

H
<::> ·

-

.

~ J ~ SAP HANA Administration Consol~

Qu1d:.Acct1s ~

F~

Padcage b:plorer l:t

v

~

= EJ SAP HANA Oevdopmtftt 0

1:1

(OJ . . . . .

1:1

g:: Outi. ~

&3

~ 0

1:1

~ HadoopExamplts [HOB (HOEVOl. 'HDEVOl.Hadoopb.amples))

4

ffJ javaMR_OocSun ~

f1J src 16 JRESystun l ibrarylh'-·aSE·l.7)

4

~ lib

~

An .outline is not avaJ~bl e.

I

~hadoop·common·2.6.0.jar ~

h•doop·mtprtcluct·clitn

New

'

Op~n

F3

Show In

Alt•Shift•W •

t!) Copy 1\,l Copy Qualifitcl Name

Ctri•C

~ Paste

Ctft•V

X

Oel~t:

Oelde Build Path

...

.u .,_<>

'

Rdactor

I

:l Add to Build Path

Alt•Shift•T •

Import...

&port... R.efr~h

FS

Assign Worting Sets...

c.

Validotc

... ..

Profile As

'

Debug As 1\un As

Team Com~rt with

R.t:placc Wrth

~~

Res.ourct

m> i-

'"

The proj ect should now appear as shown in Figure 73. Package Explorer !:!

8

~

"' =

El

~ HadoopExamples (HOB (HOEVOl, HDEVOl.HadoopExamples')(

• fJ:J jav~MR_OocScan • 0 src e, org.hana.hadoop.maprr:d.sampfe ~ ~

JRE System library (JavaSE-1.7(

• 5I\ Referenced libraries ~ t>

•

9 S

hadoop-common-1.6DJar · C:\Apps\GBAMHB'HANA_WS. hadoop· mapreduce·client· core· l.6.0.jar • C:\Apps\GBAMt

~ lib

II!) hadoop-common-1.6D.jar a!!) hadoop· mapreduce· d ient· core· 2.6.0.jar

Figure 73 Project Ready to Use

....

-

= 1:1

--

l ocation

)

Figure 72 Add Jars to Build Path

1:

v

Oe
. :

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

Create Document Scan Java MapReduce

Now let's create the five class files shown in Table 4. These will contain all the code needed for the custom MapReduce logic: Class

Purpose

customKey

Custom key, containing filename and word

customOut

Custom output, containing word count and word length

SimpleDocumentScanApp

Main excutable of Map/Reduce (not used by SAP HANA; used for manual execution on Hadoop if necessary for testing)

SimpleDocumentScanMapper

Collects the words per document

SimpleDocumentScanReducer

Summarizes the words used

Table 4 Classes Containing Map Redu ce Logic

Class files can be created by right-clicking on the pacakge ORG.HANA.HAOOOP.MAPREO.SAMPLE, and then navigting to NEW • CLASS (Figure 74). This will create an empty class file which you can rename according to Table 4. 8 ~

... = 0

1:

Package Explorer !:3

•

~ HodoopExomples (HOB (HDE\'01, 'HDE'IOI.HodoopExomp es')) [? iovoMR..DocS
"' G src 1

e, org.h ana.hadoop.maprHI.sampl

• 8 JRE System library (JovoSE·1.7J

" a\ Referenced librari6 1> 1>

9

r.=

Showln

hadoop·mapreduce·client·core·

I!! hadoop·common·2.6.0.jar 1(!.1 had oop· mapreduce· client ··co re·

• rB rS

Open in New Window

h&d oop· common · 2.6.0.jar • C:\

• a, lib

I

New

IB

Copy

Alt+Shift+W •

Ctrl•C

~ Copy Quo l ~ied Name

1'1'

Paste

Ctri•V

Java Proj&t

Project...

~ Package

&

Class

0' G'

Inte rface

Enum

Fi gure 74 Create Java Class

Repeat these steps for each of the five classes or simply copy and rename the first class file you created five times. The necessary code for each file will be inserted in subsequent steps.

70

I

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

The fil e structure should now look like that shown in Figure 75. I:

Packag• Explor
9

~

v

~ l:l

~ HadoopExamples (HOB (HOEVOJ. 'HOEVOJ.HadoopExampl•s'))

"' ~ javaMR_DocSc.an " ~ src

" morg.hana.hadoop.mapred.samplt ~

Ql

t>

Q) Q) Q) Q)

p.

t> t> ~

customKey.java customOut.java SimpleDocume.ntScanAppJava SimpleOocumentScanMapper.java SimpleDocume.ntScanReductr.java

!!!l JRE System Library (JavaSE-1.7]

t> a\ Rde.renced Libraries • "" lib

Figure 75 Java Class Fi les Created

Next populate each of the five class files with the code examples in Listing 12 to Listing 16. custom Key package org . han a . hadoop . mapred . samp le ; impor t impor t import import

java . io . IOExcept ion : java . i o. Datalnput ; j ava . io . DataOutput : org . apache . hadoop .i o .WritableComparable;

& Word public class cus t omKey i mplements Wr i tabl eComparable<customKey> private Str ing filename; private Str ing word;

II Custom Key for Mapper. made up of Filen ame

pub lic customKey(String filename. String word) { this . filename ~f ile n ame; th is . word ~word : public customKey( ) 1 th i s . filename c new String() ; this . word ~new String(); public String getWord ( ) return 1~ord :

71

I

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

publ i c void write(OataOutput out) th rows IOException I out .writeUTF(filename); out .writeUTF(word) ;

pub li c void readFields(Oatalnput in) throws IOException { filename = in . readUTF () ; word= in . readUTF() : pub li c int compareTo(customKey otherKey> 1 if (otherKey == nu l 1 ) return 0; i nt intent i f ( intent return else { return

• f i lename .compareTo(otherKey . f il ename) ; ! = 0) I

intent ; word .compareTo(ot herKey . word);

pub li c int hashCode() I return this . toStri ng() . hashCode() ; @Override pub li c String toString() I return fi l ena me+ • .• +word :

Listing 12

customKey.java

custom Out package org . hana . hadoop .ma pred .sample : impor t java . i o . IOException ; import java . io .Oatalnput ; import java .i o .OataOutput : import org .apache .h adoop .i o .Wri table ; //Custom Output fo r Reducer . in cluded Word Count and Word Length pub l ic cl ass customOut implements Writabl e I II Some data pr ivate int count :

72

J

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

pr ivate int length ; publ i c customOut() t this . count = new Integer{O) ; this . lengt h = new Integer(Q) ;

publ i c customOut(int count, int length) t this . count = co unt ; this . lengt h - length ; publ i c vo id write(D ataOutput out) throws IOException { out . writelnt(count) ; out . writelnt(length) ; publ i c vo id read Fi elds ( Da tal nput in) t hrows IOException ( count = in . read l nt() ; length= in . read l nt(} ;

Listing 13

customOut.java

SimpleDocumentScanApp

package org . han a . hadoop . map red .sample ; import impor t import import impor t import import import impor t impor t

org . apache . ha doop . conf . Configu ration ; org . apache . hadoop . conf . Configured ; org . apache . hadoop . fs . Path ; org . apache . hadoop . mapreduce .Job ; org . apac he . hadoop . mapreduce . li b. input . Fi le ln putFormat ; org . apac he . hadoop . mapreduce . lib . input .Textl nputFormat ; org . apac he . hadoop . mapreduce . lib . output . FileOutputFormat; org . apac he . hadoop . mapreduce . li b. output . TextOutputFormat ; org . apac he . hadoop . ut il . Tool ; org . apache . hadoop . ut il . ToolRunner ;

publ ic cla ss Si mp leDocumentScanApp extends Configu red implements Too l t publ i c static void main(St rin g[] args) throws Exception t i nt res = ToolRunne r . run(new Conf i gu ration() , new Simp l eDocumentScanApp() . a rg s) ; System . exit(res) ;

73

J

3

Developing and Executing Hadoop Map Reduce Tasks from SAP HANA

public int run(String[) argsl t hrows Exception I Job job ~ Job .get l nstance(new Con f i gurat i on()) ; job . setJobName( "Simple Document Scan ") : job . setOutputKeyClass ( customKey .class); job . setOutputValueC la ss(cus t omOut .classl : job . setMapperClass(S i mpleDocumentScanMapper . class) ; job.setCombinerClass(SimpleDocumentScanReducer . cl ass) : job . setReducerClass(S i mpleDocumentScanReducer .classl ; job . setlnput FormatClass(Text lnputFormat .cl assl ; job . setOutputFo rmatClass ( TextOutputFormat .class) ; FilelnputFormat .setlnputPaths(job . new Path(args[O)ll ; Fi leOutput Format . setOutputPath(job . new Path(args[l))) : job . setJa r ByClass(SimpleDocumentScanApp .class) ; j ob . submit() ; return 0 ;

Listing 14

SimpleDocumentScanApp.java

SimpleDocumentScanMapper

package org . hana .hadoop . mapred .sample ; impor t java . io . IOExcept i on; import java . util . regex .Pa ttern ; import java . util . regex . Matcher; import org .apache . hadoop.io .Text : import org .apache . hadoop.i o .I ntWritable ; import org .apache . hadoop . mapreduce . lib . input . Fi leSplit ; impor t org .apache . hadoop . mapreduce . Mapper ; ////////////////////l/1111111

//MAPPER //Read document , ignore some special characters and collect //words used in document File Name+ Word //Custom Key is /////////////////////////////

public class SimpleDocumentScanMapper extends Mapper

11.integrating Sap Hana And Hadoop.pdf

Overview

More details

Related Documents

Sap Hana Abap

11.integrating Sap Hana And Hadoop.pdf

Custom Abap Code Migrating To Sap Hana And S4 Hana.pdf

Sap Abap On Hana By Ravi Anand

115 Practical Sap Hana Abap Interview Q&a

Hadita-hana

More Documents from "Ariana Cazarin"

11.integrating Sap Hana And Hadoop.pdf

Murabahah