Thursday, August 29, 2019

Azure 2# - Join On-Premise and Azure data via SQL 2017 PolyBase

Starting with SQL Server 2016 PolyBase was introduced as an easy way to query semi-structured and non-structured data sources with T-SQL. In today's post I'm creating a demo to show this feature.

Step 1. Install and configure PolyBase

First of all, we have to install PolyBase. It is a feature that can be checked during SQL Server installation.


Make sure you install Java Runtime Environment (JRE 7 or above) in advance or your PolyBase installation will fail. JRE is needed to connect to Hadoop. In SQL Server 2019 RC1 the prerequisite of JRE has been removed. Instead you're taken to a page that downloads and install JRE 11. 


After SQL Server gets installed go into a query session and run the following commands:

EXEC sp_configure 'show advanced option', '1';
RECONFIGURE;
GO

EXEC sp_configure 'hadoop connectivity', 4;
GO

Note you must restart SQL Server service in order for the above configuration to take effect (run value turns from 1 into 4). You can confirm PolyBase is installed by executing the select query below. It should return 1.

select SERVERPROPERTY ('IsPolyBaseInstalled')

Step 2. Download and import source file

We wish to use unstructured non-relational data source. In searching for free samples of comma delimited csv files I stumbled upon this web site http://eforexcel.com/wp/downloads-18-sample-csv-files-data-sets-for-testing-sales/

It seems 1 million records is suitable for our demo. So go ahead downloading that file highlighted in yellow circle.


In SSMS, connect to your SQL 2017 server. You can use any database. I happen to have a restored copy of AdventureWorks2017 so let's import the downloaded CSV file into table named "1000000 Sales Records". Make sure data source is flat file and tick the file types to include .csv files. Check columns and preview to make sure the result looks OK.








Step 3. Store CSV file in Azure container

Login in Microsoft Azure Portal. Create a storage account by following this order:

Storage -> Storage Accounts -> New storage Account "polybasestoragesqlshack" -> Blob Containers -> mycontainer01

When the container is ready, click on Upload button to transfer the csv file to cloud.



Step 4. Configure PolyBase query

Follow the steps below to set up PolyBase. Note the password is shown as pound signs for obvious reason. Replace it with your own password.

USE AdventureWorks2017
GO

CREATE MASTER KEY ENCRYPTION BY PASSWORD='########';
GO

CREATE DATABASE SCOPED CREDENTIAL mycredential
WITH IDENTITY = 'credential', Secret = 'YMmkZJR2P4XwmIUhElJeZ88SpYaI3l/LR6BE7vbbPoyREvMYYHb5ioisNu6SXh1BeUbF1KJKuvlPc/RCfGbQYw=='
GO

The secret hash is copied from Azure Portal. You can obtain it by either way:

Portal: Click on storage account access keys, then copy the values from Key1.

MASE (Microsoft Azure Storage Explorer): Click on storage account properties, then copy the values of Primary Key.





CREATE EXTERNAL DATA SOURCE mycustomers
WITH (
    TYPE = HADOOP,
    LOCATION = 'wasbs://mycontainer01@polybasestoragesqlshack.blob.core.windows.net/',
    CREDENTIAL = mycredential
);


CREATE EXTERNAL FILE FORMAT csvformat 
WITH ( 
    FORMAT_TYPE = DELIMITEDTEXT, 
    FORMAT_OPTIONS ( 
        FIELD_TERMINATOR = ','
    ) 
);

CREATE EXTERNAL TABLE ExtTblSalesRecords ( 
Region varchar(2000),
Country varchar(2000),
[Item Type] varchar(2000),
[Sales Channel] varchar(2000),
[Order Priority] varchar(2000),
[Order Date] nvarchar(100),
[Order ID] nvarchar(100),
[Ship Date] nvarchar(100),
[Units Sold] nvarchar(100),
[Unit Price] nvarchar(100),
[Unit Cost] nvarchar(100),
[Total Revenue] nvarchar(100),
[Total Cost] nvarchar(100),
[Total Profit] nvarchar(100) ) 
WITH ( 
    LOCATION = '/1000000 Sales Records.csv', --  file is in the root
    DATA_SOURCE = mycustomers, 
    FILE_FORMAT = csvformat)

Finally we can run a query that brings local SQL table and Azure Blob data table together. The below record set is consisted of 5 rows of sales orders from Asia (Azure Blob) and 10 rows of sales orders from other non-Asia regions (on-premise).

SELECT TOP 5 * 
FROM [AdventureWorks2017].[dbo].[ExtTblSalesRecords] 
WHERE Region IN ('Asia')
UNION 
SELECT TOP 10 * FROM [AdventureWorks2017].[dbo].[1000000 Sales Records] 
WHERE Region NOT IN ('Asia')



Saturday, August 24, 2019

Azure #1 - Create a Windows 2019 VM in Microsoft Azure

This is the first post of an upcoming series for Microsoft Azure.

Assume you've got an Azure subscription account. The next first thing to do is to create a resource group. Login in to Azure Portal page https://portal.azure.com/. Click to create a resource at upper left-hand corner. A "New" page appears in the right pane. Click "Windows Server 2016 Datacenter" to launch vm creation page. This page has many vm related tabs. I'll go over each tab in the following sections.


On the first "Basics" tab, choose subscription. Underneath it either pick a resource group from the drop down list or click on "Create new" to create a new resource group. We are creating a new resource so fill in the pop-up windows with Name "ResGrp_Win2019". Hit OK. 


Next we need to give the VM a machine name. Note there're some restrictions for naming. You'll see them on the right side. I'd like to call it "TstSrvSQL2019_A". Unfortunately it didn't seem to accept underscore inside the name. The guide only says it doesn't allow an underscore at the beginning. More interestingly, the red error message does not sound like an error. If I remove "_" then all is green. I guess we just hit a small bug but let's move on.



Choose a Region from a list of 30+ options. We also want to create an Availability Set. Since no existing set exists we click on "Create new". In the right side pane fill in set name as "Set_Win_US_East" and leave default numbers of machines as is.




Next choose the right size for this VM. Then click OK.



In setting up administrator account, notice you cannot use "Admin" or "Administrator" as they are considered reserved words. For password, it must be between 12 - 123 characters long. So we create  "VMADMIN" to allow RDP access to all VMs we create. Select RDP port to allow inbound traffic. 

So much for the basics. Now click on "Next:Disks >" at the bottom of the page to move on to next section. 


On the "Disks" tab, there're fewer items to choose. We pick Standard hard disks instead of SSD disks. Then switch to networking tab. 



On networking tab, create a new vNetwork. Accept default vnet name. Choose both address space and subnets. Click OK.




Click on "Next:Management >" to move on to Mgt. tab. Accept default account for diagnostics storage. Enable backup. Create a new vault "vault888" and assign existing BackupGrp to it, Use Daily Backup Policy.


Temporarily pick rd as a tag on the "Tags" tab for now. Click "Review + create" to review all settings before creating the actual VM.



At last click on "Create" to kick off the VM building process. A few minutes later the deployment is completed.






Now we need to connect to the VM. On Azure portal page click on Virtual Machines then choose the new VM we just build. You can click on "download RDP file" or use your own RDP file. All we need are the machine's IP address and login name. Please use machine name\login name as the full login.



Upon connection we can confirm the VM specs. Surprisingly it's not Windows 2019 as claimed. It is still Win 2016 image.