How I Setup External Data Sharing for a Customer

December 8, 2021December 8, 2021hopefoley 1 Comment

I worked with a customer recently who had a need to share CSVs for an auditing situation. They had a lot of external customers that they needed to collect CSVs from for the audit process. There were a lot of discussions happening on how to best do it, whether we’d pull data from their environment or have them push them into theirs. Folks weren’t sure on that so I tried to come up with something that would work for both.

What I came up with is something that will grab CSVs in an Azure storage account, no matter where it lives. It then is pulled into their data lake in a consolidated ADLS Gen 2 location in a parquet format so they can do all the fun big data things on it in Azure Synapse in an aggregated data lake. It setups up an Azure Key Vault where you can store the connection string for the Azure storage account that contains the CSV files. NOTE – YOU NEED ELEVATED ACCESS TO STORAGE ACCOUNTS. You may need approval or separate storage accounts both on the sending side with the CSVs and the receiving end in your Azure subscription. GET PROPER APPROVAL/BLESSINGS before setting this up.

If you want to setup the full solution including the components in Azure. I’ve published the code to build out everything at https://aka.ms/DataSharePipeline. All you need to do is store the files located in the DataShare Full Sample folder locally. You’ll then run some or all the PowerShell scripts below.

Update the paramfile03.json file with the values of what you want to build or use. You can use existing pieces if you’d like, the script checks if it exists before creating all the pieces.
Load the CSVs to a storage account that you can get the connection string with the storage keys. Note the connection string as you’ll get prompted for it during the run.
Run ./”01 – Create Resources DataShare.ps1″ -filepath ./paramfile03.json to create all the Azure pieces.
Run ./”02 – GrantRightsDataShareCLI.ps1″ -filepath ./paramfile03.json to grant rights needed to admin and MSI for Synapse to storage and Azure Key Vault.
Run ./”03 – Create Pipeline Parts DataShare.ps1″ -filepath ./paramfile03.json to build the Synapse pipeline components.
Run the pipeline with the parameters filled in below and verify your CSVs have been processed into your data lake as parquet files. This pattern allows you to setup separate key vault secrets to pull CSV files from as many storage accounts as you need. There are a couple sample CSVs within the folder as well if you’d like to use them. They contain data for hospital capacity that were extracted from the site https://healthdata.gov/dataset/COVID-19-Reported-Patient-Impact-and-Hospital-Capa/4cnb-m4rz.

Parameter Name	Value details
ParamSecret	Name of the secret containing connection string
Container	Name of container holding CSV files
FilePrefix	All CSVs that begin with this text will be processed
DirectoryPath	directory path (i.e. foldername1/foldername2/)

Let me know if this helps you or you have suggestions to make it better/clearer.

How I Start my Synapse Analytics POCs

November 2, 2021hopefoley 2 Comments

I love my job! One of the things I do for a living is to help customers get started with new services in Azure to finagle their data. Many times we’ll start with a small POC to just start to understand the parts and pieces, and I teach them along the way. I work with a lot of customers so being quick and nimble helps. Lately I’ve been using PowerShell to setup the pieces needed for a full Synapse Analytics environment, including an example set of 4 pipelines (2 to extract to ADLS, 2 to upload to dedicated SQL pool). Pulling data out of large relational databases into the data lake became a request I heard over and over so I automated it. I’ve added and tweaked this over the years into a project I called “Synapse Load” and put a version out in my github.

https://aka.ms/SynapseLoadV2

I want to pick up where I left off in my last post (https://hopefoley.com/2021/09/27/powershell-in-the-clouds/) and show how I can use the code in the repo above to setup a full environment for Synapse including the pipelines below.

Download a SynapseLoadV2-master.zip from the repo above and extract them locally. We’ll setup the full sample so all the files we’ll need are in the SynapseLoadV2-master\03 Sample folder. Open the paramfile01.json file and edit all the values that contain <text>. For example here’s the values for the Azure SQL Server and DB after editing. You can supply existing resources and the create will skip them.

“azsqlserver” : “yoursqlservername”,

“azsqlDB” : “yoursqldbname”,

Open up Azure Storage Explorer and navigate to the file share Azure storage location for running CLI/PowerShell. I create a folder called SynapseLoad. We need to load all the files from 03 Sample folder except the *.sql files (SQL scripts will be run from SQL client of preference). Copy those files and put them into the Azure file share location for your CLI. Here’s a view of what you should see in your PowerShell session in Azure portal.

The paramfile01.json file we updated drives everything we’ll create in Azure. Let’s do one last look at what’s in it from the CloudShell to be safe. Type in cmd: get-content ./paramfile01.json

After you’ve validated it, we’re ready to run the scripts. The 01 – CreateSynLoadResources.ps1 will create all the components in Azure. We’ll run this in the CLI:

./”01 – CreateSynLoadResources.ps1″ -filepath ./paramfile01.json

Detailed instructions can be found in the readme at https://github.com/hfoley/SynapseLoadV2/tree/master/03%20Sample as well.

You’ll get a couple prompts to create SQL admin user/passwords and when complete verify in Azure portal the components completed. We need to run a script to update some rights needed to view data in Synapse. You run that in CLI:

./”02 – SynLoadGrantRights.ps1″ -filepath ./paramfile01.json

The Synapse pipelines are created by running the following script. This script updates the json files we need to build the pipelines and then runs after updates occur.

./”03 – CreateSynLoadPipelineParts.ps1″ -filepath ./paramfile01.json

You can now go into Synapse Studio (link in Azure Portal if you click on the resource). You should see the following pipelines.

You will need to run the scripts to create the metadata tables, the source table, and destination tables inside of the dedicated SQL pool. The steps in the https://github.com/hfoley/SynapseLoadV2/blob/master/03%20Sample/readme.md walk through this in detail. After you’ve set the sources in the metadata tables. The SQL Date Based Extract PL pulls base on date range. SQL Not Date Based Extract PL is driven by a where clause extract (i.e where Animal = 9111 in the sample). You can then upload the parquet files after updating the [ADF].[MetadataLoad] table with your parquet filename(s). The filename is the match to the table field values that are used to get target table information. One is for truncating/reloading patterns and one is for incremental loading.

I have a million ideas to make this better. What’s yours? Please let me know if you have feedback and if this helps you, I so want to hear it!

PowerShell in the Clouds

September 27, 2021November 2, 2021hopefoley 2 Comments

Part of my job is doing POCs with customers to help with Azure Data Services. Anything that helps me move quicker is helpful so I’m a tad bit obsessed with automating things. I have used PowerShell for more years than I’m willing to admit to help me automate what I can. There are a lot of ways to automate things like ARM templates and DevOps, but PoSH has been my preferred hammer. As much as I love it, I’ve ran into issue sometimes with installing modules locally on folks machines and not to mention if they have a Mac. I wondered recently if Azure Cloud Shell would help make things easier, and it very much did and I’m super pumped to share! This post will help run through how to get setup to run PowerShell scripts in Azure.

Navigate to and login to the Azure portal (azure.portal.com). Start Azure Cloud Shell by clicking the >_ icon in top right. I overlooked it forever so here’s a view of it below

If necessary change from Bash to PowerShell. It will require us to connect some Azure blob storage to it or create a storage account to use if you haven’t already. This post below gives further details on this (https://docs.microsoft.com/en-us/azure/cloud-shell/persisting-shell-storage).

If you’re not prompted to setup storage you may have already done so, and like me didn’t remember it. You can figure out the details of the storage account by using the command: get-clouddrive

You can navigate the same as you would locally. Type the command: dir and you should see at least one folder called clouddrive. Navigate to it by typing the command: cd clouddrive This is the location where you’ll be able to upload and interact with your files. Next we’ll upload a PowerShell script to this location to check for modules needed for upcoming blogs to create items in Azure.

Navigate to the storage account that is tied to Azure Cloud Shell. I like to use Azure Storage Explorer (https://azure.microsoft.com/en-us/features/storage-explorer). You’ll expand File Shares and you should see the file share listed when you ran get-clouddrive. You may see a folder for system use called .cloudconsole. Download the PowerShell script from my github repo at https://github.com/hfoley/AzureCloudShell called CSModuleCheck.ps1. Then upload that file to your Azure Cloud Shell file share location. Now within the Azure Cloud Shell console check to make sure the file is seen by running another dir command.

Now we confirmed we can see the file, we’ll run that script by using the command below:

./CSModuleCheck.ps1

What this script does is check and list the versions of the modules you see here.

If you do not see one listed like Az.Synapse, then we can install it. All you need to do is run the command below:

Install-Module -Name Az.Synapse

And that’s all there is to it! You’re all set to run PowerShell scripts in Azure Cloud Shell. Up next I’ll go through how I use Azure Cloud Shell and PowerShell scripts help my customers setup a full Synapse environment including pipelines (https://hopefoley.com/2021/11/02/how-i-start-my-synapse-analytics-pocs/).

Precision Learning – Umm What?

March 1, 2021hopefoley Leave a comment

What does “precision learning” even mean? I think the first time I heard “precision” used was in relation to “precision medicine”. That was a term tied to pivoting treatment for patients based on their specific needs based an extensive data analysis. There are many factors that can impact treatment such as gender, size, family history, and a massive list of other attributes. It also seems there’s a trend toward a big data project when you throw in the term “precision”.

In education, everything is focused on a student’s outcome. So in precision learning, you’d be pivoting to focus on the student and what behaviors help achieve the desired success criteria like grades, test scores, etc. That’s a tough nut to crack from a data perspective though right? It is one of those times where I have to admit, this seemed a far away fuzzy concept. The data capture of data for that seemed beyond this data gal’s realm of thinking. In a recent project, I found the most interesting data set that immediately brought me to thinking of the term “precision learning”.

I worked with a customer at a university that had an e-textbook and learning system in place called VitalSource. There are extracts from this tool that are made available in a few methods that contain all the telemetry of all interactivity from within the application. So imagine now imagine having the ability to analyze a student’s interactivity with the learning text! When I learned what data I was to work with on this POC, the data nerd in me felt like I won the lottery!

After a few pinches to myself for landing into this project/data, I did get brought back down to earth a notch when I heard the data came in gzipped nested json files. I realized I was staring down the barrel of a full on big data project! I don’t get many of these in education so I was insanely excited. And I’m even more excited to share what I learned with everyone, especially my friends in the education data world. I intend to do deeper dive posts into this data and the architecture below. I also created scripts you can use at https://aka.ms/PrecisionLearning to setup the entire solution and sample data (pulled from sample data posted on VitalSource documentation). Please comment or reach out if this solution is helpful to you or you have suggestions to make it better.

Only Took a Pandemic, But She’s Back and Brought PoSH Automation Presents

October 13, 2020hopefoley 1 Comment

I’ve threatened to come out of blog retirement for a long time but in the triage of Hope’s life, it just never happened (outside a few posts for SQLStarter track). Now that COVID 19 has upended travel for both work and SQL community events, seems the time has found me. And also not that I seriously thought my blogging again would trigger some terrible apocalypse like zombies or Godzilla showing up, but when better to do it than in 2020 when folks may not even notice?

So what’s got me writing besides more time? I’ve been helping customers at Microsoft with data challenges of all sizes for the past 5 years with the last 1.5 focused on the Education sector. A lot of my job is helping folks figure out and implement data things in Azure. If I think something is harder to do, I tend to try to help automate it in some way. I’ve built up a lot of scripts, how to information, and a billion links to info over the years but I haven’t been great about putting it out in the world. So here’s my attempt at fixing that with the help of some better Git/Github skills (thanks again Drew).

So what’s the hard thing I want to help make easier in this post? Metadata driven pipelines in Azure Data Factory! I had the opportunity awhile back to work with a customer who was pulling data out of large SQL Servers to eventually land data into Azure Synapse SQL pools back when they were still Azure SQL DW. We created a couple load pattern pipelines that used metadata in Azure SQL DB to load Synapse sql pool tables from parquet files in Azure Data Lake Storage (ADLS) Gen 2.

Not gonna lie, the pipelines weren’t easy for me to learn to setup initially. Big thanks to Catherine for your blog which was a life preserver in the hardest parts! So I wanted to see if I could automate it in my old friend PowerShell. That grew into what is now published out on https://aka.ms/SynapseLoad. This is the bones to create all the pieces in Azure (diagram below) including the ADF pipelines. I intend to keep that solution as all the pieces are using GA (general availability) Azure Data Services.

I also have a new incarnation of that work out now as well. The new solution is using as much of the new Synapse Preview workspaces that PoSH cmdlets would allow (no pipelines in workspace yet). I also have some additional pipelines to drive the extraction of SQL tables to parquet files in ADLS. These pipelines can extract portions from date based criteria or other non date columns using metadata too. Oh and I also threw in some custom logging back to Azure SQL DB as well. This version can be found at https://aka.ms/SynapseLoadV2.

I hope to make this better each time I work with a customer. I have a laundry list of updates in mind already like adding the extract pipelines to the GA version. If you use it, and have suggestions, issues, or just find it helpful throw out a tweet to me using hashtag #HopeSynapseLoad

SQLStarter Track Back Again In Indy

May 25, 2017hopefoley 1 Comment

Last year I pulled together an idea that had been haunting me for a long time and started the SQLStarter track in Indy. You can refer back to my blog post on the creation/inspiration. I can tell you, the day of the event I watched intently the attendance for the sessions in the track. So glad to see it was really well attended and I got great feedback! I would really like to thank all the speakers who helped get this going. Thank you so much to Josh Fennessy, Dave Leininger, David Maxwell, John Deardurff, Traci Marcero, Kevin Kline, and Jim Bennett! You guys will never know how grateful I am for your help getting this off the ground!

***

I’d like to bring it back again this year so I’ll need speakers again. All the sessions are intended to be introductions to different tracks in data related fields. If you would like to be involved, please let me know. I think the tracks below worked well but if you have thoughts/suggestions, I’m all ears. Also please help get the word out to people thinking about getting into a new career or starting out.

Intro to Big Data
Intro to Data Science
Intro to Database Administration
Intro to Database Development
Intro to Business Intelligence

And another pic from a time when my judgement was questionable (see previous blog).

PASS Summit 2015 Highlights

November 13, 2015hopefoley 1 Comment

I’m back home from PASS Summit 2015 and reflecting on the experience. I think I even still have a little bit of the goofy grin that I couldn’t wipe off my face all week. I do so love PASS Summit! This was my 5th consecutive one. I went back and looked through some of my previous blog posts and I still have all the same giddy, giggling idiot feelings I have had all these years. Below are my top 10 highlights of this year’s PASS Summit.

***

10. Calories don’t count at PASS Summit – Ok this isn’t really true but I pretended it did. After I had an egg white omelet at breakfast the first day, it was all downhill from there. It’s difficult to eat well at a conference and I missed eating like a truck driver so I took a healthy eating sabbatical while there. The Morning After burger at Tap House is very good by the way.

My egg white omelet was at one of the “hey pull up a chair” breakfasts at Daily Grill.

9. Great Sessions – As with every Summit, I didn’t catch as many sessions as I would have liked (highly recommend purchasing DVD/download of sessions for this reason). I did catch my dear friend, Mark Vaillancourt’s (Twitter | Blog) session “A Bigger Boat: Data Visualization Lessons from the Movie Theater”. Mark always cracks me up and I always enjoy his sessions. Great information on perception with the right balance of humor mixed in to it.

Mark caught a fish this big!

I also caught Kasper de Jonge’s (Twitter | Blog) session “The Analysis Services Evolution”. Poor Kasper had all the presentation planets align against him. He couldn’t get the screen to display except setting up like he is in the picture below. He handled all the issues VERY well and it was a great session. Folks were excited to hear of the new features in SSAS so that helped too.

Poor Kasper!

8. Birds of a pitiful feather – On Friday they had the Birds of a Feather lunch. This is where they will setup tables with specified subjects. It’s always nice for good discussions. I went with my friend and former co-worker, Meagan Longoria (Twitter | Blog) . I was thinking I would sit at the Datazen table but turned out there wasn’t one. I did find a table though on geo-spatial. There were only a couple folks sitting there at the time so we sat there. After chatting with them for a couple minutes, they left and Meagan and I manned that table together alone (cue the All By Myself song). We had fun but that was a lonely subject table(reason I said pitiful)! The whole Birds of a Feather lunch seemed to be scaled down a bit more than in years past too.

The masses at the WIT Luncheon

I also missed the lunch where they setup tables for regions. I always met new people at each lunch from my area who weren’t familiar with the local user group. I’ve connected with several folks over the years at that lunch. Not sure of others experience but I missed it.

***

7. SQL Mystery – Doug Lane (Twitter | Blog) is a brave man! He had a couple of sessions at Summit where he uses other speakers as cast members in his sessions. It’s like incorporating community theater into his technical sessions. I was asked to be a cast member in “SQL Server Mystery: Dead Reports Don’t Talk”. He sent the script several days ahead of time but I was more nervous about this than my technical session. I feel I do well with conversations. I feel I do well in a rehearsed technical setting. Anything off the cuff, in front of an audience, yeah not so much. That and I hated the thought of me impacting his Summit session if I messed up. I survived but definitely could have done a better job. Sorry Doug!

Doug’s super fun SSRS Mystery session

6. Kasper knowing who I was and said looked through my session slides! – I still get star struck at PASS Summit. I was introduced to Kasper and he had a question about one of MY slides! I am still giddy about that!

***

5. Tim Chapman is a helluva guy! – Knowing I was new to Microsoft, Tim (Twitter ) reached out and pointed me to great resources to help navigate the oodles of tools and info within Microsoft. Not only that but he introduced me to some great folks at Summit. Thank you much Tim! Very much appreciate it!

***

4. Speaker Idol – I was asked by Denny Cherry (Twitter | Blog) if I would be a judge in this year’s Speaker Idol. For those unfamiliar, it is a contest for speakers to win a speaking slot at the next year’s Summit. I said yes and was very glad to be a part of it and honored to be in the line up of judges (Mark Simms (Twitter), Andre Kamman (Twitter | Blog), Allan Hirt (Twitter | Blog), Joey D’Antoni(Twitter | Blog), and Karen Lopez (Twitter | Blog)). All of the speakers were great! Due to that, it came down to getting way nit picky on the tiniest of details of the sessions. I felt little uncomfortable doing that since many of the speakers I consider friends. Still was a great process to be a part of and I picked up some tips for myself as a speaker. The winner was David Maxwell (Twitter | Blog) . Congrats David!

No Summit trip is complete without a trip to Bush Garden. Many times it’s Jason twisting my arm to go 🙂

3. Marco Russo / Alberto Ferrari in my session – My session was over SSAS Tabular performance. I reference Marco Russo (Twitter | Blog) and Alberto Ferrari (Twitter | Blog) many times during it. During practice run throughs of my session, it dawned on me that this was the only time I would give the session and they could show up. Guess what?! They came to MY session!! That still cracks me up a bit and was great to meet them in person.

***

2. Seeing all my #SQLFamily all week – One of the best parts of Summit is that you can’t throw a rock all week and not hit a member of #SQLFamily. Over the years I have met so many wonderful SQL people from all over the world. Summit is the one time of year when the majority of those folks are together. Makes me feel all SQL warm and fuzzy just thinking about it.

My favorite pic of Summit!

Stephanie Bruno moment – I help babies in Africa! – I went to dinner with a group on Thursday evening at Tap House. I met Stephanie Bruno (Twitter) a couple years ago when she attended my spatial data session. We became friends that year and kept up some after that Summit. She works for the Elizabeth Glaser Pediatric AIDS Foundation (www.pedaids.org). She was at that dinner and began talking about how I helped her with some Datazen work she was doing. And by this act of me getting information to her, I was also in a way helping babies in Africa! I get teary eyed again just thinking about it. I mean we speak and hope some bit of info we relay helps folks do their jobs or helps them with an issue they run into but that’s usually the end of the thought. It really hit me in the gut to be able to draw a line to a concrete example of something I did, doing some real good in the world! One session, one connection did that! It’s really amazing to think about and helps push more passion into my speaking in the community, my push to drive new speakers, and my push to get folks to become part of this amazing community!

Goodbye MVP and yoga pants

September 28, 2015October 1, 2015hopefoley 4 Comments

Life is funny you know. So many times you are moving right along and fate throws you new paths you weren’t expecting. I recently had a very surprising phone call. It was from a friend and colleague at Microsoft who was hoping to move to a new role. He said he wanted to give me a heads up as he had recommended me as a good candidate to replace his potentially vacant spot and I would start getting phone calls. I was very glad he did as I was stunned and sounded like a mumbling idiot while the information processed. Microsoft!? Moi?! I was very happy at BlueGranite and wasn’t looking at all. But everyone has that short list of companies where if they call, you at least pick up the phone.

***

Fast forward a few weeks and I have just accepted a TSP position with Microsoft! I’m giddy with excitement for this opportunity! And there the tears well up again, but I’m so sad to be leaving BlueGranite! This company is an amazing group of people who are astounding in the BI/Data Analytics field. I have joked in the past that there may have been some deals with the devil within upper management to have acquired the talent that they have. And they took a chance hiring this DBA with less than stellar track record in BI, but wanted to learn. Part of the reason too that I went to work for BlueGranite was I had many friends who work for them. That is what makes leaving so hard…I’m going to miss working with these folks so much! I am very proud to have been able to call myself a part of this team. Thank you for everything BlueGranite!

***

The other painful piece of this is that if I’m renewed for MVP in a few days, it won’t last very long. Being an MVP has been such a surreal experience! It’s still unreal I was able to be a part of this group of crazy smart, crazy passionate, and crazy generous people. Being an MVP is, was, and will always be a great source of pride in my career.

***

And dear yoga pants, I will miss spending most of my time with you. We’ll still be together, just not as often.

Cleveland Rocks – SQLSaturday Cleveland 2015

February 17, 2015hopefoley 1 Comment

Heading home now from SQLSaturday Cleveland. I know I say all the SQLSaturdays are good but this one really was fantastic! I am a little amazed at how many passionate people are in that area that help with that event. I think a lot of that has to do with Mr. Allen White (Twitter|blog) and his constant encouragement of others in the community. It shows and is inspiring!

***

So I head out from Indy on Friday with Warren (Twitter|blog) who graciously agreed to drive. Which thanks to that and technology, I was able to finish up work while on the way. He then drops me off at Harness Cycle spin studio. Real life Super Woman Erin Stellato (Twitter|blog) set up Joey (Twitter|blog) and I to do a spin class there that she instructed! Is there anything she can’t do!? 🙂 Thanks Erin for setting that up! Was really nice to have a fitness activity that for a change that wasn’t running related.

***

From there we head to Ironwood Café to the speaker dinner. They setup a feast! When the waitresses started bringing food, they didn’t stop! Well done there organizers, I was impressed. After that I head back and chat for a good long time with the dynamic duo Data Chix, Audrey (Twitter|blog) and Julie (Twitter|blog).

***

Event day comes and I get myself over to Hyland Software at a reasonable time in the morning. I was lucky again and have a session right after lunch so wasn’t rushed. The Cleveland folks are lucky with that venue they have at Hyland. It’s a great facility! I am super jealous! After getting there I head to the speaker room which was setup with an amazing spread! Brewnuts (donuts with BEER in them!!), healthy options, and homemade goodies including caramels from Erin (see I told you…Super Woman!).

***

I was able to sit in on Joey’s lunch session and show support to one of the great SQL community sponsors, SIOS. After that I had my session which was on spatial data again. I really enjoy that one as it shows the potential analysis with floor plans and spatial data. I had several say they enjoyed it and even two who told me it was their favorite of the day!

***

After my session was over I went to watch my homegirl, Julie’s session on SSIS. She was doing a decent amount of tweaking to it so was there for some morale support in addition to learning. She’s a pro and it was great. I know I have run into some of the same things she went over. SSIS can be, well let’s just say non-intuitive at times. She goes over many of these spots that can make folks bang their heads on their desk.

***

Last session of the day I caught Michael J. Swart’s (Twitter|blog) Drawing Conclusions talk. He is such a great artist and he used his illustrations to convey some SQL Server lessons. Good stuff and glad I was able to catch that one. That and I got to listen to him say “about” (he’s Canadian).

Michael Swart caught a join this big 🙂

After the prize drawings we all went to Ambrosia to decompress and have some dinner. It was a Hungarian place. The food and company was great! And I found out I do like stuffed cabbage…who knew?! The staff seemed a bit overwhelmed but it happens. After that a few of us went over to Bar Louie for one more beverage and chatting.

The crowd at the end of the day for prize drawings.

Thank you Cleveland folks for having me speak again! Looking back on the weekend I really do feel blessed to be a part of this community. Good people, good times!

Spatial Data – Outside the Map slides

February 13, 2015hopefoley Leave a comment

Wanted to make sure to get these slides up for the user groups I have presented to lately. Hope you enjoyed the session!

SpatialDataLookingOutsideTheMap

Hope Foley's Blog

The balancing of my karma with some stuff I figured out about data that I wanna share.

Category Uncategorized