How I Setup External Data Sharing for a Customer

I worked with a customer recently who had a need to share CSVs for an auditing situation.  They had a lot of external customers that they needed to collect CSVs from for the audit process.  There were a lot of discussions happening on how to best do it, whether we’d pull data from their environment or have them push them into theirs.  Folks weren’t sure on that so I tried to come up with something that would work for both. 

What I came up with is something that will grab CSVs in an Azure storage account, no matter where it lives.  It then is pulled into their data lake in a consolidated ADLS Gen 2 location in a parquet format so they can do all the fun big data things on it in Azure Synapse in an aggregated data lake.  It setups up an Azure Key Vault where you can store the connection string for the Azure storage account that contains the CSV files.  NOTE – YOU NEED ELEVATED ACCESS TO STORAGE ACCOUNTS.  You may need approval or separate storage accounts both on the sending side with the CSVs and the receiving end in your Azure subscription.  GET PROPER APPROVAL/BLESSINGS before setting this up. 

If you want to setup the full solution including the components in Azure.  I’ve published the code to build out everything at All you need to do is store the files located in the DataShare Full Sample folder locally.  You’ll then run some or all the PowerShell scripts below. 

  1. Update the paramfile03.json file with the values of what you want to build or use.  You can use existing pieces if you’d like, the script checks if it exists before creating all the pieces. 
  2. Load the CSVs to a storage account that you can get the connection string with the storage keys.  Note the connection string as you’ll get prompted for it during the run. 
  3. Run ./”01 – Create Resources DataShare.ps1″ -filepath ./paramfile03.json to create all the Azure pieces. 
  4. Run ./”02 – GrantRightsDataShareCLI.ps1″ -filepath ./paramfile03.json to grant rights needed to admin and MSI for Synapse to storage and Azure Key Vault. 
  5. Run ./”03 – Create Pipeline Parts DataShare.ps1″ -filepath ./paramfile03.json to build the Synapse pipeline components.
  6. Run the pipeline with the parameters filled in below and verify your CSVs have been processed into your data lake as parquet files. This pattern allows you to setup separate key vault secrets to pull CSV files from as many storage accounts as you need. There are a couple sample CSVs within the folder as well if you’d like to use them. They contain data for hospital capacity that were extracted from the site
Parameter NameValue details
ParamSecretName of the secret containing connection string
ContainerName of container holding CSV files
FilePrefixAll CSVs that begin with this text will be processed
DirectoryPathdirectory path (i.e. foldername1/foldername2/)

Let me know if this helps you or you have suggestions to make it better/clearer.


How I Start my Synapse Analytics POCs

I love my job!  One of the things I do for a living is to help customers get started with new services in Azure to finagle their data.  Many times we’ll start with a small POC to just start to understand the parts and pieces, and I teach them along the way.  I work with a lot of customers so being quick and nimble helps.  Lately I’ve been using PowerShell to setup the pieces needed for a full Synapse Analytics environment, including an example set of 4 pipelines (2 to extract to ADLS, 2 to upload to dedicated SQL pool).  Pulling data out of large relational databases into the data lake became a request I heard over and over so I automated it.  I’ve added and tweaked this over the years into a project I called “Synapse Load” and put a version out in my github.

I want to pick up where I left off in my last post ( and show how I can use the code in the repo above to setup a full environment for Synapse including the pipelines below. 

Download a from the repo above and extract them locally.  We’ll setup the full sample so all the files we’ll need are in the SynapseLoadV2-master\03 Sample folder.  Open the paramfile01.json file and edit all the values that contain <text>.  For example here’s the values for the Azure SQL Server and DB after editing.  You can supply existing resources and the create will skip them. 

 “azsqlserver” : “yoursqlservername”,

  “azsqlDB” : “yoursqldbname”,

Open up Azure Storage Explorer and navigate to  the file share Azure storage location for running CLI/PowerShell.  I create a folder called SynapseLoad.  We need to load all the files from 03 Sample folder except the *.sql files (SQL scripts will be run from SQL client of preference).  Copy those files and put them into the Azure file share location for your CLI.  Here’s a view of what you should see in your PowerShell session in Azure portal.

The paramfile01.json file we updated drives everything we’ll create in Azure.  Let’s do one last look at what’s in it from the CloudShell to be safe.  Type in cmd:   get-content ./paramfile01.json

After you’ve validated it, we’re ready to run the scripts.  The 01 – CreateSynLoadResources.ps1 will create all the components in Azure.  We’ll run this in the CLI:

./”01 – CreateSynLoadResources.ps1″ -filepath ./paramfile01.json

Detailed instructions can be found in the readme at as well.

You’ll get a couple prompts to create SQL admin user/passwords and when complete verify in Azure portal the components completed.  We need to run a script to update some rights needed to view data in Synapse.  You run that in CLI: 

./”02 – SynLoadGrantRights.ps1″ -filepath ./paramfile01.json

The Synapse pipelines are created by running the following script.  This script updates the json files we need to build the pipelines and then runs after updates occur. 

./”03 – CreateSynLoadPipelineParts.ps1″ -filepath ./paramfile01.json

You can now go into Synapse Studio (link in Azure Portal if you click on the resource).  You should see the following pipelines.

You will need to run the scripts to create the metadata tables, the source table, and destination tables inside of the dedicated SQL pool.  The steps in the walk through this in detail.  After you’ve set the sources in the metadata tables.  The SQL Date Based Extract PL pulls base on date range.  SQL Not Date Based Extract PL is driven by a where clause extract (i.e where Animal = 9111 in the sample).  You can then upload the parquet files after updating the [ADF].[MetadataLoad] table with your parquet filename(s).  The filename is the match to the table field values that are used to get target table information.  One is for truncating/reloading patterns and one is for incremental loading. 

I have a million ideas to make this better.  What’s yours?  Please let me know if you have feedback and if this helps you, I so want to hear it! 

PowerShell in the Clouds

Part of my job is doing POCs with customers to help with Azure Data Services.  Anything that helps me move quicker is helpful so I’m a tad bit obsessed with automating things.  I have used PowerShell for more years than I’m willing to admit to help me automate what I can.  There are a lot of ways to automate things like ARM templates and DevOps, but PoSH has been my preferred hammer.  As much as I love it, I’ve ran into issue sometimes with installing modules locally on folks machines and not to mention if they have a Mac.  I wondered recently if Azure Cloud Shell would help make things easier, and it very much did and I’m super pumped to share!  This post will help run through how to get setup to run PowerShell scripts in Azure.

Navigate to and login to the Azure portal (  Start Azure Cloud Shell by clicking the  >_ icon in top right.  I overlooked it forever so here’s a view of it below

If necessary change from Bash to PowerShell.  It will require us to connect some Azure blob storage to it or create a storage account to use if you haven’t already.  This post below gives further details on this ( 

If you’re not prompted to setup storage you may have already done so, and like me didn’t remember it.  You can figure out the details of the storage account by using the command:  get-clouddrive 

You can navigate the same as you would locally.  Type the command:  dir and you should see at least one folder called clouddrive.  Navigate to it by typing the command:  cd clouddrive  This is the location where you’ll be able to upload and interact with your files.  Next we’ll upload a PowerShell script to this location to check for modules needed for upcoming blogs to create items in Azure. 

Navigate to the storage account that is tied to Azure Cloud Shell.  I like to use Azure Storage Explorer (  You’ll expand File Shares and you should see the file share listed when you ran get-clouddrive.   You may see a folder for system use called .cloudconsole.  Download the PowerShell script from my github repo at called CSModuleCheck.ps1.  Then upload that file to your Azure Cloud Shell file share location.  Now within the Azure Cloud Shell console check to make sure the file is seen by running another dir command. 

Now we confirmed we can see the file, we’ll run that script by using the command below:


What this script does is check and list the versions of the modules you see here. 

If you do not see one listed like Az.Synapse, then we can install it.  All you need to do is run the command below: 

Install-Module -Name Az.Synapse

And that’s all there is to it!  You’re all set to run PowerShell scripts in Azure Cloud Shell.  Up next I’ll go through how I use Azure Cloud Shell and PowerShell scripts help my customers setup a full Synapse environment including pipelines (  

Precision Learning – Umm What?

What does “precision learning” even mean?  I think the first time I heard “precision” used was in relation to “precision medicine”.  That was a term tied to pivoting treatment for patients based on their specific needs based an extensive data analysis.  There are many factors that can impact treatment such as gender, size, family history, and a massive list of other attributes.  It also seems there’s a trend toward a big data project when you throw in the term “precision”. 

In education, everything is focused on a student’s outcome.  So in precision learning, you’d be pivoting to focus on the student and what behaviors help achieve the desired success criteria like grades, test scores, etc.  That’s a tough nut to crack from a data perspective though right?  It is one of those times where I have to admit, this seemed a far away fuzzy concept.  The data capture of data for that seemed beyond this data gal’s realm of thinking.  In a recent project, I found the most interesting data set that immediately brought me to thinking of the term “precision learning”. 

I worked with a customer at a university that had an e-textbook and learning system in place called VitalSource.  There are extracts from this tool that are made available in a few methods that contain all the telemetry of all interactivity from within the application.  So imagine now imagine having the ability to analyze a student’s interactivity with the learning text!  When I learned what data I was to work with on this POC, the data nerd in me felt like I won the lottery!

After a few pinches to myself for landing into this project/data, I did get brought back down to earth a notch when I heard the data came in gzipped nested json files.  I realized I was staring down the barrel of a full on big data project!  I don’t get many of these in education so I was insanely excited.  And I’m even more excited to share what I learned with everyone, especially my friends in the education data world.  I intend to do deeper dive posts into this data and the architecture below.  I also created scripts you can use at to setup the entire solution and sample data (pulled from sample data posted on VitalSource documentation).  Please comment or reach out if this solution is helpful to you or you have suggestions to make it better.

Only Took a Pandemic, But She’s Back and Brought PoSH Automation Presents

I’ve threatened to come out of blog retirement for a long time but in the triage of Hope’s life, it just never happened (outside a few posts for SQLStarter track).  Now that COVID 19 has upended travel for both work and SQL community events, seems the time has found me.  And also not that I seriously thought my blogging again would trigger some terrible apocalypse like zombies or Godzilla showing up, but when better to do it than in 2020 when folks may not even notice? 

So what’s got me writing besides more time?  I’ve been helping customers at Microsoft with data challenges of all sizes for the past 5 years with the last 1.5 focused on the Education sector.  A lot of my job is helping folks figure out and implement data things in Azure.  If I think something is harder to do, I tend to try to help automate it in some way.  I’ve built up a lot of scripts, how to information, and a billion links to info over the years but I haven’t been great about putting it out in the world.  So here’s my attempt at fixing that with the help of some better Git/Github skills (thanks again Drew). 

So what’s the hard thing I want to help make easier in this post?  Metadata driven pipelines in Azure Data Factory!  I had the opportunity awhile back to work with a customer who was pulling data out of large SQL Servers to eventually land data into Azure Synapse SQL pools back when they were still Azure SQL DW.  We created a couple load pattern pipelines that used metadata in Azure SQL DB to load Synapse sql pool tables from parquet files in Azure Data Lake Storage (ADLS) Gen 2. 

Not gonna lie, the pipelines weren’t easy for me to learn to setup initially.  Big thanks to Catherine for your blog which was a life preserver in the hardest parts!  So I wanted to see if I could automate it in my old friend PowerShell.  That grew into what is now published out on  This is the bones to create all the pieces in Azure (diagram below) including the ADF pipelines.  I intend to keep that solution as all the pieces are using GA (general availability) Azure Data Services. 

I also have a new incarnation of that work out now as well.  The new solution is using as much of the new Synapse Preview workspaces that PoSH cmdlets would allow (no pipelines in workspace yet). I also have some additional pipelines to drive the extraction of SQL tables to parquet files in ADLS.  These pipelines can extract portions from date based criteria or other non date columns using metadata too. Oh and I also threw in some custom logging back to Azure SQL DB as well.  This version can be found at

I hope to make this better each time I work with a customer.  I have a laundry list of updates in mind already like adding the extract pipelines to the GA version.  If you use it, and have suggestions, issues, or just find it helpful throw out a tweet to me using hashtag #HopeSynapseLoad

SQLStarter Track – Of Course It’s Back!

Seems the SQLStarter track is the only thing that brings me out of blog hiatus anymore. I keep meaning to blog but my life is triage and blogging just tends to fall off the vitals list.  One day I’ll get back into gear.  But today my friends I wanted to get the word out again for something I’m passionate, SQLStarter.  If you’re inclined you can go back and read the posts from the past they are here.

SQLFamily, I Have An Idea and I Need Your Help

SQLStarter Track Back Again In Indy

I still truly feel that these career paths are a great way to change lives.  One day I’m hoping someone will come up and say “I attended a session day that changed my life”.  I’m passionate enough that I’m willing to yearly share embarrassing pictures from my days of questionable judgment (older posts reference).



It’s hard to get your hair to poof with maid hat on but by god I did it! 


And what do I ask in return for this good laugh?   I am looking for speakers to fill the spots. If you’re interested in helping, can hit me up on Twitter/email/carrier pigeon. And for everyone else, helping to get the word out would be great! There’s an info page setup at site here and you can RT messages on #SQLStarter hashtag.

SQLStarter Track Back Again In Indy

Last year I pulled together an idea that had been haunting me for a long time and started the SQLStarter track in Indy.  You can refer back to my blog post on the creation/inspiration.  I can tell you, the day of the event I watched intently the attendance for the sessions in the track.  So glad to see it was really well attended and I got great feedback!  I would really like to thank all the speakers who helped get this going.  Thank you so much to Josh Fennessy, Dave Leininger, David Maxwell, John Deardurff,  Traci Marcero, Kevin Kline, and Jim Bennett!  You guys will never know how grateful I am for your help getting this off the ground!


I’d like to bring it back again this year so I’ll need speakers again.  All the sessions are intended to be introductions to different tracks in data related fields.  If you would like to be involved, please let me know.  I think the tracks below worked well but if you have thoughts/suggestions, I’m all ears.  Also please help get the word out to people thinking about getting into a new career or starting out.

  • Intro to Big Data
  • Intro to Data Science
  • Intro to Database Administration
  • Intro to Database Development
  • Intro to Business Intelligence



And another pic from a time when my judgement was questionable (see previous blog). 


SQLFamily, I Have An Idea and I Need Your Help

Have you ever had an idea for something and it completely haunts you? Something that pops in your brain all the time and keeps you up at night? Something where it feels like the universe NEEDS you to do it? I have had this idea brewing for a long time and I now have some key pieces in place to finally get it out there so it can stop haunting me.


First a little history to understand a portion of the inspiration behind this. Back in middle school and early high school, I hung out with a group who got into a LOT of trouble. There are some situations I look back on where my future was sitting on the edge of a knife. If I had zigged many days, instead of zagging, my life could be completely different. I have a lot of friends who didn’t finish high school and went down paths that are hard to reverse, living paycheck to paycheck. My home town of Anderson, IN and hell my family is completely full of people I’d like to help get on a different path.


If you can imagine, this was an era of questionable judgement.  Photo isn’t the best quality but gets across the point. 🙂


So with those folks in the back of my mind, I also see the greatness of the SQL community and the WEALTH of resources out there for free. I also hear over and over that we don’t have enough people to fill these data related jobs. I know that recruiters hit me up all the time. So if I can reach someone out there who is ready to go down a different path and is motivated to put in the work, we can change some lives!



Now SQLSaturdays are already reaching people and changing lives. They changed my life, without a doubt. But I know I have met beginners at those who are intimidated and scared when they attend. Can you imagine going into a SQL internals class when you are brand new to the scene? Hell they may give up right then and say screw it, this stuff is way over my head. So at SQLSaturday Indy I’m going to setup a total beginner track. I’ll be giving the first session of the day where it will be an easy intro into the different data related career paths you can take. Then the rest of the sessions in that track will dig a bit deeper into each one. These sessions will explain high level concepts and the kinds of work you would be doing. And of course, giving details of where they can find more information after they figure out which avenue intrigues them.


This is where I need your help SQLFamily! I need people to assist with these area focused sessions. Will need 5-6 sessions to fill out the rest of the track. These are the ones I’m thinking now but am open to suggestion (beggers can’t be choosers).

  1. Database Admin
  2. Database Development
  3. Business Intelligence
  4. Big Data/Analytics
  5. Other??

I also want to start a way to help keep in touch and connect with these people as they go out into the world after SQLSaturday Indy. Seeing how well Twitter and some hashtags work, I’m starting another one called #SQLStarter. We can use it to help answer questions, and point these beginners to new resources. I also would like to start making recruiters aware of it as well so they can post jobs appropriate for beginners.


So there…the idea is out there and hopefully the universe will stop haunting me to do this. Hoping this is the spark and the beginning of something spreads to other SQLSaturdays/User Groups/whatever. And as kumbaya froo froo as it sounds, feels great to put some good out into the world!

PASS Summit 2015 Highlights

I’m back home from PASS Summit 2015 and reflecting on the experience.  I think I even still have a little bit of the goofy grin that I couldn’t wipe off my face all week.  I do so love PASS Summit!  This was my 5th consecutive one.  I went back and looked through some of my previous blog posts and I still have all the same giddy, giggling idiot feelings I have had all these years.  Below are my top 10 highlights of this year’s PASS Summit.


10.  Calories don’t count at PASS Summit – Ok this isn’t really true but I pretended it did. After I had an egg white omelet at breakfast the first day, it was all downhill from there. It’s difficult to eat well at a conference and I missed eating like a truck driver so I took a healthy eating sabbatical while there. The Morning After burger at Tap House is very good by the way.

My egg white omelet was at one of the "hey pull up a chair" breakfasts at Daily Grill.

My egg white omelet was at one of the “hey pull up a chair” breakfasts at Daily Grill.

9.  Great Sessions – As with every Summit, I didn’t catch as many sessions as I would have liked (highly recommend purchasing DVD/download of sessions for this reason). I did catch my dear friend, Mark Vaillancourt’s (Twitter | Blog) session “A Bigger Boat: Data Visualization Lessons from the Movie Theater”. Mark always cracks me up and I always enjoy his sessions.  Great information on perception with the right balance of humor mixed in to it.

Mark caught a fish this big!

Mark caught a fish this big!

I also caught Kasper de Jonge’s (Twitter | Blog) session “The Analysis Services Evolution”.  Poor Kasper had all the presentation planets align against him.  He couldn’t get the screen to display except setting up like he is in the picture below.  He handled all the issues VERY well and it was a great session.  Folks were excited to hear of the new features in SSAS so that helped too.

Poor Kasper!

Poor Kasper!

8.  Birds of a pitiful feather – On Friday they had the Birds of a Feather lunch. This is where they will setup tables with specified subjects. It’s always nice for good discussions. I went with my friend and former co-worker, Meagan Longoria (Twitter | Blog) . I was thinking I would sit at the Datazen table but turned out there wasn’t one. I did find a table though on geo-spatial. There were only a couple folks sitting there at the time so we sat there. After chatting with them for a couple minutes, they left and Meagan and I manned that table together alone (cue the All By Myself song). We had fun but that was a lonely subject table(reason I said pitiful)! The whole Birds of a Feather lunch seemed to be scaled down a bit more than in years past too.

The masses at the WIT Luncheon

The masses at the WIT Luncheon

I also missed the lunch where they setup tables for regions.  I always met new people at each lunch from my area who weren’t familiar with the local user group.  I’ve connected with several folks over the years at that lunch.  Not sure of others experience but I missed it.


7.  SQL Mystery – Doug Lane (Twitter | Blog) is a brave man!  He had a couple of sessions at Summit where he uses other speakers as cast members in his sessions. It’s like incorporating community theater into his technical sessions. I was asked to be a cast member in “SQL Server Mystery: Dead Reports Don’t Talk”. He sent the script several days ahead of time but I was more nervous about this than my technical session. I feel I do well with conversations. I feel I do well in a rehearsed technical setting.  Anything off the cuff, in front of an audience, yeah not so much. That and I hated the thought of me impacting his Summit session if I messed up. I survived but definitely could have done a better job. Sorry Doug!

Doug's super fun SSRS Mystery session

Doug’s super fun SSRS Mystery session

6.  Kasper knowing who I was and said looked through my session slides! – I still get star struck at PASS Summit. I was introduced to Kasper and he had a question about one of MY slides! I am still giddy about that!


5.  Tim Chapman is a helluva guy! – Knowing I was new to Microsoft, Tim (Twitter ) reached out and pointed me to great resources to help navigate the oodles of tools and info within Microsoft. Not only that but he introduced me to some great folks at Summit. Thank you much Tim! Very much appreciate it!


4.  Speaker Idol – I was asked by Denny Cherry (Twitter | Blog) if I would be a judge in this year’s Speaker Idol. For those unfamiliar, it is a contest for speakers to win a speaking slot at the next year’s Summit. I said yes and was very glad to be a part of it and honored to be in the line up of judges (Mark Simms (Twitter), Andre Kamman (Twitter | Blog), Allan Hirt (Twitter | Blog), Joey D’Antoni(Twitter | Blog), and Karen Lopez (Twitter | Blog)). All of the speakers were great! Due to that, it came down to getting way nit picky on the tiniest of details of the sessions. I felt little uncomfortable doing that since many of the speakers I consider friends.  Still was a great process to be a part of and I picked up some tips for myself as a speaker.  The winner was David Maxwell (Twitter | Blog) . Congrats David!

No Summit trip is complete without a trip to Bush Garden. Many times it's Jason twisting my arm to go :)

No Summit trip is complete without a trip to Bush Garden. Many times it’s Jason twisting my arm to go 🙂

3.  Marco Russo / Alberto Ferrari in my session – My session was over SSAS Tabular performance. I reference Marco Russo (Twitter | Blog) and Alberto Ferrari (Twitter | Blog) many times during it. During practice run throughs of my session, it dawned on me that this was the only time I would give the session and they could show up.  Guess what?! They came to MY session!! That still cracks me up a bit and was great to meet them in person.


2.  Seeing all my #SQLFamily all week – One of the best parts of Summit is that you can’t throw a rock all week and not hit a member of #SQLFamily.  Over the years I have met so many wonderful SQL people from all over the world. Summit is the one time of year when the majority of those folks are together. Makes me feel all SQL warm and fuzzy just thinking about it.

My favorite pic of Summit!

My favorite pic of Summit!

  1.  Stephanie Bruno moment – I help babies in Africa! – I went to dinner with a group on Thursday evening at Tap House. I met Stephanie Bruno (Twitter) a couple years ago when she attended my spatial data session.  We became friends that year and kept up some after that Summit. She works for the Elizabeth Glaser Pediatric AIDS Foundation ( She was at that dinner and began talking about how I helped her with some Datazen work she was doing.  And by this act of me getting information to her, I was also in a way helping babies in Africa!  I get teary eyed again just thinking about it. I mean we speak and hope some bit of info we relay helps folks do their jobs or helps them with an issue they run into but that’s usually the end of the thought. It really hit me in the gut to be able to draw a line to a concrete example of something I did, doing some real good in the world!  One session, one connection did that! It’s really amazing to think about and helps push more passion into my speaking in the community, my push to drive new speakers, and my push to get folks to become part of this amazing community!

Goodbye MVP and yoga pants

Life is funny you know. So many times you are moving right along and fate throws you new paths you weren’t expecting. I recently had a very surprising phone call. It was from a friend and colleague at Microsoft who was hoping to move to a new role. He said he wanted to give me a heads up as he had recommended me as a good candidate to replace his potentially vacant spot and I would start getting phone calls. I was very glad he did as I was stunned and sounded like a mumbling idiot while the information processed. Microsoft!? Moi?! I was very happy at BlueGranite and wasn’t looking at all. But everyone has that short list of companies where if they call, you at least pick up the phone.


Fast forward a few weeks and I have just accepted a TSP position with Microsoft! I’m giddy with excitement for this opportunity! And there the tears well up again, but I’m so sad to be leaving BlueGranite! This company is an amazing group of people who are astounding in the BI/Data Analytics field. I have joked in the past that there may have been some deals with the devil within upper management to have acquired the talent that they have. And they took a chance hiring this DBA with less than stellar track record in BI, but wanted to learn. Part of the reason too that I went to work for BlueGranite was I had many friends who work for them. That is what makes leaving so hard…I’m going to miss working with these folks so much! I am very proud to have been able to call myself a part of this team. Thank you for everything BlueGranite!


The other painful piece of this is that if I’m renewed for MVP in a few days, it won’t last very long. Being an MVP has been such a surreal experience! It’s still unreal I was able to be a part of this group of crazy smart, crazy passionate, and crazy generous people. Being an MVP is, was, and will always be a great source of pride in my career.


And dear yoga pants, I will miss spending most of my time with you. We’ll still be together, just not as often.