Power BI – Problem, Design, Solution (Folder Source Option for JSON Files)

Hello Everyone! In this episode of  the “Problem, Design, Solution” series we will be talking about the “Folder” option inside the sources display for Power BI Desktop. This is a fantastic feature that works so very well with your standard Delimited/CSV (Comma Separated Value) files, making life easier by importing whatever files you have in a directory. The only pre-requisites that are needed in order to leverage this feature are that the files must be of the same file type and structure (same number of columns and names). There is one issue though, even when meetings these minimal standards sometimes the “Folder” option just doesn’t load the files like we would want, this is the case when we talk about JSON files. We will go on to explore how we can work around this issue by basically manipulating what the “Folder” option achieves but taking more control over the process. For those of you who appreciate a more visual approach there is a detailed video at the end of this blog that you can skip to. Even so this will be a very detailed and visual written companion to the video. I hope you enjoy!

Problem

Traditionally the “Folder” source option gracefully handles all of your delimited text files that are stored in a directory. This makes it extremely easy and quick to load that data into Power BI. Unfortunately there are some common formats or structures  that are used in text files that this option does not handle well, namely those in the JSON format. I will not be going into depth on an explanation of the JSON format as this whole blog is about how to load these types of text files into Power BI so it is assumed that you know what I mean when I say JSON. As you can see in the screenshot below, using the folder option is very simple but the outcome of what is achieved is not desirable, we can immediately see that we need to pivot the information to better suit our needs amongst other things.

FolderJSON

As you can see in the image below, after the “Folder” option performs all of its operations to bring the information into one table you can see all the steps in the queries pane. There is a way that you can actually move through these steps and get the output the way you want it but this can be time consuming. The reason I do not recommend this is because within the steps that are being taken there is an order of dependencies and usually when you make one change then you need to address those changes in another step that was dependent on the last one. Instead let’s talk about a way that we can take more control of this process from the beginning and still take advantage of the idea of being able to pull together all the data that exists in multiple files in a folder!

CombineFile

Design

Since we now understand how awesome the “Folder” option is inside of Power BI we need to figure out how to make this concept work for our JSON files. To start we can borrow from what we see in the previous screenshot. We won’t copy everything that the “Folder” option does but definitely a large majority, lets break it down:

1. We can see that there is an item that is pulling a sample file

2. We also can note that there is a table where transforms are being applied

3. Lastly the “Folder” option is using a function to pass through the values of all files found in the specified directory

So let’s now take these ideas and do them ourselves in four easy steps!

Step 1 – Point to One of The JSON Files In The Directory In order to Apply All the Transforms We Want

The way we will achieve this is by first locating the “JSON” option inside of the list of source options in Power BI. Once selected we will simply navigate to the directory we eventually want to load and select any of the JSON files that are there. It really doesn’t matter which of the files you choose because we have already discussed that the files existent here should already be of the same file type and structure. Once we have pointed to the file in question and hit “Okay” the file will be loaded into the Query Editor and we really only have one option here, to convert the data into a table format as seen below.

JSONFile

Once it has been converted to a table you can now do whatever transformations that need to occur, in the video example the only step we will take is to pivot the table and choose not to aggregate any of the values as seen below once again. Also I have gone ahead and renamed the table, “JsonFile”. We have now completed step one by pointing to a sample file and applying whatever necessary transforms. Now all we need to do is find a way to get a list of all files that exist in the folder that we want to process…

Step 2 – Use the Folder Option to Point to the Directory in Question

WHAT?!?!?! We just talked about doing this in the problem portion just above, so why would we want to do this! The answer is simple, we have already decided that the data isn’t in the right format when using the “Folder” option but there is one thing it does do right. The first column that is created is meant to capture the source of where the data was brought from, which in our case is the file name and that is exactly what we want. So after we select the folder to import from we can delete all columns except for “Source.Name”. We aren’t done just yet though because as you will quickly see the filename is repeated for however many columns we had in the file but Power BI provides us an easy solution. Go ahead and right-click the “Source.Name” column and you will see that there is an option to “Remove Duplicates”, let’s also rename this column to something friendly like “Filename” as we see below. Also rename the table to something friendly if needed, in my example I have left it as “Files”.

FileName

Voila! Now we have a distinct list of filenames that appear in the directory. As well, the way this connection is setup, if files are added and/or removed it will be able to keep up with these changes. So every time we refresh the connections we will have an accurate list of what is in the designated folder. Step number 2 now has a checkmark next to it, but now we need to figure out how to tie both these steps together. We want all the transforms we setup in step 1 to apply to whatever files have been discovered in step 2. The answer, use a function!

Step 3 – Create a Function of the First Table  Where We Transformed Our Sample JSON File

Here we actually need to re-visit the first table we created where we pivoted our table. We are going to modify the M query that was written in the background for us for this table. As you can see in the screenshot below there is a line of M that indicates the type of source connection that was used and the specific path the file we chose.

UseMQuery

This is where we are going to make a change but we must first declare a variable/parameter to the beginning of this query. The syntax will be as follows:

(file as text) =>


let
     Source = Json.Document(File.Contents(“C:\Users\Manuel Quintana\Desktop\PDS JSON\Files\”&(file))),
     #”Converted to Table” = Record.ToTable(Source),
     #”Pivoted Column” = Table.Pivot(#”Converted to Table”, List.Distinct(#”Converted to Table”[Name]), “Name”, “Value”)
in
     #”Pivoted Column”

As you can see in the query, we have removed the actual path of the file and replaced it with the parameter called “file”. Once we select the “Done” button the table will change into a function and give you the option pass a value to the parameter in order to invoke the function, leave this alone for now. We now have all the pieces of the puzzle we just need to put it all together.

FinishedM

Step 4 – Invoke a Custom Function to Create a New Column

This is definitely the easiest and final part of setting up this design. Go ahead and select the table that was created in step 2. Now select  “Add Column” in the ribbon at the top and we will be using the “Invoke Custom Function” option. As you can see in the image below this will prompt a menu where we will need to give a name for the newly created column. In my case I have called it “Details” but this really doesn’t matter as we will see very shortly. Next we must specify which function we will be using from the drop down, mine is called “JsonFile”. Lastly we must decide what value we are going to pass to the parameter for this function. If we look back at our function we are wanting to pass through the names of the files that exist inside of our directory, which is exactly what we have in this table within the “FileName” column! After selecting the correct column and hitting okay the new column is created and we can see that is has some packed data inside of it as seen below.

InvokeFunction

All you need to do is select the icon which appears inside of the newly created “Details” column and you will be shown a list of all the columns that are packed inside. I have opted to unselect the “id” column in my example because it is a duplicate of my “FileName” column.  Also, at the very bottom you will see the option, “Use original column name as prefix”. If you leave this checked when you hit okay it will create all of the new columns and they will look like this “Details.ColumnName”. Where ColumnName will be replaced by each field that is coming from the JSON file. I personally do not use this option as you can see below.

image

Take a step back and look at the beautiful table you have created which contains all the details of the JSON files inside of your chosen directory. What is also fantastic is that moving forward when you refresh this Power BI report it will always bring back exactly what is in your directory, no additional effort needed!

Solution

With our design now fully implemented we can now hit “Close & Apply” and bring our data into the model and start visualizing and analyzing it. This design pattern can also be used in other scenarios, I personally have used this to cycle through different pages of a website in order to bring back all the information I was looking for. Functions can be an extremely powerful ally in Power BI! Have a great day!

Resources

Advertisements

Power BI Visualizations Follow Up

First, let me give a thank you to everyone who attended my webinar and all of your positive comments. For those of you coming to my blog and did not get a chance to attend my webinar, you can see it here. There were quite a few submitted questions and I have chosen a couple really good ones to share in this follow up video blog. There were some other questions asked that were a bit more specific to the individuals and I will be working on those and reaching out personally. I hope you enjoy the video below and I have included links afterwards in regards to the items you will see in this video. Enjoy!

-Link for Color Bling Friendly site here

– Link for creating a JSON file for a color palette here

– Blog for using UNICHAR in Power BI here

– Pragmatic Works On Demand Training here

Introduction to SSIS – Building Your First Package

Greetings everyone. I recently delivered a live webinar covering some introductory items within SSIS. In this meeting I walked through some of the common items that are used in packages at the Junior level but as well are commonly used in any level of package. I do talk briefly about the design process and a little on performance related items. If you missed the live webinar it can be viewed here. You will need to create an account to view this video but then you will have access to all the other free content we have available. Any feedback would be appreciated and if you have any question do not hesitate to ask. I do recommend reading my blog on the sort transform after you watch the webinar which takes a focused look at the component and its performance on packages. You can read that blog here.

The topic of Introductory to SSIS was a bit difficult for me to present only because there is so much in SSIS. As well some of the most basic components can be used in more advanced ways. I am as well learning that there is usually more than one way to achieve a goal in SSIS, initially you need to find the way that is easiest for you. As you increase in ability though you will have to leave you comfort zone and try new methods to ensure that you are developing packages in the most efficient way possible, this is known as performance tuning. If you have not already done so, I do recommend looking at Mitchell Pearson’s webinar on performance tuning here.

Task Factory Secure FTP – Using the Get List of Files Option

Greetings everyone. Today I will be doing a blog for a specific component and specific action with the Secure FTP. The “Get List of Files” action for this component utilizes the object variable. If you did not see my blog on object variables please go here. Quite often we have clients that would like to use this option for the Secure FTP component in Task Factory but they are unaccustomed to using object variables. In the video associated with the blog I will show you how to properly setup this component using a couple different options. The main item of this blog is that the object variable must be parsed out and after that is accomplished there are a number of ways you can view or write this information to a table. Just underneath the video you will see the script used for the two script tasks seen in the package. Shortly after this blog I will be doing a small write up on how to leverage our File Filter option correctly which is essential if you are using the “Download Directory from Server” option and you need to just pull certain items. This File Filter option is also available for the “Get List of Files” option.

**It should be known that all these scripts are written in C#. Also any items italicized and in bold represent items that will change depending on the user**

First Script which creates a message box to detect how many rows exist within the arraylist housed in the object variable:

MessageBoxScript

var files = (ArrayList)Dts.Variables[“User::objFilesList“].Value;

MessageBox.Show(files.count.ToString());

Dts.TaskResult = (int)ScriptResults.Success;

Here is the second script that was used in the the For Each Loop container to also use a message box to display the file name:

MessageBoxScript2

MessageBox.Show(Dts.Variables[“strFileName”].Value.ToString());

Dts. TaskResult = (int)ScriptResults.Success;

SSIS Performance Tuning Webinar–Q & A

A great webinar to follow up my Intro to SSIS webinar

Mitchellsql

Thank you everyone for attending my webinar on SSIS Performance Tuning, if you missed that presentation you can watch it here: http://pragmaticworks.com/LearningCenter/FreeTrainingWebinars/PastWebinars.aspx?ResourceId=683

Below are some of the questions that I received from the Webinar:

Q: Can you give some performance tips in using script tasks too?

A: Yea, don’t use them! Script tasks can be really bad for performance and many times I have been able to replace someone else’s code with native SSIS components. For example, I had situation where the client was using some very complicated .net code to parse out columns based on logic. The package was taking over an hour to process 1 million records. I replaced this with some conditional splits and derived columns and it now runs in 3 minutes.

Q: I am assuming that the file formats must be the same for all files when using the MultiFlatFile transform, correct?

A: You are…

View original post 272 more words

SSIS Execute SQL Task…How it’s used

A great blog on a frequently used component in SSIS

kwoodstechtalk

pic1

Recently, I received a request from a user to explain the purpose of the Execute SQL Task and how it’s used.  Though this is not a Task Factory component it is still very useful! Task Factory has proven time and time again to be a valuable source for business solutions, there are times when the standard components of SSIS are needed in concordance with Task Factory to address your business solution needs.

There will be times when a user will need to utilize a Foreach Loop Container to run a series of events inside of Task Factory components (as is the case in my Video Blog on the Advanced Email and SMS Task from Task Factory).  A user may need to use a Merge Join to join tables together within a project when using a TF Data Flow Nugget.  Task Factory offers an unbelievable amount functionality and adds an incredible…

View original post 319 more words

Just because its called the Sort Transform doesn’t mean you need it to sort! Just say no to this blocking transform!

Hello and good day to everyone! Today I wanted to take the time to talk about the fully-blocking, asynchronous Sort Transform in SSIS. The purpose of this transformation is to ensure that your data is flowing through the constraints with a certain order determined by you. Here is what it looks like:

SortSettings

Why is this a problem you ask? What the transform is accomplishing is not bad but how it does this is the problem. Because this component is asynchronous it uses a new buffer for the data coming into it. This means that it will wait until all the data coming from the components preceding the Sort Transform and then release it, to move onto other parts of the Data Flow. So if you are pulling 1 million records from a SQL Server table and then have a Sort Transform, it will wait until all 1 million rows are present and then release them in batches (usually 10,000 rows) down the rest of the package, as shown below:

SortTransform

This only occurs with asynchronous components (yes, there are others!), normally the records would process through whatever transform in batches. Do not worry though, this is not a blog to merely gripe about blocking transforms, I will show you how to accomplish this same goal without using the Sort Transform. This solution should help out with performance tuning on packages of any size. Next comes the easy steps to get it done:

1. IMMEDIATELY DELETE THE SORT TRANSFORM!

2. Open the OLE DB source and switch the Data access mode to “SQL commad”:OLEDBsource

3. Once this is done, you will need to use T-SQL to select which columns you would like from the appropriate table. The important item in this step is that you will want to setup an ORDER statement by the necessary column (the identity column most likely)

**Here is an example from the AdventureWorks 2012 Database using the Production.Product table**

SQLCommand

4. Once this SQL statement is set the data is now sorted but we need to mark it as such for SSIS to recognize this sorting. To do this, right-click that same OLE DB source and select the Show Advanced Editor option

ShowAdvancedEditor

5. You will now be presented with a window that has four tabs and we will be selecting the Input and Output Properties tab. There should only be two option available to you and you will want to highlight the OLE DB Source Output choice and its properties will be populated in the window to the right. The property we are going to manipulate is the IsSorted option and you will want to set this option to True, this is how we tell SSIS about the sorting.

SourceOutput

6. Now we need to indicate by which column we are sorting by. Inside the same window as the previous step expand the OLE DB Source Output item to reveal two options ; External Columns and Output Columns. Here we will be expanding the Output Columns options and be presented with the entire list of columns inside of our source. Simply locate the column that you selected in your ORDER BY clause and highlight it, its properties will appear in the window to the right. Locate the field labeled SortKeyPosition and input the value of 1. It should be known that there may be times when you ORDER BY more than one column. In this advanced editor you will just want to go to each column you Ordered by and place a value according to its priority with 1 being the highest moving downwards to 2 and so forth. So if the SQL statement had an order statement as follows: ORDER BY ProductID, SalesID. The ProductID would have a value of 1 placed in the SortKeyPosition field while the SalesID column would have the value of 2.

SortKeyPosition

Once this is done you can hit okay and everything is set! Although this method is a bit longer than simply adding a component and checking a box this solution will provide a huge boost to package performance when dealing with larger data sets. If you are pulling down less the 100,000 rows you can really use either of the two options for sorting. I do hope that this walk-though was helpful and you see good performance gains for your SSIS packages!