Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 16 Next »

Excel Power Query is amazing, and powerful, and a fabulous tool for anyone serious about getting data in or out of Salesforce. 

Unfortunately if you use a OSx you are out of luck. Try installing windows in a VM just to get Power Query goodness. 

I will go through a few scenarios with examples and screen shots of how to use Power Query simply. 

There is so much you can do with Power Query, and it's related products Power Pivot and Power BI, but I am only scratching the surface.

Download

Power Query is a free download from Microsoft for Office 2010 and 2013 running on Windows 7 plus. I am running Office 2010 on Windows 7 64 bit at the moment and it is working great. 

Download it here https://www.microsoft.com/en-au/download/details.aspx?id=39379

Note, it does get updated often, so if you have downloaded it previously, then re-download it again. 

Introduction

See this video

This was announced at Dreamforce '14 http://blogs.msdn.com/b/powerbi/archive/2014/10/13/announcing-power-query-support-for-salesforce-com.aspx

But wait, that's not all...

Power BI for Office 365 and Excel integrations with Salesforce. With these new integrations, customers will be able to bi-directionally load data to Salesforce and Excel to build reports, visualize information and discover new insights. Salesforce integration with Power Query for Excel is in preview now, with general availability anticipated for the first half of 2015. Power BI integration with Salesforce is anticipated for the first half of 2015. A Salesforce app for Excel is anticipated for the second half of 2015.

From http://www.salesforce.com/company/microsoft-faq.jsp

Wait, where is this bi-directionality? How? Why? Do we know anything about Salesforce for Excel yet? #askforce on twitter doesn't know about it, so it will be interesting to see. Hopefully it will overcome the 2000 row limit.  

What I love about Power Query

  • It's almost like my first programming love - Excel 4 macro language (XLM). This is the programming language I taught myself on, which is probably why I'm not much of a coder now. http://www.brighthub.com/computing/windows-platform/articles/84345.aspx
  • You just do things in the workbook and it records what you are doing
  • You can go back and edit earlier steps
  • You can rename steps to document as you go
  • You can insert new steps
  • You can do very powerful functions or very simple things. 
  • You can repeat and repeat over and over again without losing the source data. It is so cool!
  • You can run Salesforce reports right in Excel! Whenever you open up the spreadsheet there it is - your latest data * (caveats apply, as per usual). 

Uses

  • When doing a major import into Salesforce, you will want to do more than one import - eg one into the sandbox, and one into live. The data will change in that time frame. Power Query allows you to clean up the data once, then when the new export is done just before go-live you can run the power query again to format the new data. 
  • You can practice practice practice and get more and more sophisticated with your data cleansing
  • You should be able to do a good dedupe before importing the main data. 

Import

Here's the scenario. (This is a real scenario, but I'm using sample data to protect the innocent). 

I have a heap of data I need to import into Salesforce, and it is coming from another horrible database that has no data validation and has suffered years of neglect. The current situation is that they need to import the data into Salesforce on a weekly basis and import ALL the data because their database has no way of exporting just the new or updated records (I know, right??!?!? - People who work on these sort of databases need to reevaluate their life choices, I reckon).

The data horrors that need to be cleaned in this data set include:

  • Deleting unneeded columns
  • Extracting the year from the date field because the dates are not entered in a consistent way
  • Merging 3 columns into one separated by semicolons so they go into a multi select picklist. 
  • Making all the ID numbers the same length by prepending zeros
  • Data consistency eg Female entered as Female, female, f, F or fem.
  • V-Lookups to replace data
  • Combine the area code and the phone number into one field and attempt to format it nicely. 
  • Ensure all the states are entered as abbreviations rather than full words
  • Ensure Countries are entered as Australia and not Aus
  • Turn yes and no values into true and false (or 1 and 0)
  • Remove duplicate records
  • Remove test or dummy data (entered with an ID number beginning with Z - at least it is easy to find). 

Things you could also do:

  • Check for valid email addresses using regex
  • Try to turn phone numbers into international format (though this is difficult and would only be useful if you have a limited set of data - eg you know the majority of your entries are Australian).
  • Merge two lists into one. 
  • Separate out the data into separate import files for Accounts and Contacts. (I'm using Skyvva so I don't need to do that in this instance). 

So, all of this would take around 20 mins to do by hand every week. With Power Query it takes about 1 minute, and that's just to paste in the data from the downloaded spreadsheet, and run the query. 

Sample Data

I'm using the Sample Data I found for a temporal data set of employees. This is a data set that contains the full history of employee data over time. In this scenario we want to bring the employees in as contacts (yes, I know, that might not be what you would do in real life), and the old historic data will stay in the old system (again, may not happen in real life). 

The sample data - we can see that the start and end date appear for every data point. We will simplify this data to produce just a list of employees, their departments. 

Steps

Data Cleansing

  • Open the Power Query Ribbon
  • Click From Table
  • First we want to get rid of the data we don't need - so all the tend and start columns
  • Just go right ahead and delete the columns - don't be afraid, you won't damage anything!
    • Right Click on the column heading > Remove. 
  • That looks better already
  • Now see the side panel to see what it has done. 
    • It's a good idea to rename Removed Columns to something more descriptive. 
    • eg Removed date columns
  • Now Change M and F to Male and Female
    • Right click on the sex column 
      • Replace Values
      • Value to Find = M
      • Replace with = Male
      • Match entire cell contents = true
    • Oops, I made a mistake - I typed replace M with F. Ok that's easy. 
      • Just click the X next to Replaced Value in the panel and start again
    • Oops, another mistake - I typed Femals rater than Female. Easy. 
      • Just click the cog next to the action in the side panel and make the adjustment
    • Again rename your labels in the side panel
    • Now, if you have more than a few values you won't use find ad replace. 
  • Notice how some steps get combined (eg all the deletions) and some steps get added as each one (eg the replacements).
  • At this stage look at the advanced editor so you can see what is happening
    • Click Advanced Editor in the ribbon. 
    • You can edit this, but beware it does get confusing with a big data set. 
  • Now, let's format the dates
    • The date format depends on which data loader you use to do the Import.
    • We will use MM\DD\YYYY  
  • Right click on birthdate
    • Choose Change Type > Date > By Locale
    • Choose Date then English (United States)
  • Do the same for Hiredate
  • Remove Salary since Salesforce is not our payroll system
  • OK, things are looking pretty good now, it may be time to remove the duplicates. 
  • You can remove the duplicates based on one or more columns https://support.office.com/en-nz/article/Remove-duplicates-Power-Query-d9cffc69-dc5d-4d94-8b66-72779688874d?ui=en-US&rs=en-NZ&ad=NZ
    • In our case we will right click on EmpNo. and choose remove Duplicates.
  • At this stage we will close and load and see what we've got. 
  • Oops! We have lost our Title and Department data. 
    • Right click on Table 1 in the side panel and click Edit
    • Remove the Removed Duplicates action
    • Phew! no data lost! 
  • Now highlight empno deptno and title and choose remove duplicates. 
  • Easy! Our data now looks like this

  • Close, but no cigar!
    • We still have duplicate titles (as it's sample data, I don't know which one is correct) 
    • And the department is not on the same row as the titles. 
  • On deptno and title right click and fill down. 
  • Now we can remove the duplicates again, but better still, why don't we move the fill down action before the remove duplicates action
  • That looks better. 
  • Now, I'm just going to cheat here and assume that the first row for each is the correct one, so I will now remove duplicates on just the empno column. 

  • Does this look like something that can be imported into Salesforce yet? Nope, not net. 

VLookup 

The next step is to replace the deptno with the Salesforce ID of the lookup to the Departments object in Salesforce. 

I have a departments spreadsheet 

yes, this is made up data and they are lead ID's. 

As this is a simple vlookup we are going to use the merge function https://support.office.com/en-nz/article/Merge-queries-Power-Query-fd157620-5470-4c0f-b132-7ca2616d17f9?ui=en-US&rs=en-NZ&ad=NZ but see the articles below for a vlookup on steroids example. 

  • Add a new source - the Departments file.
  • Back on the Employees table choose Merge Queries in the ribbon
  • Choose Departments, then choose the deptno and DepartmentCode as the matching columns. Click Ok
  • Expand the Table 
    • Choose only the DepartmentID and to not append the table name
  • Voila! The Department ID is there. 
  • Now remove the column for Deptno. 

Getting ready for Salesforce

Now we need to add the data in for the salesforce import. 

  • Add Column > Add Custom Column
    • Name = OwnerID 
    • Formula ="005U0000000QNtS" (my owner ID). 
  • Finally, rename the columns (It's up to you if you rename the columns first or last - depending on how bad your data is). 
    • Right Click on each column > Rename
    • Rename them to match the API names of your fields in Salesforce - it makes it easier to add to map in the dataloader. 
  • Close & Load

Our data ready for Salesforce import

yeah, I know, the dates are not US format. I have to work out what is happening there. 

Now we can replicate that with the other data sources, or combine the data sources first to handle them all together. 

Export

Again a real scenario. My client has KPIs on number of cases answered per month. They want to do a chart to present to management overlaying KPIs with the count of Cases for the year, by quarter. How can you do that in Salesforce. It is virtually impossible (Well, there may be a big hack way of doing it, but I have not found one. The new Charts from Apsona may be able to do it http://apsona.com/pages/sfdc/apsona-charting.html). 

I ended up using http://www.klipfolio.com/ which is a brilliant brilliant, and cheap solution. But Klipfolio has a very high learning curve and is not for your average punter. 

I want to replicate this in Excel using Power Query. 

Steps

  • On the Power Query ribbon click From Other Sources
  • From Salesforce Reports
  • Sign into Salesforce
  • The list of reports will be there. 
  • Open the report you want to look at. 

You can also do this for Objects

  • Reports have a limit of 2000 rows. Objects don't
  • You can't do SOQL queries (Or can you? Maybe you can). 

Technical

Getting an object from Salesforce does a REST query on the Salesforce database. 

https://na12.salesforce.com/services/data/v29.0/sobjects/Contact/describe

Notice that is uses v29.0 not v33.0 that we are (as of today) up to. 

You can enter that query into the rest explorer in workbench to see what it does. 

So technically you think you could do custom SOQL queries via the rest API. I'm sure there is a way to do it but the articles that I've seen only relate to non authenticated APIs. 

Issues

  • Of course you can only use this if you have the APIs enabled (from Professional Edition, if you pay, up to Enterprise, Developer and Ultimate). 
  • Beware of security. DO NOT LOG INTO EXCEL AS AN ADMINISTRATOR! All of your data is then visible in Excel by anyone who opens up your PC. Use an Integration account that specifically only has read only access to the data you need it to have access to. 
  • The user remains logged in when you close excel, so it is logged in next time you open excel. 
  • Attachment object is not supported. 
  • Of course, this is one of my biggest issues - it keeps people reliant on Excel rather than trusting and using the data in Salesforce. 

Next Steps

Please comment below as to how far you have taken Power Query. 

If you are a visual person, you might like this series of YouTube videos https://www.youtube.com/playlist?list=PLrRPvpgDmw0m3ohSvgwoHvd0KO8QsQdiK

Or if you like text, then see this series of articles http://markvsql.com/category/decathlon/

I haven't covered functions - eg simple excel functions like IF and CONCATENATE and they are really basic - exactly like excel but different syntax (TIP: Case is important!). See the article about functions below. 

Then there is another whole area of coded functions that I haven't even looked into yet. 

Just search for Excel Power Query X Function to find what you want. Oh, and if you find a hyperlink function, let me know in the comments! 

And then there is the whole other layer of Power BI. That is for another page later. 

 

  • No labels