MIS 650 Grand Canyon University User Defined Aggregate Functions Responses
Response 1: David Lundholm
Within R, aggregate functions split data into subsets and computes a summary of statistics for each subset in an easy-to-read formant. I used the below steps to create a connect with SQL and establish a connection to the AdventureWork database to form a data frame. Using the aggregate function to return a subtotal (sum) for aggregated columns Color and Quantity. The results will show the total inventory by color in stock.
library(odbc)
# Create a connection object
con <- DBI::dbConnect(odbc::odbc(),
Driver=”ODBC Driver 17 for SQL Server”,
Server=”DESKTOP-G2UGCQR”,
Database=”AdventureWorks2017″,
UID = “dave1”,
PWD = “hercules24”)
# Create query
Example1 = dbGetQuery(con,
“Select P.Color, I.Quantity
From Production.Product P
INNER JOIN Production.ProductInventory I
On P.ProductID=I.ProductID”)
# aggregate of product by color
total_stock_color <- aggregate(Color$Quantity, by =list(Color$ProductID), FUN = sum)
#Create column headers
colnames(total_stock_color)<-c(“ProductID”,”Quantity”)
#Print Results
total_stock_color
Response 2: Tyler Klingler
In the example below, I calculated the total sales-driven-revenue for each year (from 2011 to 2014). This information and similar information can be incredibly beneficial to a company so I definitely believe that the aggregate function is a great one to learn.
#Load libraries
pacman::p_load(“dplyr”, “odbc”, “openxlsx”, “lubridate”)
#Connect to the database
con = DBI::dbConnect(odbc::odbc(), driver=”SQL Server”, server=”DESKTOP-VBB26S1\GCUSQLSERVER”, Database=”AdventureWorks2016″, Trusted_Connection=”True”)
#Import query exactly as written in SQL
query <- dbGetQuery(con, “Select AdventureWorks2016.Sales.SalesOrderDetail.SalesOrderID, AdventureWorks2016.Sales.SalesOrderDetail.OrderQty,
AdventureWorks2016.Sales.SalesOrderDetail.ProductID, AdventureWorks2016.Sales.SalesOrderDetail.UnitPrice,
AdventureWorks2016.Sales.SalesOrderHeader.SubTotal, AdventureWorks2016.Sales.SalesOrderHeader.TaxAmt,
AdventureWorks2016.Sales.SalesOrderHeader.Freight, AdventureWorks2016.Sales.SalesOrderHeader.TotalDue,
AdventureWorks2016.Sales.SalesOrderHeader.OrderDate
From Sales.SalesOrderDetail
Inner Join Sales.SalesOrderHeader
On Sales.SalesOrderHeader.SalesOrderID = Sales.SalesOrderDetail.SalesOrderID”)
#aggregate -> Column you want to group by, column you want aggregated, aggregate function
total_sales_by_year <- aggregate(query$TotalDue, by = list(year(query$OrderDate)), sum)
#rename each column for readability
colnames(total_sales_by_year) <- c(“Year”, “Total Sales Revenue”)
#write to an xlsx file to open up in Excel
write.xlsx(total_sales_by_year, “TotalSalesByYear.xlsx”)
Response 3: Arcelia Rael
DataScience(n.d.) writes that the aggregate function works like the “Group By” statement in SQL in that it allows the aggregation of data to compute statistical analysis. In the R sample below, we pull our dataset from the AdventureWorks 2019 database using the dbGetQuery function. We then create a truncated version of the dataset to remove product names from the calculations. Using the aggregate function with the method argument equal to “max”, we return the max unit price for each product.
library(odbc)
#create connection
con = DBI::dbConnect(odbc::odbc(),
Driver=”SQL Server”,
Server=”DESKTOP-64VBUAO”,
Database=”AdventureWorks2019″,
Trusted_Connection=”True”)
#create dataframe using our connection to SQL Server
q2 = dbGetQuery(con,
” SELECT
Name,
UnitPrice
FROM [AdventureWorks2019].[Sales].[SalesOrderDetail] sod
INNER JOIN [AdventureWorks2019].[Sales].[SalesOrderHeader] soh
ON soh.SalesOrderID = sod.SalesOrderDetailID
INNER JOIN [AdventureWorks2019].[Production].[Product] p
ON p.ProductID = sod.ProductID”)
#aggregate must only have numerics, so we take out the variable name
q2_num = q2[-1]
str(q2_num)
#we use aggregate function to calculate max of q2_num data frame variables
q2Agg= aggregate(q2_num,
by = list(q2$Name),
FUN = max)
#we rename the column headers
colnames(q2Agg) <- c(“Product Name”, “Max Unit Price”)
#we print the output
q2Agg
References
DataScience. (n.d.). Aggregate() function in R. DataScience Made Simple. Retrieved August 19, 2021, from https://www.datasciencemadesimple.com/aggregate-function-in-r/
- TK