Partial Caching of DataFrame by Vertical and Horizontal Partitioning

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/PartialCachingByVerticallyAndHorizontallyPartitioningDataFrame.ipynb In many Spark applications, performance benefit is obtained from caching the data if reused several times in the applications instead of reading them each time from persistent storage. However, there can be situations when the entire data cannot be cached in…

0

Appending an Index Column to Distributed DataFrame based on another Column with Non-unique Entries

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/AddIndexColumnToDataFrame.ipynb In many Spark applications a common user scenario is to add an index column to each row of a Distributed DataFrame (DDF) during data preparation or data transformation stages. This blog describes one of the most common variations of this scenario…

0

Saving Spark Distributed Data Frame (DDF) To PowerBI

The sample Jupyter Scala notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Scala/SparkDataFrameToPowerBI.ipynb. Spark PowerBI connector source code is available at https://github.com/hdinsight/spark-powerbi-connector. Data visualization is often the most important part of data processing as it can surface up data patterns and trends in data that cannot be otherwise easily perceptible by humans. PowerBI (https://powerbi.microsoft.com/en-us/)…

0

PySpark: Appending columns to DataFrame when DataFrame.withColumn cannot be used

The sample Jupyter Python notebook described in this blog can be downloaded from https://github.com/hdinsight/spark-jupyter-notebooks/blob/master/Python/AppendDataFrameColumn.ipynb In many Spark applications, there are common use cases in which columns derived from one or more existing columns in a DataFrame are appended during the data preparation or data transformation stages. DataFrame provides a convenient method of form DataFrame.withColumn([string] columnName,…

0