One of the common questions our team gets is how to output a Hive table to CSV. Hive does not provide a direct method to use the query language to dump to a file as CSV. Using the command
INSERT OVERWRITE will output the table as TSV. We then have to manually convert it to a CSV.
Let’s take an example to understand this better.
Every HDInsight cluster comes with
hivesampletable as a default sample table on which you can perform basic Hive query operations. If you run the following command in the Hive console, you will see the output of the table printed on the screen.
SELECT * FROM hivesampletable;
If you want to output this to a CSV, first step is we will convert this to a TSV using the
INSERT OVERWRITE command as shown below. Please replace
accountname with your container name and blob store name of your Azure Blob.
INSERT OVERWRITE DIRECTORY 'wasb://firstname.lastname@example.org/outputdir/' SELECT * FROM hivesampletable;
This will generate a file names 000000_0 in TSV format. Next, you can either download this file locally and convert it to CSV locally or log in to the Headnode and then convert it to CSV using the code snippet shown below:
hdfs dfs -cat /outputdir/000000_0 |tr "1" "," >>output.csv
Hope this helps you. If you have a better way to output as CSV, feel free to share it with us in the comments section.