Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
253 views
in Technique[技术] by (71.8m points)

How to creat a pyspark DataFrame inside of a loop?

How to creat a pyspark DataFrame inside of a loop? In this loop in each iterate I am printing 2 values print(a1,a2). now I want to store all these value in a pyspark dataframe.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Initially, before the loop, you could create an empty dataframe with your preferred schema. Then, create a new df for each loop with the same schema and union it with your original dataframe. Refer the code below.

from pyspark.sql import SparkSession
from pyspark.sql.types import StructType,StructField, StringType

spark = SparkSession.builder.getOrCreate()

schema = StructType([
  StructField('a1', StringType(), True),
  StructField('a2', StringType(), True)
  ])

df = spark.createDataFrame([],schema)

for i in range(1,5):
    a1 = i
    a2 = i+1
    newRow = spark.createDataFrame([(a1,a2)], schema)
    df = df.union(newRow)

print(df.show())

This gives me the below result where the values are appended to the df in each loop.

+---+---+
| a1| a2|
+---+---+
|  1|  2|
|  2|  3|
|  3|  4|
|  4|  5|
+---+---+

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...