Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.2k views
in Technique[技术] by (71.8m points)

apache pig - Human readable String date converted to date using Pig?

I have the following human readable date formats stored in a text file:

Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014

How can I convert the dates using so they are compatible with Pig's ToDate() function where I can then use GetHour(), GetYear(), GetDay() and GetMonth() to apply date range constraints and logic to my queries?

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

1.Pig support only few formats of date, so you need to convert your date and time according to any one of the below format.
http://docs.oracle.com/javase/6/docs/api/java/text/SimpleDateFormat.html
Time Format

2.Your input has BST as timezone but in pig BST is not supported, so you need to choose a different timezone which is equivalent to BST.
Timezones are available here http://joda-time.sourceforge.net/timezones.html

Examples:

  1. I chosen time format as "EEE, d MMM yyyy HH:mm:ss Z" Wed, 4 Jul 2001 12:08:56", bcoz this is somewhat matching with your input data.
  2. BST time zone is not available, so i chosen 'GMT' as time zone, you can change according to your need.

input.txt

Wed Oct 15 09:26:09 BST 2014
Wed Oct 15 19:26:09 BST 2014
Wed Oct 18 08:26:09 BST 2014
Wed Oct 23 10:26:09 BST 2014
Sun Oct 05 09:26:09 BST 2014
Wed Nov 20 19:26:09 BST 2014

PigScript:

A = LOAD 'input.txt' USING PigStorage(' ') AS(day:chararray,month:chararray,date:chararray,time:chararray,tzone:chararray,year:chararray);
B = FOREACH A GENERATE CONCAT(CONCAT(CONCAT(CONCAT(day,', ',date),' ',month),' ',year),' ',time) AS mytime;
C = FOREACH B GENERATE ToDate(mytime,'EEE, d MMM yyyy HH:mm:ss','GMT') AS newTime;
D = FOREACH C GENERATE GetMonth(newTime),GetDay(newTime),GetYear(newTime),GetHour(newTime),GetMinute(newTime);
DUMP D;

Output:

(10,15,2014,9,26)
(10,15,2014,19,26)
(10,15,2014,8,26)
(10,22,2014,10,26)
(10,5,2014,9,26)
(11,19,2014,19,26)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...