Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
297 views
in Technique[技术] by (71.8m points)

Extract Javascript Variables from Scraped HTML Page using PHP (Regex)

I'm stuggling to extract Javascript variables from a scraped webpage's HTML dump.

Currently using this regex

    $re = '/window.universal_variables*=s*{(.*?)}/ms';

but it only shows me the first set of values. I'm basically trying to get all variables and values that are under product (i.e. id, product_id, sku, etc)

    <script type="text/javascript">
window.universal_variable = {
page: {
category: "product" ,
searchTerm: "sony",
environment: "production",
variation: "production",
revision: "1.1"
},
user: {
otb: "",
ATG_FO_IND: "A",
ooops_preference: "false",
registered_today: false,
registration_date: "",
registered_in_current_session: false,idv_verified: true,
last_order_date: "",
start_date: "",
first_order: false,returning: false,
last_transaction_payment_type: "",
unicaSegment: "",
targetedPromos :"",
cva:"0",
cvb:"1", 
cvc:""
}// end of user
,
product:{
id: "KEN6C",
product_id: "prod1086433641",
sku: "KEN6C",
manufacturer: "",
category: "Televisions",
category_facet: "4740",
department: "Electricals",
subcategory: "electricals_televisions",
currency: "GBP",
unit_price: "",
unit_sale_price: "319.0",
rating: "4.3",
ratingCount: "2048"
}// end of product
}// end of window.universal_variable
window.sdgGA = {
environment: "production",
device: "desktop",
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.70 Safari/537.36",
currency: "GBP",
page: {
PID: "test : PRODUCT",
loggedInState: "not logged in",
category:"product",
customerStatus: "new"
},

</script>

Any suggestions?


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Rather than trying to do it with a regex, which would be extremely brittle, I would suggest using a transpiler such as this one. I tested it on your example code and it worked great.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...