regex - Creating structured hive table with unstructured GPS packets in csv format -
I have a CSV file as outlined below.
VTS, 51,0071,9739 9 65515, nm, gp, inf 01, v, 19, 7 9, 19, 291014, 0000.0000, n, 00000.0000, e, 07 AE VTS, 01.0097 9 7 9 9 65515, SP, GP, 18,07,2253, V, 0000.0000, N, 00000.0000, e, 0.0,0.0,291014,0000,00,4000,11,999,169, B205 VTS, 51, 0071.9739965515, NM, GP, INF0l, V, 18,072311,291014,0000.0000, N, 00000.0000, e, C24E VTS, 01, 0097.9739965515, NM, GP, 19.072311 , V, 0,000.0000, N, 00,000.0000, E, 0.0,0.0,291014,0000,00,4000,11,999,171, B358 VTS, 51,0071,9739965515, knots away, GP, INF01, V, 18,072319,291014,0000.0000, N, 00,000.0000, E, 012F VTS, 51,0071,9739965515, Mono miles away, GP, INF01, V, 19,072326,291014,0000.0000, N, 00,000.0000, E, B2E6 VTS, 0170097, 79, 965515, nm, gp, 18,072326, V, 0000.0000, N, 00000.0000, e, 0.0,0.0,291014,0000,00,4000,11,999,173, EAA 00 VTS, 5l,007l 9 739 9 65 515 nm, GP, INF 01, V, 18,072333,291014, 0000.0000, N, 00000.0000, e 98 9 6 VTS, 5l,007l 9 7 9 9 65 515 nm, GP, INF 01, V, 18,072340,291014,0000.0000, N, 00000.0000, e, 9 B 23
it is to be mapped with the field:
Pkt_header, Gprs_pkt_id, Pkt_length, Sim_no, Msg_id, Gprs_pkt, Gsm_sig_strength, Utc_time, Pkt_validation, latitude, Direction_n_s, longitude, direction_e_w, speed, track_angle, u Tc_date, fuel_adc_values, ignition, odometer_values, supply_int, battery_adc, pkt_id, check_sum
Valid packets are shown in the second field ie GPRS_PKT_ID value 01. The case I used is filtering the CSV data only for valid packets, for which I am using regex, but I am not able to get the full data. Any help would be greatly appreciated
The used hive query is shown below.
Create external table sky_track_testing1 (pkt_header STRING, gprs_pkt_id STRING, pkt_length STRING, sim_no STRING, msg_id STRING, gprs_pkt STRING, gsm_sig_strength STRING, utc_time STRING, pkt_validation STRING, Latitude STRING, STRING longitude STRING, direction_e_w STRING, speed STRING, track_angle STRING, utc_date STRING, fuel_adc_values STRING, ignition STRING, odometer_values STRING, supply_int STRING, battery_adc STRING, pkt_id STRING, check_sum STRING direction_n_s) row format SERDE 'org.apache.hadoop.hive.contrib .serde2.RegexSerDe with SERDEPROPERTIES ("input.regex" = "^ (VTS, 01). * $") Stored as textfile location '/ user / root / sky_track';
This is definitely a wrong query Please help me.
I recommend that you use it for:
A = Load '/ user / root / sky_track' (pkt_header, gprs_pkt_id, pkt_length, sim_no, msg_id, gprs_pkt, gsm_sig_strength, utc_time, pkt_validation, latitude, direction_n_, longitude, direction_a_va, speed, track_elegal, utc_date, fuel_adc_value, ignition, Odometer_value, supply_int, battery_adc, pkt_id, check_sum); B = Filter gprs_pkt_id == '01'; Store 'b' / user / root / sky_track_valid ';
Comments
Post a Comment