bash - awk multiple passes on each line -

- June 15, 2015

I have to process some large files and run multiple tests on each line. I am currently using awk to run the test individually and I am using the "while-read-line-do" loop to pass each line in more than a dozen awk commands, To validate their content and to test the errors, the lines passing all the tests have been added to the .VALID file.

The problem is that I am currently in the process that the process is very slow around the web and after reading many other posts on the stack overflow, what I collect is that the main culprit "when the read-line Is a "loop" which does not lend memory to files (about 100K lines each) in memory.

I was hoping someone could help me understand a better way to implement things so that I could get an awk-like performance. Here's my simplified version of code:

  line while reading || [[-n "$ line"]]; Echo $ line Awk -F \; '{If (($ 3! = "P") & amp; amp; ($ 3! = "0")) {Print $ 0 "; ERROR;" & Gt; & Gt; "INPUT_FILE.ERRORS"}; Other print $ 0 & gt; & Gt; "INPUT_FILE.OK"; } 'Echo $ line | Awk -F \; '{If (($ 7> 10) || ($ 7 & 3;)} {Print $ 0 "; ERROR;" & Gt; & Gt; "INPUT_FILE.ERRORS"}; Other print $ 0 & gt; & Gt; "INPUT_FILE.OK"; } 'Echo $ line | Awk -F \; '{If (($ 36 & lt; 0)} ($ 36>, 1000)) {Print $ 0 "; ERROR;" & Gt; & Gt; "INPUT_FILE.ERRORS"}; Other print $ 0 & gt; & Gt; "INPUT_FILE.OK"; } 'Done & lt; INPUT_FILE.txt

Ideally I'm trying to come up with a solution that gives me multiple passes per line using an awk-based loop.

Thanks in advance.

Absolutely not pass lines on awk after one; Awk processes file lines by line on your behalf. The code in your reply can be reduced to:

  awk -F \; ($ 3! = "P" and $ 3! = "0") || ($ 7 & 10; $ 10 || $ 7 & gt; 3) || ($ 36 & lt; 0 = $ 36> 1000) {Print $ 0 "; Error;" & Gt; & Gt; "INPUT_FILE.ERRORS"; Next} {Print & gt; & Gt; "INPUT_FILE.OK"} 'INPUT_FILE.txt

I suspect it will be very fast.

The structure of an awk program is condition {action} , so it's hardly the case that you if / else Instead, you can use the if branch in next , which means that the awk will move to the next line instead of the second block .

The output will be slightly different because the lines failing more than one test will not be duplicated in the error log. I think it was okay, because the output was the same for each of your checks.

For further improvement in performance, you can consider arranging trials in the order of equality, because this will mean that the possibility of having a short circuit is high.

Note that awk in Shell, & gt; and & gt; & Gt; has different meanings. & gt; means that awk creates a new file for the first time and adds it on continuous typing, so you may want to use it. If the file does not already exist, then it really does not matter.

As noted in the observations, it seems that $ 7 & lt; 10 || $ 7 & gt; There is a logical error with; 3 because it is always true, maybe you have & gt; and & lt; Found?

If you want to write separate output for each error, then you can change the structure slightly to do something like this:

  awk -F \; '$ F = 0} $ 3! = "P" & amp; $ 3! = "0" {Print $ 0 "; Error;" & Gt; & Gt; "INPUT_FILE.ERRORS"; F = 1} $ 7 & lt; 3 || $ 7 & gt; 10 {Print $ 0 "; Error;" & Gt; & Gt; "INPUT_FILE.ERRORS"; F = 1} $ 36 & lt; 0 || $ 36 & gt; 1000 {Print $ 0 "; ERROR;" & Gt; & Gt; "INPUT_FILE.ERRORS"; F = 1}! F {Print & gt; & Gt; "INPUT_FILE.OK"} 'INPUT_FILE.txt

  Each test is made separately and if any test is correct then  f  is set to true. If the  f  is still incorrect after all the tests on the line, then it is printed in the correct file. I also changed my second exam so that it is not always right.




















Get link





Facebook





X





Pinterest





Email





Other Apps




Comments





Post a Comment



Popular posts from this blog




apache - 504 Gateway Time-out The server didn't respond in time. How to
fix it? -



-



May 15, 2013








    Using a form submission on an embedded  iframe , the customer downloads a compressed log file Requested. The request was sent to the server, which is the compressed log files, perform some database operations and returns a compressed file.   After just 2 minutes,  504 gateway time-out server did not respond timely  message on browser net panel How to fix this error?      The log files were taking a long time to compress, and timeout was set to 2min   The error was fixed by extending the file file:    # # timeout: The number of seconds before getting the time out. # # # Timeout 120 timeout 600      





Read more





c# - .net WebSocket: CloseOutputAsync vs CloseAsync -



-



July 15, 2014








    We have a working ASP.NET Web API REST service, which is one of the methods of our controller, HTTPTTEX. .)   Socket handler code looks something like this ...    Public async task socket handler (AspNetWebSocketContext context) {_webSocket = context.WebSocket; ... while (! Cts.IsCancellationRequested) {WebSocketReceiveResult Results = _webSocket.ReceiveAsync (Input Segment, cts.Token) .Result; WebSocketsStateCollusocketState = _webSocket.State; If (result.MessageType == WebSocketMessageType.Close || currentSocketState == WebSocketState.CloseReceived) {// What should I use. CloseAysnc () or. CloseOutputAsync ()? _webSocket.CloseOutputAsync (WebSocketCloseStatus.NormalClosure, "Client Requested", cts.Token). Wait (); } If (currentSocketState == WebSocketState.Open) {...}}}    .What is the difference between .CooseAsync () and CloseOutputAysnc ()? I tried both of them and they both seemed to work fine but some difference should be the same they both describe very similar to...





Read more





c++ - How to properly scale qgroupbox title with stylesheet for high
resolution display? -



-



January 15, 2013








    I am trying to apply a stylesheet for QGroupBox (QT4.8), which works on the normal screen ( 96 dpi) high resolution screen (Yoga 2 Pro, 3200x1800, 275 dpi, windows 8.1).   The following style looks good on the 275 dpi screen, but the top margins on a regular 96 dpi screen are far too big.    QGroupBox {border: 1px solid red; Range radius: 7px; Margin-top: 12x; } QGroupBox :: Title {subcontrol-origin: margin; Subcontrol-position: left above; Padding-left: 10px; Padding-right: 10px; }    Changing the top-margin has an effect, but I did not get a proper setting which works on both screens. If I shorten the value, the content of the group box overlaps with the title on 275 dpi screen. I was also playing with units "East", "PX", "MX", "PT". Reading the document I would have guessed, "2 X" was the correct solution, which should be scaled with different screen resolutions.   Without the stylesheet, the groupbox works well on both screens. ...





Read more

Search This Blog

Updating

bash - awk multiple passes on each line -

Comments

Post a Comment

Popular posts from this blog

apache - 504 Gateway Time-out The server didn't respond in time. How to fix it? -

c# - .net WebSocket: CloseOutputAsync vs CloseAsync -

c++ - How to properly scale qgroupbox title with stylesheet for high resolution display? -