Thanks Harry! It worked. However if I run this in Debug mode (IDE) the program exits with
this exception. Is this significant or can be ignored in production environments?
Cheers,
Pratik
-----Original Message-----
From: Hyperscan <hyperscan-bounces(a)lists.01.org> On Behalf Of Chang, Harry
Sent: 22 March 2019 11:59
To: hyperscan(a)lists.01.org
Subject: Re: [Hyperscan] Hyperscan Digest, Vol 33, Issue 3
Hi Pratik,
These exceptions are coming from an internal literal parsing function, non-literal
patterns will throw these exceptions and then they'll be caught internally to do the
right thing, they are not problems and not windows specific, Hyperscan has the same
process routing on linux, these exceptions don't matter, you can just ignore them.
I think the matching problem comes from your patterns, if you change your patterns to
char *pattern[] = {"\\b(?i)alert\\b",
"\\b(?i)anchor\\b",
"\\b(?i)anchors\\b",
"\\b(?i)blur\\b"};
then the matching should be correct.
Thanks,
Harry
-----Original Message-----
Date: Wed, 20 Mar 2019 14:27:21 +0000
From: Pratik Khade <pkhade(a)virsec.com>
To: Hyperscan regular expression matching library
<hyperscan(a)lists.01.org>
Subject: Re: [Hyperscan] Hyperscan on windows - issues with
hs_compile_multi
Message-ID:
<MWHPR07MB3584AF7ADD8EAAB34B2253AABE410(a)MWHPR07MB3584.namprd07.prod.outlook.com>
Content-Type: text/plain; charset="us-ascii"
The simple grep app that works does not have following updates -
Cheers,
Pratik
From: Hyperscan <hyperscan-bounces(a)lists.01.org> On Behalf Of Pratik Khade
Sent: 20 March 2019 19:53
To: hyperscan(a)lists.01.org
Subject: [Hyperscan] Hyperscan on windows - issues with hs_compile_multi
Hi Folks,
I have compiled hyperscan on Windows(2012 R2) with boost 0.66. However hyperscan is unable
to match the input pattern. Here is the updated simple-grep application that would take
about 4 patterns and an input string -
Input Patterns:
char *pattern[] = {"\b(?i)alert\b",
"\b(?i)anchor\b",
"\b(?i)anchors\b",
"\b(?i)blur\b"};
Input string file:
alert me now
Same pattern works with original simplegrep example when it is invoked - $./simplegrep.exe
"\b(?i)alert\b" input
Here is the updated simple-grep sources:
/*
* Copyright (c) 2015, Intel Corporation
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions are met:
*
* * Redistributions of source code must retain the above copyright notice,
* this list of conditions and the following disclaimer.
* * Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
* * Neither the name of Intel Corporation nor the names of its contributors
* may be used to endorse or promote products derived from this software
* without specific prior written permission.
*
* THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
* AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
* IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
* ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
* LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
* CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
* SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
* INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
* CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
* ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
* POSSIBILITY OF SUCH DAMAGE.
*/
/*
* Hyperscan example program 1: simplegrep
*
* This is a simple example of Hyperscan's most basic functionality: it will
* search a given input file for a pattern supplied as a command-line argument.
* It is intended to demonstrate correct usage of the hs_compile and hs_scan
* functions of Hyperscan.
*
* Patterns are scanned in 'DOTALL' mode, which is equivalent to PCRE's
'/s'
* modifier. This behaviour can be changed by modifying the "flags" argument to
* hs_compile.
*
* Build instructions:
*
* gcc -o simplegrep simplegrep.c $(pkg-config --cflags --libs libhs)
*
* Usage:
*
* ./simplegrep <pattern> <input file>
*
* Example:
*
* ./simplegrep int simplegrep.c
*
*/
#include <errno.h>
#include <limits.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <hs.h>
/**
* This is the function that will be called for each match that occurs. @a ctx
* is to allow you to have some application-specific state that you will get
* access to for each match. In our simple example we're just going to use it
* to pass in the pattern that was being searched for so we can print it out.
*/
#if 1
static int eventHandler(unsigned int id, unsigned long long from,
unsigned long long to, unsigned int flags, void *ctx) {
printf("Match for pattern \"%s\" at offset %llu\n", (char *)ctx,
to);
return 0;
}
#else
static int matchcb(unsigned int id, unsigned long long from,
unsigned long long to, unsigned int flags, void *ctx) {
search_ctx_t *search_ctx;
unsigned int match_index;
search_ctx = (search_ctx_t *)ctx;
HS_TEST_LOG(HS_TEST_LOG_LVL_INFO,
"Match for pattern id %u: start_off %llu, end_off %llu\n", id,
from, to);
if (search_ctx->num_matches > MAX_MATCHES) {
HS_TEST_LOG(HS_TEST_LOG_LVL_WARN,
"Match for pattern id %u: start_off %llu, end_off %llu, "
"Not recorded\n",
id, from, to);
return 0;
}
search_ctx->num_matches++;
total_num_matches += search_ctx->num_matches;
match_index = search_ctx->num_matches - 1;
search_ctx->hfa_match_info[match_index].patno = id;
search_ctx->hfa_match_info[match_index].start_offset = (unsigned int)from;
search_ctx->hfa_match_info[match_index].end_offset = (unsigned int)to;
return 0;
}
#endif
/**
* Fill a data buffer from the given filename, returning it and filling @a
* length with its length. Returns NULL on failure.
*/
static char *readInputData(const char *inputFN, unsigned int *length) {
FILE *f = fopen(inputFN, "rb");
if (!f) {
fprintf(stderr, "ERROR: unable to open file \"%s\": %s\n",
inputFN,
strerror(errno));
return NULL;
}
/* We use fseek/ftell to get our data length, in order to keep this example
* code as portable as possible. */
if (fseek(f, 0, SEEK_END) != 0) {
fprintf(stderr, "ERROR: unable to seek file \"%s\": %s\n",
inputFN,
strerror(errno));
fclose(f);
return NULL;
}
long dataLen = ftell(f);
if (dataLen < 0) {
fprintf(stderr, "ERROR: ftell() failed: %s\n", strerror(errno));
fclose(f);
return NULL;
}
if (fseek(f, 0, SEEK_SET) != 0) {
fprintf(stderr, "ERROR: unable to seek file \"%s\": %s\n",
inputFN,
strerror(errno));
fclose(f);
return NULL;
}
/* Hyperscan's hs_scan function accepts length as an unsigned int, so we
* limit the size of our buffer appropriately. */
if ((unsigned long)dataLen > UINT_MAX) {
dataLen = UINT_MAX;
printf("WARNING: clipping data to %ld bytes\n", dataLen);
} else if (dataLen == 0) {
fprintf(stderr, "ERROR: input file \"%s\" is empty\n",
inputFN);
fclose(f);
return NULL;
}
char *inputData = malloc(dataLen);
if (!inputData) {
fprintf(stderr, "ERROR: unable to malloc %ld bytes\n", dataLen);
fclose(f);
return NULL;
}
char *p = inputData;
size_t bytesLeft = dataLen;
while (bytesLeft) {
size_t bytesRead = fread(p, 1, bytesLeft, f);
bytesLeft -= bytesRead;
p += bytesRead;
if (ferror(f) != 0) {
fprintf(stderr, "ERROR: fread() failed\n");
free(inputData);
fclose(f);
return NULL;
}
}
fclose(f);
*length = (unsigned int)dataLen;
return inputData;
}
int main(int argc, char *argv[]) {
if (argc != 3) {
fprintf(stderr, "Usage: %s <pattern> <input file>\n",
argv[0]);
return -1;
}
int flags[] = {HS_FLAG_DOTALL, HS_FLAG_DOTALL, HS_FLAG_DOTALL,
HS_FLAG_DOTALL};
int ids[] = {1, 2, 3, 4};
char *pattern[] = {"\b(?i)alert\b",
"\b(?i)anchor\b",
"\b(?i)anchors\b",
"\b(?i)blur\b"};
char *inputFN = argv[2];
/* First, we attempt to compile the pattern provided on the command line.
* We assume 'DOTALL' semantics, meaning that the '.' meta-character
will
* match newline characters. The compiler will analyse the given pattern and
* either return a compiled Hyperscan database, or an error message
* explaining why the pattern didn't compile.
*/
hs_database_t *database;
hs_compile_error_t *compile_err;
#if 1
if (hs_compile_multi((const char *const *)pattern,
flags,
ids,
4,
HS_MODE_BLOCK,
NULL,
&database,
&compile_err) != HS_SUCCESS) { #else
if (hs_compile(pattern, HS_FLAG_DOTALL, HS_MODE_BLOCK, NULL, &database,
&compile_err) != HS_SUCCESS) { #endif
fprintf(stderr, "ERROR: Unable to compile pattern \"%s\":
%s\n",
pattern, compile_err->message);
hs_free_compile_error(compile_err);
return -1;
}
/* Next, we read the input data file into a buffer. */
unsigned int length;
char *inputData = readInputData(inputFN, &length);
if (!inputData) {
hs_free_database(database);
return -1;
}
/* Finally, we issue a call to hs_scan, which will search the input buffer
* for the pattern represented in the bytecode. Note that in order to do
* this, scratch space needs to be allocated with the hs_alloc_scratch
* function. In typical usage, you would reuse this scratch space for many
* calls to hs_scan, but as we're only doing one, we'll be allocating it
* and deallocating it as soon as our matching is done.
*
* When matches occur, the specified callback function (eventHandler in
* this file) will be called. Note that although it is reminiscent of
* asynchronous APIs, Hyperscan operates synchronously: all matches will be
* found, and all callbacks issued, *before* hs_scan returns.
*
* In this example, we provide the input pattern as the context pointer so
* that the callback is able to print out the pattern that matched on each
* match event.
*/
hs_scratch_t *scratch = NULL;
if (hs_alloc_scratch(database, &scratch) != HS_SUCCESS) {
fprintf(stderr, "ERROR: Unable to allocate scratch space. Exiting.\n");
free(inputData);
hs_free_database(database);
return -1;
}
printf("Scanning %u bytes with Hyperscan\n", length);
if (hs_scan(database, inputData, length, 0, scratch, eventHandler,
pattern) != HS_SUCCESS) {
fprintf(stderr, "ERROR: Unable to scan input buffer. Exiting.\n");
hs_free_scratch(scratch);
free(inputData);
hs_free_database(database);
return -1;
}
/* Scanning is complete, any matches have been handled, so now we just
* clean up and exit.
*/
hs_free_scratch(scratch);
free(inputData);
hs_free_database(database);
return 0;
}
There are also C++ exceptions observed when hs_compile_multi is invoked:
Exception thrown at 0x00007FFD1EB78E6C in simplegrep.exe: Microsoft C++ exception:
ue2::ConstructLiteralVisitor::NotLiteral at memory location 0x000000402C3EE7F0.
Exception thrown at 0x00007FFD1EB78E6C in simplegrep.exe: Microsoft C++ exception:
ue2::ConstructLiteralVisitor::NotLiteral at memory location 0x000000402C3EE7F0.
Exception thrown at 0x00007FFD1EB78E6C in simplegrep.exe: Microsoft C++ exception:
ue2::ConstructLiteralVisitor::NotLiteral at memory location 0x000000402C3EE7F0.
Exception thrown at 0x00007FFD1EB78E6C in simplegrep.exe: Microsoft C++ exception:
ue2::ConstructLiteralVisitor::NotLiteral at memory location 0x000000402C3EE7F0.
Any pointers what could be going wrong here? Any help would be much appreciated!
Cheers,
Pratik