Using uri_spider to parse file metadata

The uri_spider task, when given a Uri entity such as http://www.acme.com, will spider a site to a specified level of depth (max_depth), a specified max number of pages (limit), and if configured, a specified url pattern (spider_whitelist). When configured – and by default – it will extract DnsRecord types, PhoneNumbers and EmailAddress type entities in the content of the page. All spidered Uris can be can created as entities using the extract_uris option.

Further, the spider will identify any files of the types listed below, and parse their content and metadata for the same types. Because this file parsing uses the excellent Apache Tika under the hood, the number and type of supported file formats is huge – over 300 file formats are supported including common formats like doc, docx and pdf – as well as more exotic types like application/ogg and many video formats. To enable this, simply enable the parse_file_metadata option.

Below, see a screenshot of the task’s configuration:

uri_spider task configuration

Note that you can also take advantage of Intrigue Core’s file parsing capabilities on a Uri by Uri basis by pointing the uri_extract_metadata task at a specific Uri with a file you’d like parsed, such at https://acme.com/file.pdf

 

Docker One-Liner

Assuming you have Docker installed, this will pull the latest standalone image from DockerHub, and start it listening on :7777.

IMAGE="intrigueio/intrigue-core:latest"
docker pull $IMAGE && \
docker run -e LANG=C.UTF-8 -v ~/intrigue-core-data:/data -p 0.0.0.0:7777:7777 -i -t $IMAGE 

UPDATE 20200130: We’ve now added the ability to preserve your instance data between container runs, thanks to Docker’s volume support. If you’d like to adjust where the data is stored, simply adjust the -v option in the docker run command above, substituting the “~/intrigue-core-data” folder with wherever you’d like to store the data on your host system.

IMPORTANT!!! If you want to save your data between container runs, you have to map a volume for /data, as specified above, or use the EC2 image.

Once downloaded, you’ll see the startup happen automatically, and a password will be generated for you. You’ll be able to access the interface over SSL on :7777.

Now that you have an instance running, check out the Up and Running with Intrigue Core guide.

Using Intrigue Ident for Application Fingerprinting

Chatting with folks at RSA and BsidesSF, i realized it’d be helpful to share more information about the new application fingerprinter behind Intrigue Core, Ident.

Ident is a new standalone project, with a clear focus: be the most complete, flexible and most extensible software for fingerprinting application layer technologies and vulnerabilities. Given that it’s launching with over 300 checks reflecting current and widespread technology, and it’s simple to craft a new check (see details below), it’s on the way toward fulfilling this mission.

You might wonder… why not just integrate with nmap, recog, wappalyzer, or others? A couple key reasons… 1) A focus on freedom (the code is BSD licensed) and 2) Razor sharp focus on app-layer technology. To give you a more detailed view, here’s what I highlighted at BsidesSF this year – these are the key qualities I was looking for:

Compared with Recog – another excellent fingerprinting library – which is more static in its format (not a good fit for ident but also a strength) focused on infrastructure. Recog’s focus on infrastructure actually makes it a great complement to Ident, and thus it’s been integrated into Intrigue Core as well.

And while there are many tools and libraries out there, each had a licensing or technology limitation against this criteria, making them incompatible with the focus of the project.

In addition, spinning up a standalone fingerprinter has a number of benefits:

  • It makes it easier to use, and to contribute to the overall project, checks are pretty simple to create and test.
  • It opens up new use cases … If you have a set of known applications, but want to know if they’re running a given version, or if they’re configured properly, Ident’s CLI can be an excellent fit. You can just run it against a list of urls (see below).
  • Automation of Ident can be a lot easier than automating against the whole Intrigue Core platform. Feel free to drop the library into your project, and reach out if we can help you do so!
  • By building from the ground, we can integrate CPE support, ensuring vulnerability inference vs the CVE database “just works” and we don’t need to do anything special to determine vulnerabilities for a given version.

To give you a quick run through and some examples of what it can do, here’s an example of the CLI running against a single URL:

And here’s one against a file of URLs (one per line), which automatically saves results into a CSV file:

A cool thing about the CLI tool is how it handles “content checks” – a special type of checks that will always run and print output vs “fingerprint checks” – which will also run, but will only ever show up in output if they match. The CLI generates an output.csv file that makes each content check a column and is smart enough to know if a new check is added! Simply drop a new check into the “checks/content” folder if you want to get the output in the CSV.

Here’s an example content check, this one checks for directory indexing in the content of the tested page:

Fingerprint checks are also pretty simple to write, this one matches to Axis Webcams, and as you can see, it checks the body contents for a unique string. You can regex against the contents of the body, headers, cookies, title, and generator.

Ident is also tightly integrated with Google’s “Chrome Headless”, so if you add the ‘-b’ flag, you’ll notice that some additional checks are run (and it may run a little more slowly … ~10s per url on a recent machine), but this is because it’s parsing and fingerprinting against the full DOM. Very handy!

In order to keep the library speedy and minimize the number of queries that are made as a given application is fingerprinted, each Ident check takes a “paths” parameter that is used as a template for the requested pages, and this is pre-processed at runtime to ensure only ONE request is made for each unique path. This keeps the fingerprinting FAST and so we will endeavor to minimize the number of unique paths going forward! Fortunately the standard “#{url}” path is often still VERY verbose about the running software.

If you’d like to get started using it right away, you can pull the latest Dockerhub image and start testing using the following command:

docker pull intrigueio/intrigue-ident && docker run -t intrigueio/intrigue-ident --url [YOUR URL HERE]

That’s it for now. Reach out if you have questions! You can check out the full set of checks here: https://github.com/intrigueio/intrigue-ident/tree/master/checks. If you’re interested in helping out, or have ideas on how to improve the project, certainly pop into the Slack channel and say hello, or reach out on twitter: @intrigueio.