Article

EQL Injection (not a typo) and Oracle Endeca

Oracle Endeca is a used by a number of online retailers for implementing search functionality. This post introduces the concept of EQL injection attacks and how to defend against them.

Introduction

Recently, MWR have been involved in testing e-commerce web applications, made up of a complex hierarchy of largely Oracle proprietary products. Broadly these systems are called the Oracle ATG Web Commerce platform. This includes an ATG application server, which is a customised JBoss instance that hosts application content and provides the functionality and authentication for the ATG Web Commerce platform. However, a separate component is often introduced for search functionality.

Oracle Endeca is the product catalogue of choice for ATG platforms, storing a large number (i.e. hundreds of thousands, or even millions) of products which users of an ATG-hosted application can search for. But prior to some of this recent work, Endeca was not a system that we had encountered. Similarly, this was not a system that seemed to have any notable security research conducted against it.

Testing of such systems revealed that, by default, they are not particularly hardened or secured. This means that it may often be possible to achieve SQL injection-like results through the search functionality of ATG Web Commerce applications that use Oracle Endeca, if an attacker had sufficient familiarity with Endeca queries. This post will therefore detail the basics of Endeca and the steps necessary to achieve “EQL injection”, as well as what can be done to defend against this new attack vector.

What is Endeca?

Oracle Endeca is a hybrid search-analytical database. It is designed to allow easy, lightweight searching of large product catalogues. Endeca differs from standard SQL-based databases in its flat, columnar data model; Endeca does not employ tables. An Endeca data store (called an index) comprises records (or products), which contain multiple properties (e.g. price, name, etc.) and dimensions (i.e. categories it belongs to, e.g. film, book, etc.). To search an index, Endeca uses custom search filters and its own declarative query language, Endeca Query Language (EQL).

There are several high quality guides for EQL and Endeca’s various search parameters, including:

If faced with a penetration test of a web application that includes Oracle Endeca systems, it is worth identifying if the client has a JSP reference (JSPref) application. JSPref is a rudimentary GUI to access the Endeca index, which allows a user to execute Endeca queries and view the names of properties and dimensions within the index. JSPref applications are usually hosted on the internal host Endeca runs on, for example:

Using JSPref can be a good way to identify possible fields and syntax when attempting to achieve Endeca injection. Often, the majority of what you will find through this is that property and dimension names are generally straightforward and expected – as table and column names are in SQL databases. For example, the results may include things like:

  • LISTING_ID: The record’s unique ID (effectively a primary key)
  • P_Price: A record’s price property
  • P_Best_Seller: A record’s best seller ranking property
  • D_Item_Category: A record’s categorisation dimension (e.g. video games, tablets)

Common column names are generally valid, with P_ starting properties and D_ starting dimensions. To identify property and dimension names through enumeration, it is generally a good idea to use common column names and mix between lowercase, uppercase, camel case, sentence case and title case, trying both with and without underscores.

Endeca Injection

As in SQL injection, Endeca (or EQL) injection is the process of attempting to break out of the context of a search query to execute arbitrary attacker-defined queries. Here, the goal would be to extract sensitive data from the Endeca index.

In certain very obvious cases (as discussed further in the “Where else can we find it?” section below), applications may directly use Endeca-specific parameters to populate HTTP GET requests. This makes it trivial to know how and where to inject Endeca queries.

In other instances however, search terms (i.e. as entered into the search bar of an application) are translated into Endeca-specific parameters which are then used on the back-end. For example, consider the following generic URL:

https://www.someoracle-ecommerceapp.com/searchresults.page?sortOrder=1&searchquery=ipad&filterBy=Price|%25A310+ to+%25A320

These parameters would then be translated by the application server into the relevant Endeca parameters. For the above example, these parameters include Ntt, Ntk, Nf and Ns, which described in detail in the Endeca reference material given above. A brief summary of key Endeca parameters is given below:

  • Ntt: Record Search Terms. The search term to query Endeca for, e.g. “iPad”.
  • Ntk: Record Search Key. The property or dimension Endeca will try to find the search term specified by Ntt in, e.g. product name (P_Display_Name).
  • Nf: Range Filter. Filters the results returned by the Endeca query or search by one or more range operations, e.g. price between £5 and £10.
  • Ns: Sort Key. One or more properties or dimensions to order the results returned by the Endeca query or search by.
  • Nrs: Endeca Query Language (EQL) Filter. Uses a fully formed EQL query to search Endeca.

While the legitimate functionality of these applications is to translate the contents of searchquery, sortOrder and other parameters to populate the Endeca parameters, we have found that these applications are often configured to effectively append all HTTP request parameters onto the Endeca query string. While this blind appending of values is generally ill-advised, it is particularly dangerous in this case, as it means an attacker could simply add Endeca-specific parameters into the URL to trivially execute Endeca commands. A simple Proof-of-Concept (PoC) is included below:

https://www.someoracle-ecommerceapp.com/searchresults.page?Nrs=collection()/record[LISTING_ID%3D "T123456"]

Note, the only special consideration beyond constructing an EQL query here is the need to encode the equals character (=) as %3D; this is to differentiate between the standard parameter=value structure of the HTTP request parameters, and associates this equal character as part of the broader Nrs Endeca query.

The above request would execute an EQL search for all products in the Endeca index with a LISTING_ID of T123456. Viewing the page source would confirm that the resulting product matches the specified LISTING_ID – generally found in a hidden form value. This can also be confirmed by using the JSPref application mentioned above and searching by this LISTING_ID.

While searching for products is obviously intended behaviour, it will almost always not be possible to search by LISTING_ID (the equivalent of a database primary key) using the search functionality directly. This PoC – assuming it is given a valid ID value and that LISTING_ID is the correct field name – could be used to indicate whether EQL injection is possible. This is similar to attempts to inject sleep() commands to cause a demonstrative delay in SQL injection PoCs.

Similarly benign PoCs could be achieved using more advanced EQL payloads, including reference to dimensions, such as:

Nrs=collection()/record[%20D_Item_Category%20%3D%20collection("dimensions")/dval[name%3D"D_Item_Category"]/dval[name%3D"iPad"]//id]

This would return all products that fall under the category “iPad”, specified in the D_Item_Category dimension. Again, the end-result is intended behaviour, but searching for specific Endeca properties and dimensions directly is not.

The challenge then becomes identifying what the most damaging information an attacker could retrieve from the Endeca index is. The risk of EQL injection is limited by the fact that customer data is generally not stored in Endeca indexes, with it being used instead to store large product catalogues. However, sensitive data can still be retrieved. For instance, we have observed Endeca indexes that include categorisations for products being marked for emergency withdrawal – e.g. food products with discovered contamination, defective electronics, etc. The following EQL injection payload would return a list of all products which had been marked for immediate emergency product withdrawal:

https://www.someoracle-ecommerceapp.com/searchresults.page?Nrs=collection()/record[endeca:matches(.,"P_Marked_For_Emergency_Withdrawal ","Y")]

An attacker could use this to identify products designated for withdrawal which had not yet been actioned by the company, thus having the potential to cause reputational damage.

Similarly, an attacker could attempt to limit the company’s sales, by identifying the app’s best seller products which also had low stock quantities. Such a list of low-stock, high-interest products could be obtained by an attacker by using the following EQL injection payload:

https://www.someoracle-ecommerceapp.com/searchresults.page?Ns=P_Stock_Availability|0||P_Best_Seller|0

The above query would return a list of items which were near the top of the best seller list, but had few items left in stock. The Ns parameter sorts the returned results (in this case, all results) by a given key. Additional configuration parameters are specified with the pipe character (|), demonstrated in the example above, where 0 instructs Endeca to filter results by P_Stock_Availability in ascending order, so items with a larger quantity are displayed at the top of the list. Here, we are specifying two keys, denoted by the second pipe (|) in the middle of the query. This filters the subset returned based on P_Best_Seller in ascending order.

As above, the results here can be verified by using the JSPref application: we can search the name of the top returned item and verify how many items are in stock and its best seller ranking. Assuming this matches the information we can infer from the above EQL injection payload, this information could prove useful to a competitor. An attacker could create multiple accounts for the website and add the maximum number of those items to their checkout basket, which will generally hold the product for that user for a set amount of time, effectively marking that popular product as sold out and preventing legitimate customers from buying it from the company. This could therefore be used by a competitor to divert business, by forcing customers to buy the product elsewhere.

Denial of Service

While Endeca is designed to perform efficient and lightweight searches over large product catalogues, if an attacker is able to trigger a full lookup of the entire index, it will likely take a long time to execute. Performing multiple parallel full lookups at the same time is therefore capable of causing a Denial of Service.

For example, the back-end of the Endeca system may be creating a search query in a manner similar to the following:

StringBuilder query = new StringBuilder();
query.append(someParameters);
query.append("&");
query.append("Ntt");
query.append("=");
query.append(searchquery);
query.append("&");
query.append("Ntk");
query.append("=");
query.append("All");

If an attacker then performed a blank search, this would be translated into a search of all keys (Ntk) with a blank term (Ntt). This would therefore cause a full search of all possible properties, which would match all records. This introduces significant computational complexity for the application, which puts strain on Endeca.

To defend against this, companies may decide to use URL-based Content Delivery Network (CDN) caching mechanisms, or rate-limiting Web Application Firewalls (WAFs). However, using systems like AWS to deploy multiple separate instances, an attacker could invoke a high number of requests with randomly-generated Universally Unique Identifiers (UUIDs) (“cachebusters”) on the end of a blank search query from various hosts. This would bypass the CDN caching mechanisms and trigger a full lookup, as well as likely not passing the request threshold for an individual IP address in many WAFs. Endeca injection therefore becomes a viable means for performing a Denial of Service (DoS) against load balanced and DDoS-protected applications.

Where else can we find it?

It turns out Endeca is quite common amongst similar large-scale retailers, for product catalogues. Some quick Googling allowed us to identify that it’s in use by Office Depot, Hasbro, National Geographic and others – including, unsurprisingly, Oracle. Several of these actually include the native Endeca parameters listed above in their URLs, rather than using application-specific parameters and translating these into Endeca on the server-side.

The following Google searches can be used to identify common sites using Oracle Endeca search features:

  • inurl:ntt inurl:nty
  • inurl:ntk inurl:p_price

Mitigations

So how can companies with vulnerable Endeca search functionality harden it? Broadly, the following general injection recommendations are applicable:

  • Do not blindly include user input in queries
  • Whitelist expected characters
  • Blacklist known malicious/dangerous characters
  • Use prepared statements/existing frameworks

More specifically however, there are several Endeca-specific recommendations, which are outlined below.

Input validation

The most trivial cases of EQL injection described above occur when HTTP parameters are being blindly appended to the Endeca search query string. In such cases, an attacker can simply provide a fully-formatted EQL query and it will be executed. It is recommended instead that applications populate the relevant Endeca parameters based on the values of the application-specific search parameters (e.g. search term, price range, etc.). This should be done on the server-side and should not be visible to users.

If for whatever reason the above is not directly feasible and certain HTTP parameters provided in the URL do need to be blindly forwarded to the Endeca query string, it is recommended to whitelist specifically the allowed headers. Notably, this list should not include any Endeca-specific parameters, such as Nrs or Ntt.

The above mitigations will help protect against direct EQL injection. However, if an attacker attempts to breakout of those mitigations, more advanced protections are necessary. In such cases, it is recommended to have a whitelist of expected characters for Endeca queries. This can be configured within the server-side code (e.g. the JSP files or equivalent), and with the help of Endeca’s own search_chars.xml file.

search_chars.xml is described in further detail in http://ravihonakamble.blogspot.co.uk/2016/05/how-to-handle-special-characters-in.html. Essentially, Endeca by default should only accept alphanumeric characters, with specific exceptions (e.g. if a single quote or equals sign is necessary, they should be listed here). Thorough auditing of the search_chars.xml file can be used to help protect against direct EQL injection and more advanced breakout attempts.

Denial of Service protection

While it is never possible to completely remove the risk of Denial of Service (DoS) attacks, the practical risk posed by Endeca search functionality can be reduced. Specifically, sensibly populating the values of Endeca parameters with appropriate non-empty search parameters will reduce this risk.

Ntt should be set to the specific textual search term (e.g. iPad); input validation routines should ensure this value is not empty and does not contain a wildcard character. However, it is strongly recommended input validation is based primarily on a whitelist approach, as described above, and not just blacklisting known malicious characters, such as *.

Ntk should be set to a small subset of keys this search term applies to. This might be static (e.g. just P_Display_Name), or context-dependant (e.g. price, item category, etc.) and based on parameters from the application. Mandating a non-blank and non-broad search term and only allowing this to apply to a specific set of keys will drastically reduce the risk of DoS – at the least, this will shift the DoS risk away from Endeca-specific concerns and into traditional load balancing ones.

Prepared statements

Prepared (or parameterised) statements are features which can help defeat traditional SQL injection vulnerabilities. This is done by the application pre-creating a statement or query to be executed by the Database Management System (DBMS), with placeholders left to be filled in by user-defined parameters. This statement is effectively pre-compiled before any potentially malicious user input can be considered.

The same mindset described above applies to EQL injection. Rather than creating a query string as demonstrated in the code snippet in the “Denial of Service” section, it is recommended to set all relevant parameters using the setParam method of the UrlState Endeca class. A UrlState instance is then passed into the buildQuery method of QueryBuilder, which marshals this into an “ENEQuery”, which is then executed.

While obviously this approach is not foolproof, explicit setting of input parameters in a prepared statement manner, when combined with input validation, will drastically reduce the risk of EQL injection. The resources described above can be found here:

Future work

We are in the process of creating a Python script which automates several Endeca-specific checks, to quickly verify if a site is using Endeca and if it is vulnerable to EQL injection. Broadly this does the following:

  • Takes an example search URL
  • Checks if Endeca is in use
    1. Check for Endeca parameters in the URL
    2. If none are present, try inserting Endeca parameters (e.g Ntt=test&Ntk=All) to see if a valid response is retrieved
    3. Look for “Endeca” and other related key terms in the HTML source
  • If Endeca is in use, try to identify valid property and dimension names (i.e. Ntt=&Ntk=P_<common column name>)
  • If valid properties/dimensions are identified, can we get any useful information back?

This has not been completed yet, but will be available in the future.

There are also various areas to do with Endeca and broader Oracle products that could benefit from further security scrutiny. For Endeca, one of the main disappointments from a testing perspective has been the inability to write to the index. If an attacker was able to modify the data stored by Endeca, the impact of EQL injection would obviously be vastly increased.

However, it is not possible to easily update data using EQL – i.e. there is no equivalent to an SQL UPDATE statement. Instead, Endeca uses “partials”, which merges a batch file of updates with the master index and is triggered with a request to a URL similar to the following:

Similarly, there are other locally-invoked admin operations, such as:

  • admin?op=exit
  • admin?op=restart

All signs from testing so far have indicated that partials are only possible locally. Certainly there does not seem to be an easy way to either point Endeca at a non-local batch file, or to invoke the command URL remotely. However, this could benefit from further exploration and attempts to bypass these restrictions, due to the severe security implications of a bypass.

MWR have previously performed testing of Oracle ATG and Endeca components now, with both proving to contain specific insecurities which are non-standard and high impact. The Web Commerce platform also consists of other components which may have similar concerns, namely:

  • Business Control Centre (BCC): A CMS-style web application editor for ATG applications
  • Commerce Service Centre (CSC): CSC is an application used by customer service operators to action similar sales to those performed by users of the main applications

While ATG and Endeca are largely used in applications that are customer-facing portals, BCC and CSC are firmly back-end, providing explicit administrator roles, file upload functionality, direct SQL database access (with customer data in), among other features. So, more of what would generally look interesting on a web application test.

Similarly, BCC includes functionality to enable Endeca Workbench access, which allows administrative access to an Endeca instance. Endeca Workbench itself is interesting, with further detail on this provided here: https://docs.oracle.com/cd/E66320_01/common.11-2/EndecaAdmin/html/tcag_integrating_wb_with_bcc.xmltask_E1FE342CAE1841139361B05E9E3DA684.html.

Conclusions

Oracle Endeca is a powerful search index often used for storing product catalogues and sensitive business-specific information, which is used by a high number of high-profile companies. Endeca particularly seems to be in common use for search features. However, the lack of security attention paid to it previously means it is not often afforded the same level of input validation and security controls as standard SQL databases.

The use of Endeca is also often not noticed by penetration testers who would otherwise attempt standard SQL injection payloads. This leads to routinely tested systems having EQL injection vulnerabilities that are often trivial to exploit.

This post has introduced EQL injection as a new but simple attack vector for targeting applications that use unsecured Endeca instances for their search functionality. This should provide security testers with a basis for identifying EQL injection vulnerabilities and developers with the information necessary to harden their applications against such attacks.