Latest Activity

Configuring Kerberos for Hive in Talend Open Studio for Big Data

Colm O hEigeartaigh - Thu, 09/21/2017 - 13:12
Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

1) Download Talend Open Studio for Big Data and create a job

Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

"tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



3) Configure the components

Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
  • Distribution: Hortonworks
  • Version: HDP V2.5.0
  • Host: localhost
  • Database: default
  • Select "Use Kerberos Authentication"
  • Hive Principal: hiveserver2/localhost@hadoop.apache.org
  • Namenode Principal: hdfs/localhost@hadoop.apache.org
  • Resource Manager Principal: mapred/localhost@hadoop.apache.org
  • Select "Use a keytab to authenticate"
  • Principal: alice
  • Keytab: Path to "alice.keytab" in the Kerby test project.
  • Unselect "Set Resource Manager"
  • Set Namenode URI: "hdfs://localhost:9000"

Now click on "tHiveInput" and select the following configuration options:
  • Select "Use an existing Connection"
  • Choose the tHiveConnection name from the resulting "Component List".
  • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
  • Table name: words
  • Query: "select * from words where word == 'Dare'"

Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:

Categories: Colm O hEigeartaigh

Securing Apache Hive - part VI

Colm O hEigeartaigh - Wed, 09/20/2017 - 17:13
This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

1) Set up a KDC using Apache Kerby

A github project that uses Apache Kerby to start up a KDC is available here:
  • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
  • hdfs/localhost@hadoop.apache.org
  • HTTP/localhost@hadoop.apache.org
  • mapred/localhost@hadoop.apache.org
  • hiveserver2/localhost@hadoop.apache.org
  • alice@hadoop.apache.org 
Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

2) Configure Apache Hadoop to use Kerberos

The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

Edit 'etc/hadoop/core-site.xml' and add:
  • hadoop.proxyuser.hiveserver2.groups: *
  • hadoop.proxyuser.hiveserver2.hosts: localhost
The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
  • mapreduce.framework.name: classic
  • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
  • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
  • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
  • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
  • sbin/start-dfs.sh
  • sudo sbin/start-secure-dns.sh
3) Configure Apache Hive to use Kerberos

Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
  • hive.server2.authentication: kerberos
  • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
  • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
  • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
  • kinit -k -t /pathtokerby/target/alice.keytab alice
  • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.
Categories: Colm O hEigeartaigh

Securing Apache Hive - part V

Colm O hEigeartaigh - Fri, 09/15/2017 - 12:40
This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

1) Build the Apache Sentry distribution

First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:
  • tar zxvf apache-sentry-1.8.0-src.tar.gz
  • cd apache-sentry-1.8.0-src
  • mvn clean install -DskipTests
  • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

2) Install and configure Apache Hive

Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

3) Integrate Apache Sentry with Apache Hive

Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

3.a) Configure Apache Hive to use authorization

Create a file called 'conf/hiveserver2-site.xml' with the content:
Here we are enabling authorization and adding the Sentry authorization plugin.

3.b) Add Sentry plugin configuration

Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

3.c) Add the authorization privileges for our test-case

Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

3.d) Add Sentry libraries to Hive

Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
  • sentry-binding-hive-common-1.8.0.jar
  • sentry-core-model-db-1.8.0.jar
  • sentry*provider*.jar
  • sentry-core-common-1.8.0.jar
  • shiro-core-1.2.3.jar
  • sentry-policy*.jar
  • sentry-service-*.jar
In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

4) Test authorization with Apache Hive

Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare'; (fails)

Categories: Colm O hEigeartaigh

Securing Apache Hive - part IV

Colm O hEigeartaigh - Thu, 09/14/2017 - 14:02
This is the fourth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query.

In this post we will show how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In the second post, we showed how to create a "resource" based policy for "alice" in Ranger, by granting "alice" the "select" permission for the "words" table. Instead, we can grant a user "bob" the "select" permission for a given "tag", which is synced into Ranger from Apache Atlas. This means that we can avoid managing specific resources in Ranger itself.

1) Start Apache Atlas and create entities/tags for Hive

First let's look at setting up Apache Atlas. Download the latest released version (0.8.1) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8.1-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "words_tag".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Hive Entity in the UI. Instead we can use the following json file to create a Hive Entity for the "words" table that we are using in our example, that is based off the example given here:
You can upload it to Atlas via:
  • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @hive-create.json http://localhost:21000/api/atlas/entities
Once the new entity has been uploaded, then you can search for it in the Atlas UI. Once it is found, then click on "+" beside "Tags" and associate the new entity with the "words_tag" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI (http://localhost:6080). Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HiveTagService". Create a new policy for this service called "WordsTagPolicy". In the "TAG" field enter a "w" and the "words_tag" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "select" permissions for "Hive":
We also need to go back to the Resource based policies and edit "cl1_hive" that we created in the second tutorial, and select the tag service we have created above. Once our new policy (including tags) has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. Previously, the user "bob" was denied access to the "words" table, as only "alice" was assigned a resource-based policy for the table. However, "bob" can now access the table via the tag-based authorization policy we have created:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare';
Categories: Colm O hEigeartaigh

The Real Data Processing with Apache Beam and Tika

Sergey Beryozkin - Tue, 09/12/2017 - 18:20
If we talk about the data injestion in the big data streaming pipelines it is fair to say that in the vast majority of cases it is the files in the CSV and other text, easy to parse formats which provide the source data.

Things will become more complex when the task is to read and parse the files in the format such as PDF. One would need to create a reader/receiver capable of parsing the PDF files and feeding the content fragments (the regular text, the text found in the embedded attachments and the file metadata) into the processing pipelines. That was tricky to do right but you did it just fine.

The next morning you get a call from your team lead letting you know the customer actually needs the content injested not only from the PDF files but also from the files in a format you've never heard of before. You spend the rest of the week looking for a library which can parse such files and when you finish writing the code involving that library's not well documented API all you think of is that the weekends have arrived just in time.

On Monday your new task is to ensure that the pipelines have to be initialized from the same network folder where the files in PDF and other format will be dropped. You end up writing a frontend reader code which reads the file, checks the extension, and then chooses a more specific reader.   

Next day, when you are told that Microsoft Excel and Word documents which may or may not be zipped will have to be parsed as well, you report back asking for the holidays...

I'm sure you already know I've been preparing you for a couple of good news.

The first one is a well known fact that Apache Tika allows to write a generic code which can collect the data from the massive number of text, binary, image and video formats. One has to prepare or update the dependencies and configuration and have the same code serving the data from the variety of the data formats.

The other and main news is that Apache Beam 2.2.0-SNAPSHOT now ships a new TikaIO module (thanks to my colleague JB for reviewing and merging the PR). With Apache Beam capable of running the pipelines on top of Spark, Flink and other runners and Apache Tika taking care of various file formats, you get the most flexible data streaming system.

Do give it a try, help to improve TikaIO with new PRs, and if you are really serious about supporting a variety of the data formats in the pipelines, start planning on integrating it into your products :-)

Enjoy!



Categories: Sergey Beryozkin

Integrating JSON Web Tokens with Kerberos using Apache Kerby

Colm O hEigeartaigh - Mon, 09/11/2017 - 13:28
JSON Web Tokens (JWTs) are a standard way of encapsulating a number of claims about a particular subject. Kerberos is a long-established and widely-deployed SSO protocol, used extensively in the Big-Data space in recent years. An interesting question is to examine how a JWT could be used as part of the Kerberos protocol. In this post we will consider one possible use-case, where a JWT is used to convey additional authorization information to the kerberized service provider.

This use-case is based on a document available at HADOOP-10959, called "A Complement and Short Term Solution to TokenAuth Based on
Kerberos Pre-Authentication Framework", written by Kai Zheng and Weihua Jiang of Intel (also see here).

1) The test-case

To show how to integrate JWTs with Kerberos we will use a concrete test-case available in my github repo here:
  • cxf-kerberos-kerby: This project contains a number of tests that show how to use Kerberos with Apache CXF, where the KDC used in the tests is based on Apache Kerby
The test-case relevant to this blog entry is the JWTJAXRSAuthenticationTest. Here we have a trivial "double it" JAX-RS service implemented using Apache CXF, which is secured using Kerberos. An Apache Kerby-based KDC is launched which the client code uses to obtain a service ticket using JAAS (all done transparently by CXF), which is sent to the service code as part of the Authorization header when making the invocation.

So far this is just a fairly typical example of a kerberized web-service request. What is different is that the service configuration requires a level of authorization above and beyond the kerberos ticket, by insisting that the user must have a particular role to access the web service. This is done by inserting the CXF SimpleAuthorizingInterceptor into the service interceptor chain. An authenticated user must have the "boss" role to access this service. 

So we need somehow to convey the role of the user as part of the kerberized request. We can do this using a JWT as will be explained in the next few sections.

2) High-level overview of JWT use-case with Kerberos
 
As stated above, we need to convey some additional claims about the user to the service. This can be done by including a JWT containing those claims in the Kerberos service ticket. Let's assume that the user is in possession of a JWT that is issued by an IdP that contains a number of claims relating to that user (including the "role" as required by the service in our test-case). The token must be sent to the KDC when obtaining a service ticket.

The KDC must validate the token (checking the signature is correct, and that the signing identity is trusted, etc.). The KDC must then extract some relevant information from the token and insert it somehow into the service ticket. The kerberos spec defines a structure that can be used for this purposes called the AuthorizationData, which consists of a "type" along with some data to be interpreted according to the "type". We can use this structure to insert the encoded JWT as part of the data.  

On the receiving side, the service can extract the AuthorizationData structure from the received ticket and parse it accordingly to retrieve the JWT, and obtain whatever claims are desired from this token accordingly.

3) Sending a JWT Token to the KDC

Let's take a look at how the test-case works in more detail, starting with the client. The test code retrieves a JWT for "alice" by invoking on the JAX-RS interface of the Apache CXF STS. The token contains the claim that "alice" has the "boss" role, which is required to invoke on the "double it" service. Now we need to send this token to the KDC to retrieve a service ticket for the "double it" service, with the JWT encoded in the ticket.

This cannot be done by the built-in Java GSS implementation. Instead we will use Apache Kerby. Apache Kerby has been covered extensively on this blog (see for example here). As well as providing the implementation for the KDC used in our test-case, Apache Kerby provides a complete GSS implementation that supports tokens in the forthcoming 1.1.0 release. To use the Kerby GSS implementation we need to register the KerbyGssProvider as a Java security provider.

To actually pass the JWT we got from the STS to the Kerby GSS layer, we need to use a custom version of the CXF HttpAuthSupplier interface. The KerbyHttpAuthSupplier implementation takes the JWT String, and creates a Kerby KrbToken class using it. This class is added to the private credential list of the current JAAS Subject. This way it will be available to the Kerby GSS layer, which will send the token to the KDC using Kerberos pre-authentication as defined in the document which is linked at the start of this post.

4) Processing the received token in the KDC

The Apache Kerby-based KDC extracts the JWT token from the pre-authentication data entry and verifies that it is signed and that the issuer is trusted. The KDC is configured in the test-case with a certificate to use for this purpose, and also with an issuer String against which the issuer of the JWT must match. If there is an audience claim in the token, then it must match the principal of the service for which we are requesting a ticket. 

If the verification of the received JWT passes, then it is inserted into the AuthorizationData structure in the issued service ticket. The type that is used is a custom value defined here, as this behaviour is not yet standardized. The JWT is serialized and added to the data part of the token. Note that this behaviour is fully customizable.

5) Processing the AuthorizationData structure on the service end

After the service successfully authenticates the client, we have to access the AuthorizationData part of the ticket to extract the JWT. This can all be done using the Java APIs, Kerby is not required on the receiving side. The standard CXF interceptor for Kerberos is subclassed in the tests, to set up a custom CXF SecurityContext using the GssContext. By casting it to a ExtendedGSSContext, we can access the AuthorizationData and hence the JWT. The role claim is then extracted from the JWT and used to enforce the standard "isUserInRole" method of the CXF SecurityContext. 

If you are interested in exploring this topic further, please get involved with the Apache Kerby project, and help us to further improve and expand this integration between JWT and Kerberos.
Categories: Colm O hEigeartaigh

Securing Apache Hive - part III

Colm O hEigeartaigh - Thu, 09/07/2017 - 12:45
This is the third in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. In this post we will extend the authorization scenario by showing how Apache Ranger can be used to create policies to both mask and filter data returned in the Hive query.

1) Data-masking with Apache Ranger

As a pre-requisite to this tutorial, please follow the previous post to set up Apache Hive and to enforce an authorization policy for the user "alice" using Apache Ranger. Now let's imagine that we would like "alice" to be able to see the "counts", but not the actual words themselves. We can create a data-masking policy in Apache Ranger for this. Open a browser and log in at "http://localhost:6080" using "admin/admin" and click on the "cl1_hive" service that we have created in the previous tutorial.

Click on the "Masking" tab and add a new policy called "WordMaskingPolicy", for the "default" database, "words" table and "word" column. Under the mask conditions, add the user "alice" and choose the "Redact" masking option. Save the policy and wait for it to by synced over to Apache Hive:


Now try to login to beeline as "alice" and view the first five entries in the table:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words LIMIT 5;
You should see that the characters in the "word" column have been masked (replaced by "x"s).



2) Row-level filtering with Apache Ranger 

Now let's imagine that we are happy for "alice" to view the "words" in the table, but that we would like to restrict her to words that start with a "D". The previous "access" policy we created for her allows her to view all "words" in the table. We can do this by specifying a row-level filter policy. Click on the "Masking" tab in the UI and disable the policy we created in the previous section.

Now click on the "Row-level Filter" tab and create a new policy called "AliceFilterPolicy" on the "default" database, "words" table. Add a Row Filter condition for the user "alice" with row filter "word LIKE 'D%'". Save the policy and wait for it to by synced over to Apache Hive:


Now try to login to beeline as "alice" as above. "alice" can successfully retrieve all entries where the words start with "D", but no other entries via:
  • select * from words where word like 'D%';
Categories: Colm O hEigeartaigh

Mostly On Topic: CXF and Swagger Integration Keeps Getting Better

Sergey Beryozkin - Wed, 09/06/2017 - 14:50
While thinking about a title of this post I thought the current title line, with the " Keeps Getting Better" finishing touch may work well; I knew I used a similar line before, and after looking through my posts I found it.

Oh dear. I'm transported back to 2008, I can see myself, 9 years younger, walking to the Iona Technologies office, completely wired on trying to stop the Jersey JAX-RS domination :-), spotting an ad of the latest  Christina Aguilera's albom on the exit from the Lansdowne Dart station and thinking, it would be fun, trying to blog about it and link to CXF, welcome to the start of the [OT] series. I'm not sure now if I'm more surprised it was actually me who did write that post or that 9 years later I'm still here, talking about CXF :-).

Let me get back to the actual subject of this post. You know CXF started quite late with embracing Swagger, and I'm still getting nervous whenever I remind myself Swagger does not support 'matrix' parameters :-). But the Swagger team has done a massive effort through the years, my CXF hat is off to them.

I'm happy to say that now Apache CXF offers one of the best Swagger2 integrations around, at the JSON only and UI levels and it just keeps getting better.

We've talked recently with Dennis Kieselhorst and one can now configure Swagger2Feature with the external properties file which can be especially handy when this feature is auto-discovered.

Just at the last minute we resolved an issue reported by a CXF user to do with accessing Swagger UI from the reverse proxies.

Finally, Freeman contributed a java2swagger Maven plugin.

Swagger 3 will be supported as soon as possible too.

Enjoy!
Categories: Sergey Beryozkin

Apache CXF 3.2.0 NIO Extension

Sergey Beryozkin - Thu, 08/31/2017 - 13:35
In CXF 3.2.0 we have also introduced a server-side NIO extension which is based on the very first JAX-RS API prototype done by Santiago Pericas-Geertsen. The client NIO API prototype was not ready but the server one had some promising start. It was immediately implemented in CXF once a long-awaited 1st 2.1 API jar got published to Maven.

However, once the JAX-RS 2.1 group finally resumed its work and started working on finalizing NIO API, the early NIO API was unfortunately dropped (IMHO it could've stayed as an entry point, 'easy' NIO API), while the new NIO API did not materialize primarily due to the time constraints of the JCP process.

The spec leads did all they could but it was too tight for them to make it right. As sad as it was, they did the right decision, rather then do something in a hurry, better do it right at some later stage...

It was easily the major omission from the final 2.1 API. How long JAX-RS users will wait till the new JAX-RS version will get finalized with the new NIO API becoming available to them given that it takes years for major Java EE umbrella of various specs be done ?

In meantime the engineering minds in SpringBoot and RxJava and other teams will come up with some new brilliant ways of doing it. There will be not 1 but several steps ahead.

Which brings me to this point: if I were to offer a single piece of advice to Java EE process designers, I'd recommend them to make sure that the new features can be easily added after the EE release date with the minor EE releases embracing these new features to follow soon,  without waiting for N years. If it were an option then we could've seen a JAX-RS 2.2 NIO in say 6 months - just a dream at the moment, I know. The current mechanism where EE users wait for several years for some new features is out of sync with the competitive reality of the software industry and only works because of the great teams around doing EE, the EE users loyalty and the power of the term 'standard'.

Anyway, throwing away our own implementation of that NIO API prototype now gone from 2.1 API just because it immediately became the code supporting a non-standard feature was not a good idea.

It offers an easy link to the Servlet 3.1 NIO extensions from the JAX-RS code and offers the real value. Thus the code stayed and is now available for the CXF users to experiment with.

It's not very shiny but it will deliver. Seriously, if you need to have a massive InputStream copied to/from the HTTP connection with NIO and asynchronous callbacks involved, what else do you need but a simple and easy way to do it from the code ? Well, nothing can be simpler than this option for sure.

Worried a bit it is not a standard feature ? No, it is fine, doing it the CXF way is a standard :-)
  
Categories: Sergey Beryozkin

JAX-RS 2.1 is Released

Sergey Beryozkin - Thu, 08/31/2017 - 13:33
JAX-RS 2.1 (JSR 370) has been finally released and JAX-RS users can now start looking forward to experimenting with the new features very soon, with a number of final JAX-RS 2.1 implementations being already available (such as Jersey) or nearly ready to be released.

Apache CXF 3.2.0 is about to be released shortly, and all of the new JAX-RS 2.1 features have been implemented:  reactive client API extensions, client/server Server Sent Events support, returning CompletableFuture from the resource methods and other minor improvements.

As part of the 2.1 work (but also based on the CXF JIRA request) we also introduced RxJava Observable and recently - RxJava2 Flowable/Observable client and server extensions. One can use them as an alternative to using CompletableFuture  on the client or/and the server side. Note, the combination of RxJava2 Flowable with JAX-RS AsyncResponse on the server is quite cool.

The other new CXF extension which was introduced as part of the JAX-RS 2.1 work is the NIO extension, this will be a topic of the next post.

Pavel Bucek and Santiago Pericas-Geertsen were the great JAX-RS 2.1 spec leads. Andriy Redko spent a lot of his time with getting CXF 3.2.0 JAX-RS 2.1 ready.
Categories: Sergey Beryozkin

Oracle and JAVA EE

Daniel Kulp - Fri, 08/18/2017 - 15:17
Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS […]
Categories: Daniel Kulp

Securing Apache Hive - part II

Colm O hEigeartaigh - Tue, 08/01/2017 - 13:06
This is the second post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. In this post we will show how to add authorization to the previous example using Apache Ranger.

1) Install the Apache Ranger Hive plugin

If you have not done so already, please follow the first post to install and configure Apache Hadoop and Apache Hive. Next download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hive-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hive-plugin ${ranger.hive.home}
Now go to ${ranger.hive.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "cl1_hive".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hive installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-hive-plugin.sh". The Apache Ranger Hive plugin should now be successfully installed. Make sure that the default policy cache for the Hive plugin '/etc/ranger/cl1_hive/policycache' is readable by the user who is running the Hive server. Then restart the Apache Hive server to enable the authorization plugin.

2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Apache Hive. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Ranger admin service via 'sudo ranger-admin start' and open a browser at 'http://localhost:6080', logging on with the credentials 'admin/admin'. Click the "+" button next to the "HIVE" logo and enter the following properties:
  • Service Name: cl1_hive
  • Username/Password: admin
  • jdbc.url: jdbc:hive2://localhost:10000
Note that "Test Connection" won't work as the "admin" user will not have the necessary authorization to invoke on Hive at this point. Click "Add" to create the service. If you have not done so in a previous tutorial, click on "Settings" and then "Users/Groups" and add two new users called "alice" and "bob", who we will use to test authorization. Then go back to the newly created "cl1_hive" service, and click "Add new policy" with the following properties:
  • Policy Name: SelectWords
  • database: default
  • table: words
  • Hive column: *
Then under "Allow Conditions", give "alice" the "select" permission and click "Add".


3) Test authorization with Apache Hive

Once our new policy has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (fails)
Categories: Colm O hEigeartaigh

Third party SSO support for Apache Syncope REST services

Colm O hEigeartaigh - Fri, 07/28/2017 - 17:11
A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:
  • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
  • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
1) Configuring Apache Syncope to accept third-party JWTs

Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in 'security.properties' to both issue and validate signed tokens:
  • jwtIssuer: The issuer of the token
  • jwsKey: The Hex-encoded (symmetric) verification key
The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

2) Obtain a JWT from the Apache CXF STS using REST

Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

Now we can use this token with the Syncope client API to call the user "self service" successfully:


Categories: Colm O hEigeartaigh

Securing Apache Hive - part I

Colm O hEigeartaigh - Thu, 07/20/2017 - 18:30
This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

1) Install and configure Apache Hadoop

The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':
  • hadoop.proxyuser.$user.groups: *
  • hadoop.proxyuser.$user.hosts: localhost
where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
  • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
  • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
  • bin/hadoop fs -mkdir /data
The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
  • bin/hadoop fs -put output.txt /data
2) Install and configure Apache Hive

Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
  • bin/schematool -dbType derby -initSchema
  • bin/hiveserver2
In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
  • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
  • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
  • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
  • bin/beeline -u jdbc:hive2://localhost:10000
  • select * from words where word == 'Dare';
Categories: Colm O hEigeartaigh

[OT] I Work with CXF and I Want It That Way

Sergey Beryozkin - Thu, 07/13/2017 - 11:11
The time has come for a regular OT post.

The journey of the software developer is always about finding the home where he or she can enjoy being every day, can look forward to contributing to the bigger effort every day.

In addition to that the journey of the web services developer is always about finding the web services framework which will help with creating the coolest HTTP service on the Web. We all know there are many quality HTTP service frameworks around.

My software developer's journey so far has been mostly about supporting one of such web services frameworks, Apache CXF. It has been a great journey.

Some of you helped by using and contributing to Apache CXF earlier, some of you are long term Apache CXF users and contributors, preparing the ground for the new users and contributors who are yet to discover CXF.

No matter which group you are in, even if you're no longer with CXF, I'm sure you've had that feeling at least once that you'd like your CXF experience last forever :-).

Listen to a message from the best boys band in the world. Enjoy :-)

  


Categories: Sergey Beryozkin

Multiple JWE Encryptions POC With Apache CXF in two hours

Sergey Beryozkin - Mon, 07/03/2017 - 15:12
The summer has been great so far, and as usual, instead of watching yet another sport event final, you've decided to catch up with your colleagues after work and do a new round of the Apache CXF JOSE coding. Nice idea they said.

The idea of  creating an application processing the content encrypted for the multiple recipients has captured your imagination.

After reviewing the CXF JWE JSON documentation you've decided to start with the following client code. This code creates a client proxy which posts some text.

JWE JSON filter registered with the proxy will encrypt whatever the content the proxy is sending (does not have to be only text) only once, and the content encrypting key (CEK) will be encrypted with the recipient specific encrypting keys. Thus if you have 2 recipients then CEK will be encrypted twice.

Registering the jwejson1.properties  and jwejson2.properties with the proxy instructs the JWE JSON filter that a JWE JSON container for 2 recipients needs to be created, that the content encryption algorithm is A128GCM and key encryption algorithm is A128KW, and each recipient is using its own symmetric key encryption key. Each recipient specific entry will also include a 'kid' key identifier of the key encryption key for the service to figure out which JWE JSON entry is targeted at which recipient.

Setting up the client took you all one hour.

Next task was to prototype a service code. That was even easier. Loading the recipient specific properties, locating a recipient specific entry and getting the decrypted content was all what was needed.

Two hours in total. Note I did not promise it would take you 30 mins to do all the POC, it would've been really a child's play which is not realistic. With the two hours long project it is more complex, it felt like it was a walk in the park :-)



 
Categories: Sergey Beryozkin

Securing Apache Solr - part III

Colm O hEigeartaigh - Fri, 06/30/2017 - 16:18
This is the third post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. The second post looked at how the Apache Ranger admin service can be configured to store audit information in Apache Solr. In this post we will extend the example in the first article to include authorization, by showing how to create and enforce authorization policies using Apache Ranger.

1) Install the Apache Ranger Solr plugin

The first step is to install the Apache Ranger Solr plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-${version}-solr-plugin.tar.gz
  • mv ranger-${version}-solr-plugin ${ranger.solr.home}
Now go to ${ranger.solr.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "solr_service".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Solr server directory
Save "install.properties" and install the plugin as root via "sudo -E ./enable-solr-plugin.sh". Make sure that the user who is running Solr can read the "/etc/ranger/solr_service/policycache". Now follow the first tutorial to get an example SolrCloud instance up and running with a "gettingstarted" collection. We will not enable the authorization plugin just yet.

2) Create authorization policies for Solr using the Apache Ranger Admin service

Now follow the second tutorial to download and install the Apache Ranger admin service. To avoid conflicting with the Solr example we are securing, we will skip the section about auditing to Apache Solr (sections 3 and 4). In addition, in section 5 the "audit_store" property can be left empty, and the Solr audit properties can be omitted. Start the Apache Ranger admin service via: "sudo ranger-admin start", and open a browser at "http://localhost:6080", logging on with "admin/admin" credentials. Click on the "+" button for the Solr service and create a new service with the following properties:
  • Service Name: solr_service
  • Username: alice
  • Password: SolrRocks
  • Solr URL: http://localhost:8983/solr
Hit the "Test Connection" button and it should show that it has successfully connected to Solr. Click "Add" and then click on the "solr_service" link that is subsequently created. We will grant a policy that allows "alice" the ability to read the "gettingstarted" collection. If "alice" is not already created, go to "Settings/User+Groups" and create a new user there. Delete the default policy that is created in the "solr_service" and then click on "Add new policy" and create a new policy called "gettingstarted_policy". For "Solr Collection" enter "g" here and the "gettingstarted" collection should pop up. Add a new "allow condition" granting the user "alice" the "others" and "query" permissions.




3) Test authorization using the Apache Ranger plugin for Solr

Now we are ready to enable the Apache Ranger authorization plugin for Solr. Download the following security configuration which enables Basic Authentication in Solr as well as the Apache Ranger authorization plugin:
Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
 Now let's try to query the "gettingstarted" collection as 'alice':
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
This should be successful. However, authorization will fail for the case of "bob":
  • curl -u bob:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
In addition, although "alice" can query the collection, she can't write to it, and the following query will return 403:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book4", "title_t" : "Hamlet", "author_s" : "William Shakespeare"}]'
Categories: Colm O hEigeartaigh

Securing Apache Solr - part II

Colm O hEigeartaigh - Tue, 06/27/2017 - 14:10
This is the second post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. In this post we will temporarily deviate from the concept of "securing Apache Solr", and instead look at how the Apache Ranger admin service can be configured to store audit information in Apache Solr.

1) Download and extract the Apache Ranger admin service

The first step is to download the source code, as well as the signature file and associated message digests (all available on the download page). Verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting admin archive to a location where you wish to install the UI:
  • tar zxvf apache-ranger-incubating-1.0.0.tar.gz
  • cd apache-ranger-incubating-1.0.0
  • mvn clean package assembly:assembly 
  • tar zxvf target/ranger-1.0.0-admin.tar.gz
  • mv ranger-1.0.0-admin ${rangerhome}
2) Install MySQL

The Apache Ranger Admin UI requires a database to keep track of users/groups as well as policies for various big data projects that you are securing via Ranger. For the purposes of this tutorial, we will use MySQL. Install MySQL in $SQL_HOME and start MySQL via:
  • sudo $SQL_HOME/bin/mysqld_safe --user=mysql
Now you need to log on as the root user and create two users for Ranger. We need a root user with admin privileges (let's call this user "admin") and a user for the Ranger Schema (we'll call this user "ranger"):
  • CREATE USER 'admin'@'localhost' IDENTIFIED BY 'password';
  • GRANT ALL PRIVILEGES ON * . * TO 'admin'@'localhost' WITH GRANT OPTION;
  • CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'password';
  • FLUSH PRIVILEGES;
Finally,  download the JDBC driver jar for MySQL and put it in ${rangerhome}.

3) Configure Apache Solr to support auditing from Ranger

Before installing the Apache Ranger admin service we will need to configure Apache Solr. The Apache Ranger admin service ships with a script to make this easier to configure. Edit 'contrib/solr_for_audit_setup/install.properties' with the following properties:
  • SOLR_USER/SOLR_GROUP - the user/group you are running solr as
  • SOLR_INSTALL_FOLDER - Where you have extracted Solr to as per the first tutorial.
  • SOLR_RANGER_HOME - Where to install the Ranger configuration for Solr auditing.
  • SOLR_RANGER_PORT - The port to be used (8983 as per the first tutorial).
  • SOLR_DEPLOYMENT - solrcloud
  • SOLR_HOST_URL - http://localhost:8983
  • SOLR_ZK - localhost:2181
Make sure that the user running Solr has permission to write to the value configured for "SOLR_LOG_FOLDER" (/var/log/solr/ranger_audits). Now in 'contrib/solr_for_audit_setup' run 'sudo -E ./setup.sh'. The Solr configuration is now copied to $SOLR_RANGER_HOME.

4) Start Apache Zookeeper and SolrCloud

Before starting Apache Solr we will need to start Apache Zookeeper. Download Apache Zookeeper and start it on port 2181 via (this step was not required in the previous tutorial as we were launching SolrCloud with an embedded Zookeeper instance):
  • bin/zkServer.sh start
As per the first post, we want to secure access to SolrCloud via Basic Authentication (note that this is only recently fixed in Apache Ranger). So follow the steps in this post to upload the security.json to Zookeeper via:
  • server/scripts/cloud-scrip/zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json security.json
Start Solr as follows in the '${SOLR_RANGER_HOME}/ranger_audit_server/scripts' directory:
  • ./add_ranger_audits_conf_to_zk.sh 
  • ./start_solr.sh
Edit 'create_ranger_audits_collection.sh' and change 'curl --negotiate -u :' to 'curl -u "alice:SolrRocks"'. Save it and then run:
  • ./create_ranger_audits_collection.sh
5) Install the Apache Ranger Admin UI

Edit ${rangerhome}/install.properties and make the following changes:
  • Change SQL_CONNECTOR_JAR to point to the MySQL JDBC driver jar that you downloaded above.
  • Set (db_root_user/db_root_password) to (admin/password)
  • Set (db_user/db_password) to (ranger/password)
  • audit_solr_urls: http://localhost:8983/solr/ranger_audits
  • audit_solr_user: alice
  • audit_solr_password: SolrRocks
  • audit_solr_zookeepers: localhost:2181
Now you can run the setup script via "sudo -E ./setup.sh". When this is done then start the Apache Ranger admin service via: "sudo ranger-admin start".

6) Test that auditing is working correctly in the Ranger Admin service

Open a browser and navigate to "http://localhost:6080". Try to log on first using some made up credentials. Then log in using "admin/admin". Click on the "Audit" tab and then select "Login Sessions". You should see the incorrect and the correct login attempts, meaning that ranger is successfully storing and retrieving audit information in Solr:


Categories: Colm O hEigeartaigh

Securing Apache Solr - part I

Colm O hEigeartaigh - Mon, 06/26/2017 - 11:46
This is the first post in a series of articles on securing Apache Solr. In this post we will look at deploying an example SolrCloud instance and securing access to it via basic authentication.

1) Install and deploy a SolrCloud example

Download and extract Apache Solr (6.6.0 was used for the purpose of this tutorial). Now start SolrCloud via:
  • bin/solr -e cloud
Accept all of the default options. This creates a cluster of two nodes, with a collection "gettingstarted" split into two shards and two replicas per-shard. A web interface is available after startup at: http://localhost:8983/solr/.

Once the cluster is up and running we can post some data to the collection we have created via the REST interface:
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book1", "title_t" : "The Merchant of Venice", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book2", "title_t" : "Macbeth", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book3", "title_t" : "Death of a Salesman", "author_s" : "Arthur Miller"}]'
We can search the REST interface to for example return all entries by William Shakespeare as follows:
  • curl http://localhost:8983/solr/gettingstarted/query?q=author_s:William+Shakespeare
2) Authenticating users to our SolrCloud instance

Now that our SolrCloud instance is up and running, let's look at how we can secure access to it, by using HTTP Basic Authentication to authenticate our REST requests. Download the following security configuration which enables Basic Authentication in Solr:
Two users are defined - "alice" and "bob" - both with password "SolrRocks". Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
Now try to run the search query above again using Curl. A 401 error will be returned. Once we specify the correct credentials then the request will work as expected, e.g.:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
Categories: Colm O hEigeartaigh

SSO support for Apache Syncope REST services

Colm O hEigeartaigh - Thu, 06/22/2017 - 18:05
Apache Syncope has recently added SSO support for its REST services in the 2.0.3 release. Previously, access to the REST services of Syncope was via HTTP Basic Authentication. From the 2.0.3 release, SSO support is available using JSON Web Tokens (JWT). In this post, we will look at how this works and how it can be configured.

1) Obtaining an SSO token from Apache Syncope

As stated above, in the past it was necessary to supply HTTP Basic Authentication credentials when invoking on the REST API. Let's look at an example using curl. Assume we have a running Apache Syncope instance with a user "alice" with password "ecila". We can make a GET request to the user self service via:
  • curl -u alice:ecila http://localhost:8080/syncope/rest/users/self
It may be inconvenient to supply user credentials on each request or the authentication process might not scale very well if we are authenticating the password to a backend resource. From Apache Syncope 2.0.3, we can instead get an SSO token by sending a POST request to "accessTokens/login" as follows:
  • curl -I -u alice:ecila -X POST http://localhost:8080/syncope/rest/accessTokens/login
The response contains two headers:
  • X-Syncope-Token: A JWT token signed according to the JSON Web Signature (JWS) spec.
  • X-Syncope-Token-Expire: The expiry date of the token
The token in question is signed using the (symmetric) "HS512" algorithm. It contains the subject "alice" and the issuer of the token ("ApacheSyncope"), as well as a random token identifier, and timestamps that indicate when the token was issued, when it expires, and when it should not be accepted before.

The signing key and the issuer name can be changed by editing 'security.properties' and specifying new values for 'jwsKey' and 'jwtIssuer'. Please note that it is critical to change the signing key from the default value! It is also possible to change the signature algorithm from the next 2.0.4 release via a custom 'securityContext.xml' (see here). The default lifetime of the token (120 minutes) can be changed via the "jwt.lifetime.minutes" configuration property for the domain.

2) Using the SSO token to invoke on a REST service

Now that we have an SSO token, we can use it to invoke on a REST service instead of specifying our username and password as before, e.g.:
  • curl -H "X-Syncope-Token: eyJ0e..." http://localhost:8080/syncope/rest/users/self
The signature is first checked on the token, then the issuer is verified so that it matches what is configured, and then the expiry and not-before dates are checked. If the identifier matches that of a saved access token then authentication is successful.

Finally, SSO tokens can be seen in the admin console under "Dashboard/Access Token", where they can be manually revoked by the admin user:


Categories: Colm O hEigeartaigh

Pages

Subscribe to Talend Community Coders aggregator