Latest Activity

The Apache Sentry security service - part II

Colm O hEigeartaigh - Wed, 02/14/2018 - 15:37
This is the second in a series of blog posts on the Apache Sentry security service. The first post looked at how to get started with the Apache Sentry security service, both from scratch and via a docker image. The next logical question is how can we can define the authorization privileges held in the Sentry security service. In this post we will briefly cover what those privileges look like, and how we can query them using two different tools that ship with the Apache Sentry distribution.

1) Apache Sentry privileges

The Apache Sentry docker image we covered in the previous tutorial ships with a 'sentry.ini' configuration file (see here) that is used to retrieve the groups associated with a given user. A user must be a member of the "admin" group to invoke on the Apache Sentry security service, as configured in 'sentry-site.xml' (see here).  To avoid confusion, 'sentry.ini' also contains "[groups]" and "[roles]" sections, but these are not used by the Sentry security service.

In Apache Sentry, a user is associated with one or more groups, which in turn are associated with one or more roles, which in turn are associated with one or more privileges. Privileges are made up of a number of different components that vary slightly depending on what service the privilege is associated with (e.g. Hive, Kafka, etc.). For example:
  • Host=*->Topic=test->action=ALL - This Kafka privilege grants all actions on the "test" topic on all hosts.
  • Collection=logs->action=* - This Solr privilege grants all actions on the "logs" collection.
  • Server=sqoopServer1->Connector=c1->action=* - This Sqoop privilege grants all actions on the "c1" connector on the "sqoopServer1" server.
  • Server=server1->Db=default->Table=words->Column=count->action=select - This Hive privilege grants the "select" action on the "count" column of the "words" table in the "default" database on the "server1" server.
For more information on the Apache sentry privilege model please consult the official wiki.

2) Querying the Apache Sentry security service using 'sentryShell'

Follow the steps outlined in the previous tutorial to get the Apache Sentry security service up and running using either the docker image or by setting it up manually. The Apache Sentry distribution ships with a "sentryShell" command line tool that we can use to query that Apache Sentry security service. So depending on which approach you followed to install Sentry, either go to the distribution or else log into the docker container.

We can query the roles, groups and privileges via:
  • bin/sentryShell -conf sentry-site.xml -lr
  • bin/sentryShell -conf sentry-site.xml -lg
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role
We can create a "admin_role" role and add it to the "admin" group via:
  • bin/sentryShell -conf sentry-site.xml -cr -r admin_role
  • bin/sentryShell -conf sentry-site.xml -arg -g admin -r admin_role
We can grant a (Hive) privilege to the "admin_role" role as follows:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Server=*->action=ALL"
If we are adding a privilege for anything other than Apache Hive, we need to explicitly specify the "type", e.g.:
  • bin/sentryShell -conf sentry-site.xml -gpr -r admin_role -p "Host=*->Cluster=kafka-cluster->action=ALL" -t kafka
  • bin/sentryShell -conf sentry-site.xml -lp -r admin_role -t kafka
3) Querying the Apache Sentry security service using 'sentryCli'

A rather more user-friendly alternative to the 'sentryShell' is available in Apache Sentry 2.0.0. The 'sentryCli' can be started with 'bin/sentryCli'. Typing ?l lists the available commands:

The Apache Sentry security service can be queried using any of these commands.
Categories: Colm O hEigeartaigh

Adieu Talend, Hello Red Hat !

Sergey Beryozkin - Tue, 02/13/2018 - 22:28
Seems it was nearly yesterday when I joined Talend seven years ago. Time has flown so fast... Next week I will be returning to Red Hat but first I will talk a bit about my years with Talend.

I'd like to believe that working for Talend has helped me become a better engineer, grow in confidence.  And what about those unforgettable Talend R&D events :-) ? No doubt, it has been an interesting and exciting journey.

It has not been easy to find a link to a piece of music which would associate well with the company, but I think I've got it in the end. The text there is a bit sombre, but the music reflects well what I'd like to remember about Talend, the energy and the style: enjoy Ave Cesaria By Stromae. Thank you Talend, Goodbye.

And now I'll be heading back to Red Hat :-). I will be joining a WildFly Swarm team and I look forward to and optimistic about it and the new challenge. I'll have to learn new things. I will enjoy it too. And in time, after I settle well, I will return to this blog and talk about WildFly Swarm and other related projects.

Stay Tuned !
Categories: Sergey Beryozkin

The Apache Sentry security service - part I

Colm O hEigeartaigh - Mon, 02/12/2018 - 17:30
Apache Sentry is a role-based authorization solution for a number of big-data projects. I have previously blogged about how to install the authorization plugin to secure various deployments, e.g.:
For all of these tutorials, the authorization privileges were stored in a configuration file local to the deployment. However this is just a "test configuration" to get simple examples up and running quickly. For production scenarios, Apache Sentry offers a central security service, which stores the user roles and privileges in a database, and provides an RPC service that the Sentry authorization plugins can invoke on. In this article, we will show how to set up the Apache Sentry security service in a couple of different ways.

1) Installing the Apache Sentry security service manually

Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}. In addition, download a compatible version of Apache Hadoop (2.7.5 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Hadoop distribution.

First we need to specify two configuration files, "sentry-site.xml" which contains the Sentry configuration, and "sentry.ini" which defines the user/group information for the user who will be invoking on the Sentry security service. You can download sample configuration files here. Copy these files to the root directory of "${sentry.home}". Edit the 'sentry.ini' file and replace 'user' with the user who will be invoking on the security service (such as "kafka" or "solr"). The other entries will be ignored - 'sentry-site.xml' defines that a user must belong to the "admin" group to invoke on the security service successfully.

Finally configure the database and start the Apache Sentry security service via:
  • bin/sentry --command schema-tool --conffile sentry-site.xml --dbType derby --initSchema
  • bin/sentry --command service -c sentry-site.xml
2) Installing the Apache Sentry security service via docker

Instead of having to download and configure Apache Sentry and Hadoop, a simpler way to get started is to download a pre-made docker image that I created. The DockerFile is available here and the docker image is available here. Note that this docker image is only for testing use, as the security service is not secured with kerberos and it uses the default credentials. Download and run the docker image with:
  • docker pull coheigea/sentry
  • docker run -p 8038:8038 coheigea/sentry
Once the container has started, we need to update the 'sentry.ini' file with the username that we are going to use to invoke on the Apache Sentry security service. Get the "id" of the running container via "docker ps" and then run "docker exec -it <id> bash". Edit 'sentry.ini' and change 'user' to the username you are using.

In the next tutorial we will look at how to manually invoke on the security service.

Categories: Colm O hEigeartaigh

Apache CXF Story Will Continue

Sergey Beryozkin - Thu, 02/08/2018 - 17:57
When I started working on Apache CXF full time it was already a well established project, shipping a production quality JAX-WS and an early JAX-RS implementations.

During the next N years, with some short breaks, all of us did put a lot of effort into supporting the CXF community,  keeping enhancing the JAX-RS, various security features, fixing lots and lots of bugs, and trying to support the idea that "CXF was more than just a library" :-).

I'm curious, how many SOAP or pure HTTP calls have been made over the years with the help of CXF ? Sometimes one can read: "This product supports thousands of transactions per minute". Would be fun to read somewhere that "CXF has supported several millions of service calls over 10 years" :-). Or how many downloads have been made ? Who knows...

It is satisfying to see that today the users keep coming to CXF and ask questions and open new issues. No doubt it has helped many users, helped to completely mainstream JAX-WS and then JAX-RS, alongside its Jersey and RestEasy 'colleague' frameworks.

No doubt the Apache CXF story will continue and I've been happy to be part of this story. Thank You !

Categories: Sergey Beryozkin

Securing Apache Sqoop - part III

Colm O hEigeartaigh - Thu, 02/08/2018 - 15:35
This is the third and final post about securing Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. The second post showed how to secure Apache Sqoop with Apache Ranger. In this post we will look at an alternative way of implementing authorization in Apache Sqoop, namely using Apache Sentry.

1) Install the Apache Sentry Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match, and extract it to ${sentry.home}.

a) Configure sqoop.properties

We need to configure Apache Sqoop to use Apache Sentry for authorization. Edit 'conf/sqoop.properties' and add the following properties:
  • org.apache.sqoop.security.authentication.type=SIMPLE
  • org.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandler
  • org.apache.sqoop.security.authorization.handler=org.apache.sentry.sqoop.authz.SentryAuthorizationHandler
  • org.apache.sqoop.security.authorization.access_controller=org.apache.sentry.sqoop.authz.SentryAccessController
  • org.apache.sqoop.security.authorization.validator=org.apache.sentry.sqoop.authz.SentryAuthorizationValidator
  • org.apache.sqoop.security.authorization.server_name=SqoopServer1
  • sentry.sqoop.site.url=file:./conf/sentry-site.xml
In addition, we need to add some of the Sentry jars to the Sqoop classpath. Add the following property to 'conf/sqoop.properties', substituting the value for "${sentry.home}":
  • org.apache.sqoop.classpath.extra=${sentry.home}/lib/sentry-binding-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-core-common-2.0.0.jar:${sentry.home}/lib/sentry-core-model-sqoop-2.0.0.jar:${sentry.home}/lib/sentry-provider-file-2.0.0.jar:${sentry.home}/lib/sentry-provider-common-2.0.0.jar:${sentry.home}/lib/sentry-provider-db-2.0.0.jar:${sentry.home}/lib/shiro-core-1.4.0.jar:${sentry.home}/lib/sentry-policy-engine-2.0.0.jar:${sentry.home}/lib/sentry-policy-common-2.0.0.jar

b) Add Apache Sentry configuration files

Next we will configure the Apache Sentry authorization plugin. Create a new file in the Sqoop "conf" directory called "sentry-site.xml" with the following content (substituting the correct directory for "sentry.sqoop.provider.resource"):

It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. Finally, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content, substituting "colm" for the name of the user running the Sqoop shell:

2) Test authorization 

Now start Apache Sqoop ("bin/sqoop2-server start") and start the shell ("bin/sqoop2-shell"). "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. To test that authorization is correctly disabling access for unauthorized users, change the "ALL" permission in 'conf/sentry.ini' to "WRITE", and restart the server and shell. This time access is not granted and a blank list should be returned for "show connector".
Categories: Colm O hEigeartaigh

Securing Apache Sqoop - part II

Colm O hEigeartaigh - Mon, 01/29/2018 - 18:55
This is the second in a series of posts on how to secure Apache Sqoop. The first post looked at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. In this post we will look at securing Apache Sqoop with Apache Ranger, such that only authorized users can interact with it. We will then show how to use the Apache Ranger Admin UI to create authorization policies for Apache Sqoop.

1) Install the Apache Ranger Sqoop plugin

If you have not done so already, please follow the steps in the earlier tutorial to set up Apache Sqoop. First we will install the Apache Ranger Sqoop plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-sqoop-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-sqoop-plugin ${ranger.sqoop.home}
Now go to ${ranger.sqoop.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "SqoopTest".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Sqoop installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-sqoop-plugin.sh". Make sure that the user you are running Sqoop as has permission to access '/etc/ranger/SqoopTest', which is where the Ranger plugin for Sqoop will download authorization policies created in the Ranger Admin UI.

In the Apache Sqoop directory, copy 'conf/ranger-sqoop-security.xml' to the root directory (or else add the 'conf' directory to the Sqoop classpath). Now restart Apache Sqoop and try to see the Connectors that were installed:
  • bin/sqoop2-server start
  • bin/sqoop2-shell
  • show connector
You should see an empty list here as you are not authorized to see the connectors. Note that "show job" should still work OK, as you have permission to view jobs that you created.

2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Sqoop. Follow the steps in this tutorial (except use at least Ranger 1.0.0) to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Sqoop service with the following configuration values:
  • Service Name: SqoopTest
  • Username: admin
  • Sqoop URL: http://localhost:12000
Note that "Test Connection" is not going to work here, as the "admin" user is not authorized at this stage to read from the Sqoop 2 server. However, once the service is created and the policies synced to the Ranger plugin in Sqoop (roughly every 30 seconds by default), it should work correctly.

Once the "SqoopTest" service is created, we will create some authorization policies for the user who is using the Sqoop Shell.
Click on "Settings" and "Users/Groups" and add a new user corresponding to the user for whom you wish to create authorization policies. When this is done then click on the "SqoopTest" service and edit the existing policies, adding this user (for example):


Wait 30 seconds for the policies to sync to the Ranger plugin that is co-located with the Sqoop service. Now re-start the Shell and "show connector" should list the full range of Sqoop Connectors, as authorization has succeeded. Similar policies could be created to allow only certain users to run jobs created by other users.



Categories: Colm O hEigeartaigh

Securing Apache Sqoop - part I

Colm O hEigeartaigh - Fri, 01/26/2018 - 18:16
This is the first in a series of posts on how to secure Apache Sqoop. Apache Sqoop is a tool to transfer bulk data mainly between HDFS and relational databases, but also supporting other projects such as Apache Kafka. In this post we will look at how to set up Apache Sqoop to perform a simple use-case of transferring a file from HDFS to Apache Kafka. Subsequent posts will show how to authorize this data transfer using both Apache Ranger and Apache Sentry.

Note that we will only use Sqoop 2 (current version 1.99.7), as this is the only version that both Sentry and Ranger support. However, this version is not (yet) recommended for production deployment.

1) Set up Apache Hadoop and Apache Kafka

First we will set up Apache Hadoop and Apache Kafka. The use-case is that we want to transfer a file from HDFS (/data/LICENSE.txt) to a Kafka topic (test). Follow part (1) of an earlier tutorial I wrote about installing Apache Hadoop. The following change is also required for ''etc/hadoop/core-site.xml' (in addition to the "fs.defaultFS" setting that is configured in the earlier tutorial):

Make sure that LICENSE.txt is uploaded to the /data directory as outlined in the tutorial. Now we will set up Apache Kafka. Download Apache Kafka and extract it (1.0.0 was used for the purposes of this tutorial). Start Zookeeper with:
  • bin/zookeeper-server-start.sh config/zookeeper.properties
and start the broker and then create a "test" topic with:
  • bin/kafka-server-start.sh config/server.properties
  • bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
Finally let's set up a consumer for the "test" topic:
  • bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer.config config/consumer.properties
2) Set up Apache Sqoop

Download Apache Sqoop and extract it (1.99.7 was used for the purposes of this tutorial).

2.a) Configure + start Sqoop

Before starting Sqoop, edit 'conf/sqoop.properties' and change the following property to point instead to the Hadoop configuration directory (e.g. /path.to.hadoop/etc/hadoop):
  • org.apache.sqoop.submission.engine.mapreduce.configuration.directory
Then configure and start Apache Sqoop with the following commands:
  • export HADOOP_HOME=path to Hadoop home
  • bin/sqoop2-tool upgrade
  • bin/sqoop2-tool verify
  • bin/sqoop2-server start (stop)
2.b) Configure links/job in Sqoop

Now that Sqoop has started we need to configure it to transfer data from HDFS to Kafka. Start the Shell via:
  • bin/sqoop2-shell
"show connector" lists the connectors that are available. We first need to configure a link for the HDFS connector:
  • create link -connector hdfs-connector
  • Name: HDFS
  • URI: hdfs://localhost:9000
  • Conf directory: Path to Hadoop conf directory
Similarly, for the Kafka connector:
  • create link -connector kafka-connector
  • Name: KAFKA
  • Kafka brokers: localhost:9092
  • Zookeeper quorum: localhost:2181
"show link" shows the links we've just created. Now we need to create a job from the HDFS link to the Kafka link as follows (accepting the default values if they are not specified below):
  • create job -f HDFS -t KAFKA
  • Name: testjob
  • Input Directory: /data
  • Topic: test
We can see the job we've created with "show job". Now let's start the job:
  • start job -name testjob 
You should see the content of the HDFS "/data" directory (i.e. the LICENSE.txt) appear in the window of the Kafka "test" consumer, thus showing that Sqoop has transfered data from HDFS to Kafka.
Categories: Colm O hEigeartaigh

Securing Apache Solr with Apache Sentry

Colm O hEigeartaigh - Tue, 01/23/2018 - 17:15
Last year I wrote a series of posts on securing Apache Solr, firstly using basic authentication and then using Apache Ranger for authorization. In this post we will look at an alternative authorization solution called Apache Sentry. Previously I have blogged about using Apache Sentry to secure Apache Hive and Apache Kafka.

1)  Install and deploy a SolrCloud example

Download and extract Apache Solr (7.1.0 was used for the purpose of this tutorial). Now start SolrCloud via:
  • bin/solr -e cloud
Accept all of the default options. This creates a cluster of two nodes, with a collection "gettingstarted" split into two shards and two replicas per-shard. A web interface is available after startup at: http://localhost:8983/solr/. Once the cluster is up and running we can post some data to the collection we have created via:
  • bin/post -c gettingstarted example/exampledocs/books.csv
We can then perform a search for all books with author "George R.R. Martin" via:
  • curl http://localhost:8983/solr/gettingstarted/query?q=author:George+R.R.+Martin
2) Authenticating users to our SolrCloud instance

Now that our SolrCloud instance is up and running, let's look at how we can secure access to it, by using HTTP Basic Authentication to authenticate our REST requests. Download the following security configuration which enables Basic Authentication in Solr:
Two users are defined - "alice" and "bob" - both with password "SolrRocks". Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
Now try to run the search query above again using Curl. A 401 error will be returned. Once we specify the correct credentials then the request will work as expected, e.g.:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author:George+R.R.+Martin 
3) Using Apache Sentry for authorization

a) Install the Apache Sentry distribution

Download the binary distribution of Apache Sentry (2.0.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract it to ${sentry.home}. Apache Sentry provides an RPC service which stores authorization privileges in a database. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file local to the Solr distrbution. Therefore we don't need to do any further configuration to the Apache Sentry distribution at this point.

b) Copy Apache Sentry jars into Apache Solr 

To get Sentry authorization working in Apache Solr, we need to copy some jars from the Sentry distribution into Solr. Copy the following jars from ${sentry.home}/lib into ${solr.home}/server/solr-webapp/webapp/WEB-INF/lib:
  • sentry-binding-solr-2.0.0.jar
  • sentry-core-model-solr-2.0.0.jar
  • sentry-core-model-db-2.0.0.jar
  • sentry-core-common-2.0.0.jar
  • shiro-core-1.4.0.jar
  • sentry-policy*.jar
  • sentry-provider-*
c) Add Apache Sentry configuration files

Next we will configure Apache Solr to use Apache Sentry for authorization. Create a new file in the Solr distribution called "sentry-site.xml" with the following content (substituting the correct directory for "sentry.solr.provider.resource"):
This is the configuration file for the Sentry plugin for Solr. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. Finally, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:

This configuration file contains three separate sections. The "[users]" section maps the authenticated principals to local groups. The "[groups]" section maps the groups to roles, and the "[roles]" section lists the actual privileges.

d) Update security.json to add authorization

Next we need to update the security.json to reference Apache Sentry for authorization. Use the following content, substituting the correct path for the "authorization.sentry.site" parameter. Also change the "superuser" to the user running Sentry:

Upload this file via:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json

5) Testing authorization

We need to restart Apache Solr to enable authorization with Apache Sentry. Stop Solr via:
  • bin/solr stop -all
Next edit 'bin/solr.in.sh' and add the following properties:
  • SOLR_AUTH_TYPE="basic"
  • SOLR_AUTHENTICATION_OPTS="-Dbasicauth=colm:SolrRocks"
Now restart Apache Solr and test authorization. When "bob" is used, an error should be returned (either 403 or in our case 500, as we have not configured a group for "bob"). "alice" should be able to query the collection, due to the authorization policy we have created for her.

Categories: Colm O hEigeartaigh

[OT] Work Next Year the way Arsenal and Liverpool played their football

Sergey Beryozkin - Wed, 12/27/2017 - 18:26
I've been lacking the ideas on how to write the regular and last off-topic post of the year, thinking which piece of music I should link to.

And then the inspiration came from the completely unexpected source.

Those of you who follow Arsenal FC in the English Premier League know that Arsenal can either draw against Liverpool but in such a way that fans will remember it for years (yes, that Liverpool 4 to Arsenal 4 draw), or, most likely, lose badly to this Merseyside team, something like 1:4 or similar.

So less than a week earlier, Arsenal was playing with Liverpool in London, losing 0:2. Oh well, most Arsenal fans thought, one of those days which can only be described in Fever Pitch. Then, in the 2nd half, after the few minutes, while the fans were having mince pies and tea,  Arsenal were 3:2 up, with Liverpool managing to equalize. The game saw many mistakes and brilliant moves and fans just had the day of the year watching the game. I liked this summary.

How would I translate that to a New Year wish for you, the software engineers ? Here it is:

Enjoy your work next year, try to do something extraordinary, something new, and don't be afraid to make mistakes :-)

Happy New Year !


Categories: Sergey Beryozkin

Get OpenAPI v3 JSON with CXF Now !

Sergey Beryozkin - Wed, 12/27/2017 - 17:56
The Apache CXF has done some initial work to have OpenAPI v3 JSON reported from the JAX-RS endpoints.

Andriy Redko has started with the OpenApiFeature implementation which depends on the latest OpenApi v3 aware swagger-core and swagger-jaxrs libraries and demoed it here.

In meantime I did a Swagger2 to OpenApi v3 JSON conversion filter which reacts to openapi.json queries by converting Swagger2 swagger.json produced by Swagger2Feature to openapi.json with the idea of making it easier for the existing upstream code (which has already integrated Swagger2Feature) to start experimenting with OpenAPI v3, before doing the switch to the new feature (and dependencies).  

This effort is still a work in progress but the results in both cases are promising. The new feature and the conversion filter will require some more improvements but you can start experimenting with them right now. And if you are someone like me then you will be positively surprised that SwaggerUI  3.6.1 and newer can handle both Swagger2 and OpenAPI v3 JSON :-).

Enjoy !
Categories: Sergey Beryozkin

Simple Reuse of org.reactivestreams in CXF

Sergey Beryozkin - Wed, 12/27/2017 - 17:36
I mentioned earlier that one could link RxJava2 Flowable with JAX-RS AsyncResponse with Subscriber which will do the best effort at streaming the data pieces converted to JSON array elements, see this example.

That works but requires the application code refer to both JAX-RS AsyncResponse and CXF specific JsonStreamingAsyncSubscriber (RxJava2 specific at the earlier stage), as opposed to simply returning Flowable from the resource method.

In meantime, John Ament added the initial Reactor integration code, and as part of this work John also provided the org.reactivestreams compatible JsonStreamingAsyncSubscriber to be optionally used with the CXF Reactor invoker.

As a result we've found the opportunity to do some refactoring and introduce the simple org.reactivestreams utility module which is now reused between CXF RxJava2 invoker and Reactor invoker: the common invoker code both invokers delegate to will check if JSON is expected and if yes then will register JsonStreamingAsyncSubscriber as org.reactivestreams.Subscriber with org.reactivestreams.Publisher which can be either RxJava2 Flowable or Reactor Flux, or in fact - Java9 Flow.

The end result is that users can now write simpler code by returning Flowable or Flux from the service methods. 

It is an interesting but simple example of reusing the org.reactivestreams aware code between different org.reactivestreams implementations.
Categories: Sergey Beryozkin

A fast way to get membership counts in Apache Syncope

Colm O hEigeartaigh - Wed, 12/13/2017 - 17:47
Apache Syncope is a powerful open source Identity Management project, covered extensively on this blog. Amongst many other features, it allows the management of three core types - Users, Groups and "Any Objects", the latter which can be used to model arbitrary types. These core types can be accessed via a flexible REST API powered by Apache CXF. In this post we will explore the concept of "membership" in Apache Syncope, as well as a new feature that was added for Syncope 2.0.7 which allows an easy way to see membership counts.

1) Membership in Apache Syncope

Users and "Any Objects" can be members of Groups in two ways - statically and dynamically. "Static" membership is when the User or "Any Object" is explicitly assigned membership of a given Group. "Dynamic" membership is when the Group is defined with a set of rules, which if they evaluate to true for a given User or "Any Object", then that User or "Any Object" is a member of the group. For example, a User could be a dynamic member of a group based on the value for a given User attribute. So we could have an Apache group with a dynamic User membership rule of "*@apache.org" matching an "email" attribute.

2) Exploring group membership via the REST API

Let's examine group membership with some practical examples. Start Apache Syncope and log in to the admin console. Click on "Groups" and add a new group called "employee", accepting the default options. Now click on the "User" tab and add new Users called "alice" and "bob", with static membership of the "employee" group.

Using a tool like "curl", we can access the REST API using the admin credentials to obtain information on "alice":
  • curl -u admin:password http://localhost:9080/syncope/rest/users/alice
Note that "alice" has a "memberships" attribute pointing to the "employee" group. Next we can see information on the "employee" group via:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
3) Obtaining membership counts

Now consider obtaining the membership count of a given group. Let's say we are interested in finding out how many employees we have - how can this be done? Prior to Apache Syncope 2.0.7, we have to leverage the power of FIQL which underpins the search capabilities of the REST API of Apache Syncope:
  • curl -u admin:password http://localhost:9080/syncope/rest/users?fiql=%24groups==employee
In other words, search for all Users who are members of the "employee" group. This returns a long list of all Users, even though all we care about is the count (which is encoded in the "totalCount" attribute). There is a new way to do this Apache Syncope 2.0.7. Instead of having to search for Users, membership counts are now encoded in groups. So we can see the total membership counts for a given group just by doing a GET call:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
Following the example above, you should see an "staticUserMembershipCount" attribute with a value of "2". Four new attributes are defined for GroupTO:
  • staticUserMembershipCount: The static user membership count of a given group
  • dynamicUserMembershipCount: The dynamic user membership count of a given group
  • staticAnyObjectMembershipCount: The static "Any Object" membership count of a given group
  • dynamicAnyObjectMembershipCount: The dynamic "Any Object" membership count of a given group.
Some consideration was given to returning the Any Object counts associated with a given Any Object type, but this was abandoned due to performance reasons.
Categories: Colm O hEigeartaigh

SAML SSO support for the Apache Syncope web console

Colm O hEigeartaigh - Fri, 12/08/2017 - 18:09
Apache Syncope is a powerful open source Identity Management project, that has recently celebrated 5 years as an Apache top level project. Up to recently, a username and password must be supplied to log onto either the admin or enduser web consoles of Apache Syncope. However SAML SSO login is now supported since the 2.0.3 release. Instead of supplying a username/password, the user is redirected to a third party IdP for login, before redirecting back to the Apache Syncope web console. In 2.0.5, support for the IdP-initiated flow of SAML SSO was added.

In this post we will show how to configure Apache Syncope to use SAML SSO as an alternative to logging in using a username and password. We will use Apache CXF Fediz as the SAML SSO IdP. In addition, we will show how to achieve IdP-initiated SSO using Okta. Please also refer to this tutorial on achieving SAML SSO with Syncope and Shibboleth.

1) Logging in to Apache Syncope using SAML SSO

In this section, we will cover setting up Apache Syncope to re-direct to a third party IdP so that the user can enter their credentials. The next section will cover the IdP-initiated case.

1.a) Enable SAML SSO support in Apache Syncope

First we will configure Apache Syncope to enable SAML SSO support. Download and extract the most recent standalone distribution release of Apache Syncope (2.0.6 was used in this post). Start the embedded Apache Tomcat instance and then open a web browser and navigate to "http://localhost:9080/syncope-console", logging in as "admin" and "password".

Apache Syncope is configured with some sample data to show how it can be used. Click on "Users" and add a new user called "alice" by clicking on the subsequent "+" button. Specify a password for "alice" and then select the default values wherever possible (you will need to specify some required attributes, such as "surname"). Now in the left-hand column, click on "Extensions" and then "SAML 2.0 SP". Click on the "Service Provider" tab and then "Metadata". Save the resulting Metadata document, as it will be required to set up the SAML SSO IdP.

1.b) Set up the Apache CXF Fediz SAML SSO IdP

Next we will turn our attention to setting up the Apache CXF Fediz SAML SSO IdP. Download the most recent source release of Apache CXF Fediz (1.4.3 was used for this tutorial). Unzip the release and build it using maven ("mvn clean install -DskipTests"). In the meantime, download and extract the latest Apache Tomcat 8.5.x distribution (tested with 8.5.24). Once Fediz has finished building, copy all of the "IdP" wars (e.g. in fediz-1.4.3/apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/idp/war/fediz-*) to the Tomcat "webapps" directory.

There are a few configuration changes to be made to Apache Tomcat before starting it. Download the HSQLDB jar and copy it to the Tomcat "lib" directory. Next edit 'conf/server.xml' and configure TLS on port 8443:

The two keys referenced here can be obtained from 'apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/examples/samplekeys/' and should be copied to the root directory of Apache Tomcat. Tomcat can now be started.

Next we have to configure Apache CXF Fediz to support Apache Syncope as a "service" via SAML SSO. Edit 'webapps/fediz-idp/WEB-INF/classes/entities-realma.xml' and add the following configuration:

In addition, we need to make some changes to the "idp-realmA" bean in this file:
  • Add a reference to this bean in the "applications" list: <ref bean="srv-syncope" />
  • Change the "idpUrl" property to: https://localhost:8443/fediz-idp/saml
  • Change the port for "stsUrl" from "9443" to "8443".
Now we need to configure Fediz to accept Syncope's signing cert. Edit the Metadata file you saved from Syncope in step 1.a. Copy the Base-64 encoded certificate in the "KeyDescriptor" section, and paste it (including line breaks) into 'webapps/fediz-idp/WEB-INF/classes/syncope.cert', enclosing it in between "-----BEGIN CERTIFICATE-----" and "-----END CERTIFICATE-----".

Now restart Apache Tomcat. Open a browser and save the Fediz metadata which is available at "http://localhost:8080/fediz-idp/metadata?protocol=saml", which we will require when configuring Apache Syncope.

1.c) Configure the Apache CXF Fediz IdP in Syncope

The final configuration step takes place in Apache Syncope again. In the "SAML 2.0 SP" configuration screen, click on the "Identity Providers" tab and click the "+" button and select the Fediz metadata that you saved in the previous step. Now logout and an additional login option can be seen:


Select the URL for the SAML SSO IdP and you will be redirected to Fediz. Select the IdP in realm "A" as the home realm and enter credentials of "alice/ecila" when prompted. You will be successfully authenticated to Fediz and redirected back to the Syncope admin console, where you will be logged in as the user "alice". 

2) Using IdP-initiated SAML SSO

Instead of the user starting with the Syncope web console, being redirected to the IdP for authentication, and then redirected back to Syncope - it is possible instead to start from the IdP. In this section we will show how to configure Apache Syncope to support IdP-initiated SAML SSO using Okta.

2.a) Configuring a SAML application in Okta

The first step is to create an account at Okta and configure a SAML application. This process is mapped out at the following link. Follow the steps listed on this page with the following additional changes:
  • Specify the following for the Single Sign On URL: http://localhost:9080/syncope-console/saml2sp/assertion-consumer
  • Specify the following for the audience URL: http://localhost:9080/syncope-console/
  • Specify the following for the default RelayState: idpInitiated
When the application is configured, you will see an option to "View Setup Instructions". Open this link in a new tab and find the section about the IdP Metadata. Save this to a local file and set it aside for the moment. Next you need to assign the application to the username that you have created at Okta.

2.b) Configure Apache Syncope to support IdP-Initiated SAML SSO

Log on to the Apache Syncope admin console using the admin credentials, and add a new IdP Provider in the SAML 2.0 SP extension as before, using the Okta metadata file that you have saved in the previous section. Edit the metadata and select the 'Support Unsolicited Logins' checkbox. Save the metadata and make sure that the Okta user is also a valid user in Apache Syncope.

Now go back to the Okta console and click on the application you have configured for Apache Syncope. You should seemlessly be logged into the Apache Syncope admin console.




Categories: Colm O hEigeartaigh

Relocating (again :-) )

Olivier Lamy - Sat, 12/02/2017 - 00:50
Long time I didn't post anything here... :-)
So after 4.5yo with a very happy life @ Melbourne, we have decided to move further north..
Mid January we will start new adventures in Brisbane. Very exciting time for us (well kids a bit sad to leave friends but happy to try something different).
Some have asking us why if you're happy here :-)
Well a bit difficult to answer....
Maybe tired of the so famous Melbourne weather....
Yes the Brisbane weather looks so great: short pans and thongs all the year :-)
Housing is ridiculous expensive @ Melbourne so time to stop paying the someone else mortgage
After long time, it's maybe a good idea to leave the comfort zone and try something different.

Cya soon up north :-)
Categories: Olivier Lamy

Kerberos cross-realm support in Apache Kerby 1.1.0

Colm O hEigeartaigh - Fri, 12/01/2017 - 13:52
A recent blog post covered how to install the Apache Kerby KDC. In this post we will build on that tutorial to show how to get a major new feature of Apache Kerby 1.1.0 to work - namely kerberos cross-realm support. Cross-realm support means that the KDCs in realm "A" and realm "B" are configured in such a way that a user who is authenticated in realm "A" can obtain a service ticket for a service in realm "B" without having to explicitly authenticate to the KDC in realm "B".

1) Configure the KDC for the "EXAMPLE.COM" realm

First we will configure the Apache Kerby KDC for the "EXAMPLE.COM" realm. Follow the previous tutorial to install and configure the KDC for this (default) realm. We need to follow some additional steps to get cross-realm support working with a second KDC in realm "EXAMPLE2.COM". Edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next we need to add a special principal to the KDC to enable cross-realm support via (after restarting the KDC):
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
2) Configure the KDC for the "EXAMPLE2.COM" realm

Now we will configure a second KDC for the "EXAMPLE2.COM" realm. Download the Apache Kerby source code as before. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
Copy "kdc-dist" to a location where you wish to install the second KDC. In this directory, create a directory called "keytabs" and "runtime". Edit 'conf/backend.conf' and change the value for 'backend.json.dir' to avoid conflict with the first KDC instance. Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
For testing purposes, we will change the port of the KDC from the default "88" to "54321" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "54321". In addition, change the realm from "EXAMPLE.COM" to "EXAMPLE2.COM" in both of these files. As above, edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next start the KDC via:
  • sh bin/start-kdc.sh conf runtime
We need to add a special principal to the KDC to enable cross-realm support, as in the KDC for the "EXAMPLE.COM" realm. Note that it must be the same principal name and password as for the first realm. We will also add a principal for a service in this realm:
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
  • addprinc -pw password service@EXAMPLE2.COM
3) Obtaining a service ticket for service@EXAMPLE2.COM as alice@EXAMPLE.COM

Now we can obtain a service ticket for the service we have configured in the "EXAMPLE2.COM" realm as a user who is authenticated to the "EXAMPLE.COM" realm. Configure the "tool-dist" distribution as per the previous tutorial, updating 'conf/krb5.conf' with the same "realms", "domain_realm" and "capaths" information as shown above. Now we can authenticate as "alice" and obtain a service ticket as follows:
  • sh bin/kinit.sh -conf conf alice@EXAMPLE.COM
  • sh bin/kinit.sh -conf conf -c /tmp/krb5cc_1000 -S service@EXAMPLE2.COM
If you run "klist" then you should see that a ticket for "service@EXAMPLE2.COM" was obtained successfully.
    Categories: Colm O hEigeartaigh

    Authorizing access to Apache Yarn using Apache Ranger

    Colm O hEigeartaigh - Wed, 11/29/2017 - 17:46
    Earlier this year, I wrote a series of blog posts on how to secure access to the Apache Hadoop filesystem (HDFS), using tools like Apache Ranger and Apache Atlas. In this post, we will go further and show how to authorize access to Apache Yarn using Apache Ranger. Apache Ranger allows us to create and enforce authorization policies based on who is allowed to submit applications to run on Apache Yarn. Therefore it can be used to enforce authorization decisions for Hive on Yarn or Spark on Yarn jobs.

    1) Installing Apache Hadoop

    First, follow the steps outlined in the earlier tutorial (section 1) on setting up Apache Hadoop, except that in this tutorial we will work with Apache Hadoop 2.8.2. In addition, we will need to follow some additional steps to configure Yarn (see here for the official documentation). Create a new file called 'etc/hadoop/mapred-site.xml' with the content:
    Next edit 'etc/hadoop/yarn-site.xml' and add:
    Now we can start Apache Yarn via 'sbin/start-yarn.sh'. We are going to submit jobs as a local user called "alice" to test authorization. First we need to create some directories in HDFS:
    • bin/hdfs dfs -mkdir -p /user/alice/input
    • bin/hdfs dfs -put etc/hadoop/*.xml /user/alice/input
    • bin/hadoop fs -chown -R alice /user/alice
    • bin/hadoop fs -mkdir /tmp
    • bin/hadoop fs -chmod og+w /tmp
    Now we can submit an example job as "alice" via:
    • sudo -u alice bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
    The job should run successfully and store the output in '/user/alice/output'. Delete this directory before trying to run the job again ('bin/hadoop fs -rm -r /user/alice/output').

    2) Install the Apache Ranger Yarn plugin

    Next we will install the Apache Ranger Yarn plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-yarn-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-yarn-plugin ${ranger.yarn.home}
    Now go to ${ranger.yarn.home} and edit "install.properties". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "YarnTest".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
    Save "install.properties" and install the plugin as root via "sudo -E ./enable-yarn-plugin.sh". Make sure that the user who is running Yarn has the permission to read the policies stored in '/etc/ranger/YarnTest'. There is one additional step to be performed in Hadoop before restarting Yarn. Edit 'etc/hadoop/ranger-yarn-security.xml' and add a property called "ranger.add-yarn-authorization" with value "false". This means that if Ranger policy authorization fails, it doesn't fall back to the default Yarn ACLs (which allow all users to submit jobs to the default queue).

    Finally, re-start Yarn and try to resubmit the job as "alice" as per the previous section. You should now see an authorization error: "User alice cannot submit applications to queue root.default".

    3) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for Yarn. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Yarn service with the following configuration values:
    • Service Name: YarnTest
    • Username: admin
    • Password: admin
    • Yarn REST URL: http://localhost:8088
    Click on "Test Connection" to verify that we can connect successfully to Yarn + then save the new service. Now click on the "YarnTest" service that we have created. Add a new policy for the "root.default" queue for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with a permission of "submit-app".

    Allow up to 30 seconds for the Apache Ranger plugin to download the new authorization policy from the admin service. Then try to re-run the job as "alice". This time it should succeed due to the authorization policy that we have created.
    Categories: Colm O hEigeartaigh

    Installing the Apache Kerby KDC

    Colm O hEigeartaigh - Tue, 11/28/2017 - 16:16
    Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java. Apache Kerby 1.1.0 has just been released. This release contains two major new features: a GSSAPI module (covered previously here) and cross-realm support (the subject of a forthcoming blog post).

    I have previously used Apache Kerby in this blog as a KDC to illustrate some security-based test-cases for big data components such as Apache Hadoop, Hive, Storm, etc, by pointing to some code on github that shows how to launch a Kerby KDC using Apache maven. This is convenient as a KDC can be launched with the principals already created via a single maven command. However, it is not suitable if the KDC is to be used in a standalone setting.

    In this post, we will show how to create a Kerby KDC distribution without writing any code.

    1) Install and configure the Apache Kerby KDC

    The first step is to download the Apache Kerby source code. Unzip the source and build the distribution via:
    • mvn clean install -DskipTests
    • cd kerby-dist
    • mvn package -Pdist
    The "kerby-dist" directory contains the KDC distribution in "kdc-dist", as well as the client tools in "tool-dist". Copy both "kdc-dist" and "tool-dist" directories to another location instead of working directly in the Kerby source. In "kdc-dist" create a directory called "keytabs" and "runtime". Then create some keytabs via:
    • sh bin/kdcinit.sh conf keytabs
    This will create keytabs for the "kadmin" and "protocol" principals, and store them in the "keytabs" directory. For testing purposes, we will change the port of the KDC from the default "88" to "12345" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "12345".

    The Kerby principals are stored in a backend that is configured in "conf/backend.conf". By default this is a JSON file that is stored in "/tmp/kerby/jsonbackend". However, Kerby also supports other more robust backends, such as LDAP, Mavibot, Zookeeper, etc.

    We can start the KDC via:
    • sh bin/start-kdc.sh conf runtime
    Let's create a new user called "alice":
    • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
    • addprinc -pw password alice@EXAMPLE.COM
    2) Install and configure the Apache Kerby tool dist

    We can check that the KDC has started properly using the MIT kinit tool, if it is installed locally:
    • export KRB5_CONFIG=/path.to.kdc.dist/conf/krb5.conf
    • kinit alice (use "password" for the password when prompted)
    Now you can see the ticket for alice using "klist". Apache Kerby also ships a "tool-dist" distribution that contains implementations of "kinit", "klist", etc. First call "kdestroy" to remove the ticket previously obtained for "alice". Then go into the directory where "tool-dist" was installed to in the previous section. Edit "conf/krb5.conf" and replace "88" with "12345". We can now obtain a ticket for "alice" via:
    • sh bin/kinit.sh -conf conf alice
    • sh bin/klist.sh


    Categories: Colm O hEigeartaigh

    Configuring Kerberos for Hive in Talend Open Studio for Big Data

    Colm O hEigeartaigh - Thu, 09/21/2017 - 13:12
    Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

    1) Download Talend Open Studio for Big Data and create a job

    Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

    "tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



    3) Configure the components

    Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
    • Distribution: Hortonworks
    • Version: HDP V2.5.0
    • Host: localhost
    • Database: default
    • Select "Use Kerberos Authentication"
    • Hive Principal: hiveserver2/localhost@hadoop.apache.org
    • Namenode Principal: hdfs/localhost@hadoop.apache.org
    • Resource Manager Principal: mapred/localhost@hadoop.apache.org
    • Select "Use a keytab to authenticate"
    • Principal: alice
    • Keytab: Path to "alice.keytab" in the Kerby test project.
    • Unselect "Set Resource Manager"
    • Set Namenode URI: "hdfs://localhost:9000"

    Now click on "tHiveInput" and select the following configuration options:
    • Select "Use an existing Connection"
    • Choose the tHiveConnection name from the resulting "Component List".
    • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
    • Table name: words
    • Query: "select * from words where word == 'Dare'"

    Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:

    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part VI

    Colm O hEigeartaigh - Wed, 09/20/2017 - 17:13
    This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

    In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

    1) Set up a KDC using Apache Kerby

    A github project that uses Apache Kerby to start up a KDC is available here:
    • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
    The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
    • hdfs/localhost@hadoop.apache.org
    • HTTP/localhost@hadoop.apache.org
    • mapred/localhost@hadoop.apache.org
    • hiveserver2/localhost@hadoop.apache.org
    • alice@hadoop.apache.org 
    Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

    2) Configure Apache Hadoop to use Kerberos

    The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

    Edit 'etc/hadoop/core-site.xml' and add:
    • hadoop.proxyuser.hiveserver2.groups: *
    • hadoop.proxyuser.hiveserver2.hosts: localhost
    The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
    • mapreduce.framework.name: classic
    • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
    • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
    • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
    • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
    Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
    • sbin/start-dfs.sh
    • sudo sbin/start-secure-dns.sh
    3) Configure Apache Hive to use Kerberos

    Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
    • hive.server2.authentication: kerberos
    • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
    • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
    Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
    • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
    • kinit -k -t /pathtokerby/target/alice.keytab alice
    • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
    At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.
    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part V

    Colm O hEigeartaigh - Fri, 09/15/2017 - 12:40
    This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

    1) Build the Apache Sentry distribution

    First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:
    • tar zxvf apache-sentry-1.8.0-src.tar.gz
    • cd apache-sentry-1.8.0-src
    • mvn clean install -DskipTests
    • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
    I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

    2) Install and configure Apache Hive

    Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

    3) Integrate Apache Sentry with Apache Hive

    Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

    3.a) Configure Apache Hive to use authorization

    Create a file called 'conf/hiveserver2-site.xml' with the content:
    Here we are enabling authorization and adding the Sentry authorization plugin.

    3.b) Add Sentry plugin configuration

    Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
    This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

    3.c) Add the authorization privileges for our test-case

    Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
    Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

    3.d) Add Sentry libraries to Hive

    Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
    • sentry-binding-hive-common-1.8.0.jar
    • sentry-core-model-db-1.8.0.jar
    • sentry*provider*.jar
    • sentry-core-common-1.8.0.jar
    • shiro-core-1.2.3.jar
    • sentry-policy*.jar
    • sentry-service-*.jar
    In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

    4) Test authorization with Apache Hive

    Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (works)
    However, the user 'bob' is denied access:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
    • select * from words where word == 'Dare'; (fails)

    Categories: Colm O hEigeartaigh

    Pages

    Subscribe to Talend Community Coders aggregator