Latest Activity

SAML SSO support for the Apache Syncope web console

Colm O hEigeartaigh - Fri, 12/08/2017 - 18:09
Apache Syncope is a powerful open source Identity Management project, that has recently celebrated 5 years as an Apache top level project. Up to recently, a username and password must be supplied to log onto either the admin or enduser web consoles of Apache Syncope. However SAML SSO login is now supported since the 2.0.3 release. Instead of supplying a username/password, the user is redirected to a third party IdP for login, before redirecting back to the Apache Syncope web console. In 2.0.5, support for the IdP-initiated flow of SAML SSO was added.

In this post we will show how to configure Apache Syncope to use SAML SSO as an alternative to logging in using a username and password. We will use Apache CXF Fediz as the SAML SSO IdP. In addition, we will show how to achieve IdP-initiated SSO using Okta. Please also refer to this tutorial on achieving SAML SSO with Syncope and Shibboleth.

1) Logging in to Apache Syncope using SAML SSO

In this section, we will cover setting up Apache Syncope to re-direct to a third party IdP so that the user can enter their credentials. The next section will cover the IdP-initiated case.

1.a) Enable SAML SSO support in Apache Syncope

First we will configure Apache Syncope to enable SAML SSO support. Download and extract the most recent standalone distribution release of Apache Syncope (2.0.6 was used in this post). Start the embedded Apache Tomcat instance and then open a web browser and navigate to "http://localhost:9080/syncope-console", logging in as "admin" and "password".

Apache Syncope is configured with some sample data to show how it can be used. Click on "Users" and add a new user called "alice" by clicking on the subsequent "+" button. Specify a password for "alice" and then select the default values wherever possible (you will need to specify some required attributes, such as "surname"). Now in the left-hand column, click on "Extensions" and then "SAML 2.0 SP". Click on the "Service Provider" tab and then "Metadata". Save the resulting Metadata document, as it will be required to set up the SAML SSO IdP.

1.b) Set up the Apache CXF Fediz SAML SSO IdP

Next we will turn our attention to setting up the Apache CXF Fediz SAML SSO IdP. Download the most recent source release of Apache CXF Fediz (1.4.3 was used for this tutorial). Unzip the release and build it using maven ("mvn clean install -DskipTests"). In the meantime, download and extract the latest Apache Tomcat 8.5.x distribution (tested with 8.5.24). Once Fediz has finished building, copy all of the "IdP" wars (e.g. in fediz-1.4.3/apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/idp/war/fediz-*) to the Tomcat "webapps" directory.

There are a few configuration changes to be made to Apache Tomcat before starting it. Download the HSQLDB jar and copy it to the Tomcat "lib" directory. Next edit 'conf/server.xml' and configure TLS on port 8443:

The two keys referenced here can be obtained from 'apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/examples/samplekeys/' and should be copied to the root directory of Apache Tomcat. Tomcat can now be started.

Next we have to configure Apache CXF Fediz to support Apache Syncope as a "service" via SAML SSO. Edit 'webapps/fediz-idp/WEB-INF/classes/entities-realma.xml' and add the following configuration:

In addition, we need to make some changes to the "idp-realmA" bean in this file:
  • Add a reference to this bean in the "applications" list: <ref bean="srv-syncope" />
  • Change the "idpUrl" property to: https://localhost:8443/fediz-idp/saml
  • Change the port for "stsUrl" from "9443" to "8443".
Now we need to configure Fediz to accept Syncope's signing cert. Edit the Metadata file you saved from Syncope in step 1.a. Copy the Base-64 encoded certificate in the "KeyDescriptor" section, and paste it (including line breaks) into 'webapps/fediz-idp/WEB-INF/classes/syncope.cert', enclosing it in between "-----BEGIN CERTIFICATE-----" and "-----END CERTIFICATE-----".

Now restart Apache Tomcat. Open a browser and save the Fediz metadata which is available at "http://localhost:8080/fediz-idp/metadata?protocol=saml", which we will require when configuring Apache Syncope.

1.c) Configure the Apache CXF Fediz IdP in Syncope

The final configuration step takes place in Apache Syncope again. In the "SAML 2.0 SP" configuration screen, click on the "Identity Providers" tab and click the "+" button and select the Fediz metadata that you saved in the previous step. Now logout and an additional login option can be seen:


Select the URL for the SAML SSO IdP and you will be redirected to Fediz. Select the IdP in realm "A" as the home realm and enter credentials of "alice/ecila" when prompted. You will be successfully authenticated to Fediz and redirected back to the Syncope admin console, where you will be logged in as the user "alice". 

2) Using IdP-initiated SAML SSO

Instead of the user starting with the Syncope web console, being redirected to the IdP for authentication, and then redirected back to Syncope - it is possible instead to start from the IdP. In this section we will show how to configure Apache Syncope to support IdP-initiated SAML SSO using Okta.

2.a) Configuring a SAML application in Okta

The first step is to create an account at Okta and configure a SAML application. This process is mapped out at the following link. Follow the steps listed on this page with the following additional changes:
  • Specify the following for the Single Sign On URL: http://localhost:9080/syncope-console/saml2sp/assertion-consumer
  • Specify the following for the audience URL: http://localhost:9080/syncope-console/
  • Specify the following for the default RelayState: idpInitiated
When the application is configured, you will see an option to "View Setup Instructions". Open this link in a new tab and find the section about the IdP Metadata. Save this to a local file and set it aside for the moment. Next you need to assign the application to the username that you have created at Okta.

2.b) Configure Apache Syncope to support IdP-Initiated SAML SSO

Log on to the Apache Syncope admin console using the admin credentials, and add a new IdP Provider in the SAML 2.0 SP extension as before, using the Okta metadata file that you have saved in the previous section. Edit the metadata and select the 'Support Unsolicited Logins' checkbox. Save the metadata and make sure that the Okta user is also a valid user in Apache Syncope.

Now go back to the Okta console and click on the application you have configured for Apache Syncope. You should seemlessly be logged into the Apache Syncope admin console.




Categories: Colm O hEigeartaigh

Relocating (again :-) )

Olivier Lamy - Sat, 12/02/2017 - 00:50
Long time I didn't post anything here... :-)
So after 4.5yo with a very happy life @ Melbourne, we have decided to move further north..
Mid January we will start new adventures in Brisbane. Very exciting time for us (well kids a bit sad to leave friends but happy to try something different).
Some have asking us why if you're happy here :-)
Well a bit difficult to answer....
Maybe tired of the so famous Melbourne weather....
Yes the Brisbane weather looks so great: short pans and thongs all the year :-)
Housing is ridiculous expensive @ Melbourne so time to stop paying the someone else mortgage
After long time, it's maybe a good idea to leave the comfort zone and try something different.

Cya soon up north :-)
Categories: Olivier Lamy

Kerberos cross-realm support in Apache Kerby 1.1.0

Colm O hEigeartaigh - Fri, 12/01/2017 - 13:52
A recent blog post covered how to install the Apache Kerby KDC. In this post we will build on that tutorial to show how to get a major new feature of Apache Kerby 1.1.0 to work - namely kerberos cross-realm support. Cross-realm support means that the KDCs in realm "A" and realm "B" are configured in such a way that a user who is authenticated in realm "A" can obtain a service ticket for a service in realm "B" without having to explicitly authenticate to the KDC in realm "B".

1) Configure the KDC for the "EXAMPLE.COM" realm

First we will configure the Apache Kerby KDC for the "EXAMPLE.COM" realm. Follow the previous tutorial to install and configure the KDC for this (default) realm. We need to follow some additional steps to get cross-realm support working with a second KDC in realm "EXAMPLE2.COM". Edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next we need to add a special principal to the KDC to enable cross-realm support via (after restarting the KDC):
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
2) Configure the KDC for the "EXAMPLE2.COM" realm

Now we will configure a second KDC for the "EXAMPLE2.COM" realm. Download the Apache Kerby source code as before. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
Copy "kdc-dist" to a location where you wish to install the second KDC. In this directory, create a directory called "keytabs" and "runtime". Edit 'conf/backend.conf' and change the value for 'backend.json.dir' to avoid conflict with the first KDC instance. Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
For testing purposes, we will change the port of the KDC from the default "88" to "54321" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "54321". In addition, change the realm from "EXAMPLE.COM" to "EXAMPLE2.COM" in both of these files. As above, edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next start the KDC via:
  • sh bin/start-kdc.sh conf runtime
We need to add a special principal to the KDC to enable cross-realm support, as in the KDC for the "EXAMPLE.COM" realm. Note that it must be the same principal name and password as for the first realm. We will also add a principal for a service in this realm:
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
  • addprinc -pw password service@EXAMPLE2.COM
3) Obtaining a service ticket for service@EXAMPLE2.COM as alice@EXAMPLE.COM

Now we can obtain a service ticket for the service we have configured in the "EXAMPLE2.COM" realm as a user who is authenticated to the "EXAMPLE.COM" realm. Configure the "tool-dist" distribution as per the previous tutorial, updating 'conf/krb5.conf' with the same "realms", "domain_realm" and "capaths" information as shown above. Now we can authenticate as "alice" and obtain a service ticket as follows:
  • sh bin/kinit.sh -conf conf alice@EXAMPLE.COM
  • sh bin/kinit.sh -conf conf -c /tmp/krb5cc_1000 -S service@EXAMPLE2.COM
If you run "klist" then you should see that a ticket for "service@EXAMPLE2.COM" was obtained successfully.
    Categories: Colm O hEigeartaigh

    Authorizing access to Apache Yarn using Apache Ranger

    Colm O hEigeartaigh - Wed, 11/29/2017 - 17:46
    Earlier this year, I wrote a series of blog posts on how to secure access to the Apache Hadoop filesystem (HDFS), using tools like Apache Ranger and Apache Atlas. In this post, we will go further and show how to authorize access to Apache Yarn using Apache Ranger. Apache Ranger allows us to create and enforce authorization policies based on who is allowed to submit applications to run on Apache Yarn. Therefore it can be used to enforce authorization decisions for Hive on Yarn or Spark on Yarn jobs.

    1) Installing Apache Hadoop

    First, follow the steps outlined in the earlier tutorial (section 1) on setting up Apache Hadoop, except that in this tutorial we will work with Apache Hadoop 2.8.2. In addition, we will need to follow some additional steps to configure Yarn (see here for the official documentation). Create a new file called 'etc/hadoop/mapred-site.xml' with the content:
    Next edit 'etc/hadoop/yarn-site.xml' and add:
    Now we can start Apache Yarn via 'sbin/start-yarn.sh'. We are going to submit jobs as a local user called "alice" to test authorization. First we need to create some directories in HDFS:
    • bin/hdfs dfs -mkdir -p /user/alice/input
    • bin/hdfs dfs -put etc/hadoop/*.xml /user/alice/input
    • bin/hadoop fs -chown -R alice /user/alice
    • bin/hadoop fs -mkdir /tmp
    • bin/hadoop fs -chmod og+w /tmp
    Now we can submit an example job as "alice" via:
    • sudo -u alice bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
    The job should run successfully and store the output in '/user/alice/output'. Delete this directory before trying to run the job again ('bin/hadoop fs -rm -r /user/alice/output').

    2) Install the Apache Ranger Yarn plugin

    Next we will install the Apache Ranger Yarn plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-yarn-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-yarn-plugin ${ranger.yarn.home}
    Now go to ${ranger.yarn.home} and edit "install.properties". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "YarnTest".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
    Save "install.properties" and install the plugin as root via "sudo -E ./enable-yarn-plugin.sh". Make sure that the user who is running Yarn has the permission to read the policies stored in '/etc/ranger/YarnTest'. There is one additional step to be performed in Hadoop before restarting Yarn. Edit 'etc/hadoop/ranger-yarn-security.xml' and add a property called "ranger.add-yarn-authorization" with value "false". This means that if Ranger policy authorization fails, it doesn't fall back to the default Yarn ACLs (which allow all users to submit jobs to the default queue).

    Finally, re-start Yarn and try to resubmit the job as "alice" as per the previous section. You should now see an authorization error: "User alice cannot submit applications to queue root.default".

    3) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for Yarn. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Yarn service with the following configuration values:
    • Service Name: YarnTest
    • Username: admin
    • Password: admin
    • Yarn REST URL: http://localhost:8088
    Click on "Test Connection" to verify that we can connect successfully to Yarn + then save the new service. Now click on the "YarnTest" service that we have created. Add a new policy for the "root.default" queue for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with a permission of "submit-app".

    Allow up to 30 seconds for the Apache Ranger plugin to download the new authorization policy from the admin service. Then try to re-run the job as "alice". This time it should succeed due to the authorization policy that we have created.
    Categories: Colm O hEigeartaigh

    Installing the Apache Kerby KDC

    Colm O hEigeartaigh - Tue, 11/28/2017 - 16:16
    Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java. Apache Kerby 1.1.0 has just been released. This release contains two major new features: a GSSAPI module (covered previously here) and cross-realm support (the subject of a forthcoming blog post).

    I have previously used Apache Kerby in this blog as a KDC to illustrate some security-based test-cases for big data components such as Apache Hadoop, Hive, Storm, etc, by pointing to some code on github that shows how to launch a Kerby KDC using Apache maven. This is convenient as a KDC can be launched with the principals already created via a single maven command. However, it is not suitable if the KDC is to be used in a standalone setting.

    In this post, we will show how to create a Kerby KDC distribution without writing any code.

    1) Install and configure the Apache Kerby KDC

    The first step is to download the Apache Kerby source code. Unzip the source and build the distribution via:
    • mvn clean install -DskipTests
    • cd kerby-dist
    • mvn package -Pdist
    The "kerby-dist" directory contains the KDC distribution in "kdc-dist", as well as the client tools in "tool-dist". Copy both "kdc-dist" and "tool-dist" directories to another location instead of working directly in the Kerby source. In "kdc-dist" create a directory called "keytabs" and "runtime". Then create some keytabs via:
    • sh bin/kdcinit.sh conf keytabs
    This will create keytabs for the "kadmin" and "protocol" principals, and store them in the "keytabs" directory. For testing purposes, we will change the port of the KDC from the default "88" to "12345" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "12345".

    The Kerby principals are stored in a backend that is configured in "conf/backend.conf". By default this is a JSON file that is stored in "/tmp/kerby/jsonbackend". However, Kerby also supports other more robust backends, such as LDAP, Mavibot, Zookeeper, etc.

    We can start the KDC via:
    • sh bin/start-kdc.sh conf runtime
    Let's create a new user called "alice":
    • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
    • addprinc -pw password alice@EXAMPLE.COM
    2) Install and configure the Apache Kerby tool dist

    We can check that the KDC has started properly using the MIT kinit tool, if it is installed locally:
    • export KRB5_CONFIG=/path.to.kdc.dist/conf/krb5.conf
    • kinit alice (use "password" for the password when prompted)
    Now you can see the ticket for alice using "klist". Apache Kerby also ships a "tool-dist" distribution that contains implementations of "kinit", "klist", etc. First call "kdestroy" to remove the ticket previously obtained for "alice". Then go into the directory where "tool-dist" was installed to in the previous section. Edit "conf/krb5.conf" and replace "88" with "12345". We can now obtain a ticket for "alice" via:
    • sh bin/kinit.sh -conf conf alice
    • sh bin/klist.sh


    Categories: Colm O hEigeartaigh

    Configuring Kerberos for Hive in Talend Open Studio for Big Data

    Colm O hEigeartaigh - Thu, 09/21/2017 - 13:12
    Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

    1) Download Talend Open Studio for Big Data and create a job

    Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

    "tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



    3) Configure the components

    Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
    • Distribution: Hortonworks
    • Version: HDP V2.5.0
    • Host: localhost
    • Database: default
    • Select "Use Kerberos Authentication"
    • Hive Principal: hiveserver2/localhost@hadoop.apache.org
    • Namenode Principal: hdfs/localhost@hadoop.apache.org
    • Resource Manager Principal: mapred/localhost@hadoop.apache.org
    • Select "Use a keytab to authenticate"
    • Principal: alice
    • Keytab: Path to "alice.keytab" in the Kerby test project.
    • Unselect "Set Resource Manager"
    • Set Namenode URI: "hdfs://localhost:9000"

    Now click on "tHiveInput" and select the following configuration options:
    • Select "Use an existing Connection"
    • Choose the tHiveConnection name from the resulting "Component List".
    • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
    • Table name: words
    • Query: "select * from words where word == 'Dare'"

    Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:

    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part VI

    Colm O hEigeartaigh - Wed, 09/20/2017 - 17:13
    This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

    In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

    1) Set up a KDC using Apache Kerby

    A github project that uses Apache Kerby to start up a KDC is available here:
    • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
    The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
    • hdfs/localhost@hadoop.apache.org
    • HTTP/localhost@hadoop.apache.org
    • mapred/localhost@hadoop.apache.org
    • hiveserver2/localhost@hadoop.apache.org
    • alice@hadoop.apache.org 
    Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

    2) Configure Apache Hadoop to use Kerberos

    The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

    Edit 'etc/hadoop/core-site.xml' and add:
    • hadoop.proxyuser.hiveserver2.groups: *
    • hadoop.proxyuser.hiveserver2.hosts: localhost
    The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
    • mapreduce.framework.name: classic
    • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
    • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
    • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
    • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
    • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
    Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
    • sbin/start-dfs.sh
    • sudo sbin/start-secure-dns.sh
    3) Configure Apache Hive to use Kerberos

    Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
    • hive.server2.authentication: kerberos
    • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
    • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
    Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
    • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
    • kinit -k -t /pathtokerby/target/alice.keytab alice
    • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
    At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.
    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part V

    Colm O hEigeartaigh - Fri, 09/15/2017 - 12:40
    This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

    1) Build the Apache Sentry distribution

    First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:
    • tar zxvf apache-sentry-1.8.0-src.tar.gz
    • cd apache-sentry-1.8.0-src
    • mvn clean install -DskipTests
    • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
    I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

    2) Install and configure Apache Hive

    Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

    3) Integrate Apache Sentry with Apache Hive

    Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

    3.a) Configure Apache Hive to use authorization

    Create a file called 'conf/hiveserver2-site.xml' with the content:
    Here we are enabling authorization and adding the Sentry authorization plugin.

    3.b) Add Sentry plugin configuration

    Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
    This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

    3.c) Add the authorization privileges for our test-case

    Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
    Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

    3.d) Add Sentry libraries to Hive

    Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
    • sentry-binding-hive-common-1.8.0.jar
    • sentry-core-model-db-1.8.0.jar
    • sentry*provider*.jar
    • sentry-core-common-1.8.0.jar
    • shiro-core-1.2.3.jar
    • sentry-policy*.jar
    • sentry-service-*.jar
    In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

    4) Test authorization with Apache Hive

    Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (works)
    However, the user 'bob' is denied access:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
    • select * from words where word == 'Dare'; (fails)

    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part IV

    Colm O hEigeartaigh - Thu, 09/14/2017 - 14:02
    This is the fourth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query.

    In this post we will show how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In the second post, we showed how to create a "resource" based policy for "alice" in Ranger, by granting "alice" the "select" permission for the "words" table. Instead, we can grant a user "bob" the "select" permission for a given "tag", which is synced into Ranger from Apache Atlas. This means that we can avoid managing specific resources in Ranger itself.

    1) Start Apache Atlas and create entities/tags for Hive

    First let's look at setting up Apache Atlas. Download the latest released version (0.8.1) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
    • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
    The distribution will then be available in 'distro/target/apache-atlas-0.8.1-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
    • export MANAGE_LOCAL_HBASE=true
    • export MANAGE_LOCAL_SOLR=true
    Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "words_tag".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Hive Entity in the UI. Instead we can use the following json file to create a Hive Entity for the "words" table that we are using in our example, that is based off the example given here:
    You can upload it to Atlas via:
    • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @hive-create.json http://localhost:21000/api/atlas/entities
    Once the new entity has been uploaded, then you can search for it in the Atlas UI. Once it is found, then click on "+" beside "Tags" and associate the new entity with the "words_tag" tag.

    2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

    To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
    • Set TAG_SOURCE_ATLAS_ENABLED to "false"
    • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
    • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
    • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
    Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

    3) Create Tag-based authorization policies in Apache Ranger

    Now let's create a tag-based authorization policy in the Apache Ranger admin UI (http://localhost:6080). Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HiveTagService". Create a new policy for this service called "WordsTagPolicy". In the "TAG" field enter a "w" and the "words_tag" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "select" permissions for "Hive":
    We also need to go back to the Resource based policies and edit "cl1_hive" that we created in the second tutorial, and select the tag service we have created above. Once our new policy (including tags) has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. Previously, the user "bob" was denied access to the "words" table, as only "alice" was assigned a resource-based policy for the table. However, "bob" can now access the table via the tag-based authorization policy we have created:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
    • select * from words where word == 'Dare';
    Categories: Colm O hEigeartaigh

    The Real Data Processing with Apache Beam and Tika

    Sergey Beryozkin - Tue, 09/12/2017 - 18:20
    If we talk about the data injestion in the big data streaming pipelines it is fair to say that in the vast majority of cases it is the files in the CSV and other text, easy to parse formats which provide the source data.

    Things will become more complex when the task is to read and parse the files in the format such as PDF. One would need to create a reader/receiver capable of parsing the PDF files and feeding the content fragments (the regular text, the text found in the embedded attachments and the file metadata) into the processing pipelines. That was tricky to do right but you did it just fine.

    The next morning you get a call from your team lead letting you know the customer actually needs the content injested not only from the PDF files but also from the files in a format you've never heard of before. You spend the rest of the week looking for a library which can parse such files and when you finish writing the code involving that library's not well documented API all you think of is that the weekends have arrived just in time.

    On Monday your new task is to ensure that the pipelines have to be initialized from the same network folder where the files in PDF and other format will be dropped. You end up writing a frontend reader code which reads the file, checks the extension, and then chooses a more specific reader.   

    Next day, when you are told that Microsoft Excel and Word documents which may or may not be zipped will have to be parsed as well, you report back asking for the holidays...

    I'm sure you already know I've been preparing you for a couple of good news.

    The first one is a well known fact that Apache Tika allows to write a generic code which can collect the data from the massive number of text, binary, image and video formats. One has to prepare or update the dependencies and configuration and have the same code serving the data from the variety of the data formats.

    The other and main news is that Apache Beam 2.2.0-SNAPSHOT now ships a new TikaIO module (thanks to my colleague JB for reviewing and merging the PR). With Apache Beam capable of running the pipelines on top of Spark, Flink and other runners and Apache Tika taking care of various file formats, you get the most flexible data streaming system.

    Do give it a try, help to improve TikaIO with new PRs, and if you are really serious about supporting a variety of the data formats in the pipelines, start planning on integrating it into your products :-)

    Enjoy!



    Categories: Sergey Beryozkin

    Integrating JSON Web Tokens with Kerberos using Apache Kerby

    Colm O hEigeartaigh - Mon, 09/11/2017 - 13:28
    JSON Web Tokens (JWTs) are a standard way of encapsulating a number of claims about a particular subject. Kerberos is a long-established and widely-deployed SSO protocol, used extensively in the Big-Data space in recent years. An interesting question is to examine how a JWT could be used as part of the Kerberos protocol. In this post we will consider one possible use-case, where a JWT is used to convey additional authorization information to the kerberized service provider.

    This use-case is based on a document available at HADOOP-10959, called "A Complement and Short Term Solution to TokenAuth Based on
    Kerberos Pre-Authentication Framework", written by Kai Zheng and Weihua Jiang of Intel (also see here).

    1) The test-case

    To show how to integrate JWTs with Kerberos we will use a concrete test-case available in my github repo here:
    • cxf-kerberos-kerby: This project contains a number of tests that show how to use Kerberos with Apache CXF, where the KDC used in the tests is based on Apache Kerby
    The test-case relevant to this blog entry is the JWTJAXRSAuthenticationTest. Here we have a trivial "double it" JAX-RS service implemented using Apache CXF, which is secured using Kerberos. An Apache Kerby-based KDC is launched which the client code uses to obtain a service ticket using JAAS (all done transparently by CXF), which is sent to the service code as part of the Authorization header when making the invocation.

    So far this is just a fairly typical example of a kerberized web-service request. What is different is that the service configuration requires a level of authorization above and beyond the kerberos ticket, by insisting that the user must have a particular role to access the web service. This is done by inserting the CXF SimpleAuthorizingInterceptor into the service interceptor chain. An authenticated user must have the "boss" role to access this service. 

    So we need somehow to convey the role of the user as part of the kerberized request. We can do this using a JWT as will be explained in the next few sections.

    2) High-level overview of JWT use-case with Kerberos
     
    As stated above, we need to convey some additional claims about the user to the service. This can be done by including a JWT containing those claims in the Kerberos service ticket. Let's assume that the user is in possession of a JWT that is issued by an IdP that contains a number of claims relating to that user (including the "role" as required by the service in our test-case). The token must be sent to the KDC when obtaining a service ticket.

    The KDC must validate the token (checking the signature is correct, and that the signing identity is trusted, etc.). The KDC must then extract some relevant information from the token and insert it somehow into the service ticket. The kerberos spec defines a structure that can be used for this purposes called the AuthorizationData, which consists of a "type" along with some data to be interpreted according to the "type". We can use this structure to insert the encoded JWT as part of the data.  

    On the receiving side, the service can extract the AuthorizationData structure from the received ticket and parse it accordingly to retrieve the JWT, and obtain whatever claims are desired from this token accordingly.

    3) Sending a JWT Token to the KDC

    Let's take a look at how the test-case works in more detail, starting with the client. The test code retrieves a JWT for "alice" by invoking on the JAX-RS interface of the Apache CXF STS. The token contains the claim that "alice" has the "boss" role, which is required to invoke on the "double it" service. Now we need to send this token to the KDC to retrieve a service ticket for the "double it" service, with the JWT encoded in the ticket.

    This cannot be done by the built-in Java GSS implementation. Instead we will use Apache Kerby. Apache Kerby has been covered extensively on this blog (see for example here). As well as providing the implementation for the KDC used in our test-case, Apache Kerby provides a complete GSS implementation that supports tokens in the forthcoming 1.1.0 release. To use the Kerby GSS implementation we need to register the KerbyGssProvider as a Java security provider.

    To actually pass the JWT we got from the STS to the Kerby GSS layer, we need to use a custom version of the CXF HttpAuthSupplier interface. The KerbyHttpAuthSupplier implementation takes the JWT String, and creates a Kerby KrbToken class using it. This class is added to the private credential list of the current JAAS Subject. This way it will be available to the Kerby GSS layer, which will send the token to the KDC using Kerberos pre-authentication as defined in the document which is linked at the start of this post.

    4) Processing the received token in the KDC

    The Apache Kerby-based KDC extracts the JWT token from the pre-authentication data entry and verifies that it is signed and that the issuer is trusted. The KDC is configured in the test-case with a certificate to use for this purpose, and also with an issuer String against which the issuer of the JWT must match. If there is an audience claim in the token, then it must match the principal of the service for which we are requesting a ticket. 

    If the verification of the received JWT passes, then it is inserted into the AuthorizationData structure in the issued service ticket. The type that is used is a custom value defined here, as this behaviour is not yet standardized. The JWT is serialized and added to the data part of the token. Note that this behaviour is fully customizable.

    5) Processing the AuthorizationData structure on the service end

    After the service successfully authenticates the client, we have to access the AuthorizationData part of the ticket to extract the JWT. This can all be done using the Java APIs, Kerby is not required on the receiving side. The standard CXF interceptor for Kerberos is subclassed in the tests, to set up a custom CXF SecurityContext using the GssContext. By casting it to a ExtendedGSSContext, we can access the AuthorizationData and hence the JWT. The role claim is then extracted from the JWT and used to enforce the standard "isUserInRole" method of the CXF SecurityContext. 

    If you are interested in exploring this topic further, please get involved with the Apache Kerby project, and help us to further improve and expand this integration between JWT and Kerberos.
    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part III

    Colm O hEigeartaigh - Thu, 09/07/2017 - 12:45
    This is the third in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. In this post we will extend the authorization scenario by showing how Apache Ranger can be used to create policies to both mask and filter data returned in the Hive query.

    1) Data-masking with Apache Ranger

    As a pre-requisite to this tutorial, please follow the previous post to set up Apache Hive and to enforce an authorization policy for the user "alice" using Apache Ranger. Now let's imagine that we would like "alice" to be able to see the "counts", but not the actual words themselves. We can create a data-masking policy in Apache Ranger for this. Open a browser and log in at "http://localhost:6080" using "admin/admin" and click on the "cl1_hive" service that we have created in the previous tutorial.

    Click on the "Masking" tab and add a new policy called "WordMaskingPolicy", for the "default" database, "words" table and "word" column. Under the mask conditions, add the user "alice" and choose the "Redact" masking option. Save the policy and wait for it to by synced over to Apache Hive:


    Now try to login to beeline as "alice" and view the first five entries in the table:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words LIMIT 5;
    You should see that the characters in the "word" column have been masked (replaced by "x"s).



    2) Row-level filtering with Apache Ranger 

    Now let's imagine that we are happy for "alice" to view the "words" in the table, but that we would like to restrict her to words that start with a "D". The previous "access" policy we created for her allows her to view all "words" in the table. We can do this by specifying a row-level filter policy. Click on the "Masking" tab in the UI and disable the policy we created in the previous section.

    Now click on the "Row-level Filter" tab and create a new policy called "AliceFilterPolicy" on the "default" database, "words" table. Add a Row Filter condition for the user "alice" with row filter "word LIKE 'D%'". Save the policy and wait for it to by synced over to Apache Hive:


    Now try to login to beeline as "alice" as above. "alice" can successfully retrieve all entries where the words start with "D", but no other entries via:
    • select * from words where word like 'D%';
    Categories: Colm O hEigeartaigh

    Mostly On Topic: CXF and Swagger Integration Keeps Getting Better

    Sergey Beryozkin - Wed, 09/06/2017 - 14:50
    While thinking about a title of this post I thought the current title line, with the " Keeps Getting Better" finishing touch may work well; I knew I used a similar line before, and after looking through my posts I found it.

    Oh dear. I'm transported back to 2008, I can see myself, 9 years younger, walking to the Iona Technologies office, completely wired on trying to stop the Jersey JAX-RS domination :-), spotting an ad of the latest  Christina Aguilera's albom on the exit from the Lansdowne Dart station and thinking, it would be fun, trying to blog about it and link to CXF, welcome to the start of the [OT] series. I'm not sure now if I'm more surprised it was actually me who did write that post or that 9 years later I'm still here, talking about CXF :-).

    Let me get back to the actual subject of this post. You know CXF started quite late with embracing Swagger, and I'm still getting nervous whenever I remind myself Swagger does not support 'matrix' parameters :-). But the Swagger team has done a massive effort through the years, my CXF hat is off to them.

    I'm happy to say that now Apache CXF offers one of the best Swagger2 integrations around, at the JSON only and UI levels and it just keeps getting better.

    We've talked recently with Dennis Kieselhorst and one can now configure Swagger2Feature with the external properties file which can be especially handy when this feature is auto-discovered.

    Just at the last minute we resolved an issue reported by a CXF user to do with accessing Swagger UI from the reverse proxies.

    Finally, Freeman contributed a java2swagger Maven plugin.

    Swagger 3 will be supported as soon as possible too.

    Enjoy!
    Categories: Sergey Beryozkin

    Apache CXF 3.2.0 NIO Extension

    Sergey Beryozkin - Thu, 08/31/2017 - 13:35
    In CXF 3.2.0 we have also introduced a server-side NIO extension which is based on the very first JAX-RS API prototype done by Santiago Pericas-Geertsen. The client NIO API prototype was not ready but the server one had some promising start. It was immediately implemented in CXF once a long-awaited 1st 2.1 API jar got published to Maven.

    However, once the JAX-RS 2.1 group finally resumed its work and started working on finalizing NIO API, the early NIO API was unfortunately dropped (IMHO it could've stayed as an entry point, 'easy' NIO API), while the new NIO API did not materialize primarily due to the time constraints of the JCP process.

    The spec leads did all they could but it was too tight for them to make it right. As sad as it was, they did the right decision, rather then do something in a hurry, better do it right at some later stage...

    It was easily the major omission from the final 2.1 API. How long JAX-RS users will wait till the new JAX-RS version will get finalized with the new NIO API becoming available to them given that it takes years for major Java EE umbrella of various specs be done ?

    In meantime the engineering minds in SpringBoot and RxJava and other teams will come up with some new brilliant ways of doing it. There will be not 1 but several steps ahead.

    Which brings me to this point: if I were to offer a single piece of advice to Java EE process designers, I'd recommend them to make sure that the new features can be easily added after the EE release date with the minor EE releases embracing these new features to follow soon,  without waiting for N years. If it were an option then we could've seen a JAX-RS 2.2 NIO in say 6 months - just a dream at the moment, I know. The current mechanism where EE users wait for several years for some new features is out of sync with the competitive reality of the software industry and only works because of the great teams around doing EE, the EE users loyalty and the power of the term 'standard'.

    Anyway, throwing away our own implementation of that NIO API prototype now gone from 2.1 API just because it immediately became the code supporting a non-standard feature was not a good idea.

    It offers an easy link to the Servlet 3.1 NIO extensions from the JAX-RS code and offers the real value. Thus the code stayed and is now available for the CXF users to experiment with.

    It's not very shiny but it will deliver. Seriously, if you need to have a massive InputStream copied to/from the HTTP connection with NIO and asynchronous callbacks involved, what else do you need but a simple and easy way to do it from the code ? Well, nothing can be simpler than this option for sure.

    Worried a bit it is not a standard feature ? No, it is fine, doing it the CXF way is a standard :-)
      
    Categories: Sergey Beryozkin

    JAX-RS 2.1 is Released

    Sergey Beryozkin - Thu, 08/31/2017 - 13:33
    JAX-RS 2.1 (JSR 370) has been finally released and JAX-RS users can now start looking forward to experimenting with the new features very soon, with a number of final JAX-RS 2.1 implementations being already available (such as Jersey) or nearly ready to be released.

    Apache CXF 3.2.0 is about to be released shortly, and all of the new JAX-RS 2.1 features have been implemented:  reactive client API extensions, client/server Server Sent Events support, returning CompletableFuture from the resource methods and other minor improvements.

    As part of the 2.1 work (but also based on the CXF JIRA request) we also introduced RxJava Observable and recently - RxJava2 Flowable/Observable client and server extensions. One can use them as an alternative to using CompletableFuture  on the client or/and the server side. Note, the combination of RxJava2 Flowable with JAX-RS AsyncResponse on the server is quite cool.

    The other new CXF extension which was introduced as part of the JAX-RS 2.1 work is the NIO extension, this will be a topic of the next post.

    Pavel Bucek and Santiago Pericas-Geertsen were the great JAX-RS 2.1 spec leads. Andriy Redko spent a lot of his time with getting CXF 3.2.0 JAX-RS 2.1 ready.
    Categories: Sergey Beryozkin

    Oracle and JAVA EE

    Daniel Kulp - Fri, 08/18/2017 - 15:17
    Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS […]
    Categories: Daniel Kulp

    Securing Apache Hive - part II

    Colm O hEigeartaigh - Tue, 08/01/2017 - 13:06
    This is the second post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. In this post we will show how to add authorization to the previous example using Apache Ranger.

    1) Install the Apache Ranger Hive plugin

    If you have not done so already, please follow the first post to install and configure Apache Hadoop and Apache Hive. Next download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-hive-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-hive-plugin ${ranger.hive.home}
    Now go to ${ranger.hive.home} and edit "install.properties". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "cl1_hive".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hive installation
    Save "install.properties" and install the plugin as root via "sudo -E ./enable-hive-plugin.sh". The Apache Ranger Hive plugin should now be successfully installed. Make sure that the default policy cache for the Hive plugin '/etc/ranger/cl1_hive/policycache' is readable by the user who is running the Hive server. Then restart the Apache Hive server to enable the authorization plugin.

    2) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for Apache Hive. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Ranger admin service via 'sudo ranger-admin start' and open a browser at 'http://localhost:6080', logging on with the credentials 'admin/admin'. Click the "+" button next to the "HIVE" logo and enter the following properties:
    • Service Name: cl1_hive
    • Username/Password: admin
    • jdbc.url: jdbc:hive2://localhost:10000
    Note that "Test Connection" won't work as the "admin" user will not have the necessary authorization to invoke on Hive at this point. Click "Add" to create the service. If you have not done so in a previous tutorial, click on "Settings" and then "Users/Groups" and add two new users called "alice" and "bob", who we will use to test authorization. Then go back to the newly created "cl1_hive" service, and click "Add new policy" with the following properties:
    • Policy Name: SelectWords
    • database: default
    • table: words
    • Hive column: *
    Then under "Allow Conditions", give "alice" the "select" permission and click "Add".


    3) Test authorization with Apache Hive

    Once our new policy has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. The user 'alice' can query the table according to our policy:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (works)
    However, the user 'bob' is denied access:
    • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
    • select * from words where word == 'Dare'; (fails)
    Categories: Colm O hEigeartaigh

    Third party SSO support for Apache Syncope REST services

    Colm O hEigeartaigh - Fri, 07/28/2017 - 17:11
    A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

    In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:
    • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
    • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
    1) Configuring Apache Syncope to accept third-party JWTs

    Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in 'security.properties' to both issue and validate signed tokens:
    • jwtIssuer: The issuer of the token
    • jwsKey: The Hex-encoded (symmetric) verification key
    The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

    When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

    One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

    2) Obtain a JWT from the Apache CXF STS using REST

    Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

    Now we can use this token with the Syncope client API to call the user "self service" successfully:


    Categories: Colm O hEigeartaigh

    Securing Apache Hive - part I

    Colm O hEigeartaigh - Thu, 07/20/2017 - 18:30
    This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

    1) Install and configure Apache Hadoop

    The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':
    • hadoop.proxyuser.$user.groups: *
    • hadoop.proxyuser.$user.hosts: localhost
    where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

    Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
    • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
    • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
    • bin/hadoop fs -mkdir /data
    The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

    I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
    • bin/hadoop fs -put output.txt /data
    2) Install and configure Apache Hive

    Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
    • bin/schematool -dbType derby -initSchema
    • bin/hiveserver2
    In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
    • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
    Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
    • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
    • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
    Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
    • bin/beeline -u jdbc:hive2://localhost:10000
    • select * from words where word == 'Dare';
    Categories: Colm O hEigeartaigh

    [OT] I Work with CXF and I Want It That Way

    Sergey Beryozkin - Thu, 07/13/2017 - 11:11
    The time has come for a regular OT post.

    The journey of the software developer is always about finding the home where he or she can enjoy being every day, can look forward to contributing to the bigger effort every day.

    In addition to that the journey of the web services developer is always about finding the web services framework which will help with creating the coolest HTTP service on the Web. We all know there are many quality HTTP service frameworks around.

    My software developer's journey so far has been mostly about supporting one of such web services frameworks, Apache CXF. It has been a great journey.

    Some of you helped by using and contributing to Apache CXF earlier, some of you are long term Apache CXF users and contributors, preparing the ground for the new users and contributors who are yet to discover CXF.

    No matter which group you are in, even if you're no longer with CXF, I'm sure you've had that feeling at least once that you'd like your CXF experience last forever :-).

    Listen to a message from the best boys band in the world. Enjoy :-)

      


    Categories: Sergey Beryozkin

    Pages

    Subscribe to Talend Community Coders aggregator