Latest Activity

Oracle and JAVA EE

Daniel Kulp - Fri, 08/18/2017 - 15:17
Those of you in the “Java EE” may have already seen the announcement from Oracle that was posted yesterday concerning the future of Java EE. This is potentially very exciting news, particularly for the various Apache projects that implement some of the Java EE specs. Since Apache CXF implements a couple of the specs (JAX-WS […]
Categories: Daniel Kulp

Securing Apache Hive - part II

Colm O hEigeartaigh - Tue, 08/01/2017 - 13:06
This is the second post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. In this post we will show how to add authorization to the previous example using Apache Ranger.

1) Install the Apache Ranger Hive plugin

If you have not done so already, please follow the first post to install and configure Apache Hadoop and Apache Hive. Next download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hive-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hive-plugin ${ranger.hive.home}
Now go to ${ranger.hive.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "cl1_hive".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hive installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-hive-plugin.sh". The Apache Ranger Hive plugin should now be successfully installed. Make sure that the default policy cache for the Hive plugin '/etc/ranger/cl1_hive/policycache' is readable by the user who is running the Hive server. Then restart the Apache Hive server to enable the authorization plugin.

2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Apache Hive. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Ranger admin service via 'sudo ranger-admin start' and open a browser at 'http://localhost:6080', logging on with the credentials 'admin/admin'. Click the "+" button next to the "HIVE" logo and enter the following properties:
  • Service Name: cl1_hive
  • Username/Password: admin
  • jdbc.url: jdbc:hive2://localhost:10000
Note that "Test Connection" won't work as the "admin" user will not have the necessary authorization to invoke on Hive at this point. Click "Add" to create the service. If you have not done so in a previous tutorial, click on "Settings" and then "Users/Groups" and add two new users called "alice" and "bob", who we will use to test authorization. Then go back to the newly created "cl1_hive" service, and click "Add new policy" with the following properties:
  • Policy Name: SelectWords
  • database: default
  • table: words
  • Hive column: *
Then under "Allow Conditions", give "alice" the "select" permission and click "Add".


3) Test authorization with Apache Hive

Once our new policy has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (fails)
Categories: Colm O hEigeartaigh

Third party SSO support for Apache Syncope REST services

Colm O hEigeartaigh - Fri, 07/28/2017 - 17:11
A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:
  • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
  • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
1) Configuring Apache Syncope to accept third-party JWTs

Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in 'security.properties' to both issue and validate signed tokens:
  • jwtIssuer: The issuer of the token
  • jwsKey: The Hex-encoded (symmetric) verification key
The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

2) Obtain a JWT from the Apache CXF STS using REST

Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

Now we can use this token with the Syncope client API to call the user "self service" successfully:


Categories: Colm O hEigeartaigh

Securing Apache Hive - part I

Colm O hEigeartaigh - Thu, 07/20/2017 - 18:30
This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

1) Install and configure Apache Hadoop

The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':
  • hadoop.proxyuser.$user.groups: *
  • hadoop.proxyuser.$user.hosts: localhost
where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
  • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
  • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
  • bin/hadoop fs -mkdir /data
The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
  • bin/hadoop fs -put output.txt /data
2) Install and configure Apache Hive

Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
  • bin/schematool -dbType derby -initSchema
  • bin/hiveserver2
In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
  • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
  • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
  • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
  • bin/beeline -u jdbc:hive2://localhost:10000
  • select * from words where word == 'Dare';
Categories: Colm O hEigeartaigh

[OT] I Work with CXF and I Want It That Way

Sergey Beryozkin - Thu, 07/13/2017 - 11:11
The time has come for a regular OT post.

The journey of the software developer is always about finding the home where he or she can enjoy being every day, can look forward to contributing to the bigger effort every day.

In addition to that the journey of the web services developer is always about finding the web services framework which will help with creating the coolest HTTP service on the Web. We all know there are many quality HTTP service frameworks around.

My software developer's journey so far has been mostly about supporting one of such web services frameworks, Apache CXF. It has been a great journey.

Some of you helped by using and contributing to Apache CXF earlier, some of you are long term Apache CXF users and contributors, preparing the ground for the new users and contributors who are yet to discover CXF.

No matter which group you are in, even if you're no longer with CXF, I'm sure you've had that feeling at least once that you'd like your CXF experience last forever :-).

Listen to a message from the best boys band in the world. Enjoy :-)

  


Categories: Sergey Beryozkin

Multiple JWE Encryptions POC With Apache CXF in two hours

Sergey Beryozkin - Mon, 07/03/2017 - 15:12
The summer has been great so far, and as usual, instead of watching yet another sport event final, you've decided to catch up with your colleagues after work and do a new round of the Apache CXF JOSE coding. Nice idea they said.

The idea of  creating an application processing the content encrypted for the multiple recipients has captured your imagination.

After reviewing the CXF JWE JSON documentation you've decided to start with the following client code. This code creates a client proxy which posts some text.

JWE JSON filter registered with the proxy will encrypt whatever the content the proxy is sending (does not have to be only text) only once, and the content encrypting key (CEK) will be encrypted with the recipient specific encrypting keys. Thus if you have 2 recipients then CEK will be encrypted twice.

Registering the jwejson1.properties  and jwejson2.properties with the proxy instructs the JWE JSON filter that a JWE JSON container for 2 recipients needs to be created, that the content encryption algorithm is A128GCM and key encryption algorithm is A128KW, and each recipient is using its own symmetric key encryption key. Each recipient specific entry will also include a 'kid' key identifier of the key encryption key for the service to figure out which JWE JSON entry is targeted at which recipient.

Setting up the client took you all one hour.

Next task was to prototype a service code. That was even easier. Loading the recipient specific properties, locating a recipient specific entry and getting the decrypted content was all what was needed.

Two hours in total. Note I did not promise it would take you 30 mins to do all the POC, it would've been really a child's play which is not realistic. With the two hours long project it is more complex, it felt like it was a walk in the park :-)



 
Categories: Sergey Beryozkin

Securing Apache Solr - part III

Colm O hEigeartaigh - Fri, 06/30/2017 - 16:18
This is the third post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. The second post looked at how the Apache Ranger admin service can be configured to store audit information in Apache Solr. In this post we will extend the example in the first article to include authorization, by showing how to create and enforce authorization policies using Apache Ranger.

1) Install the Apache Ranger Solr plugin

The first step is to install the Apache Ranger Solr plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-${version}-solr-plugin.tar.gz
  • mv ranger-${version}-solr-plugin ${ranger.solr.home}
Now go to ${ranger.solr.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "solr_service".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Solr server directory
Save "install.properties" and install the plugin as root via "sudo -E ./enable-solr-plugin.sh". Make sure that the user who is running Solr can read the "/etc/ranger/solr_service/policycache". Now follow the first tutorial to get an example SolrCloud instance up and running with a "gettingstarted" collection. We will not enable the authorization plugin just yet.

2) Create authorization policies for Solr using the Apache Ranger Admin service

Now follow the second tutorial to download and install the Apache Ranger admin service. To avoid conflicting with the Solr example we are securing, we will skip the section about auditing to Apache Solr (sections 3 and 4). In addition, in section 5 the "audit_store" property can be left empty, and the Solr audit properties can be omitted. Start the Apache Ranger admin service via: "sudo ranger-admin start", and open a browser at "http://localhost:6080", logging on with "admin/admin" credentials. Click on the "+" button for the Solr service and create a new service with the following properties:
  • Service Name: solr_service
  • Username: alice
  • Password: SolrRocks
  • Solr URL: http://localhost:8983/solr
Hit the "Test Connection" button and it should show that it has successfully connected to Solr. Click "Add" and then click on the "solr_service" link that is subsequently created. We will grant a policy that allows "alice" the ability to read the "gettingstarted" collection. If "alice" is not already created, go to "Settings/User+Groups" and create a new user there. Delete the default policy that is created in the "solr_service" and then click on "Add new policy" and create a new policy called "gettingstarted_policy". For "Solr Collection" enter "g" here and the "gettingstarted" collection should pop up. Add a new "allow condition" granting the user "alice" the "others" and "query" permissions.




3) Test authorization using the Apache Ranger plugin for Solr

Now we are ready to enable the Apache Ranger authorization plugin for Solr. Download the following security configuration which enables Basic Authentication in Solr as well as the Apache Ranger authorization plugin:
Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
 Now let's try to query the "gettingstarted" collection as 'alice':
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
This should be successful. However, authorization will fail for the case of "bob":
  • curl -u bob:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
In addition, although "alice" can query the collection, she can't write to it, and the following query will return 403:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book4", "title_t" : "Hamlet", "author_s" : "William Shakespeare"}]'
Categories: Colm O hEigeartaigh

Securing Apache Solr - part II

Colm O hEigeartaigh - Tue, 06/27/2017 - 14:10
This is the second post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. In this post we will temporarily deviate from the concept of "securing Apache Solr", and instead look at how the Apache Ranger admin service can be configured to store audit information in Apache Solr.

1) Download and extract the Apache Ranger admin service

The first step is to download the source code, as well as the signature file and associated message digests (all available on the download page). Verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting admin archive to a location where you wish to install the UI:
  • tar zxvf apache-ranger-incubating-1.0.0.tar.gz
  • cd apache-ranger-incubating-1.0.0
  • mvn clean package assembly:assembly 
  • tar zxvf target/ranger-1.0.0-admin.tar.gz
  • mv ranger-1.0.0-admin ${rangerhome}
2) Install MySQL

The Apache Ranger Admin UI requires a database to keep track of users/groups as well as policies for various big data projects that you are securing via Ranger. For the purposes of this tutorial, we will use MySQL. Install MySQL in $SQL_HOME and start MySQL via:
  • sudo $SQL_HOME/bin/mysqld_safe --user=mysql
Now you need to log on as the root user and create two users for Ranger. We need a root user with admin privileges (let's call this user "admin") and a user for the Ranger Schema (we'll call this user "ranger"):
  • CREATE USER 'admin'@'localhost' IDENTIFIED BY 'password';
  • GRANT ALL PRIVILEGES ON * . * TO 'admin'@'localhost' WITH GRANT OPTION;
  • CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'password';
  • FLUSH PRIVILEGES;
Finally,  download the JDBC driver jar for MySQL and put it in ${rangerhome}.

3) Configure Apache Solr to support auditing from Ranger

Before installing the Apache Ranger admin service we will need to configure Apache Solr. The Apache Ranger admin service ships with a script to make this easier to configure. Edit 'contrib/solr_for_audit_setup/install.properties' with the following properties:
  • SOLR_USER/SOLR_GROUP - the user/group you are running solr as
  • SOLR_INSTALL_FOLDER - Where you have extracted Solr to as per the first tutorial.
  • SOLR_RANGER_HOME - Where to install the Ranger configuration for Solr auditing.
  • SOLR_RANGER_PORT - The port to be used (8983 as per the first tutorial).
  • SOLR_DEPLOYMENT - solrcloud
  • SOLR_HOST_URL - http://localhost:8983
  • SOLR_ZK - localhost:2181
Make sure that the user running Solr has permission to write to the value configured for "SOLR_LOG_FOLDER" (/var/log/solr/ranger_audits). Now in 'contrib/solr_for_audit_setup' run 'sudo -E ./setup.sh'. The Solr configuration is now copied to $SOLR_RANGER_HOME.

4) Start Apache Zookeeper and SolrCloud

Before starting Apache Solr we will need to start Apache Zookeeper. Download Apache Zookeeper and start it on port 2181 via (this step was not required in the previous tutorial as we were launching SolrCloud with an embedded Zookeeper instance):
  • bin/zkServer.sh start
As per the first post, we want to secure access to SolrCloud via Basic Authentication (note that this is only recently fixed in Apache Ranger). So follow the steps in this post to upload the security.json to Zookeeper via:
  • server/scripts/cloud-scrip/zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json security.json
Start Solr as follows in the '${SOLR_RANGER_HOME}/ranger_audit_server/scripts' directory:
  • ./add_ranger_audits_conf_to_zk.sh 
  • ./start_solr.sh
Edit 'create_ranger_audits_collection.sh' and change 'curl --negotiate -u :' to 'curl -u "alice:SolrRocks"'. Save it and then run:
  • ./create_ranger_audits_collection.sh
5) Install the Apache Ranger Admin UI

Edit ${rangerhome}/install.properties and make the following changes:
  • Change SQL_CONNECTOR_JAR to point to the MySQL JDBC driver jar that you downloaded above.
  • Set (db_root_user/db_root_password) to (admin/password)
  • Set (db_user/db_password) to (ranger/password)
  • audit_solr_urls: http://localhost:8983/solr/ranger_audits
  • audit_solr_user: alice
  • audit_solr_password: SolrRocks
  • audit_solr_zookeepers: localhost:2181
Now you can run the setup script via "sudo -E ./setup.sh". When this is done then start the Apache Ranger admin service via: "sudo ranger-admin start".

6) Test that auditing is working correctly in the Ranger Admin service

Open a browser and navigate to "http://localhost:6080". Try to log on first using some made up credentials. Then log in using "admin/admin". Click on the "Audit" tab and then select "Login Sessions". You should see the incorrect and the correct login attempts, meaning that ranger is successfully storing and retrieving audit information in Solr:


Categories: Colm O hEigeartaigh

Securing Apache Solr - part I

Colm O hEigeartaigh - Mon, 06/26/2017 - 11:46
This is the first post in a series of articles on securing Apache Solr. In this post we will look at deploying an example SolrCloud instance and securing access to it via basic authentication.

1) Install and deploy a SolrCloud example

Download and extract Apache Solr (6.6.0 was used for the purpose of this tutorial). Now start SolrCloud via:
  • bin/solr -e cloud
Accept all of the default options. This creates a cluster of two nodes, with a collection "gettingstarted" split into two shards and two replicas per-shard. A web interface is available after startup at: http://localhost:8983/solr/.

Once the cluster is up and running we can post some data to the collection we have created via the REST interface:
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book1", "title_t" : "The Merchant of Venice", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book2", "title_t" : "Macbeth", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book3", "title_t" : "Death of a Salesman", "author_s" : "Arthur Miller"}]'
We can search the REST interface to for example return all entries by William Shakespeare as follows:
  • curl http://localhost:8983/solr/gettingstarted/query?q=author_s:William+Shakespeare
2) Authenticating users to our SolrCloud instance

Now that our SolrCloud instance is up and running, let's look at how we can secure access to it, by using HTTP Basic Authentication to authenticate our REST requests. Download the following security configuration which enables Basic Authentication in Solr:
Two users are defined - "alice" and "bob" - both with password "SolrRocks". Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
Now try to run the search query above again using Curl. A 401 error will be returned. Once we specify the correct credentials then the request will work as expected, e.g.:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
Categories: Colm O hEigeartaigh

SSO support for Apache Syncope REST services

Colm O hEigeartaigh - Thu, 06/22/2017 - 18:05
Apache Syncope has recently added SSO support for its REST services in the 2.0.3 release. Previously, access to the REST services of Syncope was via HTTP Basic Authentication. From the 2.0.3 release, SSO support is available using JSON Web Tokens (JWT). In this post, we will look at how this works and how it can be configured.

1) Obtaining an SSO token from Apache Syncope

As stated above, in the past it was necessary to supply HTTP Basic Authentication credentials when invoking on the REST API. Let's look at an example using curl. Assume we have a running Apache Syncope instance with a user "alice" with password "ecila". We can make a GET request to the user self service via:
  • curl -u alice:ecila http://localhost:8080/syncope/rest/users/self
It may be inconvenient to supply user credentials on each request or the authentication process might not scale very well if we are authenticating the password to a backend resource. From Apache Syncope 2.0.3, we can instead get an SSO token by sending a POST request to "accessTokens/login" as follows:
  • curl -I -u alice:ecila -X POST http://localhost:8080/syncope/rest/accessTokens/login
The response contains two headers:
  • X-Syncope-Token: A JWT token signed according to the JSON Web Signature (JWS) spec.
  • X-Syncope-Token-Expire: The expiry date of the token
The token in question is signed using the (symmetric) "HS512" algorithm. It contains the subject "alice" and the issuer of the token ("ApacheSyncope"), as well as a random token identifier, and timestamps that indicate when the token was issued, when it expires, and when it should not be accepted before.

The signing key and the issuer name can be changed by editing 'security.properties' and specifying new values for 'jwsKey' and 'jwtIssuer'. Please note that it is critical to change the signing key from the default value! It is also possible to change the signature algorithm from the next 2.0.4 release via a custom 'securityContext.xml' (see here). The default lifetime of the token (120 minutes) can be changed via the "jwt.lifetime.minutes" configuration property for the domain.

2) Using the SSO token to invoke on a REST service

Now that we have an SSO token, we can use it to invoke on a REST service instead of specifying our username and password as before, e.g.:
  • curl -H "X-Syncope-Token: eyJ0e..." http://localhost:8080/syncope/rest/users/self
The signature is first checked on the token, then the issuer is verified so that it matches what is configured, and then the expiry and not-before dates are checked. If the identifier matches that of a saved access token then authentication is successful.

Finally, SSO tokens can be seen in the admin console under "Dashboard/Access Token", where they can be manually revoked by the admin user:


Categories: Colm O hEigeartaigh

Querying Apache HBase using Talend Open Studio for Big Data

Colm O hEigeartaigh - Mon, 06/19/2017 - 18:23
Recent blog posts have described how to set up authorization for Apache HBase using Apache Ranger. However the posts just covered inputing and reading data using the HBase Shell. In this post, we will show how Talend Open Studio for Big Data can be used to read data stored in Apache HBase. This post is along the same lines of other recent tutorials on reading data from Kafka and HDFS.

1) HBase setup

Follow this tutorial on setting up Apache HBase in standalone mode, and creating a 'data' table with some sample values using the HBase Shell.

2) Download Talend Open Studio for Big Data and create a job

Now we will download Talend Open Studio for Big Data (6.4.0 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HBaseRead". In the search bar on the right-hand side, enter "hbase" and hit enter. Drag "tHBaseConnection" and "tHBaseInput" onto the palette, as well as "tLogRow".

"tHBaseConnection" is used to set up the connection to "HBase", "tHBaseInput" uses the connection to read data from HBase, and "tLogRow" will log the data that was read so that we can see that the job ran successfully. Right-click on "tHBaseConnection" and select "Trigger/On Subjob Ok" and drag the resulting arrow to the "tHBaseInput" component. Now right click on "tHBaseInput" and select "Row/Main" and drag the arrow to "tLogRow".
3) Configure the components

Now let's configure the individual components. Double click on "tHBaseConnection" and select the distribution "Hortonworks" and Version "HDP V2.5.0" (from an earlier tutorial we are using HBase 1.2.6). We are not using Kerberos here so we can skip the rest of the security configuration. Now double click on "tHBaseInput". Select the "Use an existing connection" checkbox. Now hit "Edit Schema" and add two entries to map the column we created in two different column families: "c1" which matches DB "col1" of type String, and "c2" which matches DB "col1" of type String.


Select "data" for the table name back in tHBaseInput and add a mapping for "c1" to "colfam1", and "c2" to "colfam2".


Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see "val1" and "val2" appear in the console window.
Categories: Colm O hEigeartaigh

How to do JOSE in Apache CXF service code

Sergey Beryozkin - Fri, 06/16/2017 - 17:47
This blog entry continues the series started with the introduction to Apache CXF JOSE implementation followed recently with the post talking about the signing of HTTP attachments.

So CXF helps with shipping JOSE filters which can protect the application data by wrapping them into JOSE JWS or JWE envelopes or verify that the data has been properly encrypted and/or signed. In these cases the application code is not even aware that the JOSE processors are involved.

How would one approach the task of signing/verifying and/or encrypting/decrypting the data directly in the application code ? For example, what if an individual property of the bigger payload needs to be JOSE protected ?

The most obvious approach is to use either CXF JOSE or the preferred 3rd party library to deal with the JOSE primitives in the application code. This is Option 1. It is a must option if one needs to have a closer control over the JOSE envelope creation process.

Or you can basically do nearly nothing at all and let CXF handle it for you, this is Option 2. This is a CXF Way Option - make it as easy as possible for the users to embrace the advanced technologies fast. It is not though only about making it easy - but is also about having a more flexible and even portable JOSE-aware code.

In this case such requirements as "sign only" or "encrypt only" or "sign and encrypt" and similarly for the "verify/decrypt" are not encoded in the code - it is managed at the time of configuring the JOSE helpers from the application contexts (by default they only sign/verify).

Likewise, the signature and encryption algorithm and key properties are controlled externally.

I know, it is hard to believe that it can be so easy. Try it to believe it. Enjoy !



Categories: Sergey Beryozkin

Securing Apache HBase - part II

Colm O hEigeartaigh - Wed, 06/14/2017 - 18:42
This is the second (and final for now) post in a short series of blog posts on securing Apache HBase. The first post looked at how to set up a standalone instance of HBase and how to authorize access to a table using Apache Ranger. In this post, we will look at how Apache Ranger can create "tag" based authorization policies for Apache HBase using Apache Atlas.

1) Start Apache Atlas and create entities/tags for HBase

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "customer_data". Now click on "Search" and then follow the "Create new entity" link of type "hbase_table" with the following parameters:
  • Name: data
  • QualifiedName: data@cl1
  • Uri: data
Now add the 'customer_data' tag to the entity that we have created.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HBaseTagService". Create a new policy for this service called "CustomerDataPolicy". In the "TAG" field enter a "c" and the "customer_data" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "Read" permission for the "HBase" component.

We also need to do is to go back to the Resource based policies and edit "cl1_hbase" and select the tag service we have created above. Now we are ready to test the authorization policy we have created with HBase. Start the shell as "bob" and we should be able to read the table we created in the first tutorial:
  • sudo -E -u bob bin/hbase shell
  • scan 'data'
Categories: Colm O hEigeartaigh

Securing Apache HBase - part I

Colm O hEigeartaigh - Tue, 06/13/2017 - 14:18
This is the first in a short series of blog posts on securing Apache HBase. HBase is a column-based database that facilitates random read/write access to data stored in the Hadoop FileSystem (HDFS). In this post we will focus on setting up a standalone instance of Apache HBase, and then demonstrate how to use Apache Ranger to authorize access to a HBase table.

1) Install Apache HBase

Download Apache HBase (version 1.2.6 was used for the purposes of this tutorial) and extract it. As stated above, we will set up a standalone version of HBase, which means that HBase itself and Apache Zookeeper run in a single JVM, and data is stored in the local filesystem instead of HDFS. Normally we would authenticate users via Kerberos, but as we are just running HBase in standalone mode, we will focus solely on authorization in this series of tutorials. Start HBase via:
  • bin/start-hbase.sh
Then start the shell and create a sample table called "data", with two column families, and add some rows to the table:
  • bin/hbase shell
  • create 'data', 'colfam1', 'colfam2'
  • put 'data', 'row1', 'colfam1:col1', 'val1'
  • put 'data', 'row1', 'colfam2:col1', 'val2'
  • scan 'data'
The latter command will print out the values stored in the table. Next we will look at using Apache Ranger to restrict access to the 'data' table to authorized users only.

2) Install the Apache Ranger HBase plugin 

Download Apache Ranger and verify that the signature is valid and that the message digests match. Extract and build the source, and copy the resulting plugin to a location where you will configure and install it, e.g.:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hbase-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hbase-plugin ${ranger.hbase.home}
Now go to ${ranger.hbase.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "cl1_hbase".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache HBase installation
Save "install.properties" and install the plugin as root via "sudo ./enable-hbase-plugin.sh". The Apache Ranger HBase plugin should now be successfully installed. The ranger plugin will try to store policies by default in "/etc/ranger/cl1_hbase/policycache". As we installed the plugin as "root" make sure that this directory is accessible to the user that is running HBase.

3) Configure authorization policies in the Apache Ranger Admin UI 

The next step is to create some authorization policies for Apache HBase in the Apache Ranger admin service. Please refer to this blog post for information on how to install the Apache Ranger admin service. Assuming the admin service is already installed, start it via "sudo ranger-admin start". Open a browser and log on to "localhost:6080" with the credentials "admin/admin".

Create a new HBase service, adding the following configuration items to the default values:
  • Service Name: cl1_hbase
  • Username/Password: admin
  • hbase.zookeeper.quorum: localhost
Click on "Test Connection" (if HBase is running) to verify that the connection is successful (note: only works from 1.0.0 onwards - see RANGER-1640) and then save the service. Click on "cl1_hbase" and edit the default policy which has been created, and add the user running HBase to the "Allow Condition" permission.

Now we will add a new authorization policy to test access to HBase. Under "Settings + Users/Groups" add two new users called "alice" and "bob", and also create these new users in your local system. Now we can create a new authorization policy to grant "alice" the "Read" permission for the "data" table (all column families and columns).



4) Testing authorization in HBase

The policy we have created above will get downloaded and enforced by the Ranger HBase plugin we installed into HBase. Restart HBase before proceeding further (if it was running with the Ranger plugin before downloading the policy which granted the user running HBase "admin" privileges, then HBase might not be working properly). Now start the shell as "alice" and try to read the table we created earlier:
  • sudo -E -u alice bin/hbase shell
  • scan 'data'
This should work due to the authorization policy we created. However "alice" should not be allowed to write to "data", e.g the following should result in a "AccessDeniedException":
  • put 'data', 'row1', 'colfam1:col1', 'val3'
Categories: Colm O hEigeartaigh

Securing Apache Storm - part IV

Colm O hEigeartaigh - Tue, 06/06/2017 - 16:20
This is the fourth and final post in a series of blog posts on securing Apache Storm. The first post looked at setting up a simple Storm cluster that authenticates users via Kerberos, and deploying a topology. The second post looked at deploying the Storm UI using Kerberos, and accessing it via a REST client. The third post looked at how to use Apache Ranger to authorize access to Apache Storm.  In this post, we will look at how Apache Ranger can create "tag" based authorization policies for Apache Storm using Apache Atlas.

1) Start Apache Atlas and create entities/tags for Storm

First let's look at setting up Apache Atlas. Download the latest released version (0.8-incubating) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
    The distribution will then be available in 'distro/target/apache-atlas-0.8-incubating-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
    • export MANAGE_LOCAL_HBASE=true
    • export MANAGE_LOCAL_SOLR=true
    Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "user_topologies".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Storm Entity in the UI. Instead we can use the following json file to create a Storm Entity for "*" topologies:

    You can upload it to Atlas via:
    • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @storm-create.json http://localhost:21000/api/atlas/entities
    Once the new entity has been uploaded, then you can search for it in the Atlas UI, then click on "+" beside "Tags" and associate the new entity with the "user_topologies" tag.

    2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

    To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
    • Set TAG_SOURCE_ATLAS_ENABLED to "false"
    • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
    • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
    • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
    Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

    3) Create Tag-based authorization policies in Apache Ranger

    Now let's create a tag-based authorization policy in the Apache Ranger admin UI. Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "StormTagService". Create a new policy for this service called "UserTopologiesPolicy". In the "TAG" field enter a "u" and the "user_topologies" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "alice" with all of the component permissions for "Storm":


    We also need to do is to go back to the Resource based policies and edit "cl1_storm" and select the tag service we have created above. Finally, edit the existing "cl1_storm" policy created as par of the previous tutorials, and remove the permissions for "alice" there, so that we can be sure that authorization is working correctly. Then follow the first tutorial and verify that "alice" is authorized to deploy a topology as per the tag-based authorization policy we have created in Ranger.
    Categories: Colm O hEigeartaigh

    Securing Apache Storm - part III

    Colm O hEigeartaigh - Fri, 06/02/2017 - 18:41
    This is the third in a series of blog posts on securing Apache Storm. The first post looked at setting up a simple Storm cluster that authenticates users via Kerberos, and deploying a topology. The second post looked at deploying the Storm UI using Kerberos, and accessing it via a REST client. Thus far we have only looked at how to authenticate users to Storm using Kerberos. In this post we will look at how to use Apache Ranger to authorize access to Apache Storm.

    1) Install the Apache Ranger Storm plugin
     
    Follow the steps in the first tutorial (parts 1 - 3) to setup the Apache Kerby testcase, Apache Zookeeper instance, and the Apache Storm distribution, if you have not done this already. Now we will install the Apache Ranger Storm plugin. If you want to be able to download the topologies from Storm to Ranger when creating policies, then follow the second tutorial to start the Storm UI.

    Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
    • mvn clean package assembly:assembly -DskipTests
    • tar zxvf target/ranger-1.0.0-SNAPSHOT-storm-plugin.tar.gz
    • mv ranger-1.0.0-SNAPSHOT-storm-plugin.tar.gz ${ranger.storm.home}
    Now go to ${ranger.storm.home} and edit "install.properties". You need to specify the following properties:
    • POLICY_MGR_URL: Set this to "http://localhost:6080"
    • REPOSITORY_NAME: Set this to "cl1_storm".
    • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Storm installation
    Save "install.properties" and install the plugin as root via "sudo ./enable-hdfs-plugin.sh". The Apache Ranger Storm plugin should now be successfully installed. Now start Kerby, Zookeeper and Storm as covered in the first tutorial.

    2) Create authorization policies in the Apache Ranger Admin console

    Next we will use the Apache Ranger admin console to create authorization policies for Apache Storm. Follow the steps in this tutorial to install the Apache Ranger admin service. To retrieve the running topologies from Apache Storm, then you need to configure Kerberos appropriately for Apache Ranger. You can first point to the Kerby krb5.conf via:
    • export JAVA_OPTS="-Djava.security.krb5.conf=/path.to./kerby.project/target/krb5.conf"
    Start the Apache Ranger admin service with "sudo -E ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Storm service with the following configuration values:
    • Service Name: cl1_storm
    • Username: storm-client
    • Password: storm-client
    • Nimbus URL: http://localhost:8080
    Click on "Test Connection" to verify that we can connect successfully to Storm  + then save the new service. Now click on the "cl1_storm" service that we have created. Edit the existing policy for the "*" Storm topology, adding the user "alice" (create this user if you have not done so already under "Settings, Users/Groups") to all of the available permissions.

    3) Testing authorization in Storm

    Now let's test the Ranger authorization policy we created above in action. The Ranger authorization plugin will pull policies from the Admin service every 30 seconds by default. For the "cl1_storm" example above, they are stored in "/etc/ranger/cl1_storm/policycache/" by default. Make sure that the user you are running Storm as can access this directory. To test authorization follow step 4 in the first tutorial, but use the user "storm-client" instead (and "storm_client.keytab"). You should see an authorization exception. Now try again with user "alice" (and "alice.keytab") and authorization should succeed.
    Categories: Colm O hEigeartaigh

    Securing Apache Storm - part II

    Colm O hEigeartaigh - Wed, 05/31/2017 - 13:16
    This is the second in a series of tutorials on securing Apache Storm. The first post looked at setting up a simple Storm cluster that authenticates users via Kerberos, and deploying a topology. Apache Storm also ships with a UI (and REST API) that can be used to download configuration, start/stop topologies, etc. This post looks at deploying the Storm UI using Kerberos, and accessing it via a REST client.

    1) Configure the Apache Storm UI

    The first step is to follow the previous tutorial to deploy the Apache Kerby KDC, to configure Apache Zookeeper, and to download and deploy Apache Storm (sections 1-3). Note that there is a bug in Kerby that is not yet fixed in the 1.0.0 release that you might run in to when using curl (see below), depending on whether the MIT libraries are installed or not. In additional to the principals listed in the last post, the Kerby deployment test for Storm also contains a principal for the Storm UI (HTTP/localhost@storm.apache.org).

    Now edit 'conf/storm.yaml' and add the following properties:
    • ui.filter: "org.apache.hadoop.security.authentication.server.AuthenticationFilter"
    •  ui.filter.params:
      • "type": "kerberos"
      • "kerberos.principal": "HTTP/localhost@storm.apache.org"
      • "kerberos.keytab": "/path.to.kerby.project/target/http.keytab"
      • "kerberos.name.rules": "RULE:[2:$1@$0]([jt]t@.*EXAMPLE.COM)s/.*/$MAPRED_USER/ RULE:[2:$1@$0]([nd]n@.*EXAMPLE.COM)s/.*/$HDFS_USER/DEFAULT"
    Start the UI with:
    • bin/storm ui
    2) Invoke on the Storm UI REST API

    We will invoke on the Storm UI REST API using "curl" on the command line. This can be done as follows:
    • export KRB5_CONFIG=/path.to.kerby.project/target/krb5.conf
    • kinit -k -t /path.to.kerby.project/target/alice.keytab alice
    • curl --negotiate -u : -b ~/cookiejar.txt -c ~/cookiejar.txt http://localhost:8080/api/v1/cluster/configuration
    You should see the cluster configuration in JSON format if the call is successful.
    Categories: Colm O hEigeartaigh

    Securing Apache Storm - part I

    Colm O hEigeartaigh - Fri, 05/26/2017 - 18:01
    This is the first tutorial in a planned three part series on securing Apache Storm. In this post we will look at setting up a simple Storm cluster that authenticates users via Kerberos, and how to run a simple topology on it. Future posts will cover authorization using Apache Ranger. For more information on how to setup Kerberos for Apache Storm, please see the following documentation.

    1) Set up a KDC using Apache Kerby

    As for other kerberos-related tutorials that I have written on this blog, we will use a github project I wrote that uses Apache Kerby to start up a KDC:
    • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
    The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals:
    • zookeeper/localhost@storm.apache.org
    • zookeeper-client@storm.apache.org
    • storm/localhost@storm.apache.org
    • storm-client@@storm.apache.org
    • alice@storm.apache.org
    Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

    2) Download and configure Apache Zookeeper

    Apache Storm uses Apache Zookeeper to help coordinate the cluster. Download Apache Zookeeper (this tutorial used 3.4.10) and extract it to a local directory. Configure Zookeeper to use Kerberos by adding a new file 'conf/zoo.cfg' with the following properties:
    • dataDir=/tmp/zookeeper
    • clientPort=2181
    • authProvider.1 = org.apache.zookeeper.server.auth.SASLAuthenticationProvider
    • requireClientAuthScheme=sasl 
    • jaasLoginRenew=3600000 
    Now create 'conf/zookeeper.jaas' with the following content:

    Server {
            com.sun.security.auth.module.Krb5LoginModule required refreshKrb5Config=true useKeyTab=true keyTab="/path.to.kerby.project/target/zookeeper.keytab" storeKey=true principal="zookeeper/localhost";
    };

    Before launching Zookeeper, we need to point to the JAAS configuration file above and also to the krb5.conf file generated in the Kerby test-case above. Add a new file 'conf/java.env' adding the SERVER_JVMFLAGS property to the classpath with:
    • -Djava.security.auth.login.config=/path.to.zookeeper/conf/zookeeper.jaas
    • -Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf".
    Start Zookeeper via:
    • bin/zkServer.sh start
    3) Download and configure Apache Storm

    Now download and extract the Apache Storm distribution (1.1.0 was used in this tutorial). Edit 'conf/storm.yaml' and edit the following properties:
    • For "storm.zookeeper.servers" add "- localhost"
    • nimbus.seeds: ["localhost"]
    • storm.thrift.transport: "org.apache.storm.security.auth.kerberos.KerberosSaslTransportPlugin"
    • java.security.auth.login.config: "/path.to.storm/conf/storm.jaas"
    • storm.zookeeper.superACL: "sasl:storm"
    • nimbus.childopts: "-Djava.security.auth.login.config=/path.to.storm/conf/storm.jaas -Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf" 
    • ui.childopts: "-Djava.security.auth.login.config=/path.to.storm/conf/storm.jaas -Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf" 
    • supervisor.childopts: "-Djava.security.auth.login.config=/path.to.storm/conf/storm.jaas -Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf"
    Create a file called 'conf/storm.jaas' with the content:

    Client {
        com.sun.security.auth.module.Krb5LoginModule required refreshKrb5Config=true useKeyTab=true keyTab="/path.to.kerby.project/target/zookeeper_client.keytab" storeKey=true principal="zookeeper-client";
    };

    StormClient {  
        com.sun.security.auth.module.Krb5LoginModule required refreshKrb5Config=true useKeyTab=true keyTab="path.to.kerby.project/target/storm_client.keytab" storeKey=true principal="storm-client" serviceName="storm";
    };

    StormServer {
        com.sun.security.auth.module.Krb5LoginModule required refreshKrb5Config=true useKeyTab=true keyTab="path.to.kerby.project/target/storm.keytab" storeKey=true principal="storm/localhost@storm.apache.org";
    };

    'Client' is used to communicate with Zookeeper, 'StormClient' is used by the supervisor nodes and 'StormServer' is used by nimbus. Now start Nimbus and a supervisor node via:
    • bin/storm nimbus
    • bin/storm supervisor
    4) Deploy a Topology

    As we have the Storm cluster up and running, the next task is to deploy a Topology to it. For this we will need to use another Storm distribution, so extract Storm again to another directory. Edit 'conf/storm.yaml' and edit the following properties:
    • For "storm.zookeeper.servers" add "- localhost"
    • nimbus.seeds: ["localhost"]
    • storm.thrift.transport: "org.apache.storm.security.auth.kerberos.KerberosSaslTransportPlugin"
    • java.security.auth.login.config: "/path.to.storm.client/conf/storm.jaas"
    Create a file called 'conf/storm.jaas' with the content:

    StormClient {
                com.sun.security.auth.module.Krb5LoginModule required refreshKrb5Config=true useTicketCache=true serviceName="storm";
    };

    Note that we are not using keytabs here, but instead a ticket cache. Now edit 'conf/storm_env.ini' and add:
    • STORM_JAR_JVM_OPTS:-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf
    Now that we have everything set up, it's time to deploy a topology to our cluster. I have a simple Storm topology that wires a WordSpout + WordCounterBolt into a topology that can be used for this in github here. Check this project out from github + build it via "mvn assembly:assembly". We will need a Kerberos ticket store in our ticket cache to deploy the job:
    • export KRB5_CONFIG=/path.to.kerby.project/target/krb5.conf
    • kinit -k -t /path.to.kerby.project/target/alice.keytab alice
    Finally we can submit our topology:
    • bin/storm jar /path.to.storm.project/target/bigdata-storm-demo-1.0-jar-with-dependencies.jar  org.apache.coheigea.bigdata.storm.StormMain /path.to.storm.project/target/test-classes/words.txt
    If you take a look at the logs in the nimbus distribution you should see that the topology has run correctly, e.g. 'logs/workers-artifacts/mytopology-1-1495813912/6700/worker.log'.

    Categories: Colm O hEigeartaigh

    Configuring Kerberos for Kafka in Talend Open Studio for Big Data

    Colm O hEigeartaigh - Tue, 05/23/2017 - 17:23
    A recent blog post showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. In this post we will follow a similar setup, to see how to create a job in Talend Open Studio for Big Data to read data from an Apache Kafka topic using kerberos.

    1) Kafka setup

    Follow a recent tutorial to setup an Apache Kerby based KDC testcase and to configure Apache Kafka to require kerberos for authentication. Create a "test" topic and write some data to it, and verify with the command-line consumer that the data can be read correctly.

    2) Download Talend Open Studio for Big Data and create a job

    Now we will download Talend Open Studio for Big Data (6.4.0 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "KafkaKerberosRead". 
    In the search bar under "Palette" on the right hand side enter "kafka" and hit enter. Drag "tKafkaConnection" and "tKafkaInput" to the middle of the screen. Do the same for "tLogRow":
    We now have all the components we need to read data from the Kafka topic. "tKafkaConnection" will be used to configure the connection to Kafka. "tKafkaInput" will be used to read the data from the "test" topic, and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tKafkaConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tKafkaInput". Right click on "tKafkaInput" and select "Row/Main" and drag the resulting line to "tLogRow":

    3) Configure the components

    Now let's configure the individual components. Double click on "tKafkaConnection". If a message appears that informs you that you need to install additional jars, then click on "Install". Select the version of Kafka that corresponds to the version you are using (if it doesn't match then select the most recent version). For the "Zookeeper quorum list" property enter "localhost:2181". For the "broker list" property enter "localhost:9092".

    Now we will configure the kerberos related properties of "tKafkaConnection". Select the "Use kerberos authentication" checkbox and some additional configuration properties will appear. For "JAAS configuration path" you need to enter the path of the "client.jaas" file as described in the tutorial to set up the Kafka test-case. You can leave "Kafka brokers principal name" property as the default value ("kafka"). Finally, select the "Set kerberos configuration path" property and enter the path of the "krb5.conf" file supplied in the target directory of the Apache Kerby test-case.



    Now click on "tKafkaInput". Select the checkbox for "Use an existing connection" + select the "tKafkaConnection" component in the resulting component list. For "topic name" specify "test". The "Consumer group id" can stay as the default "mygroup".

    Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. Send some data via the producer to the "test" topic and you should see the data appear in the Run Window in the Studio.
    Categories: Colm O hEigeartaigh

    Signing HTTP Attachments with Apache CXF JOSE

    Sergey Beryozkin - Tue, 05/23/2017 - 12:17
    JOSE, the primary mechanism for securing various OAuth2/OIDC tokens, slowly but surely is becoming the main technology for securing the data in the wider contexts. JOSE, alongside COSE, will become more and more visible going forward.

    I talked about Apache CXF JOSE implementation in this post. One of the practical aspects of this implementation is that one can apply JOSE to securing the regular HTTP payloads, with the best attempt at keeping the streaming going made by the sender side filters, with the JOSE protection of these payloads (JWS signature or JWE encryption) being able to 'stay' with the data even beyond the HTTP request-response time if needed.

    In CXF 3.1.12 I have enhanced this feature to support the signing of HTTP attachments. It depends on JWS Detached Content and Unencoded Content features which allow to integrity-protect the payload which can continue flowing to its destination in a clear form.

    Combining it with the super-flexible mechanism of processing the attachments in Apache CXF, and particularly with the newly introduced Multipart filters which let pre-process individual multipart attachment streams, helped produce the final solution.  

    Besides, as part of this effort, the optional binding of the outer HTTP headers to the secure JWS or JWE payloads has also been realized.

    Be the first in experimenting with this IMHO very cool feature, try it and provide the feedback, enjoy !


    Categories: Sergey Beryozkin

    Pages

    Subscribe to Talend Community Coders aggregator