loudvchar: May 2010

Thursday, May 27, 2010

User managed access

While attending the 10th Internet Identity Workshop last week, I came across an interesting initiative that I had absolutely no idea about. I am glad that I attended. It is called User Managed Access (UMA). It is about letting a user choose the access control policies for resources offering his/her personal data (personal information, address book, bookmarks, blog posts, comments, etc.). It is explained much better in UMA Explained.

"... a web user (authorizing user) can authorize a web app (requester) to gain one-time or ongoing access to a resource containing his home address stored at a "personal data store" service (host), by telling the host to act on access decisions made by his authorization decision-making service (authorization manager)."

If you are familiar with enterprise access control, the following might help.

"In enterprise settings, application access management often involves letting back-office applications serve only as policy enforcement points (PEPs), depending entirely on access decisions coming from a central policy decision point (PDP) to govern the access they give to requesters. This separation eases auditing and allows policy administration to scale in several dimensions. UMA makes use of this separation, letting the authorizing user serve as a policy administrator crafting authorization strategies on his or her own behalf."

The UMA work group has protocol spec, scenarios and reference implementation.

I plan to follow this work. Having just checked in the default set of permissions for resources for CollectionSpace, I wonder how and when a Host (such as CollectionSpace) would make permissions for a user's data available to an Authorization Manager (AM). UMA envisions that the AM could be a third party web-based service, but would that be practical from a performance perspective?

Sunday, May 23, 2010

Relationship management in RESTful web services - Part II

In the Part I on this topic, I described an approach that makes relationship between two resources a first-class resource. In this post, I would describe the second approach I had mentioned in Part I, the relationship as a sub-resource.

Relationship as a sub-resource
Here, the relationship is managed as a sub resource of the resource that acts as object in the relationship. Let's take an example.

Roles ROLE_COLLECTIONMANAGER (id: 814ed9d4-3315-44fc-993b) has permission (id: e62d2306-518b-414b-9591) to access a resource named intakes

Permission as object in relationship

If the object of the relationship is a permission and subject being role, i.e. a permission to access resource named intakes is given to role ROLE_COLLECTIONMANAGER, it could be created using the following API. We assume here that both the permission and role are already created using respective POST operations and we are only trying to relate these over here.

Here, e62d2306-518b-414b-9591 is the id of the permission. The sub-resource is permroles that indicates the context of the relationship, i.e. the relationship is between permission and role and not permission and user for example.

As you might notice, we have added some additional data about the permission (resourceName) as well as the role (roleName). This is optional and for convenience purposes only. Also, it would be possible to associate one or more roles with the same permission in a single POST request. So, associating both ROLE_COLLECTIONMANAGER and ROLE_CURATOR (id: 3772624d-1ab3-4e47-a26d-191fc6437410) with the same permission would look like the following.

Role as object in the relationship

The same data structure could also be used with role as an object in the relationship. That is, ROLE_COLLECTIONMANAGER has permission(s) to access intakes (and collectionobjects) resource. This could be accomplished using the following API. Note that id 814ed9d4-3315-44fc-993b represents ROLE_COLLECTIONMANAGER.

Here, POST does return an id to comply with RESTful architecture, however, because relationship is not treated as a first-class resource, this id is meaningless for subsequent operations such as GET and DELETE. Let's assume that we received id 123 for now. Also, more subjects could be associated with the same object using the POST operation again as follows. That means, POST also acts as PUT.

The GET operation would use the id of the object (permission:e62d2306-518b-414b-9591 or role:814ed9d4-3315-44fc-993b) in the relationship to access the relationships with all the subject(s). The GET operation would return the same data as was posted but with a union of all subject(s).

Note the last element of the URI, 123, it is just a filler for an id as the relationship is not a first-class RESTful resource.

All 3 roles associated with a permission are returned.

According to the RESTful architecture, the DELETE should only take an id of the resource to be deleted. The following operation would delete relationships between the given object and all its subjects.

This is good and bad. It is good from efficiency purposes in the sense that all the relationships between subjects and given object could be deleted in one shot. However, it is bad that individual relationships between a subject and given object cannot be deleted as there is no way to uniquely identify such a relationship.

Overall, I like the sub-resource based approach as it is more convenient to navigate to relationship(s) from an object in the relationship. Also, it supports bulk operations, i.e. associating more than one subjects with an object with a single request is possible and GET returns relationship between an object and all the subjects that object is related to in the context of that relationship.

However, this is a bit un-RESTful approach as the identifier returned from POST is meaningless for the GET and DELETE operations. Also, DELETE is not fine-grained, i.e. deleting a relationship between an object and a subject is not possible. It would be better if there was a way to make that possible (see an alternative). The approach described in Part I does not have this problem. Perhaps a hybrid is possible? Would a hybrid approach more complex? I am eager to hear your viewpoints. Feel free to send me comments if you have taken an approach which does not suffer from the limitations I have described.

Monday, May 17, 2010

would you use webfinger?

I was at the Internet Identity Workshop #10 today. I have been to an unconference only once before. It looked like a chaos initially but it was organized chaos. I liked the format. Moreover, I liked how open the participation was both from the presenter and the listeners.

Anyway, while I was slightly familiar with OpenID and OAuth, I am just getting familiar with some of the problems of the initial versions of the Open ID and OAuth 1.0a. Came across several initiatives ... one of which is WebFinger.

"WebFinger is about making email addresses more
valuable, by letting people attach public metadata to them. That
metadata might include:

public profile data
pointer to identity provider (e.g. OpenID server)
a public key
other services used by that email address (e.g. Flickr, Picasa, Smugmug, Twitter, Facebook, and usernames for each)
a URL to an avatar
profile data (nickname, full name, etc)
whether
the email address is also a JID, or explicitly declare that it's NOT an
email, and ONLY a JID, or any combination to disambiguate all the
addresses that look like something@somewhere.com
or even a public declaration that the email address doesn't have public metadata,
but has a pointer to an endpoint that, provided authentication, will
tell you some protected metadata, depending on who you authenticate as."

Eran Hammer-Lahav describes the rationale for the same over here. I like number of arguments he makes, however, I am stuck on the following ...

"The arguments against email as identifiers usually include concerns
over spam and privacy ..."

At least with the Http URI, I don't have to worry about spam. Indeed, there is a phishing problem, but as far as one knows how to protect against it, it might be manageable. How do I know that the email address I am giving to some site in order to enable it to fetch my public meta data won't be misused? Am I missing something here?

Monday, May 10, 2010

User profile management in a social application

In my opinion, user profile management is one of the most important aspects of any social application. Social graphs are made of people and relationships between people. Not only the user profile allows an application developer to capture the information about a person, it is also an important dimension in deducting various kinds of user and usage focused analytics. Such analytics is very important for a social application. If designed well, it could also help increase the application's user base. I will explain about this further later.

User profile management should include the following functionality.

Account management including lifecycle management
Support for multiple identity providers (IdP)
Login / Logout
Single sign-on (OpenID, Facebook Connect)
Profile management
Authorization (Native, OAuth)
?

Account management
From an application developer's perspective, every user of the application must have an account regardless of how the user logs in (user/password, OpenId or Facebook auth). An account could have minimal information such as display name, email, status, timestamps for creation and modification times. Account may not necessarily have user's profile information.

The lifecycle events associated with an account could be registration, activation, deactivation and finally deletion.

Support multiple IdP
Any social application should support at least 3 types of identity providers in my opinion. A local IdP, OpenID IdP and Facebook.

The local IdP comes handy when potential users of the application do not have any OpenID or do want to create and maintain a profile with the application. The local IdP is usually implemented as identifying users with username and password. Database is generally used as a realm.

The application should also support one or more OpenID IdPs. Many users may not want to create one more online identity to login to the application. They may want to use one of their existing identities managed by a 3rd party identity provider (Yahoo!, Google, myOpenId, etc.) using the OpenID protocol. An application would act as a relying party (RP) that relies on identities asserted by the 3rd party IdPs.

Lastly, any social application developer may want to tap into Facebook's 400MM user base. Unfortunately, Facebook does not support open standard such as OpenID. So, the application has to support proprietary FB authentication protocol (FB Connect).

See Stackoverflow or Plaxo login screens to check how these applications support multiple IdPs.

Login / Logout
This is a an obvious feature of any web application that would want to offer personalized services. No need to describe anything here except that a user should be able to login using an id managed by the local IdP or by a foreign IdP after due assertion of that id. Other functions would include ability to reset password (for local IdP only), "remember me" among other things. Session management (session expiration, persistent session, etc.) would be required as well as protection against session fixation attacks.

Logout for a user logged in using OpenID or FB Connect would require logging out locally from the application and destroying session context related to the application that is relevant to the user logging out. Note that for a social application, the short comings of a log out feature would not only expose the user but the user's social activities and social graph as well.

Single sign-on
For a social application, user experience is very important. Offering single sign on could improve user experience right away. If the application user has logged into some other web application using OpenID or FB Connect, that user may not have to sign in again to your application (within the timeframe set by the OpenID IdP or FB) if you support OpenID or FB. Again, check out Stackoverflow or Plaxo.

Profile management

The profile data of a user is important for any social application as
it acts as a very important dimension in various analytical services
the application could provide to the users, insights to improve its own
services and to interested 3rd parties. Account may hold minimal information about the user. An application may have a user profile to hold other user-specific information. This depends on the application but it could include information such as first and last name, nick name, address(s) (land and web), land and mobile phone(s), email(s), instant messenger id(s) and other demographics information as required by your application.

If the user logs in using OpenID, it is possible to populate some of these using OpenID's attribute exchange protocol at the time of login. FB Connect also has APIs to retrieve FB user's profile data per user's privacy settings. Such data could also be retrieved securely with user's consent after the login using the OAuth protocol.

Authorization
Authorization deserves its own post. I will cover authorization in my subsequent post.

If you are using a Java based platform on the server side of your application, you may want to look at Apache Shiro based Nimble project. Nimble is a Grails plugin that uses Shiro underneath. It provides most of the features I have mentioned here except OAuth. It also provides customizable user interface and security tags to insert into the user interface.

Thursday, May 6, 2010

Social identity theft - protecting your personal brand

I recently attended a talk given by an OpenId foundation member. I would not say here where and who gave the talk. What I came to know is that the big web-based identity providers (IdP) such as Google and Yahoo! have embraced OpenId for almost couple of years now and they would want you to use your id/account as many places where open id is supported. Facebook wants you to do the same but their protocol is very proprietary. Anyway, this is good news!

This helps a lot in increasing user registration at a relying party (a social media application) web site because users can sign in using their OpenId.
It also helps those folks who want to use OpenId to sign in where ever they could because they don't want to keep track of and maintain various social identities and profiles at various social media web sites.
Lastly it helps in further service authorization and sharing of content using OAuth.

However, this also makes these folks vulnerable to social identity theft. If your social identity is stolen, not only you are vulnerable but your social graph might be vulnerable too (without any reason). This is more dangerous.
I came across an article "How to Combat Social Identity Theft and Strengthen Your Online Personal Brand" where the author recommends creating a separate profile and identity at each social media site . This is like having a separate password for each web site...hard to remember and maintain but it restricts the vulnerability to a single profile/identity/website if stolen. Some folks even provide a service to manually go and create profiles in your name at some 150 web sites! Indeed, what guarantee they give that they won't misuse this information, disgrunted employees could be found everywhere right! And finally there are tools like KnowEm which automate the process by helping find availability of the username at>350 websites and also help create profiles (with subscription service). Many other such tools are listed here.

I would think that one should carefully use identities maintained by the big guys. I would also keep more than one OpenId handy to use at various social media web sites so if any one of these is stolen, at least the vulnerability is contained to only those sites where it was used. However, indeed there could be a better solution...looking forward to your comments, thoughts and suggestions.

Monday, May 3, 2010

Relationship management in RESTful web services - Part I

CollectionSpace offers various RESTful web services for managing meta data for physical/digital objects. Defining RESTful interfaces for various entities in the system is straight forward. Create, Read, Update, Delete, List and Search (CRUDLS) operations on these entities could be easily be mapped to the POST (create), GET (read, list and search), PUT (update) and DELETE (delete) methods of HTTP. For example, see the RESTful APIs for the Role service.

CollectionSpace also has a requirement to support relationships between several entities. For example, there could be one to many relationship between a collection object and a loan object. On a more domain-agnostic side of the services, a permission might be for one or more roles and a role might be related to one or more permissions. Implementing relationships between the RESTful resources is not so straight forward and might need some due diligence. Let's take each of the above two use cases to model relationships in two different ways and talk about pros and cons of each. The two approaches are :

Relationship as a first-class RESTful resource
Relationship as a sub-resource

I'll cover only the first part in this entry. Feel free to comment on the blog if you have implemented relationships in different and better way.

Relationship as a first-class RESTful resource With this approach, relationship becomes a RESTful resource that supports CRUDLS operations. It is a bit unintuitive to think about relationships like this but it is the most flexible approach. Here, each relationship meta data has the following components:

Object of the relationship
Subject of the relationship
Type of the relationship

For example, a relationship between a collection object entity and a loan entity in English could be:

The object in the relationship is the collection object entity and the subject of the relationship is the loan entity. The types should be namespace qualified. This is derived from the RDF model but perhaps only in parts. The relationship web service would support the following methods :

The advantage of such a data structure is that one can relate anything to anything without worrying about semantics. It is also the biggest disadvantage. Relationship being a first-class RESTful resource, each relationship has a distinct id of its own using which it could be retrieved and updated. However, there are many problems.

Unlike the RDF model, the predicate is missing, so it is hard to machine learn how entities are related. The web service would have to do a lot of validation to make sure semantically the relationships are possible between the given entities and are correct.
It is not possible to determine cardinality in the relationship
If RESTful URIs are used, there is no need to provide separate type and id. For example, http://collectionspace.org/services/collectionobjects/814ed9d4-3315-44fc-993b would uniquely identify the object in this relationship.

I'll cover the 2nd approach in my subsequent blog entry.